WEBVTT

00:00:09.029 --> 00:00:11.490
Hey everyone, welcome back to Data Nihai with

00:00:11.490 --> 00:00:14.289
Mukundan. I'm your host Mukundan Sankar and today

00:00:14.289 --> 00:00:17.289
we're going to talk deep about a topic that's

00:00:17.289 --> 00:00:19.390
very close to my heart. Something that could

00:00:19.390 --> 00:00:23.769
completely change how we share our work as data

00:00:23.769 --> 00:00:26.429
analysts, data scientists or machine learning

00:00:26.429 --> 00:00:31.230
professionals. So we're going to be talking about

00:00:31.230 --> 00:00:36.210
automating the creation of engaging animated

00:00:37.200 --> 00:00:41.200
And even beautifully designed presentations that

00:00:41.200 --> 00:00:44.979
are straight from your Jupyter notebooks. And

00:00:44.979 --> 00:00:51.380
we use AI for this, right? This is not just about

00:00:51.380 --> 00:00:55.280
saving time. This is about telling better stories

00:00:55.280 --> 00:01:00.420
with your data. And more importantly, this is

00:01:00.420 --> 00:01:05.719
about making your work shine in interviews. And

00:01:05.719 --> 00:01:10.400
even in team reviews, it could be even if you

00:01:10.400 --> 00:01:15.219
want to do a public talk. So let's just dive

00:01:15.219 --> 00:01:20.180
right in. So here's the problem that we all faced

00:01:20.180 --> 00:01:22.319
early on. And this is something I've personally

00:01:22.319 --> 00:01:26.359
faced. I'm hoping if this is something that you

00:01:26.359 --> 00:01:28.519
faced as well, then this is something that can

00:01:28.519 --> 00:01:32.599
work for you. So I want to take you back to...

00:01:32.969 --> 00:01:35.209
when i was just starting out in the data science

00:01:35.209 --> 00:01:40.310
and analytics field like many of you i was building

00:01:40.310 --> 00:01:43.310
all kinds of projects in jupyter notebooks you

00:01:43.310 --> 00:01:45.530
know when you first started out you're probably

00:01:45.530 --> 00:01:48.310
thinking like what kind of projects would stick

00:01:48.310 --> 00:01:55.689
what would get you hired so so i was doing as

00:01:55.689 --> 00:01:58.829
most of my peers were and i'm sure maybe you

00:01:58.829 --> 00:02:02.200
probably you know relate to this trying to see

00:02:02.200 --> 00:02:04.900
what kind of projects maybe on Kaggle .com or

00:02:04.900 --> 00:02:07.299
could be some, you know, government websites.

00:02:07.859 --> 00:02:09.780
You're trying to find data sets to work with.

00:02:10.319 --> 00:02:14.039
And then you analyze this data set in Jupyter

00:02:14.039 --> 00:02:18.219
Notebooks, right? And it could be something like

00:02:18.219 --> 00:02:20.500
a customer churn model that you're trying to

00:02:20.500 --> 00:02:23.560
build or time series forecasting. So these were

00:02:23.560 --> 00:02:24.979
some of the models that I was trying to build.

00:02:26.219 --> 00:02:31.000
So I had the code, the insights. even the charts

00:02:31.000 --> 00:02:37.000
right even the uh you know graphs that uh all

00:02:37.000 --> 00:02:40.280
of those were ready but something still was not

00:02:40.280 --> 00:02:44.319
um you know adding up right something was still

00:02:44.319 --> 00:02:48.360
missing and when it came time to share that work

00:02:48.360 --> 00:02:52.919
whether in an interview or with a stakeholder

00:02:52.919 --> 00:02:59.530
i struggled because the notebook that I had built,

00:02:59.669 --> 00:03:04.629
it wasn't just built for storytelling. It was

00:03:04.629 --> 00:03:06.689
great for exploration, you know, the exploratory

00:03:06.689 --> 00:03:09.590
analysis, great for iteration, all the machine

00:03:09.590 --> 00:03:13.810
learning tasks, data visualization. But for someone

00:03:13.810 --> 00:03:17.889
unfamiliar with the analysis, well, it was overwhelming,

00:03:18.250 --> 00:03:21.909
right? All they saw was just code, maybe a chart

00:03:21.909 --> 00:03:24.830
or two would, if they had scrolled enough. But

00:03:24.830 --> 00:03:30.750
there was no flow as such. no explanation and

00:03:30.750 --> 00:03:39.650
no guidance but so I started creating these slides

00:03:39.650 --> 00:03:49.610
manually copying markdown exporting charts rewriting

00:03:49.610 --> 00:03:56.879
speaker notes and it was just a nightmare I'm

00:03:56.879 --> 00:03:59.879
sure you know that feeling, especially if you've

00:03:59.879 --> 00:04:03.259
worked in this. That's when I thought, what if

00:04:03.259 --> 00:04:09.939
I could just upload my notebook and get some

00:04:09.939 --> 00:04:15.560
well -structured presentation with slides, with

00:04:15.560 --> 00:04:20.839
speaker notes and visuals, well, instantly, right?

00:04:24.459 --> 00:04:28.800
So that brings me to the vision of my app. So

00:04:28.800 --> 00:04:31.100
that's the idea behind this app that I built

00:04:31.100 --> 00:04:34.339
here. It's a Streamlit app. Again, Streamlit

00:04:34.339 --> 00:04:36.839
is a Python -based library, something that I

00:04:36.839 --> 00:04:40.139
use time and again for my projects. And you can

00:04:40.139 --> 00:04:43.240
view some of my other episodes where some of

00:04:43.240 --> 00:04:45.360
my recent episodes, actually, they all talk about

00:04:45.360 --> 00:04:48.300
me using Streamlit library. It's like a front

00:04:48.300 --> 00:04:53.000
-end Python -based library. if you don't know

00:04:53.000 --> 00:04:56.379
how to use javascript which i don't so streamlit

00:04:56.379 --> 00:04:58.759
is like python based so that makes it very easy

00:04:58.759 --> 00:05:01.759
to use right so it's a streamlit app where you

00:05:01.759 --> 00:05:04.939
upload a jupyter notebook and it does all the

00:05:04.939 --> 00:05:10.040
heavy lifting for you and here's what it gives

00:05:10.040 --> 00:05:15.779
you it gives you like a powerpoint file with

00:05:15.779 --> 00:05:23.300
slides and each slide has a clean title ai generated

00:05:23.300 --> 00:05:27.439
content as well and it gives you bullet points

00:05:27.439 --> 00:05:35.040
so um the main aim is also to give you some speaker

00:05:35.040 --> 00:05:39.339
notes which can guide the way you speak to guide

00:05:39.339 --> 00:05:43.579
your narration right and now now the big one

00:05:43.579 --> 00:05:46.860
it gives you ai generated visual diagrams like

00:05:46.860 --> 00:05:50.379
flow charts process trees and conceptual graphics

00:05:50.379 --> 00:05:54.759
so Well, instead of opening PowerPoint and starting

00:05:54.759 --> 00:06:00.079
from a blank canvas, you now get like 70 to 80

00:06:00.079 --> 00:06:03.220
% of your slide deck already done. All from just

00:06:03.220 --> 00:06:05.920
your notebook, right? You still tweak it, of

00:06:05.920 --> 00:06:09.779
course, and add your own polish. But the structure

00:06:09.779 --> 00:06:13.060
is there. The story is there. The visuals are

00:06:13.060 --> 00:06:17.079
there. And sometimes it may not even give you

00:06:17.079 --> 00:06:19.379
visuals, but I think the idea is at least it'll

00:06:19.379 --> 00:06:22.649
give you ideas to generate those visuals. and

00:06:22.649 --> 00:06:24.910
that's just as important because you can create

00:06:24.910 --> 00:06:26.709
those visuals but you don't know what exactly

00:06:26.709 --> 00:06:30.089
to create um and that's where you can save a

00:06:30.089 --> 00:06:33.170
lot of time by just knowing what to create right

00:06:33.170 --> 00:06:37.410
um and you're just like replicating uh what what's

00:06:37.410 --> 00:06:42.290
being asked for you right um so let's break down

00:06:42.290 --> 00:06:45.389
the tech in this app so let's talk about what's

00:06:45.389 --> 00:06:51.379
under the hood first uh we use nb format library

00:06:51.379 --> 00:06:56.959
to parse the uploaded notebook we extract the

00:06:56.959 --> 00:07:00.660
markdown cells and look for any charts that were

00:07:00.660 --> 00:07:05.240
generated using matplotlib or plotly then we

00:07:05.240 --> 00:07:09.240
feed those markdown cells into gpt4 and here's

00:07:09.240 --> 00:07:13.139
where the magic starts gpt then gives us a slight

00:07:13.139 --> 00:07:17.259
title two to three bullet points for that slide

00:07:18.000 --> 00:07:21.579
and speaker notes and oh and of course this is

00:07:21.579 --> 00:07:23.879
the cool part a textual description of the diagram

00:07:23.879 --> 00:07:27.100
so what i mentioned was you know there could

00:07:27.100 --> 00:07:28.779
be times when it gives the diagram but it also

00:07:28.779 --> 00:07:31.779
can give you a textual description of what you

00:07:31.779 --> 00:07:37.259
what you wanted to create right um so then we

00:07:37.259 --> 00:07:39.480
take that diagram description and we feed it

00:07:39.480 --> 00:07:43.120
to a graphvis library or another visual rendering

00:07:43.120 --> 00:07:48.259
library to actually create that diagram So you

00:07:48.259 --> 00:07:51.740
have this textual description now, but you also

00:07:51.740 --> 00:07:54.319
want to create it. And that's when you can use

00:07:54.319 --> 00:07:58.399
graph -based library or any other visual rendering

00:07:58.399 --> 00:08:02.160
library, like I mentioned, to actually go ahead

00:08:02.160 --> 00:08:04.579
and create the diagram and not just suggest it,

00:08:04.620 --> 00:08:07.040
right? Suggesting, yeah, everybody has an opinion.

00:08:08.500 --> 00:08:13.579
What I think is really helpful here is that you

00:08:13.579 --> 00:08:18.540
are getting... ideas and you are also able to

00:08:18.540 --> 00:08:23.100
create it using these libraries so yeah and of

00:08:23.100 --> 00:08:25.920
course it can handle flowcharts decision trees

00:08:25.920 --> 00:08:31.480
linear pipelines and i'm experimenting with more

00:08:31.480 --> 00:08:35.039
smart art style visuals you know what smart art

00:08:35.039 --> 00:08:38.539
is right it's the one in powerpoint you get these

00:08:38.539 --> 00:08:44.759
amazing ideas to you know recreate to create

00:08:44.759 --> 00:08:48.100
diagrams from whatever text you have so something

00:08:48.100 --> 00:08:52.580
like give you process flow diagrams idea how

00:08:52.580 --> 00:08:55.320
to use a process flow in powerpoint so that's

00:08:55.320 --> 00:08:57.200
something you can explore as well but this is

00:08:57.200 --> 00:09:02.720
something that i'm exploring to use as a next

00:09:02.720 --> 00:09:09.279
step in this app so then also use python pptx

00:09:09.279 --> 00:09:13.940
library to stitch everything together into a

00:09:13.940 --> 00:09:18.500
dot pptx presentation so python pptx like python

00:09:18.500 --> 00:09:25.940
hyphen pptx it helps to convert to a pptx presentation

00:09:25.940 --> 00:09:32.039
right with slides content images and speaker

00:09:32.039 --> 00:09:38.519
notes and it's all done in python now why does

00:09:38.519 --> 00:09:42.149
this matter you might be thinking okay Cool.

00:09:42.470 --> 00:09:45.690
Cool project, right? But what does this actually

00:09:45.690 --> 00:09:51.409
change? Here's the thing. Presentation is leverage.

00:09:53.049 --> 00:09:57.269
You might have the best model, the cleanest data

00:09:57.269 --> 00:10:01.830
pipeline, or even the sharpest insights. But

00:10:01.830 --> 00:10:05.549
if you can't communicate that clearly, you lose

00:10:05.549 --> 00:10:09.509
opportunities. In interviews, in team meetings,

00:10:09.879 --> 00:10:11.720
So let's say you go into these team meetings

00:10:11.720 --> 00:10:14.519
unprepared. You have the cleanest data pipeline,

00:10:14.759 --> 00:10:17.759
everything. You have solved the solution in your

00:10:17.759 --> 00:10:21.259
head. But if you don't present those correctly,

00:10:21.519 --> 00:10:24.279
what's the use? You're going into a client demo.

00:10:24.860 --> 00:10:29.620
Your ability to explain what you did, how you

00:10:29.620 --> 00:10:33.100
did it, and why it matters is really everything.

00:10:33.279 --> 00:10:36.100
That's how you can close deals if you're not

00:10:36.100 --> 00:10:39.659
involved in that. How do you satisfy your client?

00:10:40.519 --> 00:10:42.360
Because if you're not able to explain what you

00:10:42.360 --> 00:10:46.799
did, they're not going to be happy with you slash

00:10:46.799 --> 00:10:53.019
resign with you, right? And this app is about

00:10:53.019 --> 00:10:57.320
reducing that friction. It gives you a strong

00:10:57.320 --> 00:11:00.419
foundation to build from, like a head start.

00:11:01.480 --> 00:11:05.919
And even more, it brings storytelling into your

00:11:05.919 --> 00:11:10.629
workflow. It encourages you to think about the

00:11:10.629 --> 00:11:13.970
narrative while you're still in the Jupyter Notebook.

00:11:15.429 --> 00:11:19.210
That's great, right? So let's look at some real

00:11:19.210 --> 00:11:21.669
-world use cases of how you can end up using

00:11:21.669 --> 00:11:26.769
this. So let's just say you're a student who's

00:11:26.769 --> 00:11:29.070
working on a Kaggle project. So this is something

00:11:29.070 --> 00:11:30.889
I mentioned at the beginning. Maybe you're just

00:11:30.889 --> 00:11:33.929
starting out in your journey, trying to figure

00:11:33.929 --> 00:11:35.789
out what to do, right? And you're a student.

00:11:36.779 --> 00:11:39.000
finding some cool data sets to work with so Kaggle

00:11:39.000 --> 00:11:41.899
data set you get I mean you use this Kaggle data

00:11:41.899 --> 00:11:45.419
set to analyze what you're trying to do or analyze

00:11:45.419 --> 00:11:51.679
the data set and you produce some results but

00:11:51.679 --> 00:11:54.940
now you are able to turn your notebook into a

00:11:54.940 --> 00:12:00.980
job interview ready presentation and another

00:12:00.980 --> 00:12:05.360
use case could be a data analyst who can prepare

00:12:05.360 --> 00:12:11.240
stakeholder presentations faster without copy

00:12:11.240 --> 00:12:16.379
pasting plots all day right and a machine learning

00:12:16.379 --> 00:12:22.679
engineer can generate internal tech talks explaining

00:12:22.679 --> 00:12:27.279
their model pipeline using real diagrams that

00:12:27.279 --> 00:12:32.500
you get out of this so those are some real examples

00:12:32.500 --> 00:12:36.610
which i mean real world examples which are applicable

00:12:36.610 --> 00:12:39.750
for this right like this is how it can be used

00:12:39.750 --> 00:12:45.830
in the real world um so like i already i already

00:12:45.830 --> 00:12:47.809
gave you like an indication of where i want to

00:12:47.809 --> 00:12:49.950
take this app so i just want to talk about that

00:12:49.950 --> 00:12:53.889
what are the future plans for this so i am not

00:12:53.889 --> 00:12:56.009
stopping here and i hope you are you are not

00:12:56.009 --> 00:12:58.950
as well i hope you're trying to use this with

00:12:58.950 --> 00:13:03.340
me um a link to The blog I've already written,

00:13:03.460 --> 00:13:06.639
which has the full code, will be pasted in the

00:13:06.639 --> 00:13:08.840
show notes. And here's what I'm building next.

00:13:09.419 --> 00:13:12.779
Support for more diagram types like mind maps,

00:13:13.039 --> 00:13:16.360
Venn diagrams, org charts, organization charts,

00:13:16.480 --> 00:13:19.940
right? And connecting with napkin .ai or diagram

00:13:19.940 --> 00:13:25.299
GPT for more rich illustrations. Integrating

00:13:25.299 --> 00:13:28.340
with Google Slides so you don't even need PowerPoint.

00:13:30.720 --> 00:13:35.019
editable slide themes layout presets presets

00:13:35.019 --> 00:13:41.960
also trying to add voice narration generation

00:13:41.960 --> 00:13:47.039
for each slide it's like a voiceover and if you're

00:13:47.039 --> 00:13:49.179
listening and want to contribute let's contribute

00:13:49.179 --> 00:13:52.879
let's collaborate right like this project is

00:13:52.879 --> 00:13:54.919
open source i mean you'll have access to this

00:13:54.919 --> 00:13:59.159
um the code the code will be very easy for you

00:13:59.159 --> 00:14:01.100
and if you were able to build something cool

00:14:01.100 --> 00:14:04.759
with it which i wasn't able to just let me know

00:14:04.759 --> 00:14:08.879
it'd be great for us to collaborate um you know

00:14:08.879 --> 00:14:11.259
this is i'm really excited for where this is

00:14:11.259 --> 00:14:14.139
going and there's a lot of interesting opportunities

00:14:14.139 --> 00:14:21.600
here so to wrap it up this isn't just about automating

00:14:21.600 --> 00:14:27.980
slide creation this is about making it easier

00:14:27.980 --> 00:14:34.000
for all of us to share our value to share the

00:14:34.000 --> 00:14:39.139
value of the work that we're doing so if you're

00:14:39.139 --> 00:14:43.639
a data scientist a data analyst an ml engineer

00:14:43.639 --> 00:14:48.840
or even a student just starting out the ai space

00:14:48.840 --> 00:14:52.539
right if you've ever felt like you know you're

00:14:52.539 --> 00:14:56.559
stuck here explaining your project this tool

00:14:56.559 --> 00:15:00.820
is for you and you can find the code like I mentioned

00:15:00.820 --> 00:15:04.080
in the blog post which is which will be in the

00:15:04.080 --> 00:15:09.500
show notes the streamlit app screenshots you

00:15:09.500 --> 00:15:12.519
will see in that blog post as well and there's

00:15:12.519 --> 00:15:14.399
a full walkthrough in my in that blog post so

00:15:14.399 --> 00:15:18.679
I'd love to hear how you use it or what features

00:15:18.679 --> 00:15:21.320
you want next so just feel free to comment on

00:15:21.320 --> 00:15:27.620
that you can reach me on uh the same blog post

00:15:27.620 --> 00:15:29.500
it'll be like a medium blog post if you don't

00:15:29.500 --> 00:15:32.820
have a medium account you'll also uh get this

00:15:32.820 --> 00:15:36.440
on substack substack is free if you don't need

00:15:36.440 --> 00:15:39.200
a paid membership for that um thanks for tuning

00:15:39.200 --> 00:15:42.539
in and until next time just keep building keep

00:15:42.539 --> 00:15:44.759
sharing and let your work speak
