WEBVTT

00:00:00.000 --> 00:00:02.720
Hey, I'm Mukundan. This show is about solving

00:00:02.720 --> 00:00:06.080
real problems with small, useful AI, things that

00:00:06.080 --> 00:00:09.220
you can actually use today. Each week, we pick

00:00:09.220 --> 00:00:12.279
one problem, build a simple workflow or tool,

00:00:12.400 --> 00:00:14.820
and talk through the decisions, what to automate,

00:00:15.060 --> 00:00:18.359
how to check quality, and how to make it reliable.

00:00:19.379 --> 00:00:22.760
If you're a builder, creator, or just AI curious,

00:00:23.120 --> 00:00:25.899
you'll leave with the steps that you can copy

00:00:25.899 --> 00:00:46.350
tonight. You know those quiet moments before

00:00:46.350 --> 00:00:49.210
a project begins, when the cursor blinks on a

00:00:49.210 --> 00:00:52.329
blank screen and you just sit there, waiting

00:00:52.329 --> 00:00:56.429
for the spark? That's how this all started. No

00:00:56.429 --> 00:01:00.350
big idea. No vision board. Just curiosity and

00:01:00.350 --> 00:01:04.250
a little frustration. I was looking at a dataset

00:01:04.250 --> 00:01:08.590
at work. A big one. Customer transactions across

00:01:08.590 --> 00:01:12.709
multiple months. And I realized something that...

00:01:12.810 --> 00:01:15.930
stopped me in my tracks. I didn't know what to

00:01:15.930 --> 00:01:20.370
ask. Sure, I could slice it by week, by product,

00:01:20.450 --> 00:01:25.810
by region as well. But that wasn't insight. That

00:01:25.810 --> 00:01:32.730
was just navigation. And it hit me. If I, someone

00:01:32.730 --> 00:01:37.230
who lives and breathes analytics, sometimes struggle

00:01:37.230 --> 00:01:40.049
to know where to begin, what chance do others

00:01:40.049 --> 00:01:41.870
have? Like people on the business side maybe,

00:01:41.989 --> 00:01:45.849
right? And that moment became the seed of an

00:01:45.849 --> 00:01:50.290
idea. Could I build an AI that thinks like an

00:01:50.290 --> 00:01:54.090
analyst? Not one that spits out charts, but one

00:01:54.090 --> 00:01:56.269
that asks better questions. Hey everyone, I'm

00:01:56.269 --> 00:01:59.250
Mukundan. And this is Data and AI with Mukundan.

00:01:59.790 --> 00:02:02.810
This show is about learning AI by building it.

00:02:03.349 --> 00:02:06.829
No jargon, no fluff, just honest stories, practical

00:02:06.829 --> 00:02:09.789
tools, and a bit of curiosity along the way.

00:02:11.340 --> 00:02:14.780
Today's story is special. It's the story of a

00:02:14.780 --> 00:02:19.180
tool that started as a side project and turned

00:02:19.180 --> 00:02:22.000
into something that taught me more about thinking

00:02:22.000 --> 00:02:25.479
than any analytics course could ever do that.

00:02:25.819 --> 00:02:29.080
And it's the story of the AI that thinks like

00:02:29.080 --> 00:02:35.099
an analyst and how I, by building it, changed

00:02:35.099 --> 00:02:39.719
how I see curiosity, data, and even... creativity

00:02:39.719 --> 00:02:43.539
itself. Let's look at the problem behind the

00:02:43.539 --> 00:02:46.620
problem. Here's what most people don't talk about.

00:02:47.120 --> 00:02:50.740
The hardest part about data analysis isn't SQL,

00:02:51.020 --> 00:02:55.699
Python, or data visualization. It's asking the

00:02:55.699 --> 00:02:58.879
right question. You can have all the best tools

00:02:58.879 --> 00:03:01.379
in the world, the Tableaus and the Looker, Power

00:03:01.379 --> 00:03:05.060
BI, whatever, right? But if you're solving the

00:03:05.060 --> 00:03:08.849
wrong problem, you're just wasting time. And

00:03:08.849 --> 00:03:13.050
I found myself doing that quite a bit. I'd jump

00:03:13.050 --> 00:03:16.530
into cleaning, filtering, building charts. And

00:03:16.530 --> 00:03:19.250
only halfway through, I'd realize that my question

00:03:19.250 --> 00:03:23.830
wasn't the one that mattered. That realization

00:03:23.830 --> 00:03:28.969
sent me down a big rabbit hole, right? And I

00:03:28.969 --> 00:03:33.129
started reading about exploratory data analysis.

00:03:33.789 --> 00:03:36.310
Asking friends in data science how they would

00:03:36.310 --> 00:03:39.509
approach it. The best analysts I knew weren't

00:03:39.509 --> 00:03:42.949
the fastest coders. They were the ones who would

00:03:42.949 --> 00:03:46.629
pause before touching the data. They'd say things

00:03:46.629 --> 00:03:49.870
like, let's define what success means first.

00:03:50.430 --> 00:03:53.150
Or what decision are we actually trying to make

00:03:53.150 --> 00:03:57.509
here? And that was it. That's what I wanted to

00:03:57.509 --> 00:04:02.189
teach my AI as well. Not to answer, but to ask.

00:04:03.439 --> 00:04:07.639
So I did what, you know, I would, I think most

00:04:07.639 --> 00:04:11.659
people in data would do. I opened my laptop and

00:04:11.659 --> 00:04:14.060
I mean, my most people, I mean, the people who

00:04:14.060 --> 00:04:18.019
would know how to use Python and more specifically

00:04:18.019 --> 00:04:22.740
Streamlit, which is like a front end streaming,

00:04:22.920 --> 00:04:26.180
front end web based library, which would help

00:04:26.180 --> 00:04:32.879
you to create these amazing. and short web applications,

00:04:33.040 --> 00:04:35.500
which is very easy to make. So I opened my laptop

00:04:35.500 --> 00:04:41.720
in this instance, started my streamlet, and wrote

00:04:41.720 --> 00:04:44.100
that first line of code. So when you build something

00:04:44.100 --> 00:04:47.319
like this, it's tempting to focus on the shiny

00:04:47.319 --> 00:04:52.680
part. The UI, the charts, the GPT output. But

00:04:52.680 --> 00:04:57.100
I wanted to focus on the experience. The first

00:04:57.100 --> 00:05:01.439
prototype that I had here was rough. You'd upload

00:05:01.439 --> 00:05:05.279
a CSV file and the app would use GPT -4 to scan

00:05:05.279 --> 00:05:08.139
the column names and generate about 10 questions

00:05:08.139 --> 00:05:12.279
about the data. At first, it was basic. What's

00:05:12.279 --> 00:05:14.699
the average? What's the trend? Nothing exciting.

00:05:15.079 --> 00:05:18.899
But then I started adding prompts that would

00:05:18.899 --> 00:05:24.459
mirror a human curiosity. Things like, which

00:05:24.459 --> 00:05:28.439
variable seems most unpredictable? If this were

00:05:28.439 --> 00:05:31.560
your business, what metric would you track weekly?

00:05:32.399 --> 00:05:37.740
Where might hidden seasonality exist? And suddenly

00:05:37.740 --> 00:05:41.139
the tone shifted. It didn't feel like a chatbot

00:05:41.139 --> 00:05:45.019
anymore. It felt like a mentor. The kind of curious

00:05:45.019 --> 00:05:48.379
teammate who nudges you towards better thinking.

00:05:49.160 --> 00:05:53.339
And one person who actually tested this said

00:05:53.339 --> 00:05:55.360
that it's like the app that is teaching me how

00:05:55.360 --> 00:05:59.410
to explore, not just analyze. That's when I knew

00:05:59.410 --> 00:06:01.310
that I was onto something. Let's look at that.

00:06:01.430 --> 00:06:05.389
You know, technically speaking, here's what happens

00:06:05.389 --> 00:06:09.069
really under the hood. The app reads the metadata,

00:06:09.470 --> 00:06:14.089
column names, types, maybe a small preview of

00:06:14.089 --> 00:06:18.329
values. Now, instead of sending the entire data

00:06:18.329 --> 00:06:22.829
set, which could expose the PII, it summarizes

00:06:22.829 --> 00:06:25.610
the schema, converts it into a structured context.

00:06:26.189 --> 00:06:30.310
and passes that to GPT -4. Then GPT crafts 10

00:06:30.310 --> 00:06:34.089
thoughtful domain -agnostic questions, tuned

00:06:34.089 --> 00:06:39.170
for exploration. Not answers, just sparks. Later

00:06:39.170 --> 00:06:43.250
versions added an objective field, a little text

00:06:43.250 --> 00:06:45.829
box where you could say, I'm exploring customer

00:06:45.829 --> 00:06:49.930
churn, or I want to find high -value users. The

00:06:49.930 --> 00:06:52.930
AI would then adapt its curiosity to match your

00:06:52.930 --> 00:06:58.149
intent. That was version 2. But by version 4,

00:06:58.310 --> 00:07:01.649
I was obsessed with more security. I tend to

00:07:01.649 --> 00:07:04.649
want people uploading data at all. So I built

00:07:04.649 --> 00:07:07.509
a no -upload version as well, wherein you paste

00:07:07.509 --> 00:07:10.930
the schema or sample rows locally. That converts

00:07:10.930 --> 00:07:14.189
it into embeddings and sends only abstracted

00:07:14.189 --> 00:07:18.069
info. Here's what, you're not having any PII

00:07:18.069 --> 00:07:21.029
at this point. No data leaks, no corporate nightmares.

00:07:21.709 --> 00:07:24.939
Because here's what can happen, right? You are

00:07:24.939 --> 00:07:29.959
feeding the AI some kind of sensitive information

00:07:29.959 --> 00:07:32.959
these days. And you may not realize that it may

00:07:32.959 --> 00:07:36.879
be more of a negligence case. You know, you're

00:07:36.879 --> 00:07:39.259
trying to paste something in ChartGPT or Cloud

00:07:39.259 --> 00:07:41.920
or whatever. And this is not an internal company

00:07:41.920 --> 00:07:44.699
tool. This could be something that you just open

00:07:44.699 --> 00:07:47.540
your regular ChartGPT and type in over there.

00:07:48.339 --> 00:07:50.879
But what you may not realize is sometimes you're

00:07:50.879 --> 00:07:54.790
training their models. And by providing the data,

00:07:54.889 --> 00:07:56.529
like let's just say you're working for a company

00:07:56.529 --> 00:07:58.850
and the company name appears over there in the

00:07:58.850 --> 00:08:01.209
data set. Yeah, you're probably trading the data

00:08:01.209 --> 00:08:05.529
for sure. And when you're doing that, it can

00:08:05.529 --> 00:08:08.189
be a problem for the company. The company would

00:08:08.189 --> 00:08:10.810
be looking at it very differently. Maybe they

00:08:10.810 --> 00:08:14.949
find you doing that and there could be consequences,

00:08:15.069 --> 00:08:19.149
which I don't think, I don't know if most companies

00:08:19.149 --> 00:08:22.170
have gotten to that stage yet, but. I feel like

00:08:22.170 --> 00:08:29.290
it might at some point. So the whole point is

00:08:29.290 --> 00:08:33.450
to avoid that kind of scenario. So you're just

00:08:33.450 --> 00:08:36.730
sending out a few samples and remove those PII's.

00:08:37.409 --> 00:08:41.029
And this is why it hit me that this isn't just

00:08:41.029 --> 00:08:44.710
like a coding challenge. It's about designing

00:08:44.710 --> 00:08:49.710
trust. Now, let's look at when AI became a mirror

00:08:49.710 --> 00:08:53.659
to me. Something strange started happening as

00:08:53.659 --> 00:08:57.000
I started testing it. Every time the AI asked

00:08:57.000 --> 00:09:00.460
a question, I felt this urge to defend my assumptions.

00:09:01.200 --> 00:09:03.740
It would say something like, why do you measure

00:09:03.740 --> 00:09:07.559
engagement that way? And I'd catch myself thinking,

00:09:07.700 --> 00:09:10.700
because that's how we've always done it. And

00:09:10.700 --> 00:09:14.980
that right there was the point. The app was making

00:09:14.980 --> 00:09:18.480
me reflect on my own thinking. It wasn't telling

00:09:18.480 --> 00:09:22.639
me that I was wrong. It was making me curious

00:09:22.639 --> 00:09:27.259
again. You know how as kids we ask why about

00:09:27.259 --> 00:09:30.240
everything? Why is the sky blue? Why does 2 plus

00:09:30.240 --> 00:09:34.559
2 make 4? Why can't I eat dessert first? Somewhere

00:09:34.559 --> 00:09:39.340
in adulthood, we lose that instinct. We replace

00:09:39.340 --> 00:09:43.600
curiosity with correctness. And this project

00:09:43.600 --> 00:09:49.809
reawakened that childlike curiosity in me. In

00:09:49.809 --> 00:09:52.370
a world of dashboards and KPIs, that's quite

00:09:52.370 --> 00:09:55.049
rare. The biggest lesson this AI taught me wasn't

00:09:55.049 --> 00:09:59.490
technical. It was philosophical. Curiosity isn't

00:09:59.490 --> 00:10:02.009
something you have. It's something you practice.

00:10:02.649 --> 00:10:05.850
Every time you face a new problem, curiosity

00:10:05.850 --> 00:10:09.190
is the courage to say, I don't know that yet.

00:10:09.490 --> 00:10:14.269
Let's find out. Now, in data analysis, this means

00:10:14.269 --> 00:10:18.559
resisting the urge to rush to a conclusion. It

00:10:18.559 --> 00:10:21.600
means sitting with uncertainty long enough to

00:10:21.600 --> 00:10:23.940
understand what the data is actually trying to

00:10:23.940 --> 00:10:27.460
tell you. And I started applying this outside

00:10:27.460 --> 00:10:31.799
of work too. Instead of reacting, I started asking

00:10:31.799 --> 00:10:35.220
questions in daily life. When something frustrated

00:10:35.220 --> 00:10:40.840
me, I'd ask, what else could this mean? Now,

00:10:40.860 --> 00:10:44.000
when I got feedback, I'd ask, what can I learn

00:10:44.000 --> 00:10:49.370
from this? That same mindset. Help me make my

00:10:49.370 --> 00:10:54.110
AI better. We often talk about teaching AI to

00:10:54.110 --> 00:10:57.370
think like humans. But what if we flipped it?

00:10:57.669 --> 00:11:02.190
What if the goal is to help humans think a little

00:11:02.190 --> 00:11:07.669
more like AI, patiently, systematically, with

00:11:07.669 --> 00:11:13.549
curiosity, but also humility? The line between

00:11:13.549 --> 00:11:16.919
human and machine curiosity is... thinner than

00:11:16.919 --> 00:11:20.679
it seems. Now, AI asks questions without ego.

00:11:21.600 --> 00:11:26.159
Humans sometimes, well, they don't. And maybe

00:11:26.159 --> 00:11:29.299
that's the balance we need in the age of intelligent

00:11:29.299 --> 00:11:33.220
systems. Not machines that replace us, but ones

00:11:33.220 --> 00:11:37.480
that remind us to stay curious. Now, when I built

00:11:37.480 --> 00:11:40.139
this project, I thought it was a data tool. But

00:11:40.139 --> 00:11:43.889
looking back, it was a reflection tool. It showed

00:11:43.889 --> 00:11:46.549
me that asking better questions isn't just the

00:11:46.549 --> 00:11:49.850
key to better data. It's the key to better decisions,

00:11:50.090 --> 00:11:52.789
better creativity, and maybe even better living.

00:11:52.950 --> 00:11:54.870
If there's one thing I want you to take away

00:11:54.870 --> 00:11:58.330
from this episode, it's this. Don't just build

00:11:58.330 --> 00:12:02.289
tools that save time. Build tools that make you

00:12:02.289 --> 00:12:07.269
curious again. Because curiosity is the hidden

00:12:07.269 --> 00:12:11.029
engine of every great analysis, every insight.

00:12:11.759 --> 00:12:15.720
and every invention. And if you ever feel stuck

00:12:15.720 --> 00:12:19.460
with data, with work, or with life, maybe what

00:12:19.460 --> 00:12:23.399
you need isn't another answer. Maybe what you

00:12:23.399 --> 00:12:27.759
need is a better question. Well, thanks for listening.

00:12:27.919 --> 00:12:30.519
You can try the AI app yourself. It's linked

00:12:30.519 --> 00:12:34.700
in the show notes. And if you linked... Well,

00:12:34.799 --> 00:12:37.179
I mean, if you like this show, if you like this

00:12:37.179 --> 00:12:39.960
story... share it with someone who's forgotten

00:12:39.960 --> 00:12:42.179
what curiosity feels like before you go i want

00:12:42.179 --> 00:12:44.320
to take this time to thank my affiliate partners

00:12:44.320 --> 00:12:48.019
riverside fm which i use for recording my podcast

00:12:48.019 --> 00:12:51.799
with its amazing ai editing features where i

00:12:51.799 --> 00:12:55.080
can edit my podcast and remove any kind of cider

00:12:55.080 --> 00:12:58.440
.ai for any research i wanted to do with the

00:12:58.440 --> 00:13:02.019
topics i want to talk about and of course my

00:13:02.019 --> 00:13:06.639
podcast hosting platform rss where i host my

00:13:06.639 --> 00:13:09.340
podcast and distribute it to multiple channels

00:13:09.340 --> 00:13:13.100
and using their dynamic ads feature anybody can

00:13:13.100 --> 00:13:15.620
get paid if you just have 10 downloads a month

00:13:15.620 --> 00:13:17.919
so just 10 downloads a month is all you need

00:13:17.919 --> 00:13:23.759
to start making money and well that's it from

00:13:23.759 --> 00:13:26.379
this episode i'm mukundan and this is data and

00:13:26.379 --> 00:13:28.980
ai with mukundan see you next hey it's mukundan

00:13:28.980 --> 00:13:31.679
if this episode helped you two tiny favors that

00:13:31.679 --> 00:13:34.360
make a huge difference rate the show five stars

00:13:34.990 --> 00:13:37.769
On Spotify, you can just open the show page and

00:13:37.769 --> 00:13:40.350
tap the star button. On Apple Podcasts, scroll

00:13:40.350 --> 00:13:42.690
to the bottom of the show page and tap 5 stars.

00:13:42.889 --> 00:13:45.850
Also, leave a one -to -line review on Apple Podcasts.

00:13:46.769 --> 00:13:49.309
Tell me one takeaway. I read every single one

00:13:49.309 --> 00:13:51.009
and it helps more people find the show.
