00:04
Hello and welcome to Learning from Machine Learning. On this episode, we have a very special guest, Ines Montani, the co-founder and creator of Explosion, the company behind Spacey, which is an open source library that every natural language processing practitioner should know. She's been the keynote speaker at many different Python and data science conferences around the world.

00:30
Overall, just a rock star and inspiration in the field. Ines, thank you so much for joining me. Yeah, thanks. Thanks for having me. I'll start off with this. Did you ever expect to be the CEO of a software company like Explosion? No. I had no idea what I was ever gonna be. I think even when I was younger, I kind of had no real concept of what it would actually be like to have like a job.

01:00
Um, and yeah, definitely if you told me that this is what I'll be doing, I, yeah, I would have been very confused. Um, so what initially attracted you to computer science, machine learning, explosion, all of this. Yeah, I think it's kind of, yeah, the combination of computers and in my case, especially language and, um, teaching computers about language. So I've always programmed. I started out as a teenager.

01:30
on the internet, making websites. Like I found out that Microsoft Word let you make websites and then you could put these on the internet. And there was some like web space included with our internet contract. And so I uploaded a website that was super exciting. And there was this whole online community of girls making websites, doing design that I was super active in. So I was mostly like an indoor teenager. That was my main hobby.

01:59
Um, and, um, yeah, so I didn't actually then go into computer science. I don't really have the classical tech background because I don't know, at the time I didn't really see myself as a programmer. Like the only frame of reference I had was like, I don't know, the guys in, I don't know, computer class, and I really didn't see myself as part of that. Um, so I did communication science, media science and linguistics because language was also something I was always into.

02:27
Um, I did Latin in school for like a really long time because people told me like, Hey, it's going to help you when you learn languages later on, which then I never really did. I just stayed with English and Latin, um, till the bitter end. So I didn't have to do, I think chemistry or physics. Um, I, you know, preferred Latin and yeah, so that's kind of, um, yeah, that's kind of my background. And then in NLP, I found a way to really combine.

02:57
all of the stuff I'm passionate about, um, and, you know, really making something useful for people. And, um, yeah, that's kind of how it started. When I met my co-founder, Matt, he had just been writing spacey and was kind of, you know, right in the middle of it. And I found that really exciting. I learned that like, well, actually computers don't learn grammar. It's all statistical, which made it even more interesting in some other way. And there was like so much to innovate.

03:24
at the time, it was very early, most software was really just written for research and there was really this gap of, you know, a piece of software tool that was really designed for practical use with like visualizations, good documentation and really, you know, a practical design. And that's kind of what we ended up doing with Spacey. Very cool. That's really exciting. Two fun facts.

03:52
When I was in college, I did engineering. And one of the reasons that I stuck with engineering was because I didn't need to take a language. I ended up taking more computer languages, which sort of count. And back, I don't even remember when, I guess starting middle school, they, you have to choose a language in the education system. So they would say like Spanish, French or Latin. I definitely did not sign up for Latin.

04:20
And my teacher pulled me over to the side and was like, I think it would be a good idea if you take, if you take Latin. Yeah, it's kind of a special type of person. Like I think there's something nerdy to Latin as well. Like them, them, them, them, them, them, people I meet in our fields, who, yeah, did Latin for a very long time. Yeah, absolutely. I think.

04:49
Well, it helps you sort of break down words. It helps you, you know, understand, you know, yeah, I guess where were some of these where some of the words come from. And then I found there's just like a lot of memorization. When I when I was doing Latin. Yeah, yeah, we also we would pretend to speak Latin to each other and like mess with the other kids who are doing French and like didn't didn't get it and really thought we actually learned to speak Latin in Latin class.

05:19
Yes. It gives you the occasional Roman saying that you could throw around. But yeah, a dead language they say, no one's speaking it today, but it is the basics for a lot of the romance languages, which is very nice. In the position that you were in, you were able to sort of, when you were probably developing websites back.

05:46
when you were younger, it was like HTML and CSS. That's what it was for me. And then we've seen sort of the evolution of the web into what it is today, which is using JavaScript frameworks and all of these things where a lot of the control for some of the aspects are maybe, you're kind of giving it up for some of like, things are more.

06:12
automated, if you know what I mean. But if you could speak a little bit to the changes that you've seen in, in, in web development. Yeah. I mean, back in a day, it was definitely, you could just look at the source code and, you know, really see how everything works and that's kind of, you know, not something you get easily get these days. Um, and, you know, there is a lot more complexity around it, but it also is solving more problems like, um, you know, we kind of come

06:37
full circle with like static site generation. Like our sites became so complex and slow and that we now generate them statically. And it kind of reminds me of, yeah, back in the day I had this blog software called Grey Matter. I don't know if anyone remembers this. Maybe people who listen to this do. And it would basically a CGI script that would then generate static sites. So it was like a super old school static site generator. And it's kind of, you know, I also use the static site generator now for my website but it's all.

07:06
different. But actually, I think what you could really see a lot of parallels between like the journey of the web and how things have developed there and machine learning. Because, you know, it's the same, you know, considerations about like lowering the floor versus raising the ceiling. It's like easier than ever to start a website like your grand mark and just do it using Squarespace or

07:36
you know, really easy to get started, but at the same time, there's a huge demand for web developers and there's constantly, um, you know, new developments, the technology is changing even, you know, browsers and all of that. So, um, you know, just because it's easy to make a website now doesn't mean web developers are obsolete. And similarly, just because it's easier and easier to do machine learning now.

08:01
doesn't mean we don't need machine learning developers anymore or developers in general. If anything, we need more of them because there's just so much more to do. Absolutely. I can definitely relate to that. Seeing the current status of machine learning and people are pushing auto ML and it's easy to create a model. You know what I mean? It's hard to get it into production, but

08:30
If you're using, well, I should say harder, but if you're using something like AutoML and everything goes smoothly, then you're all set, right? But if once you do run into a problem and you will eventually run into a problem because context changes, right? The uses of the model change over time. That's when you really need machine learning practitioners who have a deep understanding.

08:57
of the field and what's going on. You need to know. And also what they're doing like with machine learning because often, you know, you don't just do machine learning for the sake of machine learning. That's like a deeper thing you want to solve. And then it's like, you know, the hard part is in trying to, you know, breaking down that business problem into something you can actually solve and really understanding the domain or like, yes, we all really want to neatly put the world into categories, but it just doesn't work like that. So I feel like, you know, by the time you get to the point

09:27
press train, most of the parts are already done. Right. The heavy lifting is done when you're up to the training part, right? Cause well, you assume that you already have your data set. Yeah. And then like nothing, you know, nothing works and your model isn't learning. And then you have to kind of, yeah, go back and figure out why. Yeah. Or the scariest thing you start, you get very high performance metrics, like too high.

09:54
Yeah, yeah, it's like our 99% accuracy, you're like, oh shit, something's broken. That's something. Yeah. Or it's like, hey, if your evaluation is shit, like, cool. Yeah, you can always get like really good results. Yeah. Yeah, absolutely. So, you know, speaking of, you know, solving a business problem and applying natural language processing to, you know, businesses, that's that's, you know, spacey and explosion written all over it.

10:22
Me personally, I've used spaCy when I was really getting into natural language processing, say four or five years ago, I was watching one of your videos. And I was using displacy, which is one of my favorite tools still. I think it's still incredible. Yeah, I had a lot of fun building that. And yeah. Yeah. Being able to visualize what you're doing.

10:48
The nice thing about it was that it was easy to explain it to other stakeholders, which was really cool. Cause there's so much that we're that, you know, that, that we do in machine learning, it's kind of hard to explain it to other people, but I think that tool and, you know, other, other things in the space of universe, like you can sort of bring people in better, I think prodigy, I mean, even though I guess it's to, it's really created for developers. If you want to be labeling different things, you can show them.

11:16
you know, what you can show them the process of how of how it's done. I remember vividly the first time that I showed displacy with named entity recognition to somebody who is doing media and things like that his jaw dropped. He's like, this is exactly what what I want to do. We need to be extracting all this information. So in a way, I guess, thank you. This is it's so nice. Like, it's always, of course, I know that like, hey,

11:46
People are using our software and I think they find it valuable, but it always kind of hits differently, like to actually talk to someone who's like, Oh, using your software. And I'm like, Oh my God, you use my code. That's so cool. And also if you do open source, like a lot of what you see is mostly on the issue tracker when people have problems and nobody goes on the issue tracker and just says like, Hey, everything worked out. No problem. Like, you know, waste the point. It also clogs up an issue tracker. I'm not saying people should be doing this, but of course that means, yeah, you only get like.

12:15
you know, some sort of view of what people find difficult. And yeah, so it's really nice to hear from people who are like, yeah, it worked. Yeah, there's a new feature for GitHub. There should, there should be a new one, right? Where you can give some claps or some positive feedback for the amazing work that people are doing in open source. And when you think about a, um, a library like spacey, so

12:41
there wasn't really like that many libraries that were out there that you could really use for production. I mean, I'm not even gonna get started on NLTK. Like, I mean, it just had a very different purpose. Like this was a good example of a library that was, you know, designed for research, teaching, you have a lot of different implementations and algorithms, and it's kind of just a very different focus. But, but yes, basically, very consciously, kind of went the other way. Yeah, took a very opinionated approach.

13:09
Yeah, which is great for real world. So I've been able to use spaCy for everything from tokenization and lemmatization, breaking things down by sentences with the beautiful name SenteCyzer. Yeah, we always have a lot of fun naming our components. Yes, no stemming, which I like that actually.

13:39
Yeah, all of the amazing new things that you guys have developed over the years. I mean, Span Cat is incredible. Um, and now with Spacey LLM, um, it's, it's really just amazing to be able to have a pipeline, um, that you can depend on, that you can extract information that you need, that you can, you know, Process text, you know, any type of text in a way where you can extract meaningful.

14:08
information from it. So it's been very valuable for the work that we do at my company and the work that I do. Yeah, that's nice to hear. But yeah, I mean, these pipeline approach, I do think that's something, you know, that kind of makes Spacey special that Spacey has always focused on and this idea of like, hey, often what you're trying to do with a series of steps that you want to apply in order and these steps can be powered by pretty much anything you might have.

14:34
something really simple and rule-based from good old regular expressions or spaCy's token rules where you can really match on different predictions by a model or you might want to have like a regular deep learning model powering the component or nowadays also a large language model by a spaCy LLM that basically behaves exactly like the regular named entity recognizer and lets you get started quickly. And it really

15:02
kind of solves that problem of getting to a prototype stage because before that, hey, it's still needed like, yeah, serious annotation effort to get to something that might not even work that you can improve. And so mixing and matching these different components and techniques I think is really important for practical use cases because yeah, a lot of them don't very neatly turn into like an end-to-end prediction problem that you can just solve. It's often.

15:29
It's often always a combination including like really specific business logic. And we think that that's where the pipeline can really shine. Absolutely. I mean, like you were saying before, it always stems from the business understanding and understanding like what problem are you trying to solve? I think, and we'll get into it later, but like people are distracted a bit by like the shiny new object, right? But when it comes down to solving problems,

15:58
heuristics and regular expressions and rules. Yeah, they might not, you know, solve everything. And they might not be the most sophisticated and the newest thing out there. But it'll give you a really good start. And having that as part of your system is, is, is really the baseline. Like, you want to have, you know, you want to know what you have to beat like, you know, yes, you can train a fancy machine learning model. But if it doesn't perform better than some like dictionary lookups, then

16:26
what are you really doing? And I think also the ability to teach by example, and supervised learning, people often see the need for annotated examples as this obstacle that like, oh, we need to avoid at all costs when actually for a lot of use cases, especially practical ones where you know exactly what you want the computer to do. It's like an opportunity. Like, we have this new way of showing the computer what we wanted to do by showing it examples where we know the answer. And that's like,

16:55
really cool and that's really effective for a lot of ways. If you have a good feeling for how to create these examples, this is still for a lot of use cases, like the absolute best, most straightforward way to get something done. Yeah, absolutely. I think sometimes people wanna find like a trick, like some trick to do it. I don't know, like.

17:24
use unsupervised learning or something like, but like just spend a week, you know, labeling data sometimes. And then you can do a really good job of creating a model, at least like same, like get the data and then use basic methods, get your baseline and see where you're at. You know, a scikit-learn, you know, support vector machine or, you know, one of the simpler, more traditional models, it's

17:52
has its own pipeline, which you can get into production much faster. And you don't have to be reliant on either an extremely large model, which has its own trade-offs or an API call, which, you know, you never, you kind of want to reduce API calls. Yeah. And also if you're really only interested in like a really small, specific thing, not even small, but like a specific thing, there's like so much, so much that you don't need of like a really massive model. And so I think.

18:22
the kind of the trick here is to see, hey, how can we kind of combine the best of both worlds and take what's there in these very large models and their representation and what they know about the world and the language and use that to then train a model that's more specific that knows a lot more about what you're trying to do. And a bit less about the language and the world. Yeah, absolutely. So having you know, someone who's

18:51
you know, co-founder of spaCy and now, you know, spaCy v version three was released, say, about two years ago. How have the goals, if the goals have changed, or how the how's the design changed? How have you seen like the expansion and evolution of spaCy?

19:21
a given component that's like optimized for real world usage. So it needs to be accurate enough. It needs to be fast enough. Um, it needs to run on all kinds of Python versions and platforms and just have a, um, you know, unified API that's like easy to program and work with. So that's, that's always been the core philosophy. And we've kind of kept that up over the evolution of machine learning, like from linear models that were like super fast.

19:49
to deep learning, different types of model implementations, and now also large language models where you kind of have to do the same work in a way. Like you have to look at what's there in research, what works, how can we improve these methods? And before we did that for model implementations, and now we're doing that for prompts and really optimize those for information extraction. And like, what do we need to do named entity recognition?

20:15
because even though, you know, if you just scroll through your LinkedIn feed, you might think like, oh, NLP is solved. Like they're actually a lot of, you know, a lot of things that even we haven't quite figured out how to get that information out of even the largest, large language models. So that's kind of always been what we've been doing. But of course, you know, library has also changed over the years. Like we've improved a lot of things. We also implemented new components that kind of address different needs people have.

20:45
people really want to extract spans from text and like arbitrary spans and a named entity recognizer that's like designed to be really sensitive to boundaries just isn't the right fit for that because it works so well because it's so sensitive to boundaries, which is great for person names, but not for arbitrary spans. So we built the span cat component. We also worked on other things like coreference resolution, which actually solves a lot of

21:13
in extracting relations from texts. Like you want to know which pronouns refer to, like which person and have these create these clusters of like, this is kind of the same person or object. So stuff like that. And I think also with like the growing complexity of machine learning, like one trade off, one challenge we've always faced is like on the one hand, we wanted to be really easy to get started and have.

21:40
defaults that make sense out of the box. And actually, if you want to do any R, you shouldn't have to think about the model implementation and what to choose. It's just like, it should just work out of the box. But the library also needs to grow with you as your needs get more specific and you want to do more sophisticated things. And so, you know, we want everything to be customizable and like powerful, and it doesn't just mean simple.

22:09
like it also needs to have this extensibility. And in Space E3, we introduced the config files that really save all defaults and all settings for a model. And that you can kind of, yeah, exchange with your team and really make sure you run reproducible training runs. And also the project system, that's also actually something that's very popular whenever even we talk to companies or introduce them to it, it really like changes the workflow to have like,

22:38
project structure to have like a file that kind of works a bit like a make file and really have orchestrate all the different steps that need to happen as part of a project, because that's also part of it. It's not just the machine learning model. It's also the whole workflow around it basically. Right. That makes a lot of sense. Yeah, I, you know, when you're a data scientist, machine learning engineer, whatever you want to call it.

23:06
software engineering best practices are so important and seeing a library like spacey where it's clearly like thought about, you know, how these things are going to be done. And you can, you know, you could read it, you can understand it, you're able to modify it. And I guess it's sort of, it's a library that's meant for developers, you know, in a lot of, in a lot of, you know, I guess all libraries are, but it's like, it's really thought about.

23:34
made for developers, for developers. So you can take this pipeline and you can do so many things with it. I mentioned some of the use cases before, but I'm also thinking how I've done like language detection, using spaCy and the spaCy universe, something in the spaCy universe. I have to get more into those config files. I applied it to one of my more recent projects, like,

24:03
outside of spaCy, but yeah, spaCy is config files. Um, that's something that I'm writing down. We also have a, we released that config system as like a separate library called confection. So that was basically, we had it in spaCy before and people liked the concept. And so we were like, Hey, how can we, um, yeah, make this available as a more general purpose tool. So even if you want to use it for other projects and the nice thing about it is that it doesn't only have like values in kind of

24:33
Python's config parser format, you can also refer to registered functions. And so you can really build up this tree of objects like, Hey, here's this object that needs to be passed into that function as an argument and you can define all of it. It uses Pydantic under the hood for validation and, you know, type checking. We have a VS code plugin for spaCy config files. So you can actually like.

24:57
know, hover over like a variable or registered function and it tells you what it is. There's a lot of cool stuff you can do with modern Python. And I think especially this kind of, these improvements to the developer experience that close the gap between prototyping and production are really important. Like you don't want to have this workflow where, you know, it's fine to work in notebooks and experiment, but like if...

25:22
Yeah, you have this really rough prototyping workflow and then someone needs to take that and translate it for production that really holds teams back. And so if you can have a workflow that works for prototyping and that you can also ship to production, that's like a huge win. Yeah, that's the dream. And I mean, it's so tough, I think, because in machine learning, it's like in the data science in the

25:48
feasibility and exploration phase and tinkering phase of a project. Like you need to have all of this flexibility. And then experimentation is obviously very important, trying out different parameters. But then when you finally are settled on something, things can be a little bit more narrow, like more narrow. And you have to get it into production. It needs to work every time. Yeah.

26:14
Yeah. And it's like, yeah, it sucks if you then find out, oh, it doesn't actually, you know, work on those types of machines or it doesn't. Yeah. It's not fast enough. And then it's like, yep, that's hard to fix. If you know, you started off like not caring about this at all. Right. You run into some bug or I call it Python dependency hell.

26:35
Yeah, that's also we're doing like, there is a lot of thought like that goes into that as well. At our end, like we actually, we have a lot of a lot of the libraries we depend on our own. So we implement, like we have this library that handles serialization because we often ended up being like, I don't know, ending up with broken space installations because of some dependency. And then there's some things we don't like. So we're like, okay, let's like

27:03
Maybe also take those that aren't maintained anymore and update them a bit and really control our own dependencies to prevent as much as possible that like people end up in these weird dependency health scenarios because that really sucks. So what's next for Spacey and Explosion? How are you thinking about, you know, I guess, you know, where you are now and what expansions there would be? Yeah, so we definitely have...

27:31
um, some component in mind for spacey that we want to develop for more, you know, practical tasks like around relation extraction. And also there are a lot of, um, you know, semantic things like there's, there's a lot of semantic role labeling. For example, there's like a lot of that stuff that's actually really, really useful and addresses a lot of the problems people have. But, um, it's also some of them are tricky to develop because, you know, we need good benchmark data sets and a lot of the stuff that comes out of

28:00
especially useful if we really want to find out, hey, does this solve real problems? So that's also why we like working together with companies occasionally and doing some of these consulting pipeline development projects, because we get to see what people are actually doing. And we also get to use our own library, which is really important because we have this whole document that we call collateral contributions that came out of,

28:30
where I don't know, yeah, you built something, you're like, why is this so shit? And who wrote this? And it's like, oh, me, yeah, okay, we should fix that. And there's a lot of, yeah, a lot of these things you really only realized when you use the library yourself. So yeah, there's definitely kind of components. There's also a lot more work around like large language models and really, you know, how to go beyond just like, hey, using an LLM as a like dialogue system or,

28:59
you know, kind of that as the last step. Like if you can have like this large language model that's surprisingly good at stuff, then we can also distill that down into something smaller, better, faster, private. And, you know, yes, that takes some work, but they're like, they use cases where this is absolutely worth it. And so, yeah, we kind of trying to take it like one step further and not just stop at like, oh yeah, LLM is like decent given it wasn't trained for it and it's...

29:28
Yeah. Kind of cheap via an API. So that's like, yeah, that's kind of, that's not good enough. And those are the use cases we really interested in. Yeah. LLMs they're kind of just like pretty good at everything. Yeah. But yeah, and I think a lot of these workflows are workflows of training task specific models that I described of like, hey, using an LLM to label some data to then train, um, you know, smaller.

29:55
um, you know, BERT based model, um, from those examples that really just targets that one specific thing. Um, and it's better, faster, um, more accurate and so on, like those are tricks that like people are using, but it's still pretty difficult for people who are just getting started. Like it is easy to just, you know, use strategy BT, but, um, we're really trying to basically make these workflows, these more advanced workflows more accessible.

30:25
to people who maybe don't come in with this like deep machine learning expertise. And that's actually also where Prodigy and Prodigy Teams fits in. So Prodigy Teams, we just started our private beta and that's really a SaaS product that takes spaCy and Prodigy into the cloud and enables these types of workflows. Because you know, you want to experiment together, you want to take a, you know, start with the

30:52
collect some data, train models, try out different things, evaluate, and also label data with the help of all kinds of different automations from models to rules. Um, and do that collaboratively in a team with different people and kind of have this shared state. Um, and that's really where Prodigy Teams fits in and also providing the data privacy, which is, um, you know, a big part of our.

31:21
or core of our stack that, hey, you can run it yourself. And it will have this cluster component that you can run in your own cloud. So no data has to ever go to our servers. And yeah, and also I believe it shouldn't, like a lot of, so many of our users are working in finance, medical, insurance, lots of sensitive use cases where, yeah, you can't just send that data to an API or upload it to some random startups cloud. And-

31:51
we think that's fine. And yeah, so we built an architecture that supports that. Very cool. So that's, you know, it's like, it's similar to what you were saying before, you know, designing solutions for, you know, many different companies. And then what I found, you know, the startup that I'm at, we are trying to develop systems that can work for

32:17
many different companies who are in different industries with different contexts, with different corpuses of data, with different terminology that's being used. But yeah, it's sort of creating this system that can be used for anyone. So it's like creating this generalized thing that can be applied to particular companies. So it sounds like a solution like that where you can combine spacey and prodigy. That's a system.

32:46
Things. Yeah. It's also because, yeah, yeah. Because you're hosting this cluster. You could run Python on it. It, you know, it's automatically turned into a form in the web UI. So you kind of have all the advantages of like a SaaS product, but still have retained that scriptability. And yeah, exactly. I think it's also important to meet companies where they are. And that's also something we've always wanted to do. Like even, yes, they are like super hot, visionary ideas and a lot of those are cool and we also have some of those. But, um, I think.

33:16
know, it's also important to build something that can be useful for people right now and help them in the transition into maybe some other technology. Instead of saying like, oh, here we built some, you know, something completely different that, you know, you all should adopt now, which, you know, really isn't, you know, what fits into people's workflows. Or it's like how we call it, like it's fine. We often, we do sometimes reinvent the wheel, but like you shouldn't reinvent the whole road.

33:45
and kind of try to own the whole thing. And that's kind of how we see it. I think, yeah, it's fine to reinvent the wheel sometimes. Sometimes. It's funny that you say that. I was just talking about that with someone, but you mentioned and you've spoken about some really interesting things about creating a software startup. So you gave a keynote, I know.

34:14
It was probably now four or five years ago. Five years ago. That I loved, that I love. I listened to it again. How to ignore most startup advice and build a decent software business. What a title for a talk. Now, you know, five years later, can we revisit some of the, you know, the misconceptions or, you know, how you still feel about the misconceptions about running a startup?

34:43
Yeah, I mean, I think it's still, it's kind of, you know, these ideas from that talk are, you know, really kind of the DNA, um, you know, of our company. And a lot of that has also carried through, even though, you know, we've grown in between and, and, um, you know, the, the teams changed. We also, we are working on a SaaS product. So we did take some venture funding for, you know, that product and their, you know, pros and cons of that. But I still think, for example, you know, one move we made very early on in the company is to.

35:11
say, Hey, let's release a paid product kind of as this transition from, you know, spacey, which is fully open source to, um, you know, something that's more of a premium thing and, you know, get some revenue from that and see like, Hey, does this work? Is this viable? Can this be, um, you know, sustainable? And that's, um, because yeah, and when we're starting out, we're like, yes, there are reasons you might need to run at a loss and, you know, make sense if you need upfront capital, but it doesn't.

35:41
know, it doesn't have to be the status quo. You can also do stuff that makes sense, create value and people will pay for that. And companies like open source, not because it's free, it's because it's open and you can work with it and script with it. And that's, that's what it's about. And yeah, there's definitely, yeah, there are definitely ways in which, yeah, we've done a lot of things very differently from, you know, your typical startup. We're like, you know, we're also quite reasonable.

36:09
I mean, you know, we often, I think we've seen as, you know, in general, if you, you know, on the scale of startups, we're like, you know, you know, we may be a bit too reasonable sometimes and not visionary enough, but, you know, we also want to do a thing that makes sense. And that's important. And especially at the moment, we're also looking at more ways to really make open source development more sustainable and really self-funded because it's like, you know, we don't want to have that dependent on.

36:39
eventual outcome. That's not, that's not great. And there are ways, um, this can be done. So that's something, you know, we exploring. Yeah. The, um, you know, the business model of startups, there's a tendency, I would say a couple of years ago, um, where it was all about growth, right? Growth, growth, growth, growth, growth. And that comes at the trade-off of sustainability.

37:07
Right? And that's something that you, you know, were, you know, that you're, you're mentioning and sustainability, working for sustainability is a business model that will work in any climate, right? In, in, in any economic climate, whether VC is hot or VC, you know, VC funding is cold, whether, you know, the economy is hot or the economy is cold, which I guess is sort of related, but you can't go wrong.

37:34
Yeah. Or what the current trends are. Like they are, you know, it's, that's also something where it's like, in a way, yes, if you raise funding, you always sort of, you know, dependent on what, you know, whether VC is like you or what they're into the moment, but it's also, you know, there have been periods where it's like, oh, this is kind of the new hot trend and that's what you should be doing. And we never wanted to be in a position where we really have to, yeah, completely change.

37:59
what we're doing and pivot all the time to meet whatever people find attractive at the moment. Like we want to do what makes sense for our customers and users. And I think also there's so much value that can easily be destroyed if you really go down that narrow path where it's like either big or bust and where there's kind of no in between.

38:26
or way out to say, okay, even if you have to downsize and you can be profitable again in some ways. And it's not like, oh, you didn't bet everything on your growth. And then if it fails, then well, that's like all this value that you've created is like just gone. And yeah, we've always found that frustrating and counterproductive. And especially if you do open source, which is creating so much value and it's supposed to like...

38:56
of be the opposite of that. Yeah, I think it's an interesting, you know, concept where explosion, it appears, it's seen, I mean, it does, it had a clear, you know, pretty clear vision, right, to bring NLP to industry, and to bring in investors would almost create an unnecessary

39:21
I mean, I know that you did a small raise, but you do what you gotta do to build in a way that you're comfortable with. But yeah, investors taking major equity stake in your company, your goals start to change a little bit. You're thinking about what am I gonna do to get to the next fundraising round as opposed to what am I gonna do to provide value for my customers, for my users, which.

39:47
you know, spacey and explosion clearly keeps that in mind, which, you know, I think- Yeah, I mean, look, at the same time, it's like, okay, if you go for like the venture path, like, you know, what you are building is a very large company and you are kind of aiming at that sort of outcome, which I still think, hey, if you're in developer tools, if you're in enterprise tools, that's totally realistic. You can, you know, there's a lot of money that you can make if you built things that are valuable and good. So,

40:16
It's like, yes, that is, that is kind of, you know, aside of, um, aside of things and, um, you know, you can, yeah, I guess you do, you know, you do have to deliver something. And for example, you know, right, like right now we're right in, right in, before launching our next product. And, you know, this is not, you know, we can't really raise at the moment. And that's like, um, you know, that sucks and that's, you know, a problem because we need to, um, you know, get the product out and make sure we can keep.

40:46
the rest of our work sustainable. And there is kind of a dependency there that's like, you know, not always ideal, but at the same time, we also, you know, we have customers, we have users, we have revenue, and we can, we have like a lot of what we have is open source, and we have products, we have like things there. And yeah, we do have graphs that go kind of up like this as well.

41:14
but there are other graphs that are like, maybe not like, woo, growth. And that's fine, we know we're doing something useful. And that's also at least important. Yeah, I mean, I think it's about finding that healthy balance, right? Between velocity and growth and sustainability.

41:40
There's a really interesting point that you make in that talk where you talk about, well, the team that you had building sort of a software engineering team and you mentioned something called T-shaped and tree-shaped people or employees. Skillsets, yes. Can you talk about T-shaped versus?

42:06
for tree shaped and how it applies. Yeah. So I think T-shape, I mean, I think that that is kind of a common phrase to represent like, yeah, our skills where you have this like strong base in the middle and then kind of, you know, these two other, or like other, um, things that you like less strong at, but that you also have, and that's kind of, you know, that is desirable, uh, for many roles, but, um, yeah, I've always found this metaphor to be like too, it's too static and it's also, you know, it doesn't really, um, represent skills very well. So.

42:36
I like to think of it more as like, you know, this tree and yes, you have a solid base and then they like all these little branches. And you know, they can also grow like you can grow a new branch if you have to branch out into some other area. And branches can overlap. Like if you think of your team as like this little forest, like you can have like some trees overlapping and maybe then you know, you have a tree that's a bit, you know, covers some areas that the other trees don't.

43:05
cover. And, um, yeah, maybe it's also something I identify with, because that is how I see, you know, myself and my background, like, um, I feel like I kind of had to create a job or role for myself, because I don't really fit neatly into many of the traditional software engineering, whatever roles, there's just like, all kinds of different things I can do. And somehow, I make that fit together. And actually, similarly, my co-founder Matt, like in the beginning, we really did

43:34
most of the work, it was just us for quite a while. And we really, we were able to build like the, you know, the prototype or MVP of Prodigy, just the two of us, because together we had like everything needed for that. And that is an advantage you can have in a small team. And we kind of try to keep that up as we've grown the team, but it's also, you know, people will argue like, hey, that's not very scalable. And that is true. You can't even just,

44:04
you know, scaling like workflows where you have like two people who happen to be have these very complimentary skill sets working together, trying to replicate that with more people is really hard. And that's also something, you know, we found hard and it's not because like, Oh, we're so amazing or something. It's more about like, okay, there is, you know, this is very specific combination and like, you know, combination of instincts that we had that worked. And then how do you replicate that with other trees? Right. Yeah.

44:33
Yeah, it sounds like a really special relationship where you're able to compliment each other and then the whole is greater than the sum of its parts. And speaking about like that forest metaphor, that's what I've been thinking about. Like, you know, the teams that I've worked with, you know, how we're not just these static entities, right? We are these things that are growing through time, changing our...

45:02
skill sets changing, our context is changing, so many of these things. And then I do, I think about like the tree, the intertwining, right? And like you're able to like create this incredible thing. And you also bring up, you know, it's sort of like a mindset also where it's not static. You have this growth mindset, skills aren't fixed, skills you're able to improve them over time. And I think that's what has helped me

45:32
think about hiring and building a data science team is finding people that are able to do that. If you can find problem solvers, if you can find people that don't necessarily fit into a particular box, and I can relate to that also. What you were saying, there's many things that I can do, and I think it's important for a startup also where, yeah, you know...

45:59
For me, I started out as the data scientist. So like anything numbers. Oh, was, you know, go to me. And it's like, yeah, you know, okay. It's not just like anything, right? But- You also did design, right? Yeah, that's true. So yeah, I did lots of graphic design and web design. So like something funny, we have these emojis as part of our product.

46:28
and we wanted like a holiday special. So somehow it was me, I was the data scientist that was creating these holiday special emojis that then went into production and whatever hundreds of users saw. But it's nice to be able to think about things from different perspectives. I think that gives you a really, it's the healthy way of doing projects and you can kind of zoom in and zoom out and you know.

46:57
do different types of things. Yeah, so where to go from here? I mean, we've talked about so many things. Let's talk about, here's a good one. What's like, instead of not most, but one of the most innovative ways that you've seen spaCy use or unexpected ways that you've seen spaCy use, they could be the same, they could be different.

47:25
Yeah, I mean, there's there like, you know, there's so many, I like it's actually also, you know, it's a question I get asked, like, you know, sometimes and it's like, oh, I don't always have like all the like super mind blowing like use case, like, you know, there's really a lot in like, small details. But, you know, if I think about it, also, you know, machine learning more general, like it also comes up a lot with like prodigy, we had people, now people working with like

47:49
horses or cows in a barn and all these everyday life things, where you're like, oh, wow. Yeah. Okay. Sure. Yes. You can detect this, the cow will say or something like that. And that's like, oh, that's important. Or these machine, general machine learning use cases of, I don't know, try to predict how many meals you need on given flight to reduce food waste. And I find these types of things super interesting. And there are a lot of them also in

48:19
getting the model like 1% more accurate or half a percent more accurate, really translates to like millions and really significant impact. Or also what I really like is stuff in journalism or investigative journalism, like there's a, like years and years ago, I don't know if you remember the net neutrality debate, I think, yeah, in the US. And then it turned out that like all these like letters that were...

48:48
you know, supposedly written in the name of like dead people or people who never sent these letters, like, you know, being like, you know, advocating, you know, against, you know, freedom on the internet basically. And there was a data scientist who actually think use like

49:09
Back in the day, it was mostly word vectors. I think also using Spacey and was able to prove that like, oh, these letters are all like fake and they're very similar and they're like dramatically generated. And on top of that, also these people don't exist or are dead. That really like an investigative case where like, hey, the technology was used to really show something or uncover something.

49:34
Um, important. And, um, I think also, you know, there are probably a lot of use cases I don't know about in like a lot of these leaks and all of these data amounts are getting bigger and bigger and there is a lot in there and it's way more than the person can read. So, um, I think using NLP there is, um, you know, pretty. Exciting and, um, yeah, very important. So, yeah, absolutely. Um,

50:03
being able to, I mean, that's the power of natural language processing, right? Being able to take like endless amounts of data. I mean, you know, these models now are reading more data, more texts than anyone could in like, what do they say, like 40 lifetimes or some ridiculous statistic like that. But that's also why models, the text and everything, text grows faster than like computers get more efficient. That was actually, there was even really motivated like

50:33
even spacey and like a lot of my co-founder Matt's philosophy, like to program, like programming and how he started writing his code. Like when I think when he was still doing his PhD, it was kind of, yeah, his advisor had this idea that like, well, the internet grows faster, like text grows much faster. Like we can't, you know, can't just sit around and wait for computers to get more efficient. We need to be smarter than that. And yeah.

51:00
That's how Matt became, randomly became known for writing like super fast and efficient software from, you know, being a linguist. Right. I'm sure it's even greater now, but I know a couple of years ago, there, I mean, I don't know, many statistics are made up. But they say, they said, you know, the amount of data that was created in the last two years is greater than the amount of data that was created in the last hundred, like, hundred years before it, like some, you know, ridiculous statistic like that.

51:27
which just shows how important this type of work is. Yeah. A more philosophical question, if you don't mind, maybe. What do you view as the difference between natural language processing and natural language understanding? Yeah, I mean, that's one of these like definition terminology questions, but like, I don't know, I guess I can try to approach it from a philosophical point of view.

51:57
rather than like, I don't know, I guess it's also, it's worth mentioning that like, yeah, that this terminology gets a bit vague and like, you know, previously stuff like text generation, that's generally seen as part of natural language processing, even though, you know, intuitively you would think like processing is really a lot more predictive and information extraction, but that's also all under that umbrella now. And I'm sure, you know,

52:25
their official definitions. So I don't want anyone to, you know, listen and be like, it's not meant to be a gotcha. I'm always trying to, you know, in living in talks, I do in everything, I always try to optimize for the like, well, actually, point that may or may not come up. But, but yeah, I do think natural language processing, yeah, you know, it's kind of all, you know, all the technology, all the different

52:51
know, techniques from information extraction, text generation, and so on. And, you know, natural language understanding is more like, you know, the goal of what, you know, you're building, like natural language understanding pipelines, for example, you know, you use, basically uses natural language processing techniques to help you build natural language understanding pipelines.

53:16
I don't know if that makes sense. No, that does. It does. But I might be wrong. Like it's again, I'm just like, you know, yes, I'm a person on a podcast, but I'm like just saying things. Aren't we all right? Aren't we all. The point that I was gonna make was, well, yeah, I mean, processing and understanding, oh, this was it, is that I love the companies or like someone will say, no, that company, they don't do NLP, they do NLU. I'm like. Oh.

53:45
Like what? Like I never really understood that, right? Because- Yeah, but you kind of need to, I mean, I guess you need a P for the you. Yeah, I wholeheartedly agree. Yeah. Oh, I guess the human can, you know, maybe everything, you know, in that sense, like if it's just a person, like we do natural language understanding all the time without doing natural language processing. So. Yeah, I'm not, well, I don't even know.

54:12
Right? I think maybe we do have to sort of process it in order to understand you have to process. What does understanding mean? There you go. How could we understand what understanding is? Do we ever understand anything? Do we really understand? Yeah. The only thing I know is that I know nothing, right? That's like, yeah, full circle philosophy. Yeah. In the industry now, there's so much hype, right? Like...

54:40
like what you were saying before, you know, it appears on social media, like influencers, basically, that all the problems are solved by large language models. I'm curious, from your perspective, how do you view the gap between the hype, you know, that's the frenzy that's been going on, I would say since like the launch of ChatGPT, but you know, now GPT-4 has been out for months.

55:08
and the reality of machine learning and AI. Yeah, I mean, there are different ways to look at it. Of course, you have a lot of new experts that I don't know previously were experts in the metaverse and before that experts in blockchain. And now they're like large language model experts. And that's like it's- Beware, beware. Whole other categories. So it's like, yes, there is a lot of hype happening for like very different motives.

55:38
But of course, one thing that's kind of clear and different here is that it's really one of the first times that like, you know, a really good version of the technology has been like so accessible even to, you know, I don't even like say like not lay people, like people are not like developers and just, you know, everyone has used ChadGBT and knows it and, you know, can actually experience like what a model.

56:08
can do even though, you know, just some, you know, specific capacity or specific things for everyday life. And that is different from, you know, a lot of other releases and, you know, other discussions in the past. And there are also things that like, well, work surprisingly well. And it's like, you know, really just making the thing bigger and bigger and bigger really, you know, makes a difference. It's kind of one of these things. It's like, well,

56:34
Yeah, but like somebody just went and did like open AI and others just went and did that. And we're like, oh, how about we make this like even bigger? And it's like, hey, that it works. And it's like, you know, it's surprising, you know, how well it works. But then, you know, there's kind of, there's also the difference. There is this difference between like, well, what do you want to do with it? Like, you know, you can make it generate stuff, but then they also, they are a lot of, if you look at what's actually done in NLP or

57:04
today in companies, enterprise, what's in production. Like again, I don't have any stats, but I would say like information extraction systems are probably the majority of what's in production today also because that's the stuff that has been working pretty reliably for quite a while now. So, and actually a lot of these things are also things that companies were doing way before machine learning and even way before like computers in some cases. Like there's just like,

57:34
stuff and text and someone needs to go and like order that somehow with all kinds of different objectives and people would write index cards and like punch cards and did all kinds of stuff and then did stuff manually and now there's more you can automate or helping humans perform tasks. Like it's not just all automation. It's like an analyst would previously make more mistakes and it'd be less efficient and now they can use

58:00
suggestions and work faster, stuff like that. And I think these are just a lot of the fundamental problems. And I don't think this is all, you know, everyone's now just gonna stop what they're doing and replace that with like, you know, we're large language model. Like that doesn't really, you know, that's not really in my opinion, the path, like a reasonable path. It's just that like, hey, we have more tools now.

58:28
to actually solve these problems even better, but the problems remain the problems. And there is definitely still quite a gap if you're looking at the predictive tasks, like going from this, from a generative model or like a model that's trained with in-context learning. So really that's kind of the distinction I wanna make because people now call BERT and LLM and it's like, and it makes the discussion really.

58:51
And it is an LLM maybe, but it makes it really difficult. It's murky. It's a little murky. What people are referring to as it. Yes. Yeah. And it's like, oh, by that definition, well, then we've all had LLMs in production for years is that what's new, but basically in context learning and transfer learning. And yeah, if you look at, yeah, the performance of a generative model where then you kind of have to find the right output, parse it and try to really make it do like a predictive task.

59:19
Entity recognition, text classification, anything structured. The models don't, there's still a lot of room for improvements and, you know, like this, the even, you know, the newest large language models don't perform as well as, you know, maybe some of the hype would suggest. And yeah, so we've done, we've done some experiments like for, you know, we talked at it on the topic where we just basically took.

59:47
Um, some data sets, benchmark data sets from research for different text classification problems, um, look evaluated, um, just GPT out of the box, um, on it and on the evaluation data and then trained on different portions of the data to see like how many examples do we need to be that accuracy and it's, um, you know, maybe we can, we can link the slide into in the show notes like the, we've always, we've, we've been meaning to publish this properly, but we haven't, because there's so much other stuff.

01:00:17
But basically for simple problems like news text or even sentiment out of the box accuracy is really, pretty good and quite impressive. But even with a few percent of the data, you can beat that. And so you just need, with some labeled examples, you can still exceed that. And especially on like harder problems like banking, you can really see that while out of the box,

01:00:46
still impressive accuracy given that, you know, it wasn't trained to do this at all, but, um, you know, it can very easily beat that. And even if, you know, if you take that position that, Hey, these models are going to get even bigger and even better and even more efficient. It's like, well, but at the same time also, um, so do the other techniques we can use to, you know, always take it one step further. And there's clearly this advantage in, um, you know, transfer learning, learning by example, that

01:01:15
you know, we can exploit in order to build something even better and smaller. And that's, I don't know, that's the kind of stuff I find exciting or more exciting than, um, how can we, I don't know, replace everything with tragedy. Yeah, absolutely. You brought up so many great points. Um, um, I would encourage listeners to listen to two of your talks, um, large language models from prototype to production and, um, incorporating LLMs into practical NLP.

01:01:46
Um, the idea of using generative models, not necessarily for your solution in production, but using generative models to, and in context learning to help you gain labeled data to then fine tune models that you have control of. That's my, and I think we share, we share that. It also resonates with people. Like I've always at conferences, I do talk to, um, you know, developers and, you know, it.

01:02:15
generally makes sense. It's not just, you know, oh, me going on stage and saying some crazy stuff. And similarly, also companies we talk to and work with, like, you know, not for all use cases. And there are lots of new things that of course the generative capabilities add. And there's a whole other areas like chatbots that we're not, you know, really invested in or working much in, but they're definitely, I think, you know, a lot of use cases where that sort of workflow really resonates.

01:02:45
Yeah, absolutely. I mean, for me, it's about having control. Not that I need to have control over everything, but it's about having control. It's about having the control over your model. The server being down isn't going to stop you from making predictions, right? That doesn't cut it for your clients. And in actuality, I mean, companies just want you to solve their business

01:03:15
care the algorithm behind it that does it. So if you're able to create a way of extracting information in a way that provides them whatever the actionable insights that they're looking for, then you can, you know get closer to your goal. And that's a lot of the problems that I deal with these predictive problems of, you know, extracting information, bringing structure to this unstructured text, which is a lot of, you know, NLP.

01:03:45
And yeah, all of the generative side of things, it's different and it has different use cases. For my work, it's helped me build up data sets and it's helped me validate my predictions sometimes. Like I kind of try to use it in the loop, which I know is something that Prodigy is having LLMs in the loop, which is exciting and amazing to be able to use the newest things. Yeah.

01:04:14
So it's incredible what's going on in the field. And I want to zoom out to learning from machine learning. Welcome, that's where we are. I have a couple of questions in this vein. So who are the people in the machine learning field that influence you? Yeah, so I don't want to go like, yeah, I think the.

01:04:38
classic answer to this would be to rattle off like some, you know, big names or people, you know, on the internet, but I actually think even in my, you know, my day to day life for, you know, what inspires or influences me in the work I'm doing, is actually seeing a lot of the use case of people who come in from like, you know, really specific domain that they know a lot about, and then seeing them like, pick up the tools.

01:05:05
and built something and often, you know, again, we already talked about like some interesting use cases of like spacey or like prodigy with like cows and horses, but they also so many other like, um, areas and domains that like, I know nothing about and then, you know, there's so many people who like, really have like this deep understanding and come in with a problem and succeed at solving it because, you know, they like, they pick up the tools, they're really doing it.

01:05:35
for a purpose. And I feel like these things keep coming up and people, you know, I keep, you know, meeting and talking to people who, you know, kind of came into the field that way. And I think it definitely also shapes how I think about the tools I'm building and the work I'm doing. Very cool. Yeah. I mean, it's, there will never be like problems aren't going to go away.

01:06:01
Right, people are just gonna wanna apply it to different areas. And one of the interesting branches of NLP is applying a lot of the, you know, cutting edge research and things that are done a lot on like English language and applying it to other languages where there's not, you know, I don't know, on Hugging Face, there's probably tens of thousands, if not more data, you know, English data sets.

01:06:29
So yeah, having tools in your toolbox like spaCy and Prodigy and being able to apply it to other languages, that's a really interesting use case. And it makes you think about things in a different way. So continuing learning from machine learning, I'm not going to ask general advice, but what advice would you give yourself?

01:06:57
when you were starting out in your career, you know, or maybe when you were starting explosion.

01:07:05
Yeah. Um, I mean, you know, like, you know, you can kind of take it full circle. It's like, would I have listened to my own advice at the time? Would I even give myself advice? But I don't know. I mean, it's difficult because, you know, you always have the, you know, kind of hindsight bias and survivorship bias. And they're like just a lot of these kind of biases that, um, you know, make you interpret things that went well or.

01:07:32
didn't go so well. And that, I don't know, draw conclusions from it. So I don't know, it could be that like, Hey, my, you know, whatever I concluded is like really bad advice. But I do think there is some element of, um, I don't know, trusting my instincts, uh, or trust your instincts more and like, you know, really do that because, um, but it's also difficult because, you know, you don't want to be, if you're doing things that are opinionated or a bit like,

01:08:01
different and you know, you have these ideas that like make total sense to you. And I like super reasonable. And I'm like, of course should do it that way. You also don't want to lose sight of, you know, other things you might be missing and you want to be open to suggestions or you want to be open. Um, especially in later stages of your company to think about like, Oh, here's how this is normally done. Maybe, you know, we, you know, maybe you don't want to be stubborn. Um, because that can also have

01:08:30
you know, bad outcomes, but I definitely feel like there were, you know, situations where, you know, it would have helped like, you know, like my instincts were there and it was all reasonable and logical. And it's easy, you know, I let myself get carried away, um, you know, trying to trying out something else and trying to be, you know, open to the way of doing whatever.

01:09:00
it was that we were facing and even though it didn't make sense and the outcome was that, yeah, well, what did you expect? It didn't make sense. Right. I mean, yeah, hindsight is 20-20, right? And yeah, it's interesting. Well, two things, like when you're working on something, first off, like how involved you're in it. And sometimes like you put these blinders on and then maybe like you might ignore some of the other factors.

01:09:29
um, that are happening, but that, yeah, that's, yeah. And I also, I don't know everything. I mean, it's like, I was still, you know, I was still relatively young when we started the company. It's, you know, my first, the first time I founded a company and yes, I've had like jobs before I freelance before, but like, this is, you know, not something I've done before and I'm operating off, um, you know, what's reasonable and what's logical and, you know, drawing conclusions and then also making, you know, making

01:09:58
decisions, like if you're thinking about what product should I build or what, um, what feature is it that people need? Like you have to make a decision and you have to be right. And, um, then you kind of, you know, check your track record. And if you're mostly right, that's good. And you want to be right most of the time and not wrong. And if you're always wrong, that's bad, but, um, you know, that's not really, again, that's also something you can't really, you know, that's not, um, you know, uh, playbook to operate by.

01:10:28
It's like, be less wrong, be more right. Right. But that's ultimately what it comes down to. Yeah, that works. But it's like, there's no formula for doing that. No, exactly. And I feel like in terms of looking at where's the market going and what types of, how are things developing, how's technology developing, we've been, our track record's pretty good. Like we've always, things have developed the way we.

01:10:57
predicted they would develop. I mean, not because, yeah, we have like these amazing prediction abilities, but more because pay, it all made sense, right? It's largely cool. And so we were like, okay, we feel pretty good on that. Then there are lots of other areas of like, how do you do a thing? How do you scale up things that previously worked well? How do you grow a team, grow the right team, decide?

01:11:26
what to focus on. Um, and you know, those are all things that are hard. And I think again, there's no, there's no playbook, like every, every time we. Yeah. Deviate from, you know, something and do what feels right and make sense. Yeah. It usually makes sense. And it is usually, you know, the, you know, the most logical next step. That means it's all a bit abstract or philosophical, but that's basically.

01:11:55
No, it makes a lot of sense. Yeah, I mean, you know, trusting, trusting your gut, trusting your instincts, thinking logically about things, you know, sometimes you just know, like deep down inside, maybe you don't have like the data to show it, but you just kind of, you know, you know, like what it's gonna be. Yeah, like, actually, this is my reminder. I don't know, it's mirror. But yeah. Oh, and, and

01:12:25
and reason. A logic tattoo and reason. I see that when I type. There you go, so yeah, that's awesome. Yeah, I mean, a reminder of logic and reason. It'll get a lot of people a long way. I'm going to flip the last question, and I'm going to say, what's one piece of advice that you've received that stuck with you, either for your startup or just your career in general?

01:12:54
Yeah, I think, I don't know if it's exactly, if it's like, yeah, advice I specifically received, but like to go back a bit, like I, you know, before I went into tech and really started my career, I was kind of, you know, I didn't really know what to do with myself and I was young and like, you know, living on my own. And I got really into like self-help kind of stuff. And, you know, it's all a bit cringy, you know, if you think about it and, you know, it's definitely, I think when...

01:13:23
It was always weird, but when I really caught on and was like, this is really bullshit, like, you know, the kind of the secret-esque like stuff, it's like, I just refuse to believe in the universe. Like that's kind of where it like stopped. There are a lot of things where you can be like, oh, psychology, and I guess blah, blah, blah. But then if it's like, when it comes, came to the universe, I was like, what? No, like the universe is not gonna, I don't know, reward me with whatever I'm manifesting and that kind of bullshit. Like the...

01:13:53
Yeah. So, and, you know, I kind of, I think I also wrote about this on my blog and I got, I kind of got, yeah, to this point where I was like, okay, no, I just need to kind of get my shit together and like, take responsibility for my life. And, you know, this whole like, you know, follow your dreams type thing and manifest stuff that's like, you know, that's not working. And that kind of also led me to reading more like, I don't know.

01:14:20
more kind of career or business minded things that had never appealed to me before. Like, you know, I never really saw myself as like wanting to have a career or like it just, a lot of these things like, yeah, would have felt really wrong to me before that. But after going through this whole like, you know, shift, after refusing to believe in the universe, that really, I don't know, I think it woke up my

01:14:50
entrepreneurial spirit or like really also that like, Hey, things can be pretty simple. Like you can, I mean, it's not like nothing is easy, but like, Hey, if there is something you can create that like people can pay some money for, you can live off that. And, um, like, okay, you just have to, you know, you can have these ideas and you can actually do these things. And, um, you know, you can, if you, if it's something you enjoy, you can actually.

01:15:20
You know, you don't have to, you don't have to like force yourself to like, um, work, but like it actually makes sense if you're like passionate about it. Yes, you can work more and more and more. And then maybe at some point in the future, I can, you know, take some time off when I'm like, um, sick of it and, uh, need a break. And there were a lot of, there's a lot of this sort of, you know, mindset, um, that. Yeah. I discovered through that hole. Um.

01:15:49
experience of like not really knowing what to do with myself. And I do think that has definitely influenced me a lot. Yeah. When we started the company and now we're actually like, yeah, running a business. I, yeah, I think that that's, that's great. Thank you for sharing that. I can definitely relate. I think that there's, I'm not sure if everyone goes through it or experiences it, but like, there comes a time when you sort of realize like,

01:16:20
no one's gonna do it for you, right? Like you're the one, this is your life and kind of just taking responsibility for your actions. And then if you're lucky or maybe luck's not the word, but to find something internal that can kind of like, drive you, which is something. This type of work that we're doing is fascinating. I mean, I find natural language processing, machine learning fascinating. I was...

01:16:48
willing to create a podcast about it. No. And I think there's also, you know, that then ties into that. That's also, you know, of course we also very, um, you know, privileged or like, you know, I certainly am or being in this industry, it's like, um, you know, of course, like things, you know, the world is not fair and it's like, we happen to be working in this industry where like, what, what we are doing is, um, very valuable and, um, you know,

01:17:17
often in ways that are like, you know, a pretty stark contrast to other things. And, you know, and also other parts of like every individual story. Like in my case, yes, I was able to don't take some more risks because I was young. I don't have any dependents. And even now I'm like, you know, I have, you know, different options than other people. And, you know, than other people are disadvantaged in other ways or.

01:17:44
know, for example, my co-founder Matt, he also makes sure to always point out, Hey, he got, when he started Spacey, he got an inheritance from like a small inheritance, but, um, it allowed him to take some time off and really focus on writing an open source library, because like that is not like, you know, something people can just do. Like, you know, you, you need to be able to live somehow. And, you know, I didn't really go down this nerdy path of saying like, Oh, I'll start.

01:18:13
I implement this library that kind of doesn't really exist yet from scratch. And I start, you know, I implemented in Scython, which is this, you know, even back then, even more like somewhat more obscure language that compiles, that looks like Python and compiles to C extensions.

01:18:34
And I start with the data structures. That's like, you know, true story. And they're still super stable. Like it was worth it, but it's still again, like doing, um, yeah. Now I really, and again, I think maybe also comes from like my, you know, disdain for like the self-help narratives, but like, you know, I really don't want any of these things to be turned into this aspirational story of like, Oh, quit your job and write an open source library. It's like, yes, you can do that. But. Yeah. This.

01:19:04
a lot of circumstances that go into that. Yeah. I mean, it makes a lot of sense. The unfortunate reality, you know, everyone doesn't have the same opportunities and resources, but I mean, it's also, it's important to make the most of those things and to make the most of those opportunities. So yeah, you know, there's something to it. And I mean, we've been touching on it, but the last, you know, really, really good one,

01:19:34
What is a career in machine learning and software taught you about life? Yeah. I mean, we kind of, we were almost like, we did it already, but I have to ask it. Thinking that also, yeah. I mean, also with like, you know, so many things going on, good or bad, like, you know, I don't want this to sound too depressing, but it's like, well, life is hard and the world is kind of, um, you know, hard and unfair and, you know, with privilege also comes responsibility and.

01:20:03
But at the same time, it's also good to be able to create something that's valuable and do things that are not terrible. And again, yeah, I think that's like, sorry for ending your podcast on this. Do things that are not terrible? No, it's fine. This really negative.

01:20:32
I'm gonna steal it down a note, but yeah. No, it's all good. There's so many things that we were able to go into and unpack. There's gonna be tons of great things for listeners. So no worries. Ines, it's been such a pleasure. If somebody wants to learn more about Explosion or your work, where would you lead them? Well, our website, explosion.ai, or my website, which is ines.io.

01:21:03
Um, and yeah, we can put that in the show notes. Um, that's kind of where we, you know, where we collect all the things you could find us on various social media, um, on GitHub. Um, you can, you know, check out our projects, whatever you're interested in. Awesome. Oh, and of course like conferences, like check out our events page, which is on our website, which is where we, um, yeah. Um, list upcoming events like.

01:21:29
data New York City, for example, that, um, yeah, I don't know if you, if I'll see you there. I will see you there. Yes. Yeah. But conferences like that and, uh, talks that are upcoming because it's really nice to, yeah, also meet people in person again. Yeah. Um, that that's great. Ines, I really appreciate the time. Um, thank you for answering my questions, letting me pick your brain and, um, really thank you for all of the great work, um, that you're doing with explosion.

01:21:59
It's much appreciated. Thank you so much. Yeah. Thanks. Thanks again for having me. This was a lot of fun.