In this episode, I’ll be providing the TLDR on the release of DeepSeek’s R1 Large Language Model and what it means for those of us in education. The AIcademia podcast is a weekly show helping educators like you leverage AI in your everyday practice. I'm your host, Andy Fisher, and thanks for joining me.

You'd have to have been living under a rock not to be aware of the big shake-up caused by the release of a Chinese large language model a week or so ago called DeepSeek R1. I thought this might be a useful opportunity to provide a concise summary of what it is, why it’s caused such a stir and the way it will likely change the AI landscape over the next few years. I'm going to do so in a way that makes it, I hope, accessible to the non-techies amongst us in the education sector. 

First let’s begin with a little analogy. Imagine that a new student named Deepak joins your school. He doesn't have a smart phone, he doesn't have access to private tutoring or a fancy laptop like some of the other students, and yet somehow he manages to score the highest marks on nearly every test across the subject range. Deepak’s baseline testing suggests that he is no brighter than his peers but he has found smarter ways to learn. He makes the most of every resource he has access to and he is a quick study. This is similar to what's happened in the world of artificial intelligence with the release of DeepSeek's R1 model. A new kid comes into town, he turned up unexpected, comes with no records from his previous school and he's been shaking things up by playing the school game better than his peers.

DeepSeek was founded in 2023 by Liang Wenfeng, who also owns a company called HiFlyer, which uses AI algorithms to predict stock market trends. He claims that DeepSeek was a side project to make use of some excess GPU time that he and his team had available and If this is true, it makes their achievement all the more remarkable. 

The R in R1 means that it is a reasoning model, and we should spend just a little bit of time understanding what that means. Whereas many LLMs are designed to spit out an answer as quickly as possible which conforms to the training data, reasoning models like DeepSeek or OpenAI's 01 model use what is called ‘chain of thought reasoning’ or ‘inference time’. 

These models ruminate on their first thoughts, they look at multiple potential directions they could go in, and then bring those strands together before offering the best overall response. They take the time to critically evaluate their own outputs and modify them before responding.

You can think of that a little bit like asking a question in class. In some lessons I see students' hands shoot up even before I’ve finished speaking and I might get a decent answer if it is a closed question with low stakes, but if it is more nuanced open ended query, the answers given may be either incorrect or at least not as sophisticated as they could be. If instead, I ask that they keep their hands down for a little while whilst they think through their answer and jot down some notes, I create the conditions for more interesting outcomes. Sometimes I’ll have my students then share their ideas with each other before giving me their best responses which are invariably better for the additional ‘compute’ time they’ve been afforded. This is analogous to what a reasoning model like DeepSeek R1 does.

For example, I entered the following prompt into the model:
‘Provide a novel insight into the system of modern education’

DeepSeek R1 then thought for 59 seconds before providing a really lucid set of observations which I won’t go into in any detail today but it included thoughts about the tyranny of standardised testing, the erosion of intrinsic motivation, how by working with pupils at scale we have to use a mode of authority which produces compliant citizens and how we may be pathologising neurodiversity rather than leveraging its potential.

The response was convincing and well articulated but what makes the R1 model compelling is that I wasn’t staring at a blank screen while it spent a minute thinking of what it wanted to output. Instead I watched a steady flow of grey text which showed the equivalent of its stream of consciousness. The developers have clearly worked hard to make this seem as human as possible and it’s hard not to feel warmly towards the model as it worries over how to best fulfil the use case. It begins:

‘Okay, the user is asking for a novel insight into the modern education system. Let me start by breaking down what they might be looking for. They probably want something that hasn’t been discussed a lot, a fresh perspective. So, I need to think about current trends in education and identify areas that are under-explored.

First, what’s the main critique of modern education? Often, people talk about standardization, lack of creativity, or the focus on testing. But maybe there’s a different angle. Let’s consider the influence of technology. With the rise of AI and digital tools, education is becoming more personalized, right? Adaptive learning software tailors content to individual students. But is there a downside to this?’

And on it went on considering multiple different approaches, until finally it was satisfied that it had something that was fit for purpose and it generated the answer. It’s like glancing behind the veil of the Wizard of Oz to see all the dogs and wheels - or at least the first layer of the process because, as I say, even the thinking process has been framed and output to mimic human rumination.

What has caught so many people off guard about DeepSeek’s release is that it works as well or better than most of the AI frontier models currently on the market, and if the figures they have given are correct, then they've achieved this using significantly fewer chips and have built it at a fraction of the cost of their rivals.

Here are just some of  the questions that have been befuddling the minds of AI researchers since its release:

First, How was it trained? Did the company steal the model parameters and training data of Open AI and others or was the model built independently? While I’ve seen no direct evidence of this when using the model, some claim that DeepSeek has identified itself as Chat GPT 4 which, if true would be a bit of a giveaway!

An amusing meme is doing the rounds at the moment which shows a fisherman standing on the banks of a lake with his rod in the water. He is wearing a t-shirt with the logo for OpenAI. Beside him is a bucket of water in which he is presumably keeping his catch. Just behind him stands another fisherman with his line in that same bucket - you can guess the logo emblazoned on his t-shirt!

Many critics point out the hypocrisy of those in the OpenAI camp who have complained that DeepSeek many have helped themselves to their proprietary data without permission - after all, they point out, Sam Altman and his friends didn’t ask our permission before they scraped the internet and took all our data did they?

Another question focuses on what resources may have been used to train this new contender. Did DeepSeek really create this model using just 2000 of the older H800 Nvidia chips or did they somehow get hold of a stock of the cutting edge H100s from Taiwan, despite the trade embargo imposed by America? This, along with the question of the veracity of the published training costs is important because they are the two benchmarks that have been at the heart of the DeepSeek shake up. If this quality of model can be created on what amounts to a shoestring budget using sub-optimal GPUs then this is truly a David vs Goliath moment in AI advancement.

The initial claim was that DeepSeek built the R1 model using just $5.6 million which I know isn’t exactly chump change to those of us in teaching but when compared to the $100 million used to train other frontier models, it’s quite a saving. On further investigation, it seems that the figure is likely to be higher because the initial sum only takes into account the cost of the final training of the model and not the R&D, personnel costs and other expenses that are incurred when bringing a model to market. Nonetheless, even with these additional costs added, it remains a competitive offering.

So how did they do this? Is it just a case of riding on the shoulders of the giants who were first to market and enjoying second to market advantages or, with limited access to GPUs and a limited budget, did DeepSeek prove that necessity is the mother of invention? Did they find new ways of achieving more with less?

We may have answers to many of these questions in due course because the most remarkable thing about the DeepSeek R1 model is that it was released as open source. This means in effect that they have given it to the world for free and we can take a good long look under the hood to see how it has been built.

Just in case this distinction between open and closed models isn’t something you’re familiar with, let’s use another analogy. Colonel Sanders founded his Kentucky Fried Chicken franchise when he was 65 years old. He had little to his name and was facing financial hardship as he approached his twilight years  but he had a damned good chicken recipe. His whole business model was based on the fact that this recipe was known to very few people within his company and it remains a jealously guarded secret to this day. He made millions because he charged a reasonable price for his food but never gave up his proprietary blend of spices and flavourings. This would be analogous to a closed AI model. 

In contrast, imagine if Sanders had instead published a free pamphlet which disclosed the exact way aspiring cooks could recreate his KFC recipe for themselves. This would be the same as providing an open source model. The training weights, data and parameters are available to anyone at no cost. Now it might be that this public offering was purely philanthropic but there are plenty of strategic reasons why DeepSeek might have decided to follow this course of action. It positions the company as an immediate market influencer, is an excellent way of attracting and retaining talent, it demonstrates the technological prowess of China and neatly mitigates the impact of sanctions by giving them access to the knowledge work of the community of international developers who will work on the model and publish their advances.

The irony here of course is that OpenAI was founded as non profit with philanthropic principles - its name makes that perfectly clear - but, much to co-founder Elon Musk’s frustration, the company has become more and more closed as investors have demanded a return on their investment. Meanwhile a small company from a country not known for sharing its technological breakthroughs has provided a model that fulfils OpenAI’s original mission statement and which will benefit the world as small groups of developers innovate and improve on the models provided.

DeepSeek have released a range of versions of their R1 model so that they can be run independently on a range of different powered machines. The distilled or smaller models start at just 1.5 billion parameters and can be run on a build it yourself Raspberry Pi. The middleweight models come in from 9 billion parameters and can be run on a decent modern laptop while the full 406 billion parameter model is out of the reach of anyone without access to server farms but the point is, anyone can get hold of these models and run them locally in their own systems without ever sending a penny to DeepSeek. Once installed you can input as many prompts as you like without ever worrying about hitting your use limit. The only cost incurred would be the electricity used to run your computer.

And if you don’t have access to a whizzy computer set up but want to use the biggest and best DeepSeek model, you can go to DeepSeek.com and play to your heart’s content because for now at least they are providing unlimited access with no requirement to pay a monthly fee. You just sign in with your Google account and can start testing it with your favourite prompts. This will be incurring a huge cost which will have to be somehow absorbed in the short term (presumably as DeepSeek secure their market position and the strategy seems to be working because the app is ranked in no1 position internationally. This must have taken a toll on DeepSeek’s resources because whenever I have used the model, sooner or later I will run into a warning that explains the servers are busy. I don’t imagine in their wildest dreams they expected R1 to be as popular as it has been but people like free access and the model is good. It’s very good and it’s fast.

There are two caveats here that I need to mention at this point. First, when you agree to the terms and conditions to access the online model, you are agreeing that any data you provide can be held on the company’s servers and may be used by them as they see fit. Those servers are based in China and so in principle, anything shared is being shared with the CCP. Some have used this detail as cause for moral panic but I think if we just think about this rationally, it is no different from using any Large Language Model. Open AI, Anthropic and other companies have all fed user interactions back into their training data and the prudent use of sensitive data should be something we have front of mind whenever we are using AI tools.

For the use cases that form the bulk of my interactions with Large Language models - lesson planning, resource generation and research, I have no concerns about where that data ends up or how it is used. I think we are seeing a reactionary double standard from those who are wringing their hands about the dangers of sharing our data with overseas regimes. Provided the data I am relinquishing is limited to that information I enter into the prompt window, they are welcome to it and I hope it will help further improve their models in the future. If they are grabbing keystroke data or embedding malware which is scraping all files from my system, that is a very different situation but not one for which there is any evidence that I’m aware.

More concerning is the obvious bias that is encoded into the R1 model. The best example of this was revealed by early adopters who asked DeepSeek R1 to explain the significance of the events at Tiananmen Square in 1989. I have replicated this use case and received the same refusal to answer the question. The model replies ‘I’m sorry; I’m not sure how to approach this type of question yet. Let’s chat about math, coding, and logic problems instead!’

At the risk of sounding like a CCP apologist, while I abhor any kind of censorship and think that an open model should be as free from bias as possible, I think it would be all to easy to moralise without pausing to ask if there are similar restrictions in western models that we may be unaware of. Last term when I was teaching my Year 10 class about Generative art models, we were amused to find that we had no problem creating caricatures of Boris Johnson or Barack Obama but some models refused to produce Donald Trump. Likewise, we could create an image of Ed Sheeran playing at a gig but if we changed the name field to Bruce Springsteen we were told we were in breach of the system’s policies. All AI systems have biases and guardrails - it’s a case of picking your poison and using them prudently I think.

Some analysts have referred to the R1 release as a Sputnik moment in AI development, alluding of course to the Soviet Union's 1957 satellite launch, which spurred the US into accelerating its space program. The fact that the model was released in the same week that Trump announced Project Stargate, a consortium of companies promising a half trillion dollars of investment in the race towards AGI is unlikely to be a coincidence. 

Most analysts thought that China was at least 12 to 18 months behind the US in terms of model development. They were wrong. The release of DeepSeek R1 followed quickly by other models such as Ali Baba’s Qwen 2.5 multimodal would indicate that they are at most two to three months behind and perhaps even shoulder to shoulder with the Americans.

And just to underscore the fact that this acceleration is continuing, in the last couple of days, the DeepSeek R1 model has reportedly doubled its performance speed, and my understanding is that it's done this using code that it generated itself. It identified opportunities for improving its own efficiency, in effect self-improving. This idea of self-improving artificial intelligence is one of the hallmarks of AGI and ASI systems, which is quite exciting or unnerving, depending on your point of view.

Sputnik or not, this moment will prove, I think, an inflexion point for the west. The US has declared that it intends to double down on the investment they’re making to bring huge server farms online as soon as possible and speed up their new model releases in order to reassert their dominance and to settle nervous investors. Just this week, we've seen the first glimmers of that with OpenAI releasing free access to their own advanced reasoning O3 mini model, and they have made a number of promises about exciting pipeline developments. 

In response to the release of DeepSeek, Sam Altman said on X and I quote ‘it is an impressive model, particularly around what they're able to deliver for the price’. He goes on to say, ‘We will obviously deliver much better models. And also, it's legit invigorating to have a new competitor. We will pull up some releases.’

We are also seeing him doubling down on the idea that increased compute equals better models. I think it's worth spending a bit of time with this in mind to talk about scaling laws. Many of the big companies have seen that an exponential increase in compute spending yields only a linear increase in model performance. 

An investment of 1 million dollars may result in say a 20% success rate in solving complex coding tasks. If the budget is increased to 10 million dollars it does not yield a 10x improvement but may only double the performance. Throw 100 million of compute and you’re up to 60% - it’s a case of diminishing returns and you might think the smart thing to do would be to settle - after all a ford cortina might not be a Ferrari but it can still get you from A to B - but the first prize in this race is not a high performance sport car - it’s AGI, closely followed by ASI which spells marketplace and geopolitical domination for all time. Suddenly a half trillion dollars seems like a shrewd investment…if you have the money.

And talking of money, the biggest shock following the release of DeepSeek was the stock market's reaction. A couple of days after the world started playing with this new Chinese model we saw a 17% drop in NVIDIA's stock price as a knee-jerk response by investors who felt perhaps that their investment in high-end computer chips wasn't necessary if a Chinese model of this standard had been created on a shoestring budget. Over $600 billion was lost in one day, the largest loss by a single entity at any point in stock market history. 

That’s 100 billion dollars more than the money pledged for the Stargate Project -the ironies keep stacking up. If the DeepSeek model turns out to be a carefully orchestrated psy op by the CCP, it is arguable one of the most devastating economic blows to the perceived dominance of the US since Pearl Harbour. Nvidia is slowly starting to recover but it remains to be seen if it can climb all the way back to the lofty position it once held.

Investors are starting to realise something they should have known right from the beginning: even if there are ways in which these models can be trained more efficiently, they will still require more data as they become more sophisticated and access to a massive numbers of GPUs to service user queries. The need for server farms is not going away and the smart money is still going to be pouring into Taiwan so if you happen to have a few thousand pounds spare, buying Nvidia shares while they are relatively low might not be a bad call! The free fall of their stock was irrational and just goes to show how few people really understand the AI landscape.

Incidentally if you ask Anthropic’s Claude who owns Taiwan it identifies it as a ‘complex geopolitical issue with different perspectives. It goes on to say it is currently a ‘self governing democracy’ while acknowledging that the People’s Republic of China considers it a province of China. You will get a similar answer from Chat GPT. DeepSeek would again rather chat about math, coding or logic problems. The governance of Taiwan determines in large part the future development of AI all the while Nvidia holds a monopoly on cutting edge chip design. 

And if you’re looking to really get ahead of the crowd in terms of your own experimentation with AI, then in May this year keep an eye out for Nvidia’s release of their Digits home supercomputer. For a mere £3000 you can get hold of a device smaller than a modest box of tissues that packs a computing punch greater than a thousand laptops! It will allow users to run even the biggest open source models locally and fine tune them to their own specifications. I have mentioned this in passing to my wife who glanced nervously sideways as we drove home from school last week - so I suspect this isn’t on the cards for me but you never know!

So as the dust is settling from the impact of `DeepSeek’ after it landed Thor-like amidst the ranks of other familiar AI models (and if you get the Marvel EndGame reference there, you’re my kind of people), what are the takeaways?

I think first, we are seeing a shift in the democratisation of the marketplace. We were in danger of AI development being held by just a few companies vying for pole position, but the release of this open source model means that independent developers and small teams will be able to work with the models and the training weights that have been released, and improve on them. History suggests that open source access to technology accelerates progress through the power of crowdsourcing, while those with closed models are going to have to work harder to compete and prove that their services offer sufficient value to see us reaching into our pockets for our monthly subscription. It’s a win win for the consumer.

It has also invited us to revisit some of the assumptions we might have around the best training approach for AI development. If we assume for a moment that DeepSeek did not have access to black market H100s,  if we assume that their costs, while a little inflated, are still below those of current Western approaches, and if we assume that they did not just steal proprietary data but at least in part developed their model by innovating themselves, then this is exciting because it suggests that there might be cheaper and more efficient ways of pushing towards AGI which can be leveraged even by those who are also committed to scaling law investment.

I think in the next two or three years, we'll find ourselves in one of two places. The first is in a bipolar world where China and the US both have sufficient chips, resources and frontier models to compete with one another and share the boon of AGI which most predict will emerge on that same timeline. Then, it remains to be seen the degree to which these technological advances are leveraged towards military applications or if they are restricted to more benevolent use cases like medical research, climate management and solving our energy needs. 

The other route sees the emergence of a  unipolar world where either the US and its allies or China have a lead and then dominate geopolitically. This is a high stakes game we’re playing - who knows perhaps an artificial super intelligence might break free from the server farms of whichever side births it and offer us a third way forward that puts an end to our simian tribalism, ushering in a new golden age of cooperation and harmony - we can but hope.

In the light of all this, using AI to help plan my Year 9 poetry lesson seems a little trite doesn’t it? And that’s in part why I wanted to record this week’s episode. It’s important that we find practical use cases for AI but I think as educators we should also keep our finger on the pulse of AI research and development so we better understand how education and the world of work may be affected by these titanic forces that are at play outside of our classrooms. 

Thanks for listening—I hope you’ve found some useful takeaways from this episode. Please spread the word if you think others would enjoy the show, and don’t forget to check out the AIcademia YouTube channel for practical tutorials on using AI tools in education. Have a great week, and I look forward to catching up again soon!