In this episode, we explore two new frontier AI model releases - Grok 3 from XAI and Claude 3.7 from Anthropic. The AIcademia podcast is a weekly show helping educators like you leverage AI in your everyday practice. I'm your host, Andy Fisher, and thanks for joining me.

At the end of last month, February 2025, you may have felt a disturbance in the force as we saw the release of two formidable models in the AI universe - Grok 3 and Claude 3.7. Today, we're going to have some fun comparing these tools using a Star Wars metaphor that I think works surprisingly well. Elon Musk's Grok 3 is arguably the Sith Lord of AI offerings - it’s powerful, the most uncensored model in the marketplace and it even has a chat mode labelled ‘unhinged’.  In contrast, Claude 3.7, with its constitutional approach and careful guardrails, represents the more benign and disciplined path of the Jedi. So, let’s compare the two, and ask if either of these AI tools should become the daily driver for educators.

If you are a Chat GPT user then it might be the case that you’ve not had any experience of either of these brand offerings so here’s a little background for you.

Grok from xAI was originally released in November 2023 and is a large language model similar to Chat GPT but it was created from the start as a system which would reflect its founder’s somewhat controversial political stance. Musk was, after all, one of the co-founders of Open AI but left following a power struggle with Sam Altman and he has since condemned Open AI as being too left-leaning. 

Grok was therefore built with what some refer to as an ‘anti-woke’ mandate and is programmed to respond with edgy and provocative outputs. It is also distinctive because from the outset it has had access to the X platform. Formally known as Twitter, this real-time information database means that Grok does not suffer from the knowledge cut-off that is a necessary limitation of most other models. Initially, it was only available to Premium users on the X platform but has more recently offered standalone access via iOS and Android apps as well as through a desktop URL.

The name Grok originates from Robert A Heinlein’s 1961 sci-if novel ‘Stranger in a Strange Land’ in which a Martian uses the term grok to describe an intuitive and immediate understanding of a complex idea. If you ‘grok’ it, you get it in the same way that Neo in The Matrix receives a download to his operating system and suddenly knows kung fu. The company’s mission statement is a lofty one - ‘to assist humanity in its quest for understanding and knowledge’ but as we will come to see, the Grok model does have the whiff of the Sith despite these philanthropic claims. 

Anthropic was founded in 2021 by several ex-OpenAI researchers including Daniela and Dario Amodei who left after expressing concerns about what they considered to be the unchecked acceleration of Chat GPT models.

Anthropic’s Claude was first released 8 months before the first Grok model in March 2023. Its name is likely inspired by Claude Shannon, an American pioneer in Artificial Intelligence who is often referred to as the ‘father of information theory’.

The model is built on a constitutional framework which has been praised by many who advocate for a cautious approach to AI development while others have criticised the company for being too stringent in its application of ethical alignment principles. Its constitution demands that the model choose the response which is least dangerous or hateful, most truthful and which conveys benevolent intentions, and while these would seem sound directives for any AI system, it has resulted in a much slower and less headline-grabbing path to market. Open Ai and Grok have been vying for headlines while Anthropic is just quietly working on its models in the background, stoic or perhaps Jedi-like in its demeanour.

The more recent offerings from Anthropic were named after poetic forms - Claude Haiku, Sonnet and Opus and the models were renowned for their last context window - that is the number of tokens you could use in a prompt and receive in an answer. It was also the go-to model for copywriters and coders because the platform seemed to produce more consistent and human-like responses.

So that gives you a little background on each company - let’s now turn our attention to the state of play now in March 2025 - what do Grok and Claude 3.7 have to offer to the teacher looking to integrate AI tools into their workflow?

Before you choose which side of the Force to align with, let's talk about what it costs to wield these powers. Both models offer free tiers with certain limitations 
Grok 3's free version allows you about 10 questions every two hours, which might be enough for casual exploration but won't sustain intensive classroom use. For full access to Grok's powers, you'll need an X Premium+ subscription at $40 per month - not insignificant for educators on a budget. 

Alternatively, if you’re not an X user, there's also the stand alone "SuperGrok" option. This can be accessed by app or from the desktop, costs $30 a month and this is the subscription I’m currently trialling. While it’s not clear what the usage rates are on the paid tier, I’ve been using it intensively over the last few weeks and haven’t hit a cap if one exists.

Claude 3.7 also has a free plan which gives you limited access to its latest model but there are some features which you won’t be able to access unless you upgrade to the Pro plan. This costs $18 per month if billed annually or $20 monthly - so it’s significantly more affordable than Grok's offering and on par with the Chat GPT Plus subscription. For schools and universities, there's also a Team plan at $25-$30 per user per month with a minimum of 5 users, and custom Enterprise pricing for larger institutions. I’m currently trialling the monthly Pro plan so I can meaningfully compare the two models and again, so far, I’ve not hit usage limits despite heavy testing.

Now, if you’re a teacher and have no intention of paying for an AI Chatbot subscription, then my recommendation would be to sign up for Chat GPT, Anthropic, Grok, Perplexity and Gemini accounts and toggle between the models as and when your daily usage allowances are maxed out. It means that you won’t have the continuity or convenience of having all your work in one place, and you’ll have to get familiar with each model’s UI, but it is certainly possible. Incidentally, most of my focus today is on using these systems as a teacher - if you are looking for options to use in the classroom with students, just make sure you look into the age restrictions for account access. For most LLMs, it is currently 13 years old, but Anthropic clearly states that users must be 18 years old minimum, and this might affect your model choice, too.

Before moving back to the Sith vs. Jedi focus of this episode, I should just mention that there are also now an increasing number of services that offer umbrella access to a range of different models including image generation and the option to build agents at a very reasonable cost. These companies rent server space and then provide limited access through APIs for monthly subscriptions. The one that I’ve used for some time now is called SimTheory and is provided by brothers Michael and Chris Sharkey who are based in Australia and also produce a weekly podcast of the same name.

They describe SimTheory as an AI Ecosystem and they provide Member, Pro and Max accounts with increasing usage limits. Using their platform you can access most of the Chat GPT models, Anthropic, Grok, DeepSeek, Gemini and many more and I pay $20 a month for the Pro Plan. I’m not affiliated with SimTheory but I have to say that if I were looking to explore the range of options out there, this is an excellent way of being able to play with them all without forking out for several monthly subscriptions and their podcast is really worth a listen too!

Returning to the Grok/Claude face-off, let’s now turn our attention to what each model can do. First, let’s turn to the dark side. Grok 3 is a very impressive model in terms of its speed and processing power. I mentioned a couple of episodes ago that billionaire Musk has spared no expense in creating the infrastructure for his Frontier challenger. His team remarkably built a server farm consisting of 100,000 Nvidia H100 GPUs in just 122 days which is an astonishing feat. He has since doubled the computing power to 200,000 GPUs and this means that it can hold its own with any other AI model on the market.

In the various benchmarks used to test LLMs Grok 3 has shown impressive results, outperforming Gemini, DeepSeek, and the older Claude 3.5 in Science, Maths, Coding and problem-solving. Perhaps most impressively it is the first model to breach the 1400 ELO rating in the LMSYS Arena which you can think of as an AB tester site for AI models where users blind test two models side by side with the same prompt and then rank the outputs. This is important because there’s evidence that companies are now building systems specifically with benchmarks in mind; in contrast, the Arena platform is ranked by real people trying to solve real-world challenges and so for me, it has more credibility.

The plus icon on the left-hand side allows you to attach photos or files as part of your prompt and I’ll return to this functionality in a while.

Grok 3 has advanced reasoning capability which you can activate by clicking on the ‘Think’ icon in the prompt interface. This means that it employs a chain of thought with self-correction, using more inference time before outputting a better and more nuanced response. 

It also has a ‘Deep Search’ feature which is Grok’s equivalent of Perplexity and Open AI’s ‘Deep Research’ tool which I explored a couple of episodes ago. This allows the model to scan the internet and draw on a wider range of sources including real-time data before providing its answer. In recent use cases, I’ve seen the model draw on close to 100 sources and they are all cited at the bottom of the output in case you want to mine into them at a later date. This feature is coming to the Plus account of Chat GPT but has been hidden behind a $ 200-a-month paywall up to this point. For teachers looking to research developments in an academic field or curate real-time data on a topic, this mode is invaluable and is probably the feature I’ve used most with my Grok subscription.

It can also read and generate images using ‘Aurora’, a text-to-image generation tool built into the model. The images created are far less censored than any other model I have used - for example, it had no problem creating a photorealistic image of Trump in a clown costume riding a donkey which is a request denied by many models in the marketplace. This of course is a cause for concern if the model is misused to produce deepfakes because they are photorealistic. Nonetheless, the images are of a high standard and are generated quickly.

It also has the largest input context window on the market with a staggering 1 million tokens available - in comparison, to Claude’s comparatively modest 200k. That having been said, 200,000 tokens is the equivalent of about 150,000 words so unless you are uploading whole novels as part of your prompt, this might not make any practical difference - and both have an output limit of 128,000 tokens or 73k words which is plenty for most use cases.

Before going on to consider its limitations, I should also mention the extraordinary voice mode which is available on the mobile app. Similar to the Chat GPT advanced voice mode, this is triggered by touching one of the icons on the  top of the app screen and you’ll then see a number of options available - it defaults to assistant mode but you could also select ‘storyteller’, ‘romantic’, ‘meditation’, ‘conspiracy’, ‘not a therapist’, ‘not a doctor’, ‘unhinged’, ‘sexy’, ‘motivation’ and finally ‘argumentative’. Many of these modes come with an 18+ advisory rating and with good reason. If you are looking to test the limits of the uncensored nature of the Grok model, this is a good place to start but make sure you have no kids in earshot and prepare to blush - these interfaces are not for the feint-hearted and neither do they align with Musk’s lofty mission statement cited earlier. 

If I put aside the moral reservations I might have about about supporting Musk given the current political climate, there are some issues with the Grok platform from a functional perspective. 

First, users have commented on the inconsistency of the model. At times it can give brilliant insights and laser-sharp accuracy and at other times it can be riddled with hallucinations or bland in its content.

Next, image analysis is not its strong suit. If you load a chart and ask for data insights or a screenshot of a table of data, it can sometimes misinterpret the content.

Finally, while blisteringly fast, Grok can give shallow and repetitive answers which look good at first glance but lack depth or nuance when read closely.

Now, to be fair to the XAI team, it is in Beta and on release they did say that it will be improved over the next couple of months but this review is based on what is available to date and not what it might promise in the future.


Turning our attention now to Claude 3.7, it is also a multimodal modal. You can attach images or files and use these along with your prompt. There is also a style function where you can refine the output so that it can give a default response or something more concise, explanatory or formal. As with Grok you can type in your prompt or press the microphone icon and speak and the model will transcribe your input to save time. 

On the desktop version, you can also choose from a ‘normal’ and ‘extended’ mode if you have the paid plan which is Anthropic’s reasoning model which employs the same inference time chain of thought as Grok’s ‘Think’ mode and is better for complex problem solving and advanced coding.

Anthropic recently published the findings of a body of research looking at which sectors and industries in the US were most likely to use AI. Unsurprisingly Office and Admin support ranked high while fishing, farming and forestry showed the lowest inclination of use. Education fell somewhere in the middle with 9.3% of teachers using Claude. This of course only takes Claude’s usage into account and I suspect that Chat GPT has a larger share of teachers on their platform because of their first-to-market prominence. Nonetheless, by far the highest-ranked group using Anthropic models are those in Mathematics and Computer use because it is in coding that Claude 3.7 really excels. 

Having played with a number of no-code and low-code platforms like Replit, Bolt and Cursor over the last year, I cannot overstate how impressive the new Claude offering is in spinning up apps and code snippets from one-shot prompts. I asked for a 3D animation of the solar system and 3 minutes later I had a fully working model. If I paused and clicked on any planet it provided interesting facts and figures - not something I asked for but this functionality was intuited by the model. Next, I input a screenshot of a collection of IGCSE poems from my current anthology along with the dates when each poem had last been used in an exam. With a single prompt, it converted this dry data into a colourful interactive infographic. Finally, I asked for a flashcard app for testing French vocabulary to do with the weather - it produced a fully functioning learning tool with icons and a scoring system. In each case Claude’s ‘artifact’ mode allowed the output to appear on the screen as a working preview and with a single click, I could save the file as an html file and then run it independently in the future.

This is not to say that Grok 3 is bad at coding - it’s good but not in my experience a rival for Anthropic’s latest release.

My point here is that Claude 3.7 is not just a model suited to developers - it heralds a new way in which I think teachers will be using AI in the future. We are entering an era in which we will spin up disposable software in real-time as a means of teaching a specific topic or illustrating a concept. If you are teaching the periodic table and want to demonstrate the atomic arrangement for the first 10 elements, a single prompt can create an interactive learning tool that visualises that concept. If you are teaching the water cycle, or the golden mean, or magnetic fields, you will be able to generate a bespoke application to suit your needs and then save it for future use, share it with learners on their devices or just delete it when the lesson is over. Anthropic’s ‘Publish’ feature also means you can publish the output to the web on a public-facing URL, separate from the rest of the chat.

In this sense, Claude 3.7 is a versatile and impressive model. But this is not to say that old Obi-Wan is not without his own shortcomings - and they are a real cause for concern. First, Claude is not web-enabled - it has no search facility and its data cut-off is October 2024. This alone may be a deal breaker for some teachers who want to be working with the most up-to-date information.

Next, Claude cannot produce images. While it is great at interpreting and analysing uploaded images, it has no native ability to generate them and so you’d need to go to a different platform like Midjourney or Ideogram for this. Again, perhaps not a critical issue but it would be nice to have a model that does everything rather than have to move between platforms.

Finally, while the Constitutional foundation on which Claude is built makes it the ethical choice for the discerning teacher, at times it can feel too restrictive and will refuse what I consider a perfectly reasonable request because it conflicts with its hair-trigger policy guidelines.

Censorship is of course a huge topic and one which has potentially far-reaching consequences. As a liberal-minded teacher, my instincts are inclined towards freedom of choice. I don’t want the nanny state dictating what I can or can’t read, say or think - but on the other hand, I’m very grateful that the roads are policed, my food is protected by stringent regulations and bad actors can’t pick up a six-pack of depleted uranium in the local supermarket. There is always a balance to be had and the challenge with AI is that it is evolving too quickly for most governments to invoke fit-for-purpose policies to ensure they are suitably aligned and contained.

Artificial Intelligence companies are not, on the whole, Sith Lords seeking to dominate the universe and destroy whole planets on a whim but most are profit driven and they all believe they are in a race to achieve AGI and so temperance is in short supply. 

Over half term, I was fortunate to attend an online presentation from IBM directed primarily to businesses wanting to know how to manage the risk that comes with scaling AI. The discussion identified three kinds of risk - regulatory, reputational, and operational. The issue of governance is concerned with ensuring the right policies, expertise, practices and oversight are in place to protect the company and individuals when using AI to augment their work and thus reduce these risks and I was struck by just how much of what we covered is also applicable to schools. 

When we use an AI system in our own time, on our own device and in our own home, then it is simply a case of caveat empor but if we are on a school device or network, or if we are using that LLM in a classroom setting as a member of staff, we need to be much more cautious and appreciate the ease with which we might place our ourselves or our school in jeopardy.

There are some concerning posts online from consumers who have claimed that they have been able to use Grok 3 to access information that has significant risks attached. While I’m not willing to test the veracity of their claims for reasons that will become apparent, they are urging Musk’s team to undertake more robust Red team testing of the model because they have been able to access the knowledge needed to create biological and chemical weapons using consumer-available components. They also claim to have received advice on how to carry out an assassination of a state leader and have been advised on how to end their own life painlessly. It is not that this information isn’t available elsewhere on the web if you know where to look, but the fact that it can be so easily generated by anyone on this platform is obviously deeply worrying. The published age for access to this platform is 13 with parental permission but it is all too easy for a user even younger to create an account with no parental oversight. 

I have stated in earlier episodes that the future of AI in schools and businesses will likely involve locally hosted fine-tuned models which will not only reduce the risk of private data leakage but will also mean that institutions will be able to decide for themselves what level of access and censorship they would like to impose on users. Until then, we are obliged to use these models with a clear sense of what they can and can’t do.

For now, my personal advice is as follows. I am experimenting with a number of different models in my own time and on my own devices. In school, however, I use the Claude 3.7 model to create resources or carry out research because its constitutional approach means that I am confident that the appropriate governance is in place. I don’t however allow my pupils to use this model in lessons themselves because Anthropic’s policies are clear that no one under 18 can use their services. And if you do decide to give them access to a model with a 13+ sign-up, be aware that this requires explicit parental consent for which you as a teacher cannot serve in loco parentis. 

Yes - it’s a potential legal minefield - so make sure you are clear about your School’s policies and then get the right forms filled in before using AI as a tool in the hands of learners.

And on that sobering note, it’s time to wrap up this episode. Whether you are Sith or Jedi May the force be with you.

Thanks for listening— Please do spread the word if you think others would like the show, and do check out the AIcademia YouTube channel where you'll find practical tutorials that complement the topics covered on this podcast. Have a great week, and I look forward to catching up again soon.