1
00:00:00,000 --> 00:00:08,700
Welcome back to Voices of Tomorrow, the podcast where we explore the cutting edge of science, technology, and artificial intelligence.

2
00:00:08,700 --> 00:00:14,800
In recent years, AI has been nothing short of revolutionary, reshaping entire fields of study.

3
00:00:14,800 --> 00:00:23,700
In fact, AI's contributions to science have been so impactful that recently, two Nobel Prizes, one in physics and another in chemistry,

4
00:00:23,700 --> 00:00:27,800
have been awarded to researchers leveraging AI in their discoveries.

5
00:00:27,800 --> 00:00:35,000
We've discussed the profound advancements in AI-enabled protein prediction and design that led to these Nobel wins.

6
00:00:35,000 --> 00:00:42,600
But today, we dive into an area that, until recently, has remained hidden from our view, the RNA virus sphere.

7
00:00:42,600 --> 00:00:52,800
This is the world of RNA viruses, an incredibly diverse and fast-evolving group of viruses that, despite their ubiquity, have remained largely unexplored.

8
00:00:52,800 --> 00:00:58,700
Why? Because traditional methods have struggled to keep pace with their rapid evolution and diversity.

9
00:00:58,700 --> 00:01:04,000
But now, thanks to artificial intelligence, we are witnessing a monumental shift.

10
00:01:04,000 --> 00:01:13,300
AI is not only accelerating the discovery of RNA viruses but also revealing entirely new species and viral ecosystems we never knew existed.

11
00:01:13,300 --> 00:01:18,600
Today's episode will be focused on a groundbreaking study recently published in Cell,

12
00:01:18,600 --> 00:01:24,400
which used AI to document the vast, hidden diversity of RNA viruses in the environment.

13
00:01:24,400 --> 00:01:29,000
To understand the significance of this study, let's first set the stage.

14
00:01:29,000 --> 00:01:38,600
RNA viruses are everywhere. They infect nearly every form of life, plants, animals, fungi, bacteria and even other viruses.

15
00:01:38,600 --> 00:01:45,900
Some RNA viruses are well known to us, like those that cause influenza, HIV or COVID-19,

16
00:01:45,900 --> 00:01:51,200
but the vast majority of RNA viruses remain unknown and uncharacterized.

17
00:01:51,200 --> 00:01:56,100
Scientists refer to this uncharted territory as the RNA dark matter.

18
00:01:56,100 --> 00:02:02,500
So why have these viruses been so elusive? It's because RNA viruses are masters of evolution.

19
00:02:02,500 --> 00:02:11,000
They mutate and evolve much faster than their DNA counterparts, making it difficult for traditional tools to recognize and categorize them.

20
00:02:11,000 --> 00:02:19,400
Moreover, the key protein used to identify RNA viruses, RNA-dependent RNA polymerase, or RDRP,

21
00:02:19,400 --> 00:02:26,100
often evolves so rapidly that it becomes almost unrecognizable to standard sequence-based search tools.

22
00:02:26,100 --> 00:02:33,300
Without a reliable way to detect this polymerase, vast sections of the RNA virus sphere remain undiscovered.

23
00:02:33,300 --> 00:02:36,900
Now, imagine the challenge this presents to researchers.

24
00:02:36,900 --> 00:02:43,600
It's like trying to solve a jigsaw puzzle where many of the pieces are not just missing but constantly changing shape.

25
00:02:43,600 --> 00:02:46,900
This is where artificial intelligence comes into play.

26
00:02:46,900 --> 00:02:52,100
AI offers a way to make sense of this puzzle, even as the pieces shift before our eyes.

27
00:02:52,100 --> 00:03:00,100
Enter LucaProt, the AI-powered tool at the heart of this study, which is revolutionizing the way we explore the RNA virus sphere.

28
00:03:00,100 --> 00:03:06,100
LucaProt is built using deep learning, specifically leveraging the transformer architecture,

29
00:03:06,100 --> 00:03:11,600
the same architecture behind language models like GPT-3 and GPT-4.

30
00:03:11,600 --> 00:03:20,500
But instead of generating text, LucaProt is tasked with finding hidden RNA viruses in massive datasets of metagenomic sequences.

31
00:03:20,500 --> 00:03:24,700
The researchers behind LucaProt, as detailed in the cell paper,

32
00:03:24,700 --> 00:03:29,000
aim to tackle one of the most fundamental problems in viral discovery,

33
00:03:29,000 --> 00:03:34,900
identifying highly divergent RDRP sequences in vast amounts of metagenomic data.

34
00:03:34,900 --> 00:03:41,200
Traditional tools had been limited by their reliance on close-sequence similarity to known viruses.

35
00:03:41,200 --> 00:03:46,100
In contrast, LucaProt doesn't just match sequences to known examples.

36
00:03:46,100 --> 00:03:53,000
It uses AI to predict the structure and function of highly divergent sequences that have no close-known relatives.

37
00:03:53,000 --> 00:03:58,200
LucaProt integrates sequence data with structural predictions from ESMfold,

38
00:03:58,200 --> 00:04:04,400
a powerful AI tool developed by Meta that predicts the three-dimensional structures of proteins.

39
00:04:04,400 --> 00:04:07,900
By combining both the sequence and structural information,

40
00:04:07,900 --> 00:04:12,300
LucaProt is able to detect viruses that traditional models would miss,

41
00:04:12,300 --> 00:04:17,600
even when their RDRP sequences have diverged dramatically from known viruses.

42
00:04:17,600 --> 00:04:23,700
This approach allowed LucaProt to scan through 10,487 metatranscriptomes,

43
00:04:23,700 --> 00:04:29,300
datasets representing genetic material collected from diverse ecosystems around the world.

44
00:04:29,300 --> 00:04:34,200
In total, LucaProt analyzed 51 terabytes of sequencing data,

45
00:04:34,200 --> 00:04:44,900
ultimately identifying a staggering 161,979 putative RNA virus species and 180 RNA virus supergroups.

46
00:04:44,900 --> 00:04:52,700
To put this in perspective, this discovery represents one of the largest expansions of the known RNA virus sphere in history.

47
00:04:52,700 --> 00:04:56,400
Now, let's break down how LucaProt actually works.

48
00:04:56,400 --> 00:05:02,200
At its core, LucaProt uses a deep learning model based on the transformer architecture.

49
00:05:02,200 --> 00:05:08,100
What makes the transformer architecture so powerful is its ability to process sequences of varying lengths

50
00:05:08,100 --> 00:05:12,600
while capturing both local and long-range dependencies within the data.

51
00:05:12,600 --> 00:05:16,100
This makes it ideal for biological sequence analysis,

52
00:05:16,100 --> 00:05:21,700
where relationships between distant parts of a sequence can be crucial for understanding function.

53
00:05:21,700 --> 00:05:25,300
LucaProt doesn't stop at analyzing the sequence alone.

54
00:05:25,300 --> 00:05:31,200
It integrates the predicted 3D structure of proteins, using the AI model ESMfold.

55
00:05:31,200 --> 00:05:35,900
ESMfold predicts how a protein will fold into its three-dimensional shape,

56
00:05:35,900 --> 00:05:40,700
which is critical for understanding how that protein functions in the real world.

57
00:05:40,700 --> 00:05:46,200
This structural prediction is key because even if two protein sequences are highly divergent,

58
00:05:46,200 --> 00:05:50,600
their three-dimensional structures might still perform similar functions.

59
00:05:50,600 --> 00:05:57,700
For example, let's say you have two RNA viruses whose RDRP sequences look almost nothing alike.

60
00:05:57,700 --> 00:06:01,400
Traditional models would miss the connection, but LucaProt,

61
00:06:01,400 --> 00:06:04,900
thanks to its deep learning model and structural predictions,

62
00:06:04,900 --> 00:06:11,200
can recognize that both sequences fold into similar structures and therefore likely perform the same function.

63
00:06:11,200 --> 00:06:16,500
This allows LucaProt to identify highly divergent viruses that have evolved to the point

64
00:06:16,500 --> 00:06:20,900
where their sequences are barely recognizable compared to known examples.

65
00:06:20,900 --> 00:06:27,200
In fact, LucaProt was able to identify 70,000 novel viruses in just this way.

66
00:06:27,200 --> 00:06:30,800
Many of these viruses were found in extreme environments,

67
00:06:30,800 --> 00:06:34,800
such as salt lakes, hot springs, and hydrothermal vents,

68
00:06:34,800 --> 00:06:39,500
places where viral life has adapted to thrive under extreme conditions.

69
00:06:39,500 --> 00:06:44,200
Some of the viruses LucaProt discovered were unlike anything seen before,

70
00:06:44,200 --> 00:06:49,700
with genomes that stretched up to 47,250 nucleotides in length,

71
00:06:49,700 --> 00:06:53,600
far exceeding the typical size of known RNA viruses.

72
00:06:53,600 --> 00:06:57,700
To give you a better sense of the power of LucaProt's eye-driven approach,

73
00:06:57,700 --> 00:07:01,800
imagine trying to identify distant relatives at a family reunion.

74
00:07:01,800 --> 00:07:07,000
Traditional viral discovery methods would be like searching for familiar faces in the crowd.

75
00:07:07,000 --> 00:07:12,400
If someone doesn't closely resemble a family member you already know, you'd miss them completely.

76
00:07:12,400 --> 00:07:18,800
LucaProt, however, is like having a tool that can identify even the most distantly related family members

77
00:07:18,800 --> 00:07:22,200
by looking at more than just surface-level features.

78
00:07:22,200 --> 00:07:25,100
It can find commonalities at a deeper level,

79
00:07:25,100 --> 00:07:28,600
whether they share a similar bone structure or genetic markers,

80
00:07:28,600 --> 00:07:31,500
even if their appearance has changed dramatically.

81
00:07:31,500 --> 00:07:34,700
This ability to detect hidden patterns and make connections

82
00:07:34,700 --> 00:07:40,400
that humans and traditional tools might miss is exactly what makes LucaProt so revolutionary.

83
00:07:40,400 --> 00:07:44,000
It's a reminder of the power of AI not just to analyze data,

84
00:07:44,000 --> 00:07:48,100
but to discover new realms of knowledge that were previously inaccessible.

85
00:07:48,100 --> 00:07:53,400
LucaProt's discoveries don't just expand our understanding of the RNA virus sphere.

86
00:07:53,400 --> 00:07:58,100
They point to a future where AI is at the forefront of biological discovery.

87
00:07:58,100 --> 00:08:00,700
The implications of this work are profound.

88
00:08:00,700 --> 00:08:05,200
We're now able to explore viral diversity at a scale never before possible,

89
00:08:05,200 --> 00:08:11,800
opening up new avenues for research into viral ecology, evolution, and host-virus interactions.

90
00:08:11,800 --> 00:08:16,200
One of the most exciting applications of this research is in understanding the role

91
00:08:16,200 --> 00:08:19,500
these newly discovered viruses play in ecosystems.

92
00:08:19,500 --> 00:08:25,400
Are they infecting plants, animals, or even microorganisms we haven't yet identified?

93
00:08:25,400 --> 00:08:28,300
How do they influence the environments they inhabit?

94
00:08:28,300 --> 00:08:32,500
These are questions that LucaProt's discoveries are just beginning to address.

95
00:08:32,500 --> 00:08:36,000
And as AI models like LucaProt continue to improve,

96
00:08:36,000 --> 00:08:41,800
we will gain more detailed insights into the complex web of life shaped by RNA viruses.

97
00:08:41,800 --> 00:08:45,800
However, this work also highlights a number of open questions.

98
00:08:45,800 --> 00:08:50,900
For instance, while LucaProt identified tens of thousands of novel viruses,

99
00:08:50,900 --> 00:08:56,200
there remains a significant gap in our understanding of the hosts these viruses infect.

100
00:08:56,200 --> 00:09:02,800
No clear RNA virus has been shown to infect archaea, for example, leaving us with the question,

101
00:09:02,800 --> 00:09:07,500
are there RNA viruses infecting these organisms, or are they immune?

102
00:09:07,500 --> 00:09:11,500
These are the kinds of questions that future research will need to answer.

103
00:09:11,500 --> 00:09:16,500
While LucaProt has made incredible strides, we are still only scratching the surface

104
00:09:16,500 --> 00:09:19,500
of what AI can uncover in the biological world.

105
00:09:19,500 --> 00:09:25,800
Let's take a moment to step back and see where this work fits within the larger context of eye-driven discoveries.

106
00:09:25,800 --> 00:09:29,800
Earlier this year, we celebrated AI's contribution to science

107
00:09:29,800 --> 00:09:33,700
when the Nobel Prize in Chemistry was awarded for AlphaFold,

108
00:09:33,700 --> 00:09:39,000
a deep learning model that predicts the structure of proteins with unprecedented accuracy.

109
00:09:39,000 --> 00:09:42,800
LucaProt represents the next phase of this AI evolution.

110
00:09:42,800 --> 00:09:45,700
While AlphaFold focuses on known proteins,

111
00:09:45,700 --> 00:09:51,800
LucaProt pushes the boundaries even further by discovering completely unknown RNA viruses,

112
00:09:51,800 --> 00:09:54,500
many of which have never been seen before.

113
00:09:54,500 --> 00:09:58,200
In a way, LucaProt builds on AlphaFold's legacy,

114
00:09:58,200 --> 00:10:03,700
using AI to explore the hidden corners of biology that have long eluded scientists.

115
00:10:03,700 --> 00:10:09,800
Both LucaProt and AlphaFold highlight the incredible power of AI not just to solve existing problems

116
00:10:09,800 --> 00:10:14,500
but to uncover entire realms of life that were previously beyond our reach.

117
00:10:14,500 --> 00:10:17,500
AI is no longer just an analytical tool,

118
00:10:17,500 --> 00:10:21,400
it's becoming an indispensable partner in biological discovery.

119
00:10:21,400 --> 00:10:28,300
The RNA viruses uncovered by LucaProt are not simply an addition to our database of known life forms.

120
00:10:28,300 --> 00:10:31,200
They represent a new frontier in virology,

121
00:10:31,200 --> 00:10:36,900
one that could reshape how we think about viral evolution, ecology, and even human health.

122
00:10:36,900 --> 00:10:38,400
So, what's next?

123
00:10:38,400 --> 00:10:44,100
LucaProt's discovery of tens of thousands of new RNA viruses is only the beginning.

124
00:10:44,100 --> 00:10:50,500
AI is poised to play an even bigger role in biology as these tools become more refined and more powerful.

125
00:10:50,500 --> 00:10:54,300
With AI's help, we're not just uncovering hidden viruses,

126
00:10:54,300 --> 00:10:58,900
we're gaining new insights into how ecosystems function, how life evolved,

127
00:10:58,900 --> 00:11:03,600
and even how viruses might be harnessed for future biomedical applications.

128
00:11:03,600 --> 00:11:10,800
The sheer scale and speed at which AI can process and analyze biological data is unlike anything we've seen before.

129
00:11:10,800 --> 00:11:14,000
As models like LucaProt continue to improve,

130
00:11:14,000 --> 00:11:18,400
we will unlock even more secrets hidden in the biological, dark matter,

131
00:11:18,400 --> 00:11:21,400
transforming our understanding of life on Earth.

132
00:11:21,400 --> 00:11:25,000
Thank you for tuning into this episode of Voices of Tomorrow,

133
00:11:25,000 --> 00:11:32,400
where we explored the vast, hidden virus sphere and how AI is unlocking secrets that have eluded humanity for centuries.

134
00:11:32,400 --> 00:11:37,300
As we've seen, the boundaries of science are expanding faster than ever before,

135
00:11:37,300 --> 00:11:41,900
and with the help of AI, we are rewriting the very code of life itself.

136
00:11:41,900 --> 00:11:43,700
But this is just the beginning.

137
00:11:43,700 --> 00:11:50,900
The discoveries we've discussed today are a glimpse into a future where AI will lead us to answers we didn't even know to ask.

138
00:11:50,900 --> 00:11:59,300
So, whether you're a researcher on the front lines of innovation, or someone fascinated by the potential of technology to transform our world,

139
00:11:59,300 --> 00:12:01,200
I invite you to stay connected.

140
00:12:01,200 --> 00:12:07,900
Be sure to subscribe, share your thoughts, and let us know what excites you most about the future of AI.

141
00:12:07,900 --> 00:12:15,800
In the episodes to come, we'll continue our deep dive into the cutting-edge intersections of technology, biology, and beyond.

142
00:12:15,800 --> 00:12:19,300
Will we explore the mysterious depths of quantum computing?

143
00:12:19,300 --> 00:12:23,600
Or perhaps investigate the next wave of breakthroughs in eye-driven healthcare?

144
00:12:23,600 --> 00:12:26,500
Whatever it may be, I can promise you this.

145
00:12:26,500 --> 00:12:33,200
The discoveries we will uncover together will challenge what we thought we knew and inspire new visions of what's possible.

146
00:12:33,200 --> 00:12:49,900
So, join us again next time on Voices of Tomorrow, where today's voices of innovation are shaping the future, one breakthrough at a time.

