WEBVTT

00:00:00.000 --> 00:00:02.540
Welcome back to the deep dive, everybody. Today,

00:00:02.600 --> 00:00:07.559
we're diving into a pretty complex topic. Multiscale

00:00:07.559 --> 00:00:10.480
modeling. Yeah. You know, it's kind of like trying

00:00:10.480 --> 00:00:12.679
to understand, let's say, a hurricane, right?

00:00:12.699 --> 00:00:15.279
OK. You could look at the swirling clouds from

00:00:15.279 --> 00:00:19.179
a satellite. But to really predict its path and

00:00:19.179 --> 00:00:21.859
get its power, you also have to understand like

00:00:21.859 --> 00:00:24.000
those. tiny little air currents and pressure

00:00:24.000 --> 00:00:26.600
systems. That's kind of what multi -skill modeling

00:00:26.600 --> 00:00:28.699
is all about. It's about using all these different

00:00:28.699 --> 00:00:30.859
lenses, right? Right. To try and understand this

00:00:30.859 --> 00:00:33.539
complex system at different levels of detail.

00:00:33.579 --> 00:00:36.159
And this is important for, well, tackling a lot

00:00:36.159 --> 00:00:38.939
of different things like biological systems and

00:00:38.939 --> 00:00:40.920
even designing materials. Yeah. And especially

00:00:40.920 --> 00:00:43.079
biological systems are so hierarchical, right?

00:00:43.079 --> 00:00:47.000
You have from the quantum mechanics of the atom

00:00:47.000 --> 00:00:48.939
and then the molecules and the cells and the

00:00:48.939 --> 00:00:51.439
tissues and the organs. And to really understand

00:00:51.439 --> 00:00:53.939
how life works, you have to connect those dots

00:00:53.939 --> 00:00:56.399
between all those different levels. Yeah, absolutely.

00:00:56.560 --> 00:00:58.939
And that's what multiscale modeling tries to

00:00:58.939 --> 00:01:01.880
do. And this is helpful for all kinds of applications

00:01:01.880 --> 00:01:07.140
in biology, from enzymes, how they work, to ecosystems.

00:01:07.540 --> 00:01:10.079
It's really amazing. It's like, if you're trying

00:01:10.079 --> 00:01:13.040
to appreciate a painting and you only look at

00:01:13.040 --> 00:01:14.819
one brushstroke, you'd miss the big picture.

00:01:14.989 --> 00:01:17.810
Totally. And this isn't just limited to biology.

00:01:18.290 --> 00:01:19.969
Right. I mean, chemical engineers use this a

00:01:19.969 --> 00:01:23.170
lot to design catalysts or to optimize industrial

00:01:23.170 --> 00:01:26.069
processes. So it really is this powerful tool,

00:01:26.290 --> 00:01:28.790
this interdisciplinary approach that is really

00:01:28.790 --> 00:01:32.359
fascinating. And a lot of the most important

00:01:32.359 --> 00:01:36.459
processes in biology hinge on these complex interactions

00:01:36.459 --> 00:01:41.000
between proteins, lipids, nucleic acids. These

00:01:41.000 --> 00:01:43.659
interactions drive everything from how our cells

00:01:43.659 --> 00:01:46.819
communicate to how genes are processed. And modeling

00:01:46.819 --> 00:01:49.359
these is a huge challenge because it occurs at

00:01:49.359 --> 00:01:51.780
such a huge range of scales. I mean, we're talking

00:01:51.780 --> 00:01:55.620
about events that occur at fractions of a nanometer

00:01:55.620 --> 00:01:58.859
in femtoseconds all the way to micrometer -sized

00:01:58.859 --> 00:02:01.140
structures change over seconds or even longer.

00:02:01.760 --> 00:02:03.980
OK, so let's unpack that a little bit. Sure.

00:02:04.159 --> 00:02:06.260
You're saying that these traditional computer

00:02:06.260 --> 00:02:09.000
simulations really have a hard time capturing

00:02:09.000 --> 00:02:11.219
those tiny little interactions and these slow

00:02:11.219 --> 00:02:14.340
changes simultaneously. It's kind of like trying

00:02:14.340 --> 00:02:17.240
to photograph a bullet train and a snail with

00:02:17.240 --> 00:02:19.400
the same camera settings. Exactly. You'd probably

00:02:19.400 --> 00:02:22.379
miss one or the other. Totally. But this is where

00:02:22.379 --> 00:02:24.639
machine learning comes in. Yeah, exactly. This

00:02:24.639 --> 00:02:26.400
is where things get really, really interesting.

00:02:26.500 --> 00:02:29.039
That's right. So that leads us to what we're

00:02:29.039 --> 00:02:30.560
talking about today, this new infrastructure

00:02:30.560 --> 00:02:34.259
called MumMI, Multiscale Machine -Verned Modeling

00:02:34.259 --> 00:02:38.159
Infrastructure. And our mission, really, in this

00:02:38.159 --> 00:02:41.879
deep dive is to explore how MumMI is tackling

00:02:41.879 --> 00:02:46.259
these challenges, specifically how proteins interact

00:02:46.259 --> 00:02:49.379
with the membranes of cells using the Rostov

00:02:49.379 --> 00:02:52.719
pathway as a key example. And this pathway, of

00:02:52.719 --> 00:02:54.599
course, is really important because it regulates

00:02:54.599 --> 00:02:57.319
cell growth. And when things go wrong with it,

00:02:57.129 --> 00:03:00.430
it due to mutations or other things, it can lead

00:03:00.430 --> 00:03:02.889
to cancer. Right. So let's start with the challenges

00:03:02.889 --> 00:03:05.909
then of bridging these incredible scales, these

00:03:05.909 --> 00:03:07.810
time scales, these size scales. You mentioned

00:03:07.810 --> 00:03:10.550
molecular dynamic simulations. And these are

00:03:10.550 --> 00:03:13.069
crucial, but they have limitations. They do.

00:03:13.210 --> 00:03:16.250
They do. Molecular dynamic simulations are really

00:03:16.250 --> 00:03:19.289
amazing. They allow us to see these movements

00:03:19.289 --> 00:03:23.870
of atoms and molecules. But the problem is to

00:03:24.139 --> 00:03:27.060
accurately represent the forces and interactions

00:03:27.060 --> 00:03:29.719
across those spatial scales and across those

00:03:29.719 --> 00:03:33.879
time scales from femtoseconds to seconds is computationally

00:03:33.879 --> 00:03:36.759
a huge, huge challenge. Yeah. I mean, it's like

00:03:36.759 --> 00:03:39.539
trying to map every single ant in a colony, but

00:03:39.539 --> 00:03:41.560
also track how the whole colony is moving all

00:03:41.560 --> 00:03:43.960
at the same time. That's a great analogy, yeah.

00:03:44.080 --> 00:03:46.449
It's kind of mind boggling. It is. And many important

00:03:46.449 --> 00:03:49.090
processes, like when proteins assemble or the

00:03:49.090 --> 00:03:50.710
rearrangements of membranes, involve these huge

00:03:50.710 --> 00:03:53.650
structures that span many orders of magnitude.

00:03:54.750 --> 00:03:57.689
And traditional MD simulations struggle to capture

00:03:57.689 --> 00:04:00.889
both the rapid little vibrations and the slower

00:04:00.889 --> 00:04:02.569
movements. At the same time. At the same time.

00:04:02.610 --> 00:04:04.710
And then it just becomes computationally way

00:04:04.710 --> 00:04:07.770
too expensive. So you're forced to choose exactly

00:04:07.770 --> 00:04:10.430
between this incredible level of detail and this

00:04:10.430 --> 00:04:12.669
broader perspective. Yeah, that's right. And

00:04:12.669 --> 00:04:16.370
what's missing is a truly integrated framework

00:04:16.370 --> 00:04:18.470
that can handle these different levels at the

00:04:18.470 --> 00:04:21.810
same time. OK, right. So you can have an atomistic

00:04:21.810 --> 00:04:26.149
simulation that tells you how two molecules bind,

00:04:26.149 --> 00:04:28.509
but it won't tell you how often that happens

00:04:28.509 --> 00:04:30.689
in a cell, which is what you need for those larger

00:04:30.689 --> 00:04:33.129
scale models. Right. And then there's this time

00:04:33.129 --> 00:04:35.910
scale difference. Huge. Yeah. I mean the vibrations

00:04:35.910 --> 00:04:39.430
in a molecule happen so fast compared to a membrane

00:04:39.430 --> 00:04:42.449
changing shape. Exactly. So it's not just running

00:04:42.449 --> 00:04:44.449
simulations at different scales separately. Right.

00:04:44.569 --> 00:04:46.509
You have to somehow transfer that information

00:04:46.509 --> 00:04:48.850
from one level to another. Yes. So it's like

00:04:48.850 --> 00:04:51.089
translating an engineering blueprint into a simple

00:04:51.089 --> 00:04:54.110
sketch. Yeah. Yeah. But you can't lose that structural

00:04:54.110 --> 00:04:56.209
information. Exactly. That's right. And even

00:04:56.209 --> 00:04:59.129
within methods that simplify like coarse graining

00:04:59.129 --> 00:05:02.269
methods, bridging different time scales is still

00:05:02.269 --> 00:05:06.120
a big, big problem. To create a truly robust

00:05:06.120 --> 00:05:09.839
workflow, we need to not just overcome the computational

00:05:09.839 --> 00:05:12.439
limitations, but also figure out how to transfer

00:05:12.439 --> 00:05:15.259
this information, integrate it. Okay, so these

00:05:15.259 --> 00:05:18.519
are some pretty big hurdles, but you mentioned

00:05:18.519 --> 00:05:21.709
machine learning could help. Yes. How so? Yeah,

00:05:22.089 --> 00:05:23.949
so this is where machine learning really comes

00:05:23.949 --> 00:05:27.709
in and helps us bridge those gaps. And machine

00:05:27.709 --> 00:05:29.790
learning is great at learning complex patterns

00:05:29.790 --> 00:05:32.829
from data. So it can really help connect different

00:05:32.829 --> 00:05:36.230
scales in this multi -scale framework. OK, so

00:05:36.230 --> 00:05:39.100
how does that work? Yeah, so think of it as like

00:05:39.100 --> 00:05:40.959
feedback loops between these different levels,

00:05:41.139 --> 00:05:43.199
okay? So machine learning can help with something

00:05:43.199 --> 00:05:45.879
called forward coupling. This is where you use

00:05:45.879 --> 00:05:48.540
information from a larger scale to improve things

00:05:48.540 --> 00:05:51.519
at the smaller scale. So for example, imagine

00:05:51.519 --> 00:05:54.060
a model that shows specific regions of the cell

00:05:54.060 --> 00:05:56.959
membrane changing a lot. That information can

00:05:56.959 --> 00:05:59.740
then be used to focus a smaller, more detailed

00:05:59.740 --> 00:06:02.300
simulation on that exact region. So this is like

00:06:02.300 --> 00:06:04.319
the bigger simulations guiding the smaller one.

00:06:04.480 --> 00:06:08.279
Exactly, exactly. backward coupling where the

00:06:08.279 --> 00:06:11.779
detailed simulations give feedback to improve

00:06:11.779 --> 00:06:15.019
the larger scale model. So for example, a detailed

00:06:15.019 --> 00:06:18.240
simulation might reveal some very specific atomic

00:06:18.240 --> 00:06:21.560
interaction. Then machine learning can use that

00:06:21.560 --> 00:06:23.920
to adjust the parameters of the larger model

00:06:23.920 --> 00:06:26.180
so it becomes more accurate. So you have this

00:06:26.180 --> 00:06:28.480
back and forth and things get more accurate.

00:06:28.579 --> 00:06:31.040
So it's this iterative process. Exactly. Can

00:06:31.040 --> 00:06:33.980
you give me some concrete examples of how ML

00:06:33.980 --> 00:06:36.860
is being used? Yeah, yeah, sure. So one example

00:06:36.860 --> 00:06:40.459
is in these QMMM methods, where you combine quantum

00:06:40.459 --> 00:06:43.120
mechanics with molecular mechanics. Machine learning

00:06:43.120 --> 00:06:45.480
can be used to handle these very complex interactions

00:06:45.480 --> 00:06:49.220
and reduce the computational cost. Another example

00:06:49.220 --> 00:06:52.600
is in material science, where you can train machine

00:06:52.600 --> 00:06:55.259
learning on simulations to predict how material

00:06:55.259 --> 00:06:58.319
might fail or classify defects. And then you

00:06:58.319 --> 00:07:01.079
can use that in these larger continuum models

00:07:01.079 --> 00:07:05.540
to go from very small to very large. really connecting

00:07:05.540 --> 00:07:07.540
those scales. Exactly. And then you also have

00:07:07.540 --> 00:07:10.779
things like ML potentials. These are ways to

00:07:10.779 --> 00:07:14.019
predict energy and forces between atoms with

00:07:14.019 --> 00:07:16.560
almost the accuracy of quantum mechanics, but

00:07:16.560 --> 00:07:20.079
much, much faster. Wow. And then ML is also used

00:07:20.079 --> 00:07:22.680
to go from these simplified representations back

00:07:22.680 --> 00:07:25.620
to the full atomic detail, like enhancing a blurry

00:07:25.620 --> 00:07:28.199
picture. That's a great analogy. Yeah. And it's

00:07:28.199 --> 00:07:31.139
not just simulations. It's also integrating it

00:07:31.139 --> 00:07:34.350
with experiments, with biophysical methods. So

00:07:34.350 --> 00:07:36.829
ML is helping us overcome those computational

00:07:36.829 --> 00:07:39.810
limitations, making things faster, focusing on

00:07:39.810 --> 00:07:42.610
the important parts. And these deep learning

00:07:42.610 --> 00:07:44.649
advancements are really making this accessible

00:07:44.649 --> 00:07:46.490
to more researchers. That's really exciting.

00:07:46.790 --> 00:07:49.930
So where does Mamma fit into this? Yeah, so Mamma

00:07:49.930 --> 00:07:52.910
Mai is a platform designed specifically to leverage

00:07:52.910 --> 00:07:55.649
machine learning for multi -scale modeling. And

00:07:55.649 --> 00:07:59.490
it focuses on those intricate biomolecular systems.

00:07:59.930 --> 00:08:02.790
And it does this by really tightly coupling the

00:08:02.790 --> 00:08:04.850
different scales using machine learning. OK,

00:08:04.850 --> 00:08:08.250
so what makes Mamma Mai stand out? Yeah, so one

00:08:08.250 --> 00:08:10.970
key thing is its ability to create these huge

00:08:10.970 --> 00:08:13.769
ensembles of micro simulations. We're talking

00:08:13.769 --> 00:08:16.829
tens of thousands even, all guided by machine

00:08:16.829 --> 00:08:20.110
learning. It also uses this thing called a machine

00:08:20.110 --> 00:08:23.889
learned latent space, which is a way to simplify

00:08:23.889 --> 00:08:26.889
complex data. So it's like taking all this complex

00:08:26.889 --> 00:08:29.060
information and boiling it down to its essence.

00:08:29.740 --> 00:08:31.920
And that allows us to see patterns and connections

00:08:31.920 --> 00:08:34.379
we couldn't see otherwise. And it focuses initially

00:08:34.379 --> 00:08:37.159
on these interactions between RAS -RAF proteins

00:08:37.159 --> 00:08:39.679
and the cell membrane. And as we said, this pathway

00:08:39.679 --> 00:08:42.139
is crucial for cell growth. And when it goes

00:08:42.139 --> 00:08:45.899
wrong, it can cause cancer. So Mamamai is trying

00:08:45.899 --> 00:08:48.360
to understand how these proteins interact with

00:08:48.360 --> 00:08:51.240
the membrane at a molecular level. And it can

00:08:51.240 --> 00:08:54.340
identify these lipid protein fingerprints, which

00:08:54.340 --> 00:08:56.840
are these patterns of lipid molecules. that can

00:08:56.840 --> 00:08:59.240
affect how these proteins work. Interesting.

00:08:59.419 --> 00:09:01.899
OK, so tell me more about this three -scale architecture

00:09:01.899 --> 00:09:05.779
that Mamai uses. Yeah, so Mamai has these three

00:09:05.779 --> 00:09:08.519
levels to bridge these gaps in space and time.

00:09:08.700 --> 00:09:11.720
The first level is the macro scale. OK. And it

00:09:11.720 --> 00:09:13.960
uses something called dynamic density functional

00:09:13.960 --> 00:09:18.730
theory, or DDFT. OK. And this allows us to simulate

00:09:18.730 --> 00:09:22.110
milliseconds of time over a fairly large area,

00:09:22.269 --> 00:09:24.649
like one square micrometer of the membrane. So

00:09:24.649 --> 00:09:27.289
it's the broad view, right? Exactly. Exactly.

00:09:27.429 --> 00:09:29.789
It doesn't show every atom, but it can capture

00:09:29.789 --> 00:09:32.350
the general movement of the membrane and how

00:09:32.350 --> 00:09:35.690
proteins like RAS and RAF generally prefer to

00:09:35.690 --> 00:09:38.149
sit on the membrane. OK, so that's the big picture.

00:09:38.149 --> 00:09:39.980
Right. And then we zoom in. Right, so then you

00:09:39.980 --> 00:09:43.419
have the coarse -grained or CG scale. OK. And

00:09:43.419 --> 00:09:46.919
this uses the martini bead model. And in this

00:09:46.919 --> 00:09:48.960
model, groups of atoms are represented as single

00:09:48.960 --> 00:09:51.580
beads, so we can simulate larger systems for

00:09:51.580 --> 00:09:55.279
longer. And this level focuses on smaller patches

00:09:55.279 --> 00:09:58.700
of the membrane, maybe 30 by 30 nanometers, areas

00:09:58.700 --> 00:10:02.200
that the macro scale has identified as interesting.

00:10:02.340 --> 00:10:05.639
OK, so the action spots. Exactly, yeah. And these

00:10:05.639 --> 00:10:08.970
patches often have proteins like RAS or the the

00:10:08.970 --> 00:10:12.029
RAS -RB -DCRD complex, which is part of RAF.

00:10:12.730 --> 00:10:15.110
And we use the martini force field, which are

00:10:15.110 --> 00:10:17.529
the rules for how these beads interact. And we

00:10:17.529 --> 00:10:20.830
can run these on GPUs, and we can get about a

00:10:20.830 --> 00:10:23.809
microsecond per day on a single GPU. And this

00:10:23.809 --> 00:10:26.789
gives us a good view of how proteins change their

00:10:26.789 --> 00:10:28.929
shape and interact with each other. OK, so we've

00:10:28.929 --> 00:10:30.610
got the big picture. We've got some more local

00:10:30.610 --> 00:10:34.129
details. What about the really, really fine details?

00:10:34.230 --> 00:10:36.710
Right, so then you go to the all -atom, or AA,

00:10:36.970 --> 00:10:39.049
level, and now we're back. to every single atom

00:10:39.049 --> 00:10:42.269
being represented. We use the CharM M36 force

00:10:42.269 --> 00:10:44.629
field, which is very detailed. And this allows

00:10:44.629 --> 00:10:48.370
us to see those very specific interactions between

00:10:48.370 --> 00:10:51.350
lipids and the amino acids in the protein and

00:10:51.350 --> 00:10:53.889
how those lipids might affect the protein structure.

00:10:54.129 --> 00:10:56.250
But this must be very computationally expensive.

00:10:56.549 --> 00:10:59.470
It is. It is. So we take snapshots from the CG

00:10:59.470 --> 00:11:02.370
simulations and we convert them back to all atom,

00:11:02.450 --> 00:11:05.529
a process called backmapping. And then we use

00:11:05.529 --> 00:11:08.710
energy minimization to to kind of relax the structure.

00:11:09.009 --> 00:11:11.429
And we run these simulations using things like

00:11:11.429 --> 00:11:14.789
AMBER or GROMACS. And these are much slower.

00:11:14.929 --> 00:11:18.950
We're talking maybe 14 ms per day on a GPU. But

00:11:18.950 --> 00:11:21.769
it gives us that really crucial atomic detail

00:11:21.769 --> 00:11:25.210
that we can't get otherwise. And so these transitions

00:11:25.210 --> 00:11:28.149
between scales, are there tools that help you

00:11:28.149 --> 00:11:31.230
with that? Yes. Yes. MUMMI has tools to help

00:11:31.230 --> 00:11:33.549
move between these different levels. OK. For

00:11:33.549 --> 00:11:35.870
example, there's a tool called Create Sims that

00:11:35.870 --> 00:11:38.029
converts the macro scale to the coarse grain

00:11:38.029 --> 00:11:41.009
scale. And then for going from CG to AA, there's

00:11:41.009 --> 00:11:43.889
a backmapping protocol to make sure we get an

00:11:43.889 --> 00:11:46.250
accurate atomic structure. OK. So all these tools

00:11:46.250 --> 00:11:48.990
help us move between these different levels consistently.

00:11:49.330 --> 00:11:52.340
OK. So. We've got these three scales, each with

00:11:52.340 --> 00:11:54.679
their own methods. How does machine learning

00:11:54.679 --> 00:11:57.600
tie into all of this? Yeah, so machine learning

00:11:57.600 --> 00:12:00.840
is not just an add -on here. It's really essential

00:12:00.840 --> 00:12:03.740
to Mamamai's ability to connect these scales

00:12:03.740 --> 00:12:06.620
to manage all this data. And it does this by

00:12:06.620 --> 00:12:10.559
coupling these scales, both macro to CG and also

00:12:10.750 --> 00:12:14.269
sort of indirectly CG to AA through choosing

00:12:14.269 --> 00:12:16.570
those starting configurations. OK. So it's really

00:12:16.570 --> 00:12:19.330
the glue that holds everything together. So let's

00:12:19.330 --> 00:12:21.769
talk about that coupling in more detail. How

00:12:21.769 --> 00:12:24.830
does ML do this forward and backward flow? Yeah.

00:12:24.870 --> 00:12:28.110
So in the forward direction, the ML models are

00:12:28.110 --> 00:12:31.029
trained on the macro scale data. And they're

00:12:31.029 --> 00:12:32.830
trained to pick up the interesting parts, the

00:12:32.830 --> 00:12:34.429
parts we want to look at in more detail. OK.

00:12:34.809 --> 00:12:38.970
So let's say protein starts to cluster or it

00:12:38.970 --> 00:12:42.169
really likes a specific lipid or the membrane

00:12:42.169 --> 00:12:45.250
changes shape in a specific way. The ML model

00:12:45.250 --> 00:12:47.950
will flag that and MMMI will then say okay let's

00:12:47.950 --> 00:12:50.409
look at that region with the CG model. So it's

00:12:50.409 --> 00:12:52.590
like a smart filter. Yeah, yeah, like a scout

00:12:52.590 --> 00:12:54.330
that says, OK, look here. This is where the action

00:12:54.330 --> 00:12:56.990
is. And then in the backward direction, machine

00:12:56.990 --> 00:12:59.710
learning helps refine those larger models based

00:12:59.710 --> 00:13:01.809
on what we learn from the smaller simulations.

00:13:01.970 --> 00:13:05.970
OK. So as the CG simulations run, we analyze

00:13:05.970 --> 00:13:09.190
in real time how those proteins and lipids are

00:13:09.190 --> 00:13:12.090
interacting. OK. And that information can then

00:13:12.090 --> 00:13:14.950
be used to adjust the macro model, make it more

00:13:14.950 --> 00:13:17.990
accurate. Similarly, the information from the

00:13:17.990 --> 00:13:20.350
all -atom simulations can be used to improve

00:13:20.350 --> 00:13:22.649
the CG. model. So it's this constant learning

00:13:22.649 --> 00:13:26.129
and refining. Exactly, exactly. It's fascinating.

00:13:26.350 --> 00:13:28.769
And beyond that, machine learning also helps

00:13:28.769 --> 00:13:31.929
with how MomMI manages all these simulations,

00:13:32.149 --> 00:13:34.850
how it decides which simulations to run. Okay.

00:13:34.909 --> 00:13:37.629
It uses something called dynamic importance sampling.

00:13:38.110 --> 00:13:40.470
Okay, so what does that mean? What makes a simulation

00:13:40.470 --> 00:13:43.419
important? So it really depends on what the scientist

00:13:43.419 --> 00:13:46.299
is looking for. OK. Right. So maybe it's a protein

00:13:46.299 --> 00:13:49.419
binding to a specific lipid, or proteins clustering

00:13:49.419 --> 00:13:52.220
together, or maybe it's the membrane changing

00:13:52.220 --> 00:13:54.980
shape. The machine learning model will look at

00:13:54.980 --> 00:13:57.039
all these configurations from the macro scale

00:13:57.039 --> 00:13:59.700
and say, OK, this one's interesting. This one's

00:13:59.700 --> 00:14:02.539
not so interesting. OK. And then MOMA -I uses

00:14:02.539 --> 00:14:05.139
that to decide which regions to simulate in more

00:14:05.139 --> 00:14:08.039
detail. So it's like a priority system. Exactly,

00:14:08.240 --> 00:14:10.519
yeah. And this is really important because it

00:14:10.519 --> 00:14:14.120
helps us use those computational resources efficiently.

00:14:14.460 --> 00:14:16.799
So we're not wasting time on things that aren't

00:14:16.799 --> 00:14:19.549
important. So this must require huge amounts

00:14:19.549 --> 00:14:22.350
of computing power. It does. It does. High resolution,

00:14:22.690 --> 00:14:25.850
long time scale models need a lot of power. Yeah.

00:14:26.570 --> 00:14:29.629
And MamaEye is designed to run on all kinds of

00:14:29.629 --> 00:14:32.149
high performance computing systems, from smaller

00:14:32.149 --> 00:14:35.110
clusters to super computers. Wow. It's been run

00:14:35.110 --> 00:14:37.690
on systems like Sierra at Lawrence Livermore,

00:14:37.909 --> 00:14:40.789
which has hundreds of thousands of CPU cores

00:14:40.789 --> 00:14:44.129
and thousands of GPUs. Wow. It's hard to even

00:14:44.129 --> 00:14:46.269
imagine that much power. How do you manage that?

00:14:46.450 --> 00:14:49.690
Yeah, so Mamamai is designed to use both CPUs

00:14:49.690 --> 00:14:53.230
and GPUs. OK. The macro scale model with those

00:14:53.230 --> 00:14:55.629
equations on a grid, it's really good for CPUs.

00:14:55.750 --> 00:14:58.690
And we can get almost a millisecond per day with

00:14:58.690 --> 00:15:02.129
thousands of cores. Wow, that's fast. And then

00:15:02.129 --> 00:15:05.330
the CG and AA simulations with all those particles,

00:15:05.370 --> 00:15:09.409
they're better for GPUs. OK. So CG. can get a

00:15:09.409 --> 00:15:13.429
microsecond per day on one GPU, AA, about 14

00:15:13.429 --> 00:15:17.269
nanoseconds per day. So by using both, we can

00:15:17.269 --> 00:15:19.350
handle all those different scales. Right. And

00:15:19.350 --> 00:15:21.509
with thousands of simulations running, how do

00:15:21.509 --> 00:15:24.210
you keep track of everything? Yeah. So we have

00:15:24.210 --> 00:15:27.379
the Workful Manager. or WM, and this is like

00:15:27.379 --> 00:15:30.120
the control center. It monitors resources, it

00:15:30.120 --> 00:15:32.460
starts jobs, it tracks progress, it restarts

00:15:32.460 --> 00:15:35.759
things that fail, and it uses these tools like

00:15:35.759 --> 00:15:38.860
Flux or Maestro to handle all the communication.

00:15:39.000 --> 00:15:40.480
So it's like the conductor of the orchestra.

00:15:40.740 --> 00:15:42.940
Exactly, yeah. And all these simulations must

00:15:42.940 --> 00:15:45.620
generate a ton of data. Oh, they do, hundreds

00:15:45.620 --> 00:15:48.220
of terabytes, sometimes even petabytes. How do

00:15:48.220 --> 00:15:50.799
you manage that? So we need special storage solutions,

00:15:51.039 --> 00:15:52.860
special analysis tools, and a lot of this data

00:15:52.860 --> 00:15:55.700
is made public through server. in a repository

00:15:55.700 --> 00:15:57.980
so other researchers can use it. That's great.

00:15:58.200 --> 00:16:00.940
Yeah. So how has MUMI actually been used to study

00:16:00.940 --> 00:16:04.139
this RAS -RAF system? Yeah, so studying RAS -RAF

00:16:04.139 --> 00:16:07.179
has really been a key focus for MUMI. OK. This

00:16:07.179 --> 00:16:09.820
pathway is so important for cell growth. And

00:16:09.820 --> 00:16:12.299
when it goes wrong, it causes cancer. Right.

00:16:12.379 --> 00:16:14.600
So we really need to understand how these proteins

00:16:14.600 --> 00:16:16.879
work and how they interact with the membrane.

00:16:16.899 --> 00:16:19.399
OK. And MUMI is perfect for this because it can

00:16:19.399 --> 00:16:21.659
handle those atomic details and those long time

00:16:21.659 --> 00:16:24.330
scales. OK, so what have you learned? Well, one

00:16:24.330 --> 00:16:25.889
of the biggest things is that we've been able

00:16:25.889 --> 00:16:29.070
to identify those lipid protein fingerprints.

00:16:29.789 --> 00:16:32.049
These are those patterns of lipids that surround

00:16:32.049 --> 00:16:35.549
the protein, and they can affect how it's oriented,

00:16:35.610 --> 00:16:39.169
how it binds to other molecules. OK. So by simulating

00:16:39.169 --> 00:16:42.710
both RES on its own and the RES -REF complex

00:16:42.710 --> 00:16:45.149
on the membrane, we've learned a lot about how

00:16:45.149 --> 00:16:47.149
they interact with the lipids. OK. Can you give

00:16:47.149 --> 00:16:49.629
me some more specific examples? Sure. So for

00:16:49.629 --> 00:16:51.950
example, we've looked at how a specific region

00:16:51.950 --> 00:16:56.179
of REF, cysteine -rich domain, or CRD, moves

00:16:56.179 --> 00:16:59.759
when it binds to RAS. This is important for understanding

00:16:59.759 --> 00:17:03.159
how RAF gets activated. We've also looked at

00:17:03.159 --> 00:17:07.460
how the orientation of the RAS rave complex affects

00:17:07.460 --> 00:17:10.440
how it interacts with lipids. And we've even

00:17:10.440 --> 00:17:13.900
looked at specific RAS variants like KRS4B, which

00:17:13.900 --> 00:17:16.200
is known to be really difficult to target with

00:17:16.200 --> 00:17:19.039
drugs. And we've been able to simulate all this

00:17:19.039 --> 00:17:21.920
on realistic membranes. So it's really relevant

00:17:21.920 --> 00:17:23.700
to what's happening. in a cell. Okay, so really

00:17:23.700 --> 00:17:25.660
connecting those details to the bigger picture.

00:17:26.200 --> 00:17:28.819
Exactly, exactly. And that's the power of this

00:17:28.819 --> 00:17:31.019
multi -scale approach to get that full picture.

00:17:31.400 --> 00:17:33.559
So what are some of the key findings from all

00:17:33.559 --> 00:17:36.839
this? Yeah. So one surprising thing is that RAS

00:17:36.839 --> 00:17:38.779
proteins are not just sitting there passively

00:17:38.779 --> 00:17:41.380
on the membrane. OK. They can actually change

00:17:41.380 --> 00:17:43.619
the lipids around them. Interesting. So they're

00:17:43.619 --> 00:17:45.960
shaping their environment. Exactly. Yeah. And

00:17:45.960 --> 00:17:48.799
we've also seen that RAS proteins like to cluster

00:17:48.799 --> 00:17:52.119
together. And the way they cluster is influenced

00:17:52.119 --> 00:17:54.720
by those lipid fingerprints. So the lipids might

00:17:54.720 --> 00:17:57.759
be controlling how RAS signals. So the membrane's

00:17:57.759 --> 00:17:59.920
not just a backdrop. It's an active participant.

00:18:00.140 --> 00:18:03.579
Yeah, absolutely. And we've also seen how RAS

00:18:03.579 --> 00:18:06.539
changes its orientation when it binds to REF,

00:18:06.940 --> 00:18:08.900
which then affects how it interacts with lipids.

00:18:08.960 --> 00:18:11.559
OK. And it turns out that RAS has different lipid

00:18:11.559 --> 00:18:13.180
fingerprints, depending on whether it's bound

00:18:13.180 --> 00:18:15.619
to REF or not. So it's like the lipids are telling

00:18:15.619 --> 00:18:19.079
us what state RAS is in. Yeah, exactly. And we've

00:18:19.079 --> 00:18:21.559
also learned a lot about how the race rough complex

00:18:21.559 --> 00:18:24.240
forms and how it moves. OK. The Continuum model,

00:18:24.279 --> 00:18:27.400
when we use data from the CG simulations to refine

00:18:27.400 --> 00:18:31.140
it, it can actually accurately show how RAF binds

00:18:31.140 --> 00:18:34.799
and unbinds to RAS. And the all -atom simulations

00:18:34.799 --> 00:18:37.519
have shown us very specific lipid -dependent

00:18:37.519 --> 00:18:39.859
changes in the complex, which we can then use

00:18:39.859 --> 00:18:42.359
to improve the CG model. So again, that back

00:18:42.359 --> 00:18:44.940
and forth. And the CG simulations have even hinted

00:18:44.940 --> 00:18:47.099
that you can get these larger clusters of RAS

00:18:47.099 --> 00:18:50.619
and RAS. So it's really a complex and dynamic

00:18:50.619 --> 00:18:52.920
picture. And this all gives us a much more accurate

00:18:52.920 --> 00:18:55.359
understanding of these interactions. Yes, absolutely.

00:18:55.519 --> 00:18:58.519
And by using both the continuum and CG models

00:18:58.519 --> 00:19:00.769
together, we can actually make the simulations

00:19:00.769 --> 00:19:03.750
run faster. Oh, really? Yeah, so the CG simulations

00:19:03.750 --> 00:19:06.230
reach equilibrium faster. Okay, so it's more

00:19:06.230 --> 00:19:09.309
efficient. More efficient, yeah. So all of this,

00:19:09.309 --> 00:19:13.009
I think, shows how powerful MUMi is and how it

00:19:13.009 --> 00:19:15.210
can be used to generate these hypotheses that

00:19:15.210 --> 00:19:17.990
we can then test in the lab. This has been a

00:19:17.990 --> 00:19:20.640
fascinating deep dive, so... What's the key takeaway?

00:19:20.920 --> 00:19:22.920
Well, I think the key takeaway is that mummini

00:19:22.920 --> 00:19:25.660
is a major step forward in computational biology.

00:19:25.839 --> 00:19:29.140
OK. Its design and how it uses machine learning

00:19:29.140 --> 00:19:32.140
lets us study these complex systems across these

00:19:32.140 --> 00:19:35.180
huge ranges of time and size. And what it's told

00:19:35.180 --> 00:19:37.420
us about Rostroff is really exciting, right?

00:19:37.519 --> 00:19:40.779
It is. It's giving us this really detailed understanding

00:19:40.779 --> 00:19:42.880
of how these proteins work, which is important

00:19:42.880 --> 00:19:45.400
for understanding cancer and potentially for

00:19:45.400 --> 00:19:48.779
developing new drugs. So looking ahead, what's

00:19:48.779 --> 00:19:51.210
next? for MummyMai? Well, I think we can expand

00:19:51.210 --> 00:19:53.450
it to study all kinds of other systems, other

00:19:53.450 --> 00:19:57.230
proteins, other signaling pathways. We can use

00:19:57.230 --> 00:19:59.130
even more advanced machine learning, like deep

00:19:59.130 --> 00:20:02.170
learning or reinforcement learning. And I think

00:20:02.170 --> 00:20:04.869
this could really change how we develop drugs,

00:20:05.390 --> 00:20:08.210
especially for targets like RASROF that are so

00:20:08.210 --> 00:20:10.450
difficult. It sounds like these machine learning

00:20:10.450 --> 00:20:12.250
-driven simulations are going to be essential

00:20:12.250 --> 00:20:14.910
tools going forward. Oh, I think so, yeah. I

00:20:14.910 --> 00:20:17.529
think with the way computing power is increasing

00:20:17.529 --> 00:20:19.910
and how these methods are getting better, this

00:20:19.910 --> 00:20:22.589
is going to be the standard way to study complex

00:20:22.589 --> 00:20:25.190
molecular systems. And maybe we'll even see fully

00:20:25.190 --> 00:20:27.490
automated simulations in the future. Oh, I think

00:20:27.490 --> 00:20:32.369
that's very likely. Thinking about all this complex

00:20:32.369 --> 00:20:35.369
dance of proteins and their environment makes

00:20:35.369 --> 00:20:38.089
you wonder, how will this change how we treat

00:20:38.089 --> 00:20:41.109
diseases? Maybe we'll go beyond targeting just

00:20:41.109 --> 00:20:43.410
the proteins themselves and start thinking about

00:20:43.410 --> 00:20:46.549
the whole context. It's really fascinating. Thank

00:20:46.549 --> 00:20:48.049
you so much for joining us. This has been a great

00:20:48.049 --> 00:20:50.450
deep dive. My pleasure. And for everyone listening,

00:20:50.970 --> 00:20:53.829
keep thinking about these questions. And we'll

00:20:53.829 --> 00:20:54.430
see you next time.
