WEBVTT

00:00:03.497 --> 00:00:10.497
This is the Convergent Science Network podcast. Leading researchers in the domain

00:00:10.497 --> 00:00:16.777
of neuroscience, brain theory and technology are interviewed by Paul Verschoor and Tony Prescott.

00:00:27.157 --> 00:00:32.357
This is Paul Fouchard with the Convergent Science Network. And today I'm speaking

00:00:32.357 --> 00:00:37.177
with Etienne Kuchla, who was also a speaker in our summer school.

00:00:37.817 --> 00:00:42.257
And Etienne, you focus very much on the human prefrontal cortex.

00:00:43.037 --> 00:00:49.737
Not only your talk, also in your work in general. So, what is so special about this part of the brain?

00:00:53.497 --> 00:00:56.897
It's a huge question. First,

00:00:57.157 --> 00:01:01.357
why I'm interested in the prefrontal cortex and more generally in the frontal

00:01:01.357 --> 00:01:08.117
lobe function is that I think part of what makes us really humans compared to

00:01:08.117 --> 00:01:14.797
other primates lies especially in the prefrontal cortex.

00:01:14.797 --> 00:01:18.837
So, for me, it's a curiosity. It's a question of curiosity.

00:01:20.057 --> 00:01:25.877
And it's also a huge region, and it's a region that is really involved in,

00:01:25.937 --> 00:01:33.837
I think, in how we feel we are the actor of our own actions and behavior.

00:01:34.337 --> 00:01:39.097
And it's related to consciousness, to many different things that I'm interested in.

00:01:39.877 --> 00:01:49.817
Right. But now, you have sort of summarized your view on prefrontal cortex also

00:01:49.817 --> 00:01:51.557
in a very formalistic way.

00:01:51.697 --> 00:01:56.137
You had actually one very simple equation that you thought captured most of

00:01:56.137 --> 00:02:00.317
its function around sensory states and actions and so on and so on.

00:02:00.577 --> 00:02:05.217
So how can you then characterize the function of this complex structure in a

00:02:05.217 --> 00:02:07.877
simple equation? What is that equation exactly?

00:02:08.857 --> 00:02:15.217
Your question is how it is possible to simplify both actually what's the equation

00:02:15.217 --> 00:02:23.477
and how did you get there So first I like simple models because I think a model should be the.

00:02:24.921 --> 00:02:31.041
The most simple is the model, the most explanatory, I think it is, in one sense.

00:02:31.241 --> 00:02:35.581
So I really try to first find simple models.

00:02:36.881 --> 00:02:41.121
And I think also simple models are more intelligible.

00:02:41.561 --> 00:02:46.561
And also I develop simple models because I try to develop models that can be

00:02:46.561 --> 00:02:53.681
testable in experiment. I mean, and even simple models are not so easy to test

00:02:53.681 --> 00:02:55.861
and to confirm or infirm in experiment.

00:02:57.321 --> 00:03:03.201
And that's why I do simple models, I develop simple models, because I know that

00:03:03.201 --> 00:03:06.381
these models have some straightforward predictions.

00:03:06.481 --> 00:03:11.621
They might be a bit simplistic in one sense, but still with these models you

00:03:11.621 --> 00:03:17.461
can test them and you can tease apart some quite deep conceptual differences

00:03:17.461 --> 00:03:19.701
between different hypotheses.

00:03:20.741 --> 00:03:25.321
Right. So now prefrontal cortex is essentially in some way bringing together,

00:03:26.741 --> 00:03:28.441
perceptual states, states of

00:03:28.441 --> 00:03:34.801
the world, actions, and a sense of value or utility of their combination.

00:03:36.621 --> 00:03:42.141
Do you see those as the key ingredients upon which these areas operate Or is

00:03:42.141 --> 00:03:43.821
there another element to that?

00:03:44.951 --> 00:03:52.191
I think one of the most important key elements of the prefrontal function is action.

00:03:53.151 --> 00:03:59.151
Action has very specific constraints. I mean, first, action requires choosing.

00:04:00.211 --> 00:04:08.291
You cannot say that I think I will do this 80% of time and this 20% of time.

00:04:08.391 --> 00:04:12.311
When you really act in the world, you do one action, another one.

00:04:12.311 --> 00:04:14.871
So it's required making decision, making choice.

00:04:15.311 --> 00:04:21.251
And this is a huge consequence, because making choice engage you and is in one

00:04:21.251 --> 00:04:22.971
sense suboptimal to make choice.

00:04:24.611 --> 00:04:30.591
Better wait forever. No, I mean, to be optimal is always to have some multiple

00:04:30.591 --> 00:04:37.491
interpretation of the words and to continue with this kind of multiple representation.

00:04:37.491 --> 00:04:42.111
And when you do an action, basically you stick with an interpretation.

00:04:42.711 --> 00:04:48.751
So action is a very specific constraint. And I think one of the role of the

00:04:48.751 --> 00:05:01.251
prefrontal cortex on a very general view is to introduce this constraint into internal processing.

00:05:01.431 --> 00:05:05.791
So action is about choosing. It's about seriality. It's a reality,

00:05:06.151 --> 00:05:10.631
and basically the prefrontal function introduces all these constraints in the

00:05:10.631 --> 00:05:14.671
way the mind or cognitive process occurs in the brain.

00:05:15.611 --> 00:05:21.071
So action is very important. And second, of course, utility and values,

00:05:21.131 --> 00:05:22.191
if you want, are important.

00:05:24.571 --> 00:05:32.131
But, of course, they're important because you need value to know what is good for you or not.

00:05:32.311 --> 00:05:37.491
But value is very archaic. And I don't think that value is actually one of the

00:05:37.491 --> 00:05:39.391
key components of the prefrontal function.

00:05:39.551 --> 00:05:42.851
It's a key component of action and decision-making, of course.

00:05:43.411 --> 00:05:45.511
But I think the prefrontal...

00:05:46.626 --> 00:05:57.146
Function is more related to understanding and learning what seems to be a true

00:05:57.146 --> 00:06:01.866
representation of the world than really what is good or bad.

00:06:02.006 --> 00:06:07.426
Because I think even the most simple insect, the brain of the most simple insect,

00:06:07.546 --> 00:06:12.146
knows in one way what is good and bad for the organism.

00:06:12.726 --> 00:06:17.306
Right. So, what is specific to the human prefrontal function is that we have

00:06:17.306 --> 00:06:24.946
all this kind of, we can say, reasoning process that are not so much interesting in values,

00:06:25.146 --> 00:06:29.586
but are also interesting in what is true, what can be predicted,

00:06:30.126 --> 00:06:33.626
what is reliable, and so on and so on. Right.

00:06:33.766 --> 00:06:38.246
But now if I combine it, because on the one hand you're saying it's action and

00:06:38.246 --> 00:06:42.386
action is unitary at each point in time, I can execute only one.

00:06:42.826 --> 00:06:46.746
I have one body to act with. But on the other hand, you talk about,

00:06:46.826 --> 00:06:50.286
let's say, modeling the world context and so on.

00:06:50.346 --> 00:06:55.706
So these would be two functions. So, the action-selection component of this,

00:06:55.786 --> 00:07:01.966
where you actually are collapsing all of these possibilities that you can engage

00:07:01.966 --> 00:07:04.546
with, to collapse that into one interpretation,

00:07:04.766 --> 00:07:08.266
one action, you see both of these things reside in frontal areas,

00:07:08.446 --> 00:07:11.066
or is that in synergy with other areas?

00:07:11.726 --> 00:07:15.006
No, of course. I mean, the prefrontal

00:07:15.006 --> 00:07:21.366
cortex is in synergy with most other associative areas in the brain.

00:07:22.766 --> 00:07:28.686
I mainly focus on the prefrontal cortex because in one sense I think it's simpler

00:07:28.686 --> 00:07:37.486
because this, usually other associative regions are actually I think at the interface between,

00:07:39.424 --> 00:07:44.384
Peripheral current system like sensory system, like the visual system.

00:07:44.524 --> 00:07:46.344
Let us talk about the parietal region.

00:07:46.624 --> 00:07:52.224
The parietal cortex is very complex, actually, because it's at the interface

00:07:52.224 --> 00:07:58.204
between all this low-level sensory system, like vision and whatever.

00:08:00.664 --> 00:08:06.084
And interfacing this system with this internal cognitive system,

00:08:06.264 --> 00:08:07.564
which is the prefrontal cortex.

00:08:08.344 --> 00:08:11.424
So you have a two level of complexity within this region.

00:08:11.564 --> 00:08:18.204
Whereas in the prefrontal cortex, it's far away from this peripheral system.

00:08:18.664 --> 00:08:24.864
And I think, for me at least today, I think it's easier to understand what's

00:08:24.864 --> 00:08:29.644
going on in the prefrontal cortex and what's going on in this other associative

00:08:29.644 --> 00:08:33.164
area like temporal associative regions.

00:08:33.304 --> 00:08:39.404
They are very complex. I'm not sure nobody has a very good idea about what's going on in this region.

00:08:39.564 --> 00:08:43.024
In the parietal cortex, we have some cue about some specific things.

00:08:43.544 --> 00:08:49.104
But this is a huge region. There are so many things going on that I think that's

00:08:49.104 --> 00:08:50.364
what I like with the prefrontal cortex.

00:08:50.684 --> 00:08:55.004
I have the feeling that I understand something about the parietal cortex.

00:08:55.244 --> 00:08:57.504
But now, if you had to choose, right? Because, like I said earlier,

00:08:58.024 --> 00:09:04.124
you emphasize both action, unitary action, and you emphasize something like

00:09:04.124 --> 00:09:05.824
context. an internal model.

00:09:06.564 --> 00:09:11.124
You see them both as a function of prefrontal cortex, or do you see prefrontal

00:09:11.124 --> 00:09:15.984
cortex more as maintaining, let's say, these representations of what is possible?

00:09:16.844 --> 00:09:23.224
Or you see it as together, that there is both representations of possible together

00:09:23.224 --> 00:09:26.364
with... No, I think this is exactly the converse.

00:09:26.464 --> 00:09:33.084
The prefrontal cortex forces the mind, the human mind, the cognitive system

00:09:33.084 --> 00:09:34.384
everywhere in the brain,

00:09:35.184 --> 00:09:41.504
Not to multiply many possibilities. Okay.

00:09:42.084 --> 00:09:48.664
And it forced to make a choice, in one sense, to say, okay, the most probable

00:09:48.664 --> 00:09:54.044
interpretation of what I see in the scene is this, so I am going to act like that. And then?

00:09:55.160 --> 00:10:01.180
It's not necessary guys, other guys, to represent other alternatives.

00:10:01.300 --> 00:10:04.180
Because we decide to do that. So now we go for that.

00:10:04.600 --> 00:10:08.400
And at some point you need this, you need to simplify the representation.

00:10:08.520 --> 00:10:13.380
Otherwise, I mean, the system saturates very easily and very fast and very rapidly

00:10:13.380 --> 00:10:18.420
in making an inference about what is possible.

00:10:18.760 --> 00:10:23.040
So I think this is a region that, which means that it's a decision region.

00:10:23.040 --> 00:10:29.560
That really make decisions in the sense that exclude alternative interpretation. Okay.

00:10:30.340 --> 00:10:35.540
But then, so we can later look at how many alternatives you might want to consider.

00:10:36.300 --> 00:10:42.060
But then what you emphasize is that there are actually three sources of information

00:10:42.060 --> 00:10:43.480
in prefrontal cortex, right?

00:10:43.540 --> 00:10:50.780
You talked about context, episodic events or memory, and expected rewards.

00:10:50.780 --> 00:10:58.000
Words so how what are the boundaries of of these notions right so what's the

00:10:58.000 --> 00:11:06.220
difference exactly between context of action and episodic memory for instance uh so yeah,

00:11:08.000 --> 00:11:14.800
so uh the idea is that the context is something that is present when you make the selection,

00:11:16.109 --> 00:11:20.789
So in a basic way, there is no memory involved.

00:11:22.589 --> 00:11:27.309
The context is present here. Of course, it involves some memorized representation

00:11:27.309 --> 00:11:31.009
about how the context is connected to your action.

00:11:31.529 --> 00:11:37.269
But the context is present, basically. It's part of your environment where you make the selection.

00:11:39.169 --> 00:11:43.049
Episodic events is the past, basically. It's everything that happened in the

00:11:43.049 --> 00:11:48.349
past that, of course, you can memorize or not. and that may influence your actions.

00:11:49.129 --> 00:11:55.549
And I would say the opposite thing is expected reward, which is in the future,

00:11:55.629 --> 00:12:00.409
or expected outcome, more generally, is about the future, and it's just the

00:12:00.409 --> 00:12:02.869
symmetric of episodic events.

00:12:03.249 --> 00:12:07.069
So basically, the idea is very simple. I mean, you have the past,

00:12:07.189 --> 00:12:10.169
the present, information from the past, episodic event.

00:12:10.309 --> 00:12:16.549
The context is information from the present. and the future. Very simple.

00:12:17.569 --> 00:12:21.869
But this is, of course, in its generality, this also becomes,

00:12:21.989 --> 00:12:24.169
again, problematic, right? Yeah, I agree.

00:12:24.329 --> 00:12:27.049
Because each of these will be bounded in some way.

00:12:27.309 --> 00:12:33.349
So if you say, look, the prefrontal or the frontal lobes have access to past,

00:12:33.349 --> 00:12:38.249
present, and future, the question arises like, okay, if we imagine that these

00:12:38.249 --> 00:12:42.189
are not of infinite capacity, there must be boundaries on this.

00:12:42.189 --> 00:12:47.209
There must be aspects of past, present, future that you are considering because

00:12:47.209 --> 00:12:48.209
they're highly relevant.

00:12:48.389 --> 00:12:53.729
And there will probably be many aspects of it that you fully have to neglect to stay operational.

00:12:54.309 --> 00:12:56.409
So where would you draw that line?

00:12:57.749 --> 00:13:02.809
I think the line is drawn by your internal representation, by your learning, by your experience.

00:13:02.809 --> 00:13:06.089
So your internal model, what is very important is that, and I think this is

00:13:06.089 --> 00:13:14.369
one of the role of other associative regions, is to memorize and to implement

00:13:14.369 --> 00:13:17.569
and to code internal model of the world.

00:13:18.569 --> 00:13:24.129
So according to your internal model, I mean, a past event, even very close to

00:13:24.129 --> 00:13:29.309
your action, could be totally irrelevant and not deserving to be memorized.

00:13:29.309 --> 00:13:35.769
Or this kind of event could happen, I mean, a long time ago and could be very

00:13:35.769 --> 00:13:41.949
informative because your internal model tells you actually what is important in the world and not.

00:13:42.729 --> 00:13:47.969
And this is to be learned. And this is why I think this other associative region

00:13:47.969 --> 00:13:51.409
that is the parietal or the temporal cortex are very complex because this is

00:13:51.409 --> 00:13:59.729
probably where all these internal models that allow to capture information from the world,

00:14:00.669 --> 00:14:04.329
to make your selections are encoded.

00:14:04.469 --> 00:14:10.189
And the prefrontal cortex is organized in a way that it can make a difference between.

00:14:12.989 --> 00:14:13.989
What is,

00:14:15.353 --> 00:14:19.493
what is part of the immediate context.

00:14:19.773 --> 00:14:23.373
So there are some specific regions in the prefrontal cortex that allow you to

00:14:23.373 --> 00:14:26.093
include immediate information in your choice.

00:14:27.813 --> 00:14:33.373
But the prefrontal context by itself doesn't know what or which immediate information

00:14:33.373 --> 00:14:35.553
is useful in this situation.

00:14:35.953 --> 00:14:39.793
It just can say, okay, it just can include this in the selection process.

00:14:40.433 --> 00:14:46.973
So its role is really at the very end of the selection process to make the selection

00:14:46.973 --> 00:14:53.353
and to be able to include as many as information that can be processed by your

00:14:53.353 --> 00:14:59.473
internal models to elsewhere in the brain in the action selection process.

00:14:59.993 --> 00:15:06.153
Okay, so then in some way you're saying the magic resides in these areas at

00:15:06.153 --> 00:15:10.373
the interface between the sensory systems and the frontal lobe where in some

00:15:10.373 --> 00:15:11.953
way these internal models are constructed.

00:15:12.933 --> 00:15:22.393
But then, I guess I would expect that the frontal area would also add some intrinsic

00:15:22.393 --> 00:15:24.373
aspects to that process.

00:15:24.533 --> 00:15:29.353
They cannot just be, let's say, a selector driven and enslaved by information

00:15:29.353 --> 00:15:31.013
provided by other systems.

00:15:31.433 --> 00:15:36.613
So what would then be this added value? So one of the added value is that it's

00:15:36.613 --> 00:15:43.193
a monitoring system. This is what usually other people call metacognition in one sense.

00:15:43.513 --> 00:15:48.753
It means that it's a system that monitors all the time the processing,

00:15:48.993 --> 00:15:53.073
the behavior, and is able to make some important switch.

00:15:54.893 --> 00:16:00.473
So, for example, you may have an internal system, a very sophisticated system

00:16:00.473 --> 00:16:03.033
in the parietal cortex that you use to behave.

00:16:03.853 --> 00:16:10.033
And what the added value of the prefrontal cortex is to monitor all the time

00:16:10.033 --> 00:16:16.473
whether I should perseverate with this very complex strategy or possibly adjusting it or,

00:16:17.393 --> 00:16:23.373
should I something wrong with this strategy and I need to switch to something

00:16:23.373 --> 00:16:27.513
else so this is really the added value of the prefrontal cortex to have this

00:16:27.513 --> 00:16:31.513
kind of meta cognitive role in

00:16:31.693 --> 00:16:35.013
judging whether I should persevere,

00:16:35.213 --> 00:16:41.533
continue to learn, or to switch to something else, and just to give up with

00:16:41.533 --> 00:16:44.673
that and with this behavior and to try something else.

00:16:44.913 --> 00:16:47.253
This is exactly what's the problem in learning.

00:16:48.033 --> 00:16:53.313
When you learn something, of course you make errors or you get some negative feedbacks.

00:16:53.933 --> 00:16:59.293
At some point, there is a system that needs to say, okay, you make this error,

00:16:59.453 --> 00:17:01.633
but perseverate, learn.

00:17:03.173 --> 00:17:03.733
Or,

00:17:04.884 --> 00:17:09.264
Too many errors, it's no more valuable to learn this. You should change and

00:17:09.264 --> 00:17:10.524
give up and do something else.

00:17:10.704 --> 00:17:14.164
And this is the added value of the prefrontal cortex. Should I persevere in

00:17:14.164 --> 00:17:19.824
what I am doing, in what I am learning, or should I give up and switch to something else? Right.

00:17:20.544 --> 00:17:23.724
But then, so this is clear, right?

00:17:23.804 --> 00:17:28.764
So now we have sort of a functional understanding of this frontal area.

00:17:29.304 --> 00:17:34.624
And then already in your early work on this area, You seem to have identified

00:17:34.624 --> 00:17:41.784
a fairly clean mapping, if you want, of these functional components onto specific

00:17:41.784 --> 00:17:44.104
structures in the frontal lobe.

00:17:44.184 --> 00:17:46.384
So could you explain that in a bit more detail?

00:17:48.304 --> 00:17:51.884
So the idea is that, because as I said before,

00:17:52.084 --> 00:17:55.424
action is very important, the idea is that the prefrontal cortex is organized

00:17:55.424 --> 00:18:03.884
on the basis of the motor system, motor-premotor system.

00:18:04.384 --> 00:18:10.044
And the more you go more interiorly, the more you added some layers that allow

00:18:10.044 --> 00:18:13.484
you to add some additional information in the decision process.

00:18:13.904 --> 00:18:15.824
So this is the general idea of the organization.

00:18:17.004 --> 00:18:23.524
And the general idea is that the best, I would say, the goal of the prefrontal

00:18:23.524 --> 00:18:26.124
system is not to be involved, in one sense.

00:18:26.524 --> 00:18:33.044
That is that the more you are able to use routine or to routinize your action, it's good.

00:18:35.584 --> 00:18:41.884
So it means that you recruit additional layers when the routine in lower layer

00:18:41.884 --> 00:18:45.164
are not enough to resolve ambiguities in your actions.

00:18:45.164 --> 00:18:48.784
So in that way you recruit more and more entirely.

00:18:50.004 --> 00:18:57.264
Regions in the prefrontal cortex to solve the decision problem because decision

00:18:57.264 --> 00:19:02.184
is a problem first rather than a solution.

00:19:02.444 --> 00:19:09.824
But then if you recruit more areas what's the criteria to do that and how deep can you go?

00:19:10.084 --> 00:19:18.864
Yeah, so the idea is that first of course you have the basic stimulus that triggers

00:19:18.864 --> 00:19:24.484
the actions and this kind of stimulus response association are stored in the premotor cortex.

00:19:24.644 --> 00:19:26.944
So this is a very basic level. Then,

00:19:28.038 --> 00:19:33.198
If you have some ambiguities at this level, the first thing you want to know

00:19:33.198 --> 00:19:36.598
is whether in the immediate context, in the present context,

00:19:36.918 --> 00:19:39.958
there are some cues that help to disambiguate this.

00:19:40.758 --> 00:19:47.818
And this is the role of the posterior prefrontal region that just lie next to the premotor cortex.

00:19:48.098 --> 00:19:52.078
So this is the first layer to disambiguate actions, selection.

00:19:52.078 --> 00:19:58.998
Then if in the immediate context I mean there are no cues that help you to know

00:19:58.998 --> 00:20:07.258
which action you should select then you go more anteriorly and in this layer you have regions,

00:20:08.178 --> 00:20:15.258
that have access to more distant information more temporarily distant information and as we said mainly.

00:20:16.378 --> 00:20:17.838
Episodic information in the

00:20:17.838 --> 00:20:24.718
past events that occur maybe one minute ago that may provide some cues.

00:20:26.078 --> 00:20:31.898
And then you have the frontal pole which has a specific role which allows you

00:20:31.898 --> 00:20:33.698
to consider multiple alternatives.

00:20:34.258 --> 00:20:41.458
And which is important just to break the pure seriality of actions and to be

00:20:41.458 --> 00:20:48.778
able to consider several alternatives in the selection process.

00:20:49.818 --> 00:20:53.918
And so this is more or less the way I think, and of course we have data that

00:20:53.918 --> 00:21:00.578
provide evidence about this organization, that I think the prefrontal cortex is organized.

00:21:00.858 --> 00:21:03.158
So you would see it as a three-step process?

00:21:06.058 --> 00:21:11.438
Yes. Yeah, I would say that basically I think these are three steps.

00:21:11.438 --> 00:21:16.358
So there is a sensorimotor level, then there is a contextual level that allows

00:21:16.358 --> 00:21:21.538
you to select appropriate sensorimotor associations according to the present context,

00:21:21.638 --> 00:21:27.338
and then an additional layer that provides you information about the past, episodic events.

00:21:27.518 --> 00:21:30.738
And then there are these specific regions, so it's a top layer,

00:21:30.978 --> 00:21:41.318
which is a frontopolar cortex that enables you to process different alternatives at the same time.

00:21:41.438 --> 00:21:46.218
To consider different alternatives that might be influence the selection process. Right.

00:21:46.778 --> 00:21:54.038
But now, so if I face a certain problem-solving task, who decides that I switch processing level?

00:21:55.358 --> 00:22:00.578
Or which system or what criteria would switch between these levels of processing?

00:22:02.758 --> 00:22:05.818
So you can see a problem like...

00:22:07.684 --> 00:22:12.064
An environment that you don't know and that you travel within.

00:22:13.104 --> 00:22:19.364
Like a city, you arrive in a new city, you have no plan, and basically you travel within this city.

00:22:21.704 --> 00:22:26.704
And so I think that every problem,

00:22:27.544 --> 00:22:37.044
can be reduced to navigating into an unknown space and find a way, a path within this.

00:22:38.584 --> 00:22:45.284
And so it means that at the end, at the beginning, you first start by some very

00:22:45.284 --> 00:22:48.424
basic routine, maybe on store in the premotor cortex.

00:22:49.064 --> 00:22:54.884
And at some point, this routine will fail because this is a new situation, a new city.

00:22:55.304 --> 00:23:01.304
And then you start to see whether in the immediate environment,

00:23:01.304 --> 00:23:09.164
There are some cues that will trigger in your memory some other basic routine

00:23:09.164 --> 00:23:10.864
you learn somewhere else.

00:23:11.064 --> 00:23:15.024
Some system must be monitoring this, right? Some system must be monitoring like,

00:23:15.204 --> 00:23:18.424
okay, sensory cues are not helping me now.

00:23:19.064 --> 00:23:23.384
So there must be some integrator somewhere with some threshold that says,

00:23:23.524 --> 00:23:25.804
okay, we're lost at the level of sensory cues.

00:23:26.904 --> 00:23:29.464
Let's move on. Let's try context.

00:23:31.384 --> 00:23:35.904
So I guess it's really that sequential and that scheduled or are these systems…

00:23:35.904 --> 00:23:36.844
No, of course, everything is combined.

00:23:37.684 --> 00:23:41.304
It's just an easy way to describe things.

00:23:41.744 --> 00:23:45.624
Okay, so in your mind, this all runs concurrently. All these systems run in

00:23:45.624 --> 00:23:48.864
parallel at the same time generating solutions. Yes, of course,

00:23:48.864 --> 00:23:51.204
there is no reason that I switch on or switch off.

00:23:53.344 --> 00:23:59.364
Always, I mean, the context, context where you are integrated within the process.

00:23:59.364 --> 00:24:02.804
Yeah, but so what I'm asking for, on the one that you were saying earlier,

00:24:03.464 --> 00:24:09.644
this frontal area has this intrinsic property to monitor and to regulate if you want.

00:24:09.864 --> 00:24:14.444
But now, if we look at how this system is deployed in a task,

00:24:14.704 --> 00:24:19.344
where it actually performs multiple functions in parallel, this in itself would

00:24:19.344 --> 00:24:20.764
require some form of monitoring.

00:24:21.444 --> 00:24:25.064
So that raises then this question, okay, where is that coming from?

00:24:25.064 --> 00:24:32.004
Because it's monitoring other areas taking that into account in its own processing

00:24:32.004 --> 00:24:36.404
but now we need a monitor that monitors the monitor so how is that done.

00:24:38.445 --> 00:24:43.965
Yeah, I see what you mean, but there is only one type of monitoring.

00:24:44.045 --> 00:24:48.385
I don't think, but this is an interesting question by itself.

00:24:49.845 --> 00:24:53.445
Your question is whether there are some monitoring of the monitoring process,

00:24:54.605 --> 00:24:59.165
because in one sense we can go with no limit. Infinite regress, exactly.

00:24:59.645 --> 00:25:02.065
My view is that there is only one level of monitoring.

00:25:03.225 --> 00:25:06.645
So basically you have the basic process and then you have the monitoring process,

00:25:06.645 --> 00:25:09.465
And the prefrontal cortex is about this level of monitoring.

00:25:10.205 --> 00:25:17.705
And they don't have systems that don't, there is no system that basically monitor what,

00:25:17.925 --> 00:25:26.625
there is no recursive way of monitoring, I think, in the prefrontal.

00:25:26.665 --> 00:25:30.325
We have some evidence about that, some tiny evidence about that.

00:25:30.325 --> 00:25:33.645
When you ask people to, you know,

00:25:33.665 --> 00:25:43.885
we know that the prefrontal cortex is important in suspending a task you are

00:25:43.885 --> 00:25:45.765
performing for performing another task.

00:25:47.665 --> 00:25:54.185
What we notice is that people are very bad in doing this process recursively twice.

00:25:54.585 --> 00:25:57.685
That is, you interrupt the first

00:25:57.685 --> 00:26:01.565
task to perform a subtask. When I say interrupt, I don't say stopping.

00:26:01.705 --> 00:26:05.745
I just say you interrupt, you suspend it. So you have to keep some information about the task.

00:26:05.865 --> 00:26:10.305
So you suspend it, then you switch to a subtask to perform it.

00:26:10.725 --> 00:26:14.085
And so people can do that very easily.

00:26:14.305 --> 00:26:20.105
But when you ask them to suspend this secondary task to perform a tertiary task,

00:26:20.525 --> 00:26:22.305
then they got real problems.

00:26:23.085 --> 00:26:26.305
So it seems that they don't have the ability to,

00:26:29.105 --> 00:26:34.085
To have a two-level monitoring system. Right. It's tiny evidence,

00:26:34.285 --> 00:26:36.985
but it's some evidence. It's an interesting prediction, right?

00:26:37.705 --> 00:26:44.485
So, but are you, in some sense you're saying, look, overall the brain is organized

00:26:44.485 --> 00:26:48.565
in such a way that it hopes the frontal areas don't get involved because it

00:26:48.565 --> 00:26:50.505
means it knows what to do automatically.

00:26:52.505 --> 00:26:57.825
So, how is that then linked to this whole debate on controlled versus automated

00:26:57.825 --> 00:27:00.705
processing Because it's not only now about the decision making,

00:27:00.845 --> 00:27:05.185
it's about also the discovery of the structure in a decision making problem

00:27:05.185 --> 00:27:07.325
so that you can automate it.

00:27:08.565 --> 00:27:09.945
So how does that play out?

00:27:11.771 --> 00:27:16.891
Um, I far as understand your question, I, for me, automation is,

00:27:17.031 --> 00:27:19.571
uh, is a process by itself.

00:27:21.071 --> 00:27:25.371
And when I say that, uh, the goal of the prefrontal function is not to be involved,

00:27:25.631 --> 00:27:30.911
it means that when it is involved, it means basically that you face a situation

00:27:30.911 --> 00:27:35.631
that, uh, you don't really know what to do, uh, about.

00:27:36.371 --> 00:27:43.211
And, uh, but I'm not sure there's a, there is no, there is no process control

00:27:43.211 --> 00:27:45.971
process that controls the auto automatization.

00:27:45.991 --> 00:27:53.831
If you don't automatization is by default, what's occur, but it fails when it

00:27:53.831 --> 00:27:56.851
fails, the prefrontal cortex is engaged. Okay.

00:27:57.131 --> 00:28:01.211
But this is a default. There is a default. I think this is this notion of default.

00:28:01.211 --> 00:28:04.271
I mean, you are involved in robots.

00:28:05.071 --> 00:28:11.491
I think the notion of default behavior is very important.

00:28:12.051 --> 00:28:16.571
You don't think this is the same in robots? Of course. That you need to have some default.

00:28:16.831 --> 00:28:21.011
Sure. If everything goes wrong, so I do that by default. Yes.

00:28:21.371 --> 00:28:28.831
What I'm after is, so now we have this, you also call this cascade of cognitive

00:28:28.831 --> 00:28:30.951
control, right? This is really what you described.

00:28:31.211 --> 00:28:34.051
Previously but now if i if

00:28:34.051 --> 00:28:37.871
i engage in a certain task like talking to you um if

00:28:37.871 --> 00:28:40.951
we would do this 20 times over and i would say the same things at

00:28:40.951 --> 00:28:44.631
some point in time i don't have to invent questions anymore because i know them

00:28:44.631 --> 00:28:50.591
by heart i've automated this task so but do you see that automation as as an

00:28:50.591 --> 00:28:55.531
active process that is regulated by by this frontal area or do you see this

00:28:55.531 --> 00:29:00.671
as being a concurrent process dependent on other neural structures that is just picking

00:29:00.751 --> 00:29:03.251
up these regularities again, and automates them.

00:29:03.791 --> 00:29:11.811
I think it's even simply a sensorimotor model, internal model that is stored

00:29:11.811 --> 00:29:18.231
in probably a premotor region, some basal ganglia, and some posterior associative regions.

00:29:19.651 --> 00:29:24.131
It's become more or less encapsulated in this system, and it can be just triggered

00:29:24.131 --> 00:29:27.931
or stopped as a wall by the prefrontal system.

00:29:28.491 --> 00:29:34.051
But then it can be processed, I mean, by itself. It's run by itself.

00:29:34.631 --> 00:29:37.591
That's what, you see what I mean?

00:29:39.371 --> 00:29:44.871
It's become a fully consistent representation driving behavior.

00:29:45.691 --> 00:29:50.631
But do you think that this distinction then, controlled automatic,

00:29:50.911 --> 00:29:53.171
is actually helpful to look at this system?

00:29:56.765 --> 00:30:02.665
I think so. At least, yes, I think it's an important distinction because for

00:30:02.665 --> 00:30:03.565
the prefrontal function,

00:30:03.805 --> 00:30:13.765
I really think that prefrontal functions work above what's going on in an automatic system.

00:30:14.125 --> 00:30:18.345
So a task could be as complex as possible.

00:30:18.445 --> 00:30:24.725
As far as it is automatized, it is stored in this wonderful area,

00:30:24.725 --> 00:30:28.285
which are the premotor cortex, the parietal cortex, the temporal cortex,

00:30:28.385 --> 00:30:32.165
which have impressive representational power,

00:30:33.225 --> 00:30:35.745
and it could run automatically.

00:30:36.865 --> 00:30:42.285
And the prefrontal cortex is not concerned by this. There are other regions

00:30:42.285 --> 00:30:44.345
that do this job perfectly.

00:30:45.685 --> 00:30:50.705
Just the prefrontal cortex wants to know when this should be activated and when

00:30:50.705 --> 00:30:51.805
it should not be activated.

00:30:52.325 --> 00:30:54.085
So this is the monitoring part.

00:30:54.985 --> 00:31:00.705
And, of course, the monitoring part is a way of controlling things.

00:31:01.605 --> 00:31:05.805
Right. This is a notion of control. Control is also a notion of,

00:31:05.805 --> 00:31:08.705
I mean, acting a little bit on things.

00:31:09.685 --> 00:31:17.045
But monitoring, of course, sounds easy, but it does imply that you have norms of monitoring.

00:31:18.465 --> 00:31:21.265
You have to have criteria on the on

00:31:21.265 --> 00:31:25.865
the grounds of we say like oh wait this is now a relevant exception that we

00:31:25.865 --> 00:31:32.505
have to deal with right so just just say monitoring is yeah so there is there

00:31:32.505 --> 00:31:40.105
are two view for two general views which are one view is that everything is relative,

00:31:40.465 --> 00:31:47.605
that is that you always monitor different alternatives,

00:31:48.825 --> 00:31:55.765
and you are interested in selecting the most relevant alternative within the one you monitor.

00:31:58.134 --> 00:32:04.074
It's fine. The problem with this view is that you are always stuck within this

00:32:04.074 --> 00:32:08.434
collection of alternative view monitors.

00:32:09.614 --> 00:32:14.994
You have no systems that allow you to say, okay, I should look elsewhere.

00:32:14.994 --> 00:32:23.014
Well, not just within this small collection of alternatives I can collect,

00:32:23.174 --> 00:32:28.514
to know, maybe to select an even more relevant alternative.

00:32:29.574 --> 00:32:35.174
So the other notion is that rather to compare alternatives together,

00:32:35.494 --> 00:32:41.554
it's just for each alternative you monitor, you try to have a measure whether

00:32:41.554 --> 00:32:46.234
this alternative remains reliable or relevant or not,

00:32:46.414 --> 00:32:52.574
to try to have an absolute measure of whether this alternative is,

00:32:52.734 --> 00:32:55.374
let us say, relevant, a quite elusive term.

00:32:58.214 --> 00:33:03.154
And when you have, and how do you say an alternative is relevant?

00:33:03.494 --> 00:33:09.274
So there are, I think, probably different factors that can contribute to judge

00:33:09.274 --> 00:33:15.954
an alternative as relevant or irrelevant, but one important is its ability to predict,

00:33:18.194 --> 00:33:23.874
action outcome. I use, for example, I use this...

00:33:26.394 --> 00:33:31.474
Let us say that a behavioral strategy is like a map with some paths.

00:33:34.174 --> 00:33:38.414
So you use a map with this path, what you expect first is that when you follow

00:33:38.414 --> 00:33:45.394
the path, you expect to see in the real world what you expect on the map.

00:33:46.594 --> 00:33:55.534
So the first important criteria for relevance is the ability to predict the result of your actions.

00:33:57.054 --> 00:34:00.214
There might be others, but I think this is probably one of the most important.

00:34:02.094 --> 00:34:06.194
And this is the way I think

00:34:06.194 --> 00:34:11.114
the prefrontal functions solutions monitor strategies

00:34:11.114 --> 00:34:16.414
that is many in

00:34:16.414 --> 00:34:20.094
an absolute way for each strategy try to figure out whether the strategy is

00:34:20.094 --> 00:34:24.954
relevant or not but this has interesting consequences right because then although

00:34:24.954 --> 00:34:31.354
with respect to action you might want to say i want to go to one then if you

00:34:31.354 --> 00:34:33.054
want to be able to monitor its outcome,

00:34:33.254 --> 00:34:40.974
you must actually be able to load in memory any reference for future consultation.

00:34:41.414 --> 00:34:47.654
So, I mean, that basically means a whole set of possible outcomes must now be

00:34:47.654 --> 00:34:52.914
considered because any action in a complex world can have a quite wide range

00:34:52.914 --> 00:34:54.934
of consequences. So, this world is dynamic.

00:34:59.856 --> 00:35:06.936
First, I mean, when I talk about behavioral strategy,

00:35:07.216 --> 00:35:15.456
I think about a set of internal representations that include representation

00:35:15.456 --> 00:35:22.656
about what kind of outcome I expect when I do this action in this situation.

00:35:24.596 --> 00:35:32.836
So uh of course you uh and this is what you have in your memory basically you know that,

00:35:33.696 --> 00:35:40.376
if for example i i am at home and i press on this interrupter that i will get some lights,

00:35:41.336 --> 00:35:45.516
so you you learn this this is part of your strategy what yeah but for them all

00:35:45.516 --> 00:35:48.136
i'm saying is for monitoring to work effectively,

00:35:48.916 --> 00:35:53.736
it must consider a set of possible outcomes. It's not only one.

00:35:54.616 --> 00:35:59.036
Yeah, of course. I mean, maybe in this example, there is only one,

00:35:59.116 --> 00:36:00.556
but you may have several.

00:36:02.156 --> 00:36:07.836
Yeah, I see what you mean, which you mean, for example, I do an action and I

00:36:07.836 --> 00:36:10.976
may have a chain of consequences. That's right. This is what you mean.

00:36:12.296 --> 00:36:14.996
Yeah, but I think in one sense.

00:36:17.016 --> 00:36:23.836
There is no reason to believe that in principle you can code all the consequences.

00:36:24.176 --> 00:36:26.556
But of course there are some problems of dimensionality.

00:36:27.196 --> 00:36:34.536
So it's possible that at some point there are some criteria that allow you to

00:36:34.536 --> 00:36:41.156
identify some landmarks, specific outcomes and landmarks.

00:36:41.996 --> 00:36:45.556
And it's part Part of the complexity of the system, of course.

00:36:49.596 --> 00:36:54.636
It is an interesting counterpoint because you could say, well,

00:36:54.656 --> 00:36:59.876
one thing what I'm doing, I'm pruning away all less preferable alternatives.

00:37:00.036 --> 00:37:04.956
So I have my one interpretation of the task and the action I have to execute.

00:37:05.376 --> 00:37:09.956
But you could say conversely, the more complex the task, the more pruning I

00:37:09.956 --> 00:37:14.836
have to do to get to my action, the more outcome alternatives I have to consider for my monitoring.

00:37:18.821 --> 00:37:25.181
Yes. As an example, we can walk out of the studio, we can go through that door,

00:37:25.301 --> 00:37:31.061
but maybe Giovanni, our sound engineer, stands there with a baseball bat to

00:37:31.061 --> 00:37:33.621
chase us down the whole wheel. We don't know.

00:37:34.581 --> 00:37:40.321
Or maybe the building has disappeared, etc. So these are all consequences,

00:37:40.501 --> 00:37:45.061
all future states of the world that we must be able to consider.

00:37:45.841 --> 00:37:50.681
Yeah, but you don't consider it. Okay. You know, I mean, this is because when

00:37:50.681 --> 00:37:54.321
you are in the studio, you know the studio, you are used with the studio.

00:37:54.541 --> 00:38:03.401
So you know that in 99% of time when you use it, nobody wait for you with a baseball bat.

00:38:05.161 --> 00:38:10.021
So you don't code that. Baguette, maybe, the French version. Baguette.

00:38:10.741 --> 00:38:15.761
I mean, you know, every situation, this is part, every situation.

00:38:15.761 --> 00:38:21.001
I mean, if you go to the airport, you have some expectations, okay?

00:38:21.401 --> 00:38:27.461
Of course, if you go to a place where nobody looks like what you experienced

00:38:27.461 --> 00:38:33.221
before, I think you start to be very scary. But it never really happened.

00:38:33.621 --> 00:38:36.521
Right. Okay, now this is resolved, right? So you're saying,

00:38:36.621 --> 00:38:42.721
no, monitoring acts upon a rather explicitly defined world model,

00:38:42.721 --> 00:38:48.701
Which is the same one that feeds into the action you generate and then also

00:38:48.701 --> 00:38:50.121
the monitoring of its outcomes.

00:38:50.421 --> 00:38:56.521
This is roughly what you would say. So this frontal area is really compressing

00:38:56.521 --> 00:39:00.561
everything down into just one unitary interpretation of what you're doing. Yeah.

00:39:02.104 --> 00:39:05.404
Yes, I think, and this is an important point, what you say that,

00:39:05.524 --> 00:39:13.604
at least for myself, is that there are discrete entities, which are different

00:39:13.604 --> 00:39:16.964
world, I call that strategy or behavioral strategy.

00:39:17.404 --> 00:39:21.704
I mean, psychology, they say task set, but it's the same concept.

00:39:21.704 --> 00:39:25.244
Set, that is, you have discrete sets which are consistent.

00:39:26.984 --> 00:39:32.044
Collection of, each set is a consistent collection of internal world representation,

00:39:33.144 --> 00:39:38.004
and that can be selected by it independently.

00:39:39.264 --> 00:39:42.744
And this is the role of the prefrontal cortex to select them independently,

00:39:43.144 --> 00:39:44.224
to monitor them independently,

00:39:44.724 --> 00:39:54.444
to possibly perseverate with one set in order that this set develop and learn better the word.

00:39:54.964 --> 00:39:59.124
And that's it. So this is a basic unit that the prefrontal cortex manipulates.

00:39:59.184 --> 00:40:00.364
This is this discrete set.

00:40:00.524 --> 00:40:03.984
So there is this notion of discreteness, which I think is important.

00:40:04.244 --> 00:40:07.144
That's an important point, because also in your experimental work,

00:40:07.924 --> 00:40:10.044
I think this is really one of the elements you emphasize a lot.

00:40:10.224 --> 00:40:15.584
So one set of experiments you described was about a comparison between,

00:40:15.704 --> 00:40:21.544
let's say, rule-free tasks and And rule-based tasks, right?

00:40:21.604 --> 00:40:28.984
So why is that an important manipulation for understanding what this frontal area is doing?

00:40:30.724 --> 00:40:37.524
So there are several, I think, important things related to this issue.

00:40:39.384 --> 00:40:43.844
First, there is a general, I would say, questions. questions,

00:40:43.884 --> 00:40:49.564
this very general question, why do we follow rules?

00:40:51.896 --> 00:40:55.456
I mean, uh, we follow rules all the time.

00:40:55.776 --> 00:41:01.216
I mean, uh, especially when you behave in a group, there are some rules,

00:41:01.276 --> 00:41:04.736
you follow rules and often at your expense of your own preferences.

00:41:05.156 --> 00:41:10.076
So there is, for me, it was one of the important things to understand why basically

00:41:10.076 --> 00:41:13.276
we follow rules and especially in human groups.

00:41:13.376 --> 00:41:18.916
I mean, there are some, um, many rules are about cooperative rules and,

00:41:18.936 --> 00:41:21.096
um, coordination rules.

00:41:21.896 --> 00:41:27.956
And especially coordination rules, that are very sensitive to deviations from others.

00:41:28.276 --> 00:41:35.076
I mean, a coordination rule is meaningful if everybody follows the rules, okay?

00:41:35.156 --> 00:41:38.336
Like driving on the right.

00:41:41.436 --> 00:41:49.936
So the idea was to, okay, if rules are very important to follow in groups,

00:41:51.056 --> 00:41:57.776
It means that there might be some specific process that allow rules to prevence

00:41:57.776 --> 00:42:01.196
on subjective values or subjective preferences.

00:42:01.616 --> 00:42:04.256
So I was interested in this question, this general question.

00:42:04.416 --> 00:42:10.816
So it's more a question about how it is possible that rules that are very sensitive

00:42:10.816 --> 00:42:14.196
to individual variation develop in human groups.

00:42:14.196 --> 00:42:19.336
So there might be some very specific mechanisms or functional architecture in

00:42:19.336 --> 00:42:21.036
the brain that make it possible.

00:42:21.636 --> 00:42:24.636
So a very general evolutive question.

00:42:24.836 --> 00:42:30.416
The second question was about, it's related to the notion of context.

00:42:30.716 --> 00:42:38.016
So the rule is basically you have cues, and these cues trigger some specific behavior.

00:42:39.956 --> 00:42:47.816
And you have rewards expected rewards that can drive some behavior so the question

00:42:47.816 --> 00:42:55.556
is exactly how these two process what we can identify independently interact mm-hmm.

00:42:57.505 --> 00:43:00.885
And this is related to what I said. There is, of course, the notion of values,

00:43:01.045 --> 00:43:02.685
which is important to select action.

00:43:02.825 --> 00:43:06.845
But the rule seems not to be about values, but about relevance.

00:43:07.005 --> 00:43:08.885
What is relevant in this situation?

00:43:09.245 --> 00:43:12.805
So that's why I was interested about this issue,

00:43:12.945 --> 00:43:23.045
is whether this notion of relevance or reliability is really relevant,

00:43:23.045 --> 00:43:31.565
or whether a rule is simply some representations that at some point are transformed into values,

00:43:31.885 --> 00:43:36.245
subjective value, or modulate what I can expect as a reward in the future,

00:43:37.545 --> 00:43:44.745
so that every selection ends up as a choice between two options with different values.

00:43:46.525 --> 00:43:51.045
And what we found is that actually this is not the case.

00:43:51.045 --> 00:44:02.985
We found that, according to our data, the selection process at the end occurs in the rule space.

00:44:04.105 --> 00:44:10.445
And preferences or expected reward are just some additional information that

00:44:10.445 --> 00:44:16.465
is provided to this rule-based space to make the selection.

00:44:17.585 --> 00:44:26.045
Right. but the rule the rules prevail on the selection that is if you have rules

00:44:26.045 --> 00:44:33.905
that another way to say things maybe more explicitly that as long as you have rules that allow you to,

00:44:35.062 --> 00:44:40.002
to decide what to do, the system doesn't care about your preferences.

00:44:40.662 --> 00:44:46.682
In the selection process, of course, it cares when it evaluates the result of

00:44:46.682 --> 00:44:47.962
the action, action outcome.

00:44:48.342 --> 00:44:52.942
But in the selection process, it doesn't care. The subjective preferences or

00:44:52.942 --> 00:44:57.682
expected reward start to influence selection as long as the rules become ambiguous.

00:44:58.542 --> 00:45:05.362
But now, so underlying this is like a two-dimensional space that also maps onto

00:45:05.362 --> 00:45:07.162
the anatomy of a frontal cortex.

00:45:09.562 --> 00:45:15.242
This was the idea you were sort of investigating here, that along a medial axis,

00:45:15.382 --> 00:45:19.842
it's more value-oriented, and along a lateral axis, so towards the outside,

00:45:20.042 --> 00:45:21.302
it's more rule-oriented.

00:45:22.762 --> 00:45:28.242
And so what you're saying in your experiment with the fMRI you did,

00:45:28.482 --> 00:45:32.642
it gave you the impression that the real action selection, like the dominant

00:45:32.642 --> 00:45:36.242
axis here, would then be more this lateral axis where the rules reside.

00:45:36.822 --> 00:45:41.862
But is it really that discrete? I mean, on the grounds of which can you really say that?

00:45:44.982 --> 00:45:50.602
So there are many evidence that there is this dual system.

00:45:50.822 --> 00:45:56.182
So the first is that the medial system, as you said, that is related to processing

00:45:56.182 --> 00:46:02.302
expected reward, values, subjective preference, whatever you call that.

00:46:03.382 --> 00:46:11.262
And this processing is implemented mainly in the medial prefrontal system.

00:46:12.322 --> 00:46:17.402
Then you have this lateral prefrontal system that seems to be involved whenever

00:46:17.402 --> 00:46:24.662
you have some rules, some instructions, some internal model that,

00:46:26.642 --> 00:46:27.802
drives the selection.

00:46:29.342 --> 00:46:34.002
And we know also, of course, from anatomy that these two systems are tightly connected.

00:46:37.542 --> 00:46:38.102
So

00:46:42.355 --> 00:46:50.975
So the idea is that you have, I think this is part of, I think there are two possible views.

00:46:51.835 --> 00:46:57.055
One view is the homogeneous view. That is, there is no real specializations

00:46:57.055 --> 00:47:03.635
and preferences and roles are mixed in the interaction between the two systems.

00:47:03.675 --> 00:47:07.075
At the end, the selection is made by the system as a role.

00:47:07.275 --> 00:47:12.015
Okay. The system relax to a given state and it makes a selection.

00:47:13.295 --> 00:47:15.715
It's a possible view the other view is that,

00:47:17.735 --> 00:47:22.835
there is one of these subsystems the medial the preference system or the lateral

00:47:22.835 --> 00:47:25.995
the whole system that actually,

00:47:27.415 --> 00:47:32.815
is the system that makes the final decision that commit behavior that is the

00:47:32.815 --> 00:47:37.815
information coded this is actually what the system what the system or what the

00:47:37.815 --> 00:47:42.735
organism is going to do as action and.

00:47:46.675 --> 00:47:52.015
We found evidence about this second interpretation, this second hypothesis,

00:47:52.715 --> 00:47:58.615
that is that the lateral system make final selections that commit the organism

00:47:58.615 --> 00:48:02.375
and in my view and the question is why it's like that,

00:48:03.215 --> 00:48:04.995
in my view is that,

00:48:07.355 --> 00:48:13.255
you know in human the lateral prefrontal cortex developed a lot. And,

00:48:15.855 --> 00:48:21.615
this is a rule system. So I think what is very specific to human is that we

00:48:21.615 --> 00:48:24.615
have the ability to build rules all the time.

00:48:26.275 --> 00:48:32.055
And I think the selection is moved to the lateral system where basically there

00:48:32.055 --> 00:48:34.595
is all the process that allow to learn rules.

00:48:36.115 --> 00:48:36.755
And,

00:48:39.429 --> 00:48:44.229
And promoting, therefore, all this, the learning of rules and the use of rules

00:48:44.229 --> 00:48:47.909
in the selection process, because we are social organisms.

00:48:48.169 --> 00:48:51.129
And in groups, you need rules.

00:48:51.549 --> 00:48:55.609
Right. It's very important. And especially, as I said at the beginning,

00:48:55.869 --> 00:48:59.049
coordination rules are critical in groups.

00:48:59.729 --> 00:49:05.549
So, in so much you're describing here, you described in terms of a utilitarian

00:49:05.549 --> 00:49:09.649
model versus a normative model, right? So the utilitarian one would be more

00:49:09.649 --> 00:49:12.889
value-dominated, and the normative one is more rule-dominated.

00:49:13.169 --> 00:49:16.149
And you're saying, look, your data is pointing you in this direction,

00:49:16.189 --> 00:49:21.309
that this normative model is, if you want, more dominating the action outcome

00:49:21.309 --> 00:49:24.669
than by this utilitarian model.

00:49:25.109 --> 00:49:29.869
But now, so the experiments on which you base this, which is sort of human subjects

00:49:29.869 --> 00:49:33.369
performing different decision-making tasks, and you do fMRI on them,

00:49:33.529 --> 00:49:38.229
so you look at the brain activity in these areas, There are,

00:49:38.249 --> 00:49:44.349
of course, a number of caveats, if you want, because imagine I interpret your

00:49:44.349 --> 00:49:46.109
statement in a very categorical sense.

00:49:46.309 --> 00:49:50.669
You would say, look, that would have the implication that I have a utilitarian

00:49:50.669 --> 00:49:52.489
module that just worries about value.

00:49:52.729 --> 00:49:55.989
This is more in the medial prefrontal cortex.

00:49:56.329 --> 00:50:02.229
Then I have this normative rule-based system sitting more lateral as a well-delineated,

00:50:02.249 --> 00:50:09.289
again, module. and they exchange well-defined information chunks, if you want.

00:50:09.369 --> 00:50:10.809
One is informing the other about

00:50:10.809 --> 00:50:13.729
the value, and the other one is informing back about the rules, right?

00:50:13.829 --> 00:50:16.089
But now I could argue, well, that's great.

00:50:16.229 --> 00:50:20.569
That's a really nice way to interpret the data, and it is consistent with the

00:50:20.569 --> 00:50:22.629
experiments you performed. There's no doubt about it.

00:50:23.869 --> 00:50:30.809
But for starters, the signals you interpret become significant only at a scale

00:50:30.809 --> 00:50:36.269
of seconds. while the performance is occurring at the scale of hundreds of milliseconds.

00:50:38.249 --> 00:50:43.309
So it's possible that the neural process that really is driving this action

00:50:43.309 --> 00:50:49.049
selection is really below the radar of your fMRI evaluation and that what you're

00:50:49.049 --> 00:50:50.629
analyzing is maybe more,

00:50:50.809 --> 00:50:57.329
let's say, how you process decisions in memory after the decision has been made

00:50:57.329 --> 00:51:01.569
than the real-time performance of the subject. Thank you.

00:51:08.007 --> 00:51:13.967
Yes. Of course, this is a problem with fMRI data, that we don't have access

00:51:13.967 --> 00:51:20.087
to the millisecond time scale, which is important for neural processing.

00:51:20.447 --> 00:51:29.007
But I can tell you that if you look at neural data on these regions,

00:51:29.687 --> 00:51:32.967
I mean, they are consistent with what we found.

00:51:32.967 --> 00:51:39.887
I mean, we know, for example, that in the dorsomedial prefrontal cortex,

00:51:40.027 --> 00:51:44.527
we have neurons that encode action outcomes associations,

00:51:44.847 --> 00:51:53.807
and more in larger proportions than in the lateral prefrontal regions.

00:51:53.807 --> 00:52:00.527
And we know also that during a decision, these regions activate first in medial

00:52:00.527 --> 00:52:06.087
regions and before neurons in lateral prefrontal regions.

00:52:06.407 --> 00:52:13.507
So we have neurons in the lateral prefrontal region activate closer to the decision

00:52:13.507 --> 00:52:16.267
time than neurons in the medial prefrontal cortex.

00:52:16.447 --> 00:52:19.347
So it's consistent with what I am saying. Right, okay. Okay.

00:52:20.987 --> 00:52:24.747
But then there's still another missing link, which is that I could argue,

00:52:24.787 --> 00:52:26.427
but look, if we look at these cortical areas,

00:52:26.807 --> 00:52:31.327
if we just, I give you a cubic millimeter of this medial prefrontal cortex,

00:52:31.547 --> 00:52:34.527
and I give you another cubic millimeter of lateral prefrontal cortex,

00:52:34.627 --> 00:52:38.267
and I don't tell you where they came from, you will have a hard time on morphological

00:52:38.267 --> 00:52:39.607
grounds to tell me what's what.

00:52:40.127 --> 00:52:43.547
So, I mean, there's huge similarity between these circuits.

00:52:43.647 --> 00:52:49.207
So what makes them so and so different in their functional contribution in decision-making.

00:52:54.422 --> 00:53:01.922
I think apparently what is different is that they are located in different positions

00:53:01.922 --> 00:53:07.222
in the network so that each have access to different type of information,

00:53:08.082 --> 00:53:16.682
and what is important at the end to describe the function of every regions,

00:53:16.942 --> 00:53:22.522
ideally is to be able to describe the inputs outputs and the output of these regions.

00:53:22.762 --> 00:53:33.642
And of course, we cannot do that comprehensively for a region because there

00:53:33.642 --> 00:53:35.042
are many connections from everywhere.

00:53:35.562 --> 00:53:44.482
But at least if I take this example about medial and lateral prefrontal region,

00:53:44.942 --> 00:53:51.062
I mean, I mean, if we try to understand what kind of information the media regions

00:53:51.062 --> 00:53:55.922
send to the lateral regions, and conversely, what kind of information the lateral

00:53:55.922 --> 00:53:57.422
regions send to the media regions,

00:53:58.542 --> 00:54:01.582
we see that it's not the same.

00:54:02.882 --> 00:54:08.402
Well, but in some sense, with your data, you don't really know what travels

00:54:08.402 --> 00:54:09.742
over these axons, right?

00:54:09.802 --> 00:54:14.982
You only know something about the covariance of their activity under certain task conditions.

00:54:15.462 --> 00:54:20.702
And the information exchange could possibly be regulated through another structure.

00:54:23.504 --> 00:54:29.124
Yes, it might be not direct for sure, but this data tells you that in one direction

00:54:29.124 --> 00:54:32.664
there is something happening which is not the same as in the other direction.

00:54:33.184 --> 00:54:35.084
Okay. This is what it means.

00:54:35.984 --> 00:54:39.584
But... That is the information that is shared between the two regions differ

00:54:39.584 --> 00:54:42.224
in one direction and in the other direction.

00:54:42.384 --> 00:54:46.764
So it provides some cues about the function of every region.

00:54:46.764 --> 00:54:55.924
Because my deep belief is that I think every region has a specific operation,

00:54:56.164 --> 00:54:57.664
implements a specific operation,

00:54:57.924 --> 00:55:07.944
which is quite abstract and every region is like an operator,

00:55:08.184 --> 00:55:10.544
if you want, an information processing operator.

00:55:10.544 --> 00:55:16.064
And of course, at least what I try to know is which operator is implemented

00:55:16.064 --> 00:55:17.944
in this given region and so on. Sure.

00:55:18.004 --> 00:55:24.884
I understand that. But now what we see is that if the local circuit is fairly

00:55:24.884 --> 00:55:29.804
uniform, right, the information exchange we cannot directly assess.

00:55:30.484 --> 00:55:35.804
So that means possibly there are actually other areas that are dictating this

00:55:35.804 --> 00:55:39.724
function, right? For instance, let's say it might be an interaction with basal

00:55:39.724 --> 00:55:41.944
ganglia-related structures or so. We don't know.

00:55:42.884 --> 00:55:46.104
But then I could argue, hey, but wait one moment, Etienne.

00:55:46.284 --> 00:55:49.604
This might also imply that it's actually these tertiary structures that we have

00:55:49.604 --> 00:55:53.464
not identified that are really doing the decision-making.

00:55:53.544 --> 00:55:57.324
And these other guys in this prefrontal area you're measuring from are just

00:55:57.324 --> 00:56:02.584
echoing in a way you can detect with your system, with your device, right?

00:56:03.475 --> 00:56:06.915
The result of that decision that has been made.

00:56:10.635 --> 00:56:19.975
Yes. You're perfectly right. I mean, but I think it's part of the scientific process.

00:56:20.035 --> 00:56:23.015
It's first to start with the most simple hypothesis.

00:56:23.975 --> 00:56:28.635
Okay. We found correlation or some information transfer from one region to another.

00:56:29.095 --> 00:56:32.355
We know that this region are deeply and densely connected.

00:56:33.475 --> 00:56:40.215
Okay, if we observe some influence, the first most simple interpretation is

00:56:40.215 --> 00:56:43.675
to consider that it goes directly from one region to the other.

00:56:44.215 --> 00:56:48.415
It might be wrong at some point, but I think we should always first by the most

00:56:48.415 --> 00:56:49.195
simple interpretation.

00:56:49.935 --> 00:56:52.595
As I said at the very beginning, with the most simple model.

00:56:52.955 --> 00:56:57.955
And then you complexify the model gradually if you need.

00:56:58.335 --> 00:57:02.475
And I'm sure it's a simplistic view in one sense.

00:57:02.475 --> 00:57:08.875
I don't say that this is, it's probably much more complex, but it's always useful

00:57:08.875 --> 00:57:15.855
to start from very simple principle and to refine this progressively. Absolutely.

00:57:16.055 --> 00:57:26.675
And I am sure that in the selection process, basal ganglia, of course,

00:57:26.715 --> 00:57:28.355
involves many loops. Mm-hmm.

00:57:30.283 --> 00:57:37.603
It's true, but you need to start from one point. No problem.

00:57:38.003 --> 00:57:47.383
But it's true that it's good to have a simple model, but you have to never forget

00:57:47.383 --> 00:57:51.063
that your models are just models and simple models.

00:57:51.283 --> 00:57:55.743
Sure, exactly. These are probably wrong at some point, and for sure they are wrong at some point.

00:57:55.963 --> 00:57:58.183
But what I'm challenging you on is that sometimes I'm saying,

00:57:58.283 --> 00:58:02.343
well, maybe your model is not as simple as it could be because you are already

00:58:02.343 --> 00:58:06.683
assuming that these are like two distinct modules with distinct functions that

00:58:06.683 --> 00:58:08.623
are exchanging well-defined information.

00:58:09.863 --> 00:58:12.883
Well, these are actually really pretty strong assumptions.

00:58:13.623 --> 00:58:16.103
That's why I used anatomy as an example. If I go to the anatomy,

00:58:16.423 --> 00:58:21.043
it will be really difficult to distinguish these circuits, medial, lateral.

00:58:21.283 --> 00:58:24.683
They will look rather similar, and their differences will be in,

00:58:24.723 --> 00:58:30.143
let's say, fairly subtle differences in how other structures project into them

00:58:30.143 --> 00:58:33.163
and receive information from them. But these are all really minute.

00:58:33.323 --> 00:58:34.463
These will be minute variations.

00:58:36.163 --> 00:58:40.223
So I'm sort of challenging your idea. I completely buy the method,

00:58:40.363 --> 00:58:42.663
but I would say, well, maybe this is not a minimal interpretation.

00:58:44.383 --> 00:58:48.463
Yeah, you said that it's already quite complex. But why is it complex?

00:58:50.403 --> 00:58:56.283
It's complex because simply we consider a coupling playing system with reciprocal connections.

00:58:57.383 --> 00:59:04.563
And our mind doesn't seem to be very well adapted to understand reciprocal interaction.

00:59:04.803 --> 00:59:10.903
As long as you have reciprocal interaction, you need mathematical model to make sense. Fair enough.

00:59:11.203 --> 00:59:17.003
However, you do assume that within each of these modules, a very specific function

00:59:17.003 --> 00:59:20.983
is performed because one does value-based operations and the other one does

00:59:20.983 --> 00:59:23.583
rule-based operations. And that's not necessarily fairly simple.

00:59:24.683 --> 00:59:28.343
Yes. Yes. But this is what I said at the very beginning.

00:59:28.983 --> 00:59:32.563
Imagine that, okay, let us, if you raise the question, okay,

00:59:32.623 --> 00:59:36.603
we have this media region, we have this data region, what could be the difference

00:59:36.603 --> 00:59:38.363
between the two regions, functionally?

00:59:39.243 --> 00:59:44.123
If you start to think, you look at the literature, you look at all the data,

00:59:44.883 --> 00:59:51.183
many labs and people collected, and you say, okay, what could be the differences,

00:59:51.363 --> 00:59:55.563
the functional differences, the functional segregation between these two regions, if it exists,

00:59:55.823 --> 00:59:59.903
you may end up with a conclusion that, okay, they're quite similar.

01:00:01.083 --> 01:00:04.763
And this is, as I said at the beginning, this is a homogeneous view.

01:00:05.903 --> 01:00:09.263
But you say, okay, let us really think that they make different functions.

01:00:09.603 --> 01:00:11.543
And you cannot end up with a,

01:00:12.280 --> 01:00:16.540
So many different assumptions, you know. There are a few, but not so much.

01:00:16.800 --> 01:00:22.460
Right. And because still we have data and there is a consistency also you need

01:00:22.460 --> 01:00:25.620
to look at when you build this kind of hypothesis.

01:00:26.400 --> 01:00:31.220
It's not consistent to imagine that, for example, I don't know, but...

01:00:33.020 --> 01:00:36.520
Yeah, this is an important point, right? Because if you look at your more recent

01:00:36.520 --> 01:00:41.200
experiments, you actually have found only further support for this way of thinking

01:00:41.200 --> 01:00:46.400
about the system as opposed to falsification of it.

01:00:46.560 --> 01:00:48.820
So this, I think, would argue for that.

01:00:49.100 --> 01:00:56.460
And so that had a lot to do with these experiments where you looked at the transfer

01:00:56.460 --> 01:01:02.660
of value in different tasks of varying complexity.

01:01:03.640 --> 01:01:07.720
And also in these tasks, what you did, which I think was extremely interesting,

01:01:07.960 --> 01:01:12.980
you really looked at how different models of decision-making actually scaled

01:01:12.980 --> 01:01:16.320
on these tasks, what their problems were, and on the basis of that,

01:01:16.400 --> 01:01:19.640
you have formulated an alternative where you say, look, actually,

01:01:19.640 --> 01:01:21.820
all these other models that have been very popular in literature,

01:01:22.360 --> 01:01:26.060
they might be very interesting, but they fail on really explaining what this

01:01:26.060 --> 01:01:28.780
frontal area is doing. So what was the trajectory there exactly?

01:01:30.800 --> 01:01:38.200
You mean okay the way we you mean your question is about the way we end up with

01:01:38.200 --> 01:01:39.960
this monad yeah exactly exactly,

01:01:42.500 --> 01:01:48.540
yeah this is always this is still the same process starting with very simple,

01:01:49.320 --> 01:02:00.140
stuff and try to explain things with very simple monads so we have and,

01:02:03.421 --> 01:02:07.561
It's difficult to explain. I mean… Well, maybe as a hint, you know.

01:02:07.601 --> 01:02:12.161
So the point was basically we're saying, okay, we look at the prefrontal cortex, right?

01:02:12.241 --> 01:02:14.541
So what can it really do in a task?

01:02:14.801 --> 01:02:20.061
Well, it can decide to stay. It just keeps on executing, following the same

01:02:20.061 --> 01:02:21.381
rule because it has been successful.

01:02:22.261 --> 01:02:26.021
Secondly, it might decide to switch rule because things are failing.

01:02:27.261 --> 01:02:30.101
Or it can decide, okay, I don't know what to do.

01:02:30.161 --> 01:02:36.421
I better explore. something new has to happen right so and I think it was it

01:02:36.421 --> 01:02:38.601
was that consideration that really gave rise to

01:02:38.801 --> 01:02:45.321
the model that that yes so at the very beginning this is an intuition about that,

01:02:46.801 --> 01:02:51.181
something that was largely overlooked in the literature is this exploration

01:02:51.181 --> 01:02:58.721
process there are models there are a few models that explain how you switch in exploration.

01:03:00.261 --> 01:03:03.141
Which means that basically for example you learn

01:03:03.141 --> 01:03:06.681
something and at the point the system the monitoring system tells that you need

01:03:06.681 --> 01:03:10.541
to switch and in this model you just reset everything and you will start from

01:03:10.541 --> 01:03:21.661
scratch but in the recharge there was no model explaining how you switch out

01:03:21.921 --> 01:03:25.101
from exploration and uh

01:03:25.101 --> 01:03:27.821
that is

01:03:27.821 --> 01:03:30.621
at some point uh you may

01:03:30.621 --> 01:03:33.461
explore but you may when you explore

01:03:33.461 --> 01:03:38.641
you suddenly uh you

01:03:38.641 --> 01:03:43.201
suddenly uh notice that you can

01:03:43.201 --> 01:03:47.041
switch back to return to something you know so you

01:03:47.041 --> 01:03:49.821
quit exploration and there was no model

01:03:49.821 --> 01:03:56.981
on this so that was the basic idea that the basic intuition about okay we really

01:03:56.981 --> 01:04:02.701
need a model that explain how we decide to to explore something new and how

01:04:02.701 --> 01:04:07.041
we decide to quit exploration to re-explore what you already know.

01:04:09.570 --> 01:04:16.490
This was intuition at the very beginning. The second intuition that we also need to understand,

01:04:18.510 --> 01:04:22.770
the second intuition was about learning. So learning means, I already said a

01:04:22.770 --> 01:04:27.770
little bit about that before, learning means experiencing negative feedback,

01:04:29.350 --> 01:04:31.690
which means that you need to persevere it.

01:04:33.610 --> 01:04:37.490
Also, you get negative feedback and you want to switch.

01:04:37.490 --> 01:04:41.030
So it means that there needs to be a system that pushes you to learn,

01:04:42.950 --> 01:04:48.970
and and conversely this same system can,

01:04:50.550 --> 01:04:53.910
consider that at some point it's no more valuable to perseverate,

01:04:54.070 --> 01:04:58.010
too much negative feedback or whatever and then you need to switch,

01:04:58.150 --> 01:05:03.430
so this is this intuition at the very beginning that helps us to,

01:05:05.550 --> 01:05:06.730
at least to.

01:05:09.750 --> 01:05:15.870
To set the problem, to, to, to set the problem. I mean, to, to raise the problem.

01:05:16.210 --> 01:05:19.550
And I think in science, it's very important to be able to raise problem.

01:05:20.550 --> 01:05:29.370
Right. And to, and to, to delimit, to circumvent a given problem,

01:05:29.590 --> 01:05:31.670
to raise this problem. And then after,

01:05:34.270 --> 01:05:37.570
given this wish, issue, how you explore and you quit exploration,

01:05:37.850 --> 01:05:41.550
how you persevered to learn or switch when it's no more valuable to learn.

01:05:43.570 --> 01:05:50.390
I mean, we start developing a model and at the same time an experiment and.

01:05:54.610 --> 01:06:00.630
So we develop a model based on our ideas, our intuitions and then we test it

01:06:00.630 --> 01:06:05.670
with this experiment and of course on this experiment we tested whether some

01:06:05.670 --> 01:06:09.190
more simple or more regular monad were able to explain the performance.

01:06:11.450 --> 01:06:13.950
And as our intuition,

01:06:16.530 --> 01:06:17.650
I mean,

01:06:20.250 --> 01:06:26.410
provided us some cues, I mean, we were able to show that in this kind of experiment,

01:06:26.770 --> 01:06:29.290
I mean, regular monads that have no exploration.

01:06:32.030 --> 01:06:43.210
Capabilities or no I mean cannot explain the data and if you and conversely

01:06:43.210 --> 01:06:47.710
if you consider very sophisticated model I mean normative model in the sense

01:06:47.710 --> 01:06:51.090
that statistical learning model very sophisticated model,

01:06:52.130 --> 01:06:57.170
they perform the task of course but they outperform human performance they don't

01:06:57.170 --> 01:07:00.170
explain human performance What's the difference there?

01:07:00.650 --> 01:07:02.410
In what sense do they outperform humans?

01:07:04.835 --> 01:07:13.795
So first, they are able to adjust to uncertainty and to the viability of their

01:07:13.795 --> 01:07:16.235
environment much faster than humans.

01:07:16.775 --> 01:07:26.315
And they basically are able to use every kind of, every piece of information

01:07:26.315 --> 01:07:31.615
to inform about what should be learned and what should be, when to switch.

01:07:31.615 --> 01:07:35.415
Which in a way, which is, of course, impossible.

01:07:36.115 --> 01:07:40.615
So this was this Dirichlet optimal agent, right? This was your criterion.

01:07:41.275 --> 01:07:45.295
But does it have access to other information that humans don't have access to?

01:07:45.475 --> 01:07:47.435
No, no, no. Where does the difference come from?

01:07:48.555 --> 01:07:51.615
One of the major differences is that this kind of monar.

01:07:53.635 --> 01:07:58.675
They will, for example, explore at some point, create a new strategy or a new task set.

01:07:59.495 --> 01:08:02.095
But then later on, they will get a feedback.

01:08:03.075 --> 01:08:11.695
And whenever they get a new feedback, they revise all the history of creating a set, new strategy.

01:08:12.255 --> 01:08:15.675
So every time they get new information, they revise the entire history.

01:08:15.895 --> 01:08:21.115
So they memorize the entire history and they try to find, given any new information,

01:08:21.295 --> 01:08:25.415
what would be the best history. Right. Right.

01:08:26.594 --> 01:08:30.794
And that's why they are very powerful. Sure. But what I found interesting,

01:08:30.954 --> 01:08:35.874
though, if you compare their performance to the human, indeed,

01:08:35.874 --> 01:08:39.434
in most task conditions, they were better than humans.

01:08:39.714 --> 01:08:45.514
But there were other task conditions where humans actually outperformed this optimal agent.

01:08:47.954 --> 01:08:54.874
I'm not sure you're right. Okay. No, they don't really outperform. I'm not sure.

01:08:55.234 --> 01:08:57.274
Yeah, I remember this question.

01:08:59.794 --> 01:09:03.314
I saw it in one of your plots, right? To me, you have these two conditions.

01:09:03.854 --> 01:09:08.094
There's sort of a recurrent task and there's an open task.

01:09:09.254 --> 01:09:15.474
And then in one of these, even though the optimal agent can always adjust its

01:09:15.474 --> 01:09:18.894
full policy space to any outcome, come,

01:09:18.954 --> 01:09:25.174
you still saw that humans with their assumingly more restricted capabilities

01:09:25.174 --> 01:09:27.474
would still outperform these optimal agents.

01:09:28.354 --> 01:09:30.174
So I was wondering whether this had to do with, for instance,

01:09:30.214 --> 01:09:33.954
perceptual capabilities that such agents don't have and humans do have.

01:09:37.834 --> 01:09:41.134
I'm sorry, but I think you are wrong. I don't see this in my graphs.

01:09:41.414 --> 01:09:45.714
Maybe we should have the graph. Yeah. Okay. We'll look at that later.

01:09:45.914 --> 01:09:50.274
That's very good. But then you also compare two standard algorithms like reinforcement

01:09:50.274 --> 01:09:56.974
learning models where you would basically learn policies given the feedback

01:09:56.974 --> 01:09:58.554
that you receive from your environment.

01:09:58.774 --> 01:10:02.694
So why does a standard reinforcement learning model fail in this task?

01:10:04.694 --> 01:10:11.834
So it fails because this model is just adjusted continuously to the new contingency.

01:10:11.994 --> 01:10:15.174
So in this model, there is no memory. Mm-hmm.

01:10:16.269 --> 01:10:19.989
It's just you learn something, it works, and when it no more works,

01:10:20.049 --> 01:10:23.109
you just unlearn this and relearn something new.

01:10:23.429 --> 01:10:31.409
But you never store some specific mapping you learn.

01:10:31.629 --> 01:10:35.329
Right. But if I would have a reinforcement learning model that would just store

01:10:35.329 --> 01:10:39.869
its different mapping so it knows when to switch tasks, it would be appropriate

01:10:39.869 --> 01:10:42.289
possibly for this task, for this problem.

01:10:43.329 --> 01:10:46.169
Of course but then you need a monitoring system to know

01:10:46.169 --> 01:10:49.249
when to switch okay right and this

01:10:49.249 --> 01:10:53.549
is exactly the point so you start to need a monitoring system and

01:10:53.549 --> 01:10:59.109
uh to monitor these different mapping you store and then you need a system also

01:10:59.109 --> 01:11:03.569
that decides that uh okay you have learned all this mapping but you need to

01:11:03.569 --> 01:11:11.389
maybe learn now a new one right and and explore a new one and we end up with our model, basically.

01:11:11.829 --> 01:11:15.729
Right. Of course, it raises the question whether it was a completely fair comparison

01:11:15.729 --> 01:11:21.809
or whether it was more like a straw man because you knew a priori that that

01:11:21.809 --> 01:11:23.909
model would fail given the task conditions.

01:11:24.389 --> 01:11:27.089
Of course, of course. And the task was built to make this model fail.

01:11:27.129 --> 01:11:31.589
Otherwise, we would not have developed this model. Right. But it made the point, right?

01:11:32.369 --> 01:11:37.809
Yeah, it makes the point. It's just a pedagogical way to show.

01:11:38.349 --> 01:11:42.409
But what was interesting with the reinforcement I don't know whether you notice

01:11:42.409 --> 01:11:49.449
that it captures the overall dynamics you see I mean it doesn't capture qualitatively

01:11:49.449 --> 01:11:50.929
the difference between conditions,

01:11:51.889 --> 01:11:57.969
when there is a recurrent mapping that can be used or no recurrent mapping open

01:11:57.969 --> 01:12:01.909
condition but still captures the overall dynamic of adaptation.

01:12:03.509 --> 01:12:08.969
Which and I think it's interesting it shows that Uh...

01:12:10.409 --> 01:12:16.449
The very important things about discriminating between models are in the details.

01:12:18.149 --> 01:12:25.549
Small difference at specific points that are very informative about what subjects do or not.

01:12:26.809 --> 01:12:30.589
Because the basic reinforcement model captures the overall dynamics,

01:12:31.809 --> 01:12:36.069
which is actually an artifact of averaging across episodes. Zs. Right.

01:12:37.589 --> 01:12:41.109
But now in the model you proposed, that was your alternative,

01:12:41.269 --> 01:12:49.089
you have, let's say, a Bayesian inference process that sort of is trying to

01:12:49.089 --> 01:12:50.549
figure out what's going on in the world.

01:12:50.829 --> 01:12:55.109
How well do my policies probably match to this, right? Because also my policies

01:12:55.109 --> 01:12:56.749
are tied to states of the world.

01:12:57.449 --> 01:13:02.789
Then you have a hypothesis testing component and you have an exploration component,

01:13:03.229 --> 01:13:07.029
right? So how do these three components really work together in your model?

01:13:09.469 --> 01:13:14.109
If you want to isolate components, there are three components.

01:13:14.209 --> 01:13:21.129
The first component is the inferential buffer, the hypothesis testing component,

01:13:22.429 --> 01:13:28.969
and the way we built new strategy from long-term memory.

01:13:29.169 --> 01:13:32.829
This is the third component. Of course, they are intrinsically linked,

01:13:32.949 --> 01:13:44.529
but so the idea is that you can monitor only a small number of concurrent strategy or policies.

01:13:46.669 --> 01:13:52.649
So this is a constraint we put on the model.

01:13:55.159 --> 01:13:58.299
Then you need to update this small collection.

01:13:58.859 --> 01:14:05.779
And so to update this small collection, we consider that the best way is to

01:14:05.779 --> 01:14:07.519
have an hypothesis system.

01:14:07.619 --> 01:14:10.679
I test a new policy.

01:14:11.899 --> 01:14:17.499
I start monitoring this new policy. If it seems to be reliable at the end,

01:14:17.619 --> 01:14:19.659
I continue to monitor it.

01:14:19.739 --> 01:14:28.699
Or if it's not competitive compared to other strategy, or I just don't need

01:14:28.699 --> 01:14:30.459
to monitor it, I can discard it.

01:14:31.819 --> 01:14:38.719
So this was the idea that hypothesis testing is important to update this monitoring

01:14:38.719 --> 01:14:41.579
buffer because it cannot monitor everything.

01:14:42.939 --> 01:14:49.119
And then, of course, when you decide to go for a strategy which is not in the

01:14:49.119 --> 01:14:54.259
monitoring buffer for a policy, you see, you just need to go first.

01:14:54.479 --> 01:14:58.059
I mean, you go to your long-term memory, but you don't monitor things here.

01:14:58.179 --> 01:15:03.899
So the only way you can build something from your long-term memory is to have

01:15:03.899 --> 01:15:11.419
a weighted mixture of this long-term memory, weighted by some queues,

01:15:12.259 --> 01:15:15.219
given some queues, and given, of course, some internal models.

01:15:16.099 --> 01:15:20.579
And you build a new strategy from your long-term memory like that, and you try it.

01:15:22.439 --> 01:15:28.399
And all these three components are important for the author,

01:15:29.939 --> 01:15:34.379
so to build a new strategy is like I have a long term memory system,

01:15:34.639 --> 01:15:39.639
in this long term memory system I have different policies which effectively means I have,

01:15:40.339 --> 01:15:44.439
stages in every policy where I have a certain sensory state and a certain action

01:15:44.439 --> 01:15:48.179
that goes with it and then I say and therefore and the future should look like

01:15:48.179 --> 01:15:52.639
this and if I keep on doing this I follow this chain I get some reward, right?

01:15:52.819 --> 01:15:57.499
And then you say, well, but what I can do, I can actually now just cut out bits

01:15:57.499 --> 01:16:02.859
and pieces of all these different policies of long-term memory and build a new one, try a new one.

01:16:03.599 --> 01:16:12.879
So what would be the key criterion to perform a selection on the pool of segmented, if you want, policies?

01:16:13.899 --> 01:16:20.399
So the way it works in our model is that, But first, it's not primarily a model

01:16:20.399 --> 01:16:23.459
of long-term memory, the model I described.

01:16:24.639 --> 01:16:29.699
So, of course, it's processed or is described in the model in a quite simplistic

01:16:29.699 --> 01:16:32.399
way. But still, there are some important ingredients.

01:16:35.479 --> 01:16:42.139
So, one is that actually a policy which is stored in long-term memory is always stored with some,

01:16:43.843 --> 01:16:49.623
internal representations that link this representation to external queue.

01:16:49.843 --> 01:16:51.843
Some contextual queue, if you want.

01:16:52.223 --> 01:16:57.983
Which means that in the process of mixing strategy from long-term memory,

01:16:58.223 --> 01:17:00.663
it's always weighted by the contextual queue.

01:17:01.543 --> 01:17:06.163
So it means that in a given context, when you are in a given context and you create a new strategy,

01:17:07.023 --> 01:17:14.683
the way the strategy combines in this mixing process may be different from a

01:17:14.683 --> 01:17:20.303
mixture done in a different context.

01:17:21.203 --> 01:17:24.523
In other contexts. I don't know whether you see what I mean.

01:17:25.663 --> 01:17:29.663
Because every strategy is taught in long-term memory with some contextual model.

01:17:31.243 --> 01:17:35.183
That encodes the relevance of the strategy within this context.

01:17:35.483 --> 01:17:41.103
When you are in a given context, context, your marginalization of your strategy

01:17:41.103 --> 01:17:46.843
in long-term memory could be rather different in one context than another one.

01:17:46.843 --> 01:17:48.983
To me, it's clear, right? Because I have different tasks.

01:17:49.563 --> 01:17:53.803
Let's say one can be playing football and the other one can be playing tennis, right?

01:17:53.923 --> 01:17:56.903
So these are different tasks, different rules. And now, dependent on the context

01:17:56.903 --> 01:18:01.223
I'm in, I'm either playing tennis or football, there's a different subset of

01:18:01.223 --> 01:18:06.703
policies I should now start to worry about to improve my football game.

01:18:06.703 --> 01:18:11.563
But, yeah, of course, but it could be more subtle than that.

01:18:11.903 --> 01:18:16.243
Of course, I mean, if you are trying a new recipe in your kitchen,

01:18:16.283 --> 01:18:18.923
you are not going to use a policy on the football.

01:18:19.483 --> 01:18:22.323
Exactly right. For sure. Exactly right. This is, I would say,

01:18:22.363 --> 01:18:31.223
the most evident way, but it could be more subtle because the system can have some links.

01:18:31.223 --> 01:18:38.823
For example, if you have some maze, for example, and there are some queues in

01:18:38.823 --> 01:18:42.403
the maze that are related to some strategy, when you combine them,

01:18:42.643 --> 01:18:47.183
then this internal model linking external queue or contextual queue to some

01:18:47.183 --> 01:18:54.123
strategy could make the combinations at the end when you create a new strategy quite subtle.

01:18:55.043 --> 01:18:59.823
Right, I understand. And quite unexpected. Sure. Sure, but what I found interesting

01:18:59.823 --> 01:19:02.863
is that on the one hand you're saying, well, so now in my long-term memory,

01:19:03.003 --> 01:19:07.883
I have this enormous space of policies, and this is stuff I did in the past, right?

01:19:07.983 --> 01:19:10.123
So this is how I have acquired these.

01:19:10.543 --> 01:19:12.663
Yeah, along my lifetime. Exactly.

01:19:13.283 --> 01:19:19.403
And then you said, but now if I invent or create a new sequence,

01:19:19.523 --> 01:19:25.883
a new policy or hypothesis on a policy, I actually want to equalize the outcome expectation.

01:19:26.703 --> 01:19:29.303
I just, this is what you call the dumb strategy. You say, look,

01:19:29.423 --> 01:19:33.403
if I now invent a bunch of new policies I can try out,

01:19:34.454 --> 01:19:38.374
The predicted outcomes that I associate with them, I just set to the same uniform

01:19:38.374 --> 01:19:42.614
level, as if I'm randomizing this outcome space.

01:19:43.174 --> 01:19:46.534
So why are you doing that? Why is that necessary?

01:19:48.714 --> 01:19:53.234
Yeah, so the way we describe things are not exactly correct. Okay.

01:19:55.134 --> 01:20:05.094
You need to have, so as you said before, the key point is what are the criteria

01:20:05.094 --> 01:20:07.874
for monitoring the relevance of strategy.

01:20:08.834 --> 01:20:14.354
And of course, in a situation where you know all possible strategy,

01:20:14.714 --> 01:20:16.434
your criteria is quite simple.

01:20:16.434 --> 01:20:24.854
For example, you just compare, let us say, how well each strategy predicts the next state,

01:20:25.014 --> 01:20:31.654
and you will, at the end, find what is the strategy appropriate to this situation,

01:20:31.834 --> 01:20:34.374
because you know all possible strategies,

01:20:34.614 --> 01:20:38.114
which is, of course, never happened in real life.

01:20:38.114 --> 01:20:45.654
And this means that it's very difficult to judge the relevance of a strategy

01:20:45.654 --> 01:20:49.594
because you don't know even all the alternatives.

01:20:49.594 --> 01:20:54.214
So you need to have at some point.

01:20:55.614 --> 01:21:03.334
An estimation about in the different strategy I am monitoring which is the probability

01:21:03.334 --> 01:21:08.034
that actually the the true strategy,

01:21:09.174 --> 01:21:17.694
doesn't belong to the one I am monitoring and to be able to compute this probability

01:21:17.694 --> 01:21:21.894
probability, exactly, you cannot.

01:21:22.354 --> 01:21:24.474
It's an intractable problem.

01:21:25.214 --> 01:21:30.194
But you can estimate this probability as saying that, okay, the probability

01:21:30.194 --> 01:21:34.294
that actually the right strategy doesn't belong to the monitor strategy,

01:21:34.574 --> 01:21:39.334
you can estimate it by using, as you said in your question,

01:21:39.734 --> 01:21:45.934
this do-me strategy that if I perform randomly, I will get this outcome.

01:21:45.934 --> 01:21:54.254
And monitoring the relevance of this random strategy is an estimation of the

01:21:54.254 --> 01:21:57.854
priorities that the true strategy is not in your monitoring strategy. Mm-hmm.

01:21:59.273 --> 01:22:02.813
But there's an alternative. It's a trick. It's a trick. I understand.

01:22:03.793 --> 01:22:08.213
And also it keeps things simple in some sense. Very simple. But I could also

01:22:08.213 --> 01:22:10.453
argue, look, let's take the cooking scenario.

01:22:11.393 --> 01:22:16.113
So here we have the policy space or the strategy space of cooking.

01:22:16.673 --> 01:22:22.773
And now certain sequences of actions have been more successful in the past than

01:22:22.773 --> 01:22:24.753
in your kitchen activities and others.

01:22:24.753 --> 01:22:32.073
So now if I assemble a new policy, taking elements of these other sequences,

01:22:32.373 --> 01:22:36.393
by picking them out of an existing policy that has been tested,

01:22:36.853 --> 01:22:41.913
I can make an inference about their probability to have a certain impact on outcome.

01:22:41.913 --> 01:22:44.993
Outcome for instance let's say you have a long sequence

01:22:44.993 --> 01:22:47.793
and have something an event sitting real

01:22:47.793 --> 01:22:50.753
really far away from the end state of that sequence i could

01:22:50.753 --> 01:22:53.833
say well the probability that that event is is

01:22:53.833 --> 01:22:58.713
really having a big impact on outcome is probably low however if having an event

01:22:58.713 --> 01:23:01.513
that is close to this endpoint you could say well that probability is higher

01:23:01.513 --> 01:23:06.453
i think it depends on the level of automatization okay even if you have a very

01:23:06.453 --> 01:23:10.993
complex strategy but every time you start with the first action,

01:23:11.093 --> 01:23:14.513
you will get at the very end this reward, this outcome.

01:23:14.993 --> 01:23:21.213
Then probably in your system I mean the link between the first action and the

01:23:21.213 --> 01:23:22.293
outcome would be very strong.

01:23:23.533 --> 01:23:31.473
Can't I exploit that information in assembling a new policy and making at least

01:23:31.473 --> 01:23:34.573
an estimate of its outcome?

01:23:35.933 --> 01:23:39.533
For instance, I can take elements of very successful policies,

01:23:39.633 --> 01:23:42.753
so the probability that the new policy will be successful is high,

01:23:42.873 --> 01:23:47.853
or it can take elements of really policies that are not that great.

01:23:48.133 --> 01:23:50.633
You see the point? Yeah, I see exactly your point.

01:23:51.633 --> 01:23:54.813
I think it's a very complex issue, especially.

01:23:58.373 --> 01:24:07.633
I think this process is reasonable. I mean, it's tractable. if you can project

01:24:07.633 --> 01:24:12.173
your strategy on some topological space. Mm-hmm.

01:24:13.246 --> 01:24:21.626
It will work. For example, we know that many spaces have a very dedicated system

01:24:21.626 --> 01:24:22.706
for spatial navigation.

01:24:24.486 --> 01:24:30.646
And it's possible that within this topological system, this very specific system

01:24:30.646 --> 01:24:31.766
representing the space,

01:24:32.986 --> 01:24:41.066
because of the topology, you may somehow combine things according to this topological space.

01:24:41.386 --> 01:24:47.406
Right. But it could be also for audition, for other system.

01:24:48.486 --> 01:24:54.746
But in general, I don't think there is a system outside this topological,

01:24:54.946 --> 01:24:56.866
I would say sensory space.

01:24:58.526 --> 01:25:04.846
That allow you to recombine in a clever way strategies. Okay.

01:25:05.566 --> 01:25:08.826
So this would be an empirical hypothesis we could test. Yeah.

01:25:08.826 --> 01:25:16.306
But then, so with your model, which gives a very specific prediction on how you can form,

01:25:16.386 --> 01:25:21.026
let's say, a new hypothesis so that you can deal with this task condition of

01:25:21.026 --> 01:25:24.946
staying or switching on the basis of exploration, because this is the problem

01:25:24.946 --> 01:25:26.026
you want to solve, right?

01:25:26.206 --> 01:25:30.826
You found actually an amazing close match with human performance.

01:25:31.906 --> 01:25:34.846
So but now so also humans

01:25:34.846 --> 01:25:37.566
were exposed to a task but they follow rules but the rules were

01:25:37.566 --> 01:25:42.046
switched so they had to sort of figure out what the new rule was but how did

01:25:42.046 --> 01:25:46.486
you assess this consistency between the model and human performance which were

01:25:46.486 --> 01:25:51.966
the aspects of human performance that were most now predictive if you want of

01:25:51.966 --> 01:25:55.606
the consistency with the model so yes.

01:25:57.786 --> 01:26:03.246
So So first, yeah, it's a very important question.

01:26:03.946 --> 01:26:10.806
It's basically how you fit a monor and how you compare model fit. Mm-hmm.

01:26:12.912 --> 01:26:18.372
It's a complex issue. At the end, we end up with this model.

01:26:18.452 --> 01:26:23.852
This model has a nice feature, is that it predicts some specific events,

01:26:25.452 --> 01:26:28.492
algorithmic events, which is predicted by the algorithm.

01:26:28.752 --> 01:26:33.392
That is, at some point, the model will create a new strategy from long-term

01:26:33.392 --> 01:26:36.012
memory. In this trial, the model predicts this.

01:26:37.072 --> 01:26:44.952
And of course, it implies a given profile in the response given by the model

01:26:44.952 --> 01:26:48.372
following this time point.

01:26:50.332 --> 01:26:55.152
It also predicts other types of events, very specific events,

01:26:55.952 --> 01:26:58.272
that is, at some point in a given trial,

01:27:00.192 --> 01:27:05.752
the algorithm switches out from exploration to return to exploitation by confirming

01:27:05.752 --> 01:27:07.932
the hypothetical strategy.

01:27:09.172 --> 01:27:12.792
And of course, it's associated with a given profile of response.

01:27:14.692 --> 01:27:22.252
The model also predict another kind of event that the hypothetical strategy needs to be rejected.

01:27:23.232 --> 01:27:27.832
So the algorithm, there are some specific algorithmic events,

01:27:28.132 --> 01:27:37.332
and we can check whether this, how the model perform around these algorithmic events.

01:27:38.712 --> 01:27:44.592
Predict what subject how subject perform around these events.

01:27:45.452 --> 01:27:51.932
Right. But it's important to know that these events are pure theoretical construct.

01:27:52.772 --> 01:27:58.612
Right. They are not in the experiment. They are not manipulated by the experimenter. Right, exactly.

01:27:58.952 --> 01:28:03.812
Yes. The model, the algorithm tells you, okay, in this triangle subjects should

01:28:03.812 --> 01:28:07.792
have set a new hypothetical strategy.

01:28:08.112 --> 01:28:11.872
Right, exactly. And the model behaves in this way around this event.

01:28:13.012 --> 01:28:20.192
So the way we check is how subject perform in the way the model perform around this predicted trial.

01:28:20.832 --> 01:28:26.612
And we found that it was reasonably well the case. Right.

01:28:27.252 --> 01:28:31.312
And this is the way, I think this is a very good test about the model.

01:28:31.392 --> 01:28:36.752
The model predicts that in this event, this is what should happen and you observe

01:28:36.752 --> 01:28:44.212
it right but now what you found is that also these hypothetical,

01:28:44.612 --> 01:28:49.392
no the theoretical constructs that you use now to interpret the performance

01:28:49.392 --> 01:28:56.012
of the human subjects actually matched again amazingly well on your fMRI signatures,

01:28:56.492 --> 01:29:01.092
so what were the outstanding the most salient effects that you found there.

01:29:02.678 --> 01:29:10.878
So for me, the most exciting effect and the most salient was about a specific

01:29:10.878 --> 01:29:15.218
type of algorithmic events, which is confirmation events.

01:29:16.398 --> 01:29:20.578
What is a confirmation event? A confirmation event is when the model,

01:29:20.878 --> 01:29:31.178
after creating a new strategy, decides that it continues with it.

01:29:31.198 --> 01:29:34.478
It confirms it. It confirms the assumption, the hypothesis.

01:29:35.578 --> 01:29:40.198
And because it confirms, it just means that it proceeds with it and continues

01:29:40.198 --> 01:29:41.978
to proceed, to use it for behaving.

01:29:42.578 --> 01:29:45.598
Which means that in the behavior, you cannot see this event.

01:29:45.878 --> 01:29:49.198
There is no marker, no signature in the behavior about this event.

01:29:50.958 --> 01:29:54.818
But of course, if this algorithm is really implemented in the brain,

01:29:54.818 --> 01:30:02.718
There should be an event in the brain that marks this time,

01:30:02.898 --> 01:30:06.658
this trial, when this strategy is confirmed. Mm-hmm.

01:30:07.467 --> 01:30:12.447
And we found actually a correlate of this event, of this algorithmic event,

01:30:12.647 --> 01:30:14.367
in the ventral striatum.

01:30:14.467 --> 01:30:19.307
So in the region that is really known to process rewards.

01:30:21.127 --> 01:30:25.687
So we have, and I think this is what could be nice with fMRI,

01:30:25.847 --> 01:30:31.087
that you have a purely internal neuronal event that is predicted by the model,

01:30:31.347 --> 01:30:34.647
at least the time it should happen.

01:30:34.647 --> 01:30:42.467
And that you cannot see in the behavior.

01:30:43.607 --> 01:30:48.287
But then your interpretation would be that this confirmation event triggers

01:30:48.287 --> 01:30:55.167
a ventral striatum to modulate, let's say, neuromodulatory signals to modulate memory.

01:30:55.247 --> 01:30:58.567
To cortical areas that memorize the internal. Right, exactly.

01:30:58.767 --> 01:31:01.167
So this is your interpretation of the signal. Right, exactly.

01:31:01.167 --> 01:31:09.107
So basically we know because the reliability of strategy are monitored within

01:31:09.107 --> 01:31:11.567
the ventromedial prefrontal context.

01:31:11.707 --> 01:31:15.207
And we know that this region projects to the ventral striatum. Right, exactly.

01:31:15.447 --> 01:31:18.787
So what is a confirmation event? It is when the probe actor,

01:31:18.987 --> 01:31:25.547
the hypothesis, passes from an unreliable to a reliable state.

01:31:25.787 --> 01:31:30.247
So it's a transition. It's a transition from unreliability to reliability.

01:31:31.167 --> 01:31:36.067
And probably my interpretation is that this transition is conveyed to the ventral

01:31:36.067 --> 01:31:39.607
striatum, and the ventral striatum use it as,

01:31:39.747 --> 01:31:49.307
or transform it as a reinforcer signal that is dispatched to regions that memorize the strategy.

01:31:49.307 --> 01:31:53.627
But this also means that the subject is performing the task.

01:31:53.887 --> 01:31:58.267
So it just tries out something. Now it's a hypothesis. There's no idea about the outcome.

01:31:59.387 --> 01:32:03.427
The action is successful. So there's positive reinforcement.

01:32:03.747 --> 01:32:04.907
Now we have an unexpected reward.

01:32:05.527 --> 01:32:12.147
So there we go with a reward signal. Is that roughly a correct interpretation as well?

01:32:15.489 --> 01:32:18.449
No because we can we can show that uh

01:32:18.449 --> 01:32:21.849
this effect of confirmation is goes

01:32:21.849 --> 01:32:24.829
in addition to the effect of having a positive or

01:32:24.829 --> 01:32:30.249
negative reward okay so it's an additional effect okay it's a with their own

01:32:30.249 --> 01:32:35.009
sources you're saying so it would be like an externally triggered event which

01:32:35.009 --> 01:32:39.689
is whatever happens in the task and there's an internally more cognitively dependent

01:32:39.689 --> 01:32:43.689
effect yes so basically Basically, there is an external signal,

01:32:43.869 --> 01:32:47.709
the external reward that is, of course, processed in the ventral three atoms

01:32:47.709 --> 01:32:49.809
as positive, negative, expected or not expected.

01:32:50.209 --> 01:32:53.309
And there is this internally driven feedback.

01:32:55.529 --> 01:33:00.849
It's an internal feedback. It's a cognitive feedback from the monitoring system,

01:33:01.149 --> 01:33:09.069
which is the anterior prefrontal regions, that provides this internal feedback,

01:33:09.229 --> 01:33:10.069
this cognitive feedback.

01:33:10.169 --> 01:33:15.389
I understand. But now, in some sense, to sort of close a little bit this part

01:33:15.389 --> 01:33:23.609
of the discussion, I could argue that, well, your tasks are cognitively encapsulated.

01:33:23.709 --> 01:33:27.489
That means, in some sense, you focus very much on this frontal area,

01:33:27.709 --> 01:33:30.769
almost in disconnection from everything else, right?

01:33:30.769 --> 01:33:36.989
So that means in some sense you are forced to think about hypothesis development

01:33:36.989 --> 01:33:40.629
and hypothesis testing as a pure internal cognitive act.

01:33:40.929 --> 01:33:44.429
But that's why I emphasized earlier this operational aspect.

01:33:44.509 --> 01:33:48.269
If I'm a behaving agent, this is how we are modeling problem solving and so

01:33:48.269 --> 01:33:52.489
on, actually I'm also testing hypotheses out in the world.

01:33:52.489 --> 01:33:58.129
And that in itself would allow me to build new policies from long-term memory,

01:33:58.209 --> 01:34:00.309
not by internal recombination,

01:34:00.569 --> 01:34:05.489
but by playing it out in the real world and building new memories following

01:34:05.489 --> 01:34:07.829
the standard procedures of memory.

01:34:07.989 --> 01:34:13.009
So without having to assume an additional layer… You mean in a totally unsupervised way?

01:34:13.329 --> 01:34:18.129
Well, without having to rely on internal cognitive hypothesis generation.

01:34:18.129 --> 01:34:22.649
I could just say, look, I'm here in the world, I want to explore stuff,

01:34:22.869 --> 01:34:29.629
so I'm going to allow different policies to play out or dominate my actions in some sequence.

01:34:29.749 --> 01:34:33.009
So now, as an end result, I have constructed a new policy.

01:34:33.149 --> 01:34:37.449
I've built a recombination, a recombinant of all of them, which you can now…

01:34:37.449 --> 01:34:43.189
Yeah, you mean, Parfait, if you behave randomly, you build a new… Well,

01:34:43.209 --> 01:34:45.489
it would be a very extreme way to do it. Yeah, it's extreme.

01:34:45.649 --> 01:34:47.669
But yes. Um...

01:34:51.281 --> 01:34:59.721
What I think you always monitor your own behavior.

01:35:01.401 --> 01:35:06.381
The monitoring system that monitors your behavior is always active.

01:35:07.121 --> 01:35:11.241
It seems to be very important for the organism.

01:35:12.101 --> 01:35:17.181
So I think this system is always there. The question is whether you monitor

01:35:17.181 --> 01:35:19.181
alternative strategy all the time.

01:35:19.261 --> 01:35:21.681
And explicitly, right? And explicitly, yeah.

01:35:25.761 --> 01:35:31.261
Explicitly is a good question. It's not sure that it's explicit for the subjects.

01:35:33.501 --> 01:35:41.501
It's an interesting question because as we found, we found that actually this, you are able to monitor,

01:35:42.401 --> 01:35:48.581
two, three alternative strategies in addition to your strategies you are using to act.

01:35:50.641 --> 01:35:55.741
I'm not sure that subjects are aware about or explicitly aware about having

01:35:55.741 --> 01:35:58.881
these three strategies in mind.

01:36:00.081 --> 01:36:03.941
It's a good question. I have no answer about that. But still,

01:36:04.181 --> 01:36:06.261
let us say that it's explicit.

01:36:08.441 --> 01:36:15.861
And I think you're right. We are not always monitoring alternative strategy. Right.

01:36:16.381 --> 01:36:20.181
Because it's probably quite effortful. Exactly right.

01:36:20.641 --> 01:36:24.081
Right, so you could also, this might be a method of last resort because you

01:36:24.081 --> 01:36:28.541
also could have, let's say, a more situated form of monitoring where you just

01:36:28.541 --> 01:36:30.621
say, look, let's take the football example.

01:36:30.721 --> 01:36:34.001
You could say, okay, I tried this in the past, it worked half,

01:36:34.181 --> 01:36:37.321
so I use only part of this policy and I just try it out.

01:36:37.921 --> 01:36:42.361
Okay, you just try it out in the world and now the world responds to whatever you did.

01:36:42.961 --> 01:36:46.381
So I have been monitoring if you want, but in a situated fashion. Right.

01:36:49.472 --> 01:36:51.112
By performing an experiment effectively.

01:36:54.912 --> 01:37:01.592
I think what you are describing here is just learning.

01:37:03.352 --> 01:37:08.112
It's just learning. It's just monitoring. You know?

01:37:08.572 --> 01:37:12.552
But it could give rise. But the funny thing is it can solve a problem you were

01:37:12.552 --> 01:37:13.632
solving with your monitoring.

01:37:13.632 --> 01:37:19.492
Because I can now have invented a new policy, building on known policies,

01:37:19.792 --> 01:37:24.832
but I have not relied on a complex internal memory-heavy process.

01:37:25.332 --> 01:37:29.232
I just played it out in the real world and indeed I learned and picked it up.

01:37:31.992 --> 01:37:40.952
Yes. Yeah, I think learning could be quite sophisticated. I mean,

01:37:40.992 --> 01:37:46.612
it's an embedded system that can be quite sophisticated.

01:37:49.212 --> 01:37:53.572
So as an alternative interpretation, you would leave that option open?

01:37:53.752 --> 01:37:57.692
No, I think this is too complementary system.

01:37:59.292 --> 01:38:03.932
But to explain the behavior of your subjects? No, you cannot.

01:38:05.452 --> 01:38:18.832
I mean, we have evidence in our experiments that we cannot explain our subject behavior using just,

01:38:19.052 --> 01:38:22.652
I mean, eliminating a monitoring system.

01:38:22.952 --> 01:38:29.492
Okay. So would you claim that all action depends on monitoring and rules?

01:38:32.359 --> 01:38:41.599
I think all your behavior is continuously monitored by your medial prefrontal system.

01:38:42.119 --> 01:38:48.799
And I think it's true not only in humans, but in many animals.

01:38:49.099 --> 01:38:56.759
And the first level of development is to have this system, this monitoring system.

01:38:56.879 --> 01:39:01.659
You monitor your behavior. So, I mean, the archaic system is let us say a system

01:39:01.659 --> 01:39:06.899
that just continuously adjusts like reference point learning to external contingency.

01:39:07.679 --> 01:39:13.479
Then the step, a bit more evolved than this very first step,

01:39:13.599 --> 01:39:16.739
is you monitor your behavior.

01:39:17.139 --> 01:39:23.619
And then the second step is you start to be able to monitor alternatives at the same time.

01:39:23.619 --> 01:39:35.419
So now to finish up our conversation, so we have really now this very deep understanding

01:39:35.419 --> 01:39:38.099
of decision-making and the frontal lobes.

01:39:38.219 --> 01:39:41.699
And this is based on a long track record of outstanding work,

01:39:41.819 --> 01:39:45.479
both experimental and theoretical, which I think makes it really so unique.

01:39:46.159 --> 01:39:52.159
So if we would like to follow in that tradition, what would be a chance law that we should follow?

01:39:52.259 --> 01:39:54.399
What's a chance law? To study the brain.

01:39:57.399 --> 01:39:57.859
Uh...

01:39:59.708 --> 01:40:02.728
My view is that,

01:40:03.728 --> 01:40:16.948
to do both experiments and modeling, to try to have in the same restricted team the two competencies.

01:40:17.648 --> 01:40:22.568
Because experiments are very important, because you can develop very nice model

01:40:22.568 --> 01:40:25.408
and very sophisticated, very beautiful model,

01:40:25.688 --> 01:40:34.748
mathematically beautiful model, and they are very satisfactory for our intellect,

01:40:34.928 --> 01:40:38.428
but actually they don't explain what humans do.

01:40:39.628 --> 01:40:45.688
So doing experiments forces you to move forward and not to stay in a comfortable

01:40:45.688 --> 01:40:50.488
way with very nice, beautiful models.

01:40:50.988 --> 01:40:55.668
And nature is not always perfect. Perfect.

01:40:55.748 --> 01:41:01.088
So this model, I mean, you need to develop models that are maybe less beautiful,

01:41:01.188 --> 01:41:04.348
but they are more efficient, more pragmatic.

01:41:05.068 --> 01:41:09.488
Okay. So the other thing is, so five years from now, I'm going to go visit you

01:41:09.488 --> 01:41:14.428
in Paris, and I'm going to confront you with a prediction you're going to make today.

01:41:15.088 --> 01:41:18.268
So I'm going to ask you, look, did this really work out?

01:41:18.688 --> 01:41:22.088
So what's this one prediction you're most passionate about today?

01:41:22.168 --> 01:41:29.728
Which is one prediction you would make? Like, okay, I would try to think about

01:41:29.728 --> 01:41:33.888
a prediction that is testable. Okay.

01:41:37.878 --> 01:41:41.838
I have one prediction that, a strong prediction from our data,

01:41:42.618 --> 01:41:49.778
is that humans cannot monitor more than three strategies at the same time,

01:41:49.818 --> 01:41:51.238
three, four strategies. Yes.

01:41:54.858 --> 01:42:00.638
Because I am interested about this prediction because this is what we found

01:42:00.638 --> 01:42:05.058
in our experiment, but maybe in other experiments it could be different.

01:42:05.058 --> 01:42:10.458
So I would like to know whether this capacity limit we found in our experiment

01:42:10.458 --> 01:42:17.398
is quite general, is it true or whether it's just an anecdotal finding,

01:42:18.458 --> 01:42:24.218
and I think yeah it's quite simple and I think it's testable Wonderful,

01:42:24.478 --> 01:42:27.838
so Etienne Costelan thank you very much for this conversation Thank you for

01:42:27.838 --> 01:42:31.178
your question, I really enjoyed them and enjoyed discussing with you.

01:42:32.080 --> 01:42:37.200
Music.

01:42:36.778 --> 01:42:42.638
The CSN podcast was produced by the Convergent Science Network of Biometrics

01:42:42.638 --> 01:42:49.378
and Biohybrid Systems, a project funded by the European Sevens Research Framework Programme.

01:42:50.558 --> 01:42:55.918
For more interviews, recorded lectures or upcoming conferences in the field

01:42:55.918 --> 01:43:02.158
of biometrics and biohybrid systems, go to csnnetwork.eu.

01:43:02.438 --> 01:43:04.318
And thank you for listening.

01:43:02.480 --> 01:43:10.000
Music.

01:43:05.058 --> 01:43:09.278
Thank you for watching!