WEBVTT

00:00:03.497 --> 00:00:10.497
This is the Convergent Science Network podcast. Leading researchers in the domain

00:00:10.497 --> 00:00:16.777
of neuroscience, brain theory and technology are interviewed by Paul Verschoor and Tony Prescott.

00:00:21.917 --> 00:00:27.617
Paul Verschoor with the Convergent Science Network podcast here at our Barcelona

00:00:27.617 --> 00:00:33.917
Cognition Brain Technology Summer School 2018, together with my colleague Tony Crescott.

00:00:34.157 --> 00:00:40.597
And we have Lars Mugli here. And Lars, welcome to the podcast.

00:00:41.597 --> 00:00:47.797
So Lars, you spoke about internal models and counterfactual cognition predicting

00:00:47.797 --> 00:00:54.097
our environment. And as the title might suggest, prediction was very much at

00:00:54.097 --> 00:00:55.797
the center of your presentation.

00:00:57.157 --> 00:01:02.937
So why do you believe that prediction gives us sort of this strategic lever

00:01:02.937 --> 00:01:05.717
to understand how the brain works?

00:01:06.797 --> 00:01:12.417
I think a few years back, I kind of tried to conceptualize everything I know

00:01:12.417 --> 00:01:16.797
so far from visual processing, let's say 10 years ago.

00:01:16.797 --> 00:01:25.737
And I thought that at some point we have a generally accepted narrative of a

00:01:25.737 --> 00:01:30.697
visual hierarchy that goes of increasing features represented,

00:01:31.137 --> 00:01:33.237
for example, in the visual system.

00:01:34.517 --> 00:01:41.737
And in this story, I felt that there is a particular capability missing.

00:01:41.737 --> 00:01:45.037
Missing um if let's say

00:01:45.037 --> 00:01:48.917
we are driving around in a car and um

00:01:48.917 --> 00:01:52.337
we this would be

00:01:52.337 --> 00:01:55.697
for our brains in this conventional narrative

00:01:55.697 --> 00:01:58.497
kind of an overdrive there's lots of

00:01:58.497 --> 00:02:02.197
varying stimuli there's lots of um

00:02:02.197 --> 00:02:05.517
signal that needs to be attended to that

00:02:05.517 --> 00:02:08.877
are competing for attention it is integrated

00:02:08.877 --> 00:02:15.797
bind bound together these complex features and yet it is possible to effortless

00:02:15.797 --> 00:02:25.217
navigate around and have thoughts about your holiday about your next research

00:02:25.217 --> 00:02:28.697
grant about complicated um.

00:02:30.917 --> 00:02:33.977
Conceptual theories and thoughts

00:02:33.977 --> 00:02:37.417
and so i thought if we use

00:02:37.417 --> 00:02:43.437
the kind of narrative that is coming out of describing early visual cortex and

00:02:43.437 --> 00:02:48.417
then higher stages of visual cortex as the template of how the entire brain

00:02:48.417 --> 00:02:55.617
works we have a very busy and buzzing neuronal system which is combining features.

00:02:55.957 --> 00:02:59.237
Um and we run out of brain space before

00:02:59.237 --> 00:03:02.437
we have an explanation of those really exciting

00:03:02.437 --> 00:03:10.537
things that we are doing so i thought something isn't quite right and um i think

00:03:10.537 --> 00:03:16.977
predictive processing framework brings to the table a narrative that is refreshingly

00:03:16.977 --> 00:03:20.837
different by By saying, well.

00:03:21.017 --> 00:03:24.677
It makes sense if you do something like driving in the car,

00:03:24.817 --> 00:03:28.717
where lots of those features, even though they are quite complex,

00:03:28.857 --> 00:03:35.437
and even though they have spatial temporal dynamics that from a bottom-up system is complicated,

00:03:35.437 --> 00:03:38.617
complicated um you can see that a hierarchical system

00:03:38.617 --> 00:03:41.917
is well equipped to explain that away to to

00:03:41.917 --> 00:03:45.917
say to focus on only the surprising light

00:03:45.917 --> 00:03:48.857
that comes up and signals that you're out of

00:03:48.857 --> 00:03:52.677
petrol and ignore the commercials

00:03:52.677 --> 00:03:55.857
that are running by and these kind of things because lots of

00:03:55.857 --> 00:04:00.317
that can be explained away by different um

00:04:00.317 --> 00:04:03.537
precision explanations you you're

00:04:03.537 --> 00:04:06.517
expecting something to fly by you're expecting commercials

00:04:06.517 --> 00:04:09.457
to try to attract your attention and

00:04:09.457 --> 00:04:12.457
you ignore it so so i think there's a lot that

00:04:12.457 --> 00:04:18.657
can be brought to the table by this top by this global narrative so i mean that

00:04:18.657 --> 00:04:23.077
feels also a bit an idea of how do you let's say optimize information processing

00:04:23.077 --> 00:04:27.497
by actually just looking at your errors so that means stuff you didn't expect

00:04:27.497 --> 00:04:29.577
Yeah, I suppose the stuff you expect. Yeah.

00:04:30.937 --> 00:04:36.497
But was there also something in the data that you were working with at the time

00:04:36.497 --> 00:04:39.477
that was suggestive of moving in that direction?

00:04:39.497 --> 00:04:45.497
Was there sort of a parsimony in the data that you felt needed to be explained in those terms? Yeah.

00:04:45.637 --> 00:04:56.297
So 10 years ago, we have worked in principle with very simple stimuli that looked at a pair in motion,

00:04:56.477 --> 00:05:01.857
which is from one perspective, maybe the most simple visual illusion you can

00:05:01.857 --> 00:05:09.077
think of because it only requires two dots that blink in a certain spatial temporal

00:05:09.077 --> 00:05:14.637
pattern so that your visual system combines them to the illusion of a continuous illusion.

00:05:14.657 --> 00:05:22.817
Movement from this one location to the other location and if it's long distance,

00:05:23.617 --> 00:05:28.237
it may appear as one jump but it's still one object and.

00:05:29.603 --> 00:05:34.023
Commercials work with this, you see this all around, it's everywhere.

00:05:35.303 --> 00:05:44.583
But it's a very simple illusion that we looked at and that we used to show how

00:05:44.583 --> 00:05:49.523
the visual system integrates this information and feeds back to early visual cortex.

00:05:49.683 --> 00:05:53.023
So that was an early finding. finding and

00:05:53.023 --> 00:06:02.623
let's say I started with a surprising finding of some feedback in the intermediate

00:06:02.623 --> 00:06:10.003
space between these dots which have which induce their prayer motion but which is then filled by,

00:06:10.903 --> 00:06:15.443
some activity and we try to make sense of this activity so it was actually driven

00:06:15.443 --> 00:06:21.383
by an experimental result from fmi that this intermediate space was filled up

00:06:21.383 --> 00:06:26.003
with activity that we then thought of how to explain that which then brought

00:06:26.003 --> 00:06:29.643
us to several competing hypotheses one was.

00:06:31.143 --> 00:06:34.643
Related to consciousness and representing

00:06:34.643 --> 00:06:37.703
of an item and the other was more

00:06:37.703 --> 00:06:40.823
related to predictive coding framework which we

00:06:40.823 --> 00:06:44.183
then followed up with several experiments which we

00:06:44.183 --> 00:06:47.283
then came up with a sampling strategy to test

00:06:47.283 --> 00:06:55.743
predictive processing which led us to the conclusion that there is a predictive

00:06:55.743 --> 00:07:01.463
model that is projected down to early visual cortex and and that's how we started

00:07:01.463 --> 00:07:06.243
and the the story i told you about you know 10 or um.

00:07:08.474 --> 00:07:13.254
12 years ago or so, was when I then left Frankfurt and went to Glasgow.

00:07:13.794 --> 00:07:18.974
And I was thinking of new grand ideas that I put in grand applications.

00:07:19.254 --> 00:07:24.894
And coming from this very simple stimulus, I wanted to make it more realistic,

00:07:26.074 --> 00:07:33.554
using more realistic complex stimuli and using also more interactive subjects.

00:07:33.614 --> 00:07:37.494
So we created one extrapolation was including eye movements.

00:07:37.494 --> 00:07:44.434
Another one was using complex stimuli and looking for predictive processes with

00:07:44.434 --> 00:07:46.114
these more complex scenes.

00:07:47.854 --> 00:07:51.754
Yeah, so I'm interested in how

00:07:51.754 --> 00:07:56.374
and why you arrived at the idea of using apparent motion as the paradigm.

00:07:56.594 --> 00:08:04.354
As you say, you're motivated by this predictive framework and you began by describing

00:08:04.354 --> 00:08:10.814
some ideas from people like David Mumford and his active blackboard idea that

00:08:10.814 --> 00:08:15.994
there is a sort of representation of what you expect to see in the world in higher brain areas.

00:08:16.214 --> 00:08:20.654
And you also described a study from Hanson et al., which talked about,

00:08:20.694 --> 00:08:25.234
if we're watching snippets from a Hollywood movie, we build up this picture

00:08:25.234 --> 00:08:30.414
in our heads in the higher areas of the visual pathway of the meaning of the story and so on.

00:08:30.634 --> 00:08:35.554
And this feeds back and influences processing in the lower visual areas.

00:08:36.114 --> 00:08:39.454
But the task you choose to focus on,

00:08:39.454 --> 00:08:48.374
apparent motion is of course very well known in psychology from centuries back

00:08:48.374 --> 00:08:54.854
the Gestalt psychologists found this and focused on it and really talked about

00:08:54.854 --> 00:08:56.294
it as a way of thinking about.

00:08:58.005 --> 00:09:02.705
Without perhaps too much theoretical insight into it, but seeing it as such

00:09:02.705 --> 00:09:04.745
a dynamical process that's happening in the brain.

00:09:04.905 --> 00:09:07.785
Not so much necessarily a hierarchical process, though.

00:09:07.905 --> 00:09:13.025
So it is a bit of a leap to go from a gestalt process like apparent motion to

00:09:13.025 --> 00:09:15.705
a hypothesis about hierarchy. Okay.

00:09:17.165 --> 00:09:21.245
Interestingly, I think when we started brain imaging, and that goes back to

00:09:21.245 --> 00:09:24.785
1996 when I joined Rainer Goebbels'

00:09:24.885 --> 00:09:29.405
lab, But apparent motion and other Gestalt laws were kind of the first start

00:09:29.405 --> 00:09:38.365
of how we wanted to look into the human brain to see where these Gestalt laws are represented.

00:09:38.925 --> 00:09:41.765
And motion was something that we started with.

00:09:42.365 --> 00:09:46.765
Actually, the second experiment was imagery of motion, and we're surprised to

00:09:46.765 --> 00:09:49.305
see that that worked and activated V5.

00:09:49.305 --> 00:09:53.545
And while we did these apparent motion experiments,

00:09:54.225 --> 00:10:01.585
one was about bistable apparent motion and the switches between perceiving the

00:10:01.585 --> 00:10:05.085
motion content and not perceiving the motion content was something that you

00:10:05.085 --> 00:10:07.265
can follow in the activity of V5.

00:10:08.305 --> 00:10:14.385
But then we found something in the data that actually V1 knows something about,

00:10:14.565 --> 00:10:18.645
gets informed about some of those integration processes.

00:10:19.305 --> 00:10:26.285
And it was with Nico Kriegesquarte at that time that we were late at night discussing

00:10:26.285 --> 00:10:34.085
what could be the kind of feedback along the trays in the apparel motion,

00:10:34.245 --> 00:10:36.365
how that could be informative.

00:10:36.365 --> 00:10:45.585
And we came then to David Mumford's kind of seminal papers of the 90s in which

00:10:45.585 --> 00:10:51.065
he motivated this active blackboard theory, not totally.

00:10:52.145 --> 00:10:57.505
And being explicit whether this active blackboard is a subcortical thalamic

00:10:57.505 --> 00:10:59.825
structure or primary visual cortex,

00:10:59.985 --> 00:11:07.505
but the idea that these expert areas kind of converge to their best guesses

00:11:07.505 --> 00:11:14.285
in a scene and scribble this onto a blackboard to kind of negotiate the scene

00:11:14.285 --> 00:11:19.305
and the surprises within the scene and so on was quite attractive.

00:11:19.305 --> 00:11:22.685
And we thought okay how can we test this and um

00:11:22.685 --> 00:11:28.905
and i think that started the the following experiments and the paramotion is

00:11:28.905 --> 00:11:35.845
good in the sense that you can have the inducing stimuli far away from a region

00:11:35.845 --> 00:11:42.605
in the middle where you have the illusion and since the the the early part of my brain imaging,

00:11:42.725 --> 00:11:45.445
I always use retrotopic mapping and.

00:11:47.045 --> 00:11:56.365
The trick that the spatial separation of components can be done very cleanly in fMRI,

00:11:57.165 --> 00:12:01.585
maybe better than in some other methods so in EEG you always have everything

00:12:01.585 --> 00:12:10.265
together, you have a better temporal resolution but the signal can be can be separated so well.

00:12:10.565 --> 00:12:15.345
And in fMRI, one of the advantages, you can do individual maps and then get

00:12:15.345 --> 00:12:18.985
the single signals out of those retrotopic components.

00:12:19.365 --> 00:12:27.065
And I think it's this tool that I wanted to explore and use more and more over

00:12:27.065 --> 00:12:33.745
the years up to our most recent finding where we can see that in more complex scenes,

00:12:33.745 --> 00:12:43.065
this occluded part or a rhizotopic space can show that you have something like

00:12:43.065 --> 00:12:46.825
a mental map or mental hypothesis drawn out.

00:12:52.609 --> 00:12:58.589
I think one of the interesting issues here is,

00:12:58.729 --> 00:13:05.329
and that came out in the talk, was how much of the kinds of constraints that

00:13:05.329 --> 00:13:11.169
the brain is using to do something like a parent motion tax are going to be

00:13:11.169 --> 00:13:14.649
implemented within a level of the hierarchy and how many of them are going to

00:13:14.649 --> 00:13:16.889
be higher levels of the hierarchy.

00:13:17.289 --> 00:13:24.449
So I think most of your talk was about how the upper levels might influence

00:13:24.449 --> 00:13:26.049
the process in the lower levels.

00:13:26.749 --> 00:13:30.529
But you presumably would also recognize that there are some processes happening

00:13:30.529 --> 00:13:36.049
within V1 itself which are going to do things like gestalt properties or really

00:13:36.049 --> 00:13:38.049
encourage things like completion.

00:13:38.529 --> 00:13:43.269
Yeah, I mean, this is a very interesting feature of using a parallel motion.

00:13:43.269 --> 00:13:52.789
Motion, because the motion that we trigger is a very fast one and a long distance one.

00:13:53.049 --> 00:14:01.309
So it's a 12 degrees visual angle, it goes to 60 degrees per second.

00:14:01.749 --> 00:14:09.909
Those are features that we, one, cannot process and high visual areas like V5,

00:14:10.049 --> 00:14:13.569
if it's considered or to be higher,

00:14:13.689 --> 00:14:22.189
but specialized visual access trial areas are specialized to process tuning

00:14:22.189 --> 00:14:28.049
along these motion energies, high speed,

00:14:28.349 --> 00:14:30.289
long distances.

00:14:30.709 --> 00:14:39.549
And so it's a particular good case in which we can look into V1's typical feed-forward features

00:14:39.829 --> 00:14:42.949
and see the addition

00:14:42.949 --> 00:14:46.589
of complex features that aren't

00:14:46.589 --> 00:14:49.729
typical to v1 if you want to say in

00:14:49.729 --> 00:14:52.949
a different way v1 has a certain language to

00:14:52.949 --> 00:15:02.809
speak and um we we kind of test the multi-linguality of of v1 because other

00:15:02.809 --> 00:15:07.989
other areas speak in a different language and the question is Because some of

00:15:07.989 --> 00:15:11.649
the interesting and still not totally resolved questions is,

00:15:11.649 --> 00:15:15.869
are these areas translating 2v1?

00:15:16.009 --> 00:15:19.149
So that is v5 translating its prediction

00:15:19.149 --> 00:15:24.409
and explaining 2v1 in a spatial localized way at a very high speed?

00:15:24.569 --> 00:15:29.229
You expect a dot to see here, here, here, here, and here, and this and this orientation?

00:15:30.702 --> 00:15:40.362
Or is it more of a kind of envelope translation to say there's high energy motion

00:15:40.362 --> 00:15:42.182
in this part of the visual field,

00:15:42.362 --> 00:15:49.022
but your knowledge about the concrete incidence, orientation,

00:15:49.082 --> 00:15:54.062
and so on will combine to a better understanding of the situation.

00:15:54.062 --> 00:15:59.282
And that's, I think, the kind of model that I have in my mind.

00:15:59.982 --> 00:16:04.722
There isn't a full translation. I touched upon this with mapping the precision

00:16:04.722 --> 00:16:07.902
of the filling in in complex scenes.

00:16:08.102 --> 00:16:14.482
There's a line drawings, but the precision is around four degrees.

00:16:15.062 --> 00:16:20.022
V1 has a precision of one degree. So it's more like saying there's a line somewhere

00:16:20.022 --> 00:16:23.862
around here, or there's a motion of that kind of speed somewhere around here.

00:16:24.062 --> 00:16:34.242
And this is what top-down can give as a prediction and combined with the lateral information saying,

00:16:34.362 --> 00:16:39.222
oh, there was a dot that just disappeared here together with the energy prediction up there,

00:16:39.342 --> 00:16:42.382
I can now make a very precise prediction.

00:16:43.082 --> 00:16:45.782
By combining those two constraints.

00:16:46.382 --> 00:16:50.902
But before we get there, because that's sort of after a number of experiments, right?

00:16:51.102 --> 00:16:57.522
Right. And I think that the first question is whether, given the data you have

00:16:57.522 --> 00:16:58.982
on this apparent motion paradigm,

00:16:59.522 --> 00:17:07.742
whether this predictive mind hypothesis or the active inference hypothesis,

00:17:08.082 --> 00:17:09.002
whatever you want to call it,

00:17:09.642 --> 00:17:12.582
is the most parsimonious explanation of the data that you have.

00:17:12.582 --> 00:17:16.102
Because in what you showed in your first experiment, it had apparent motion.

00:17:16.262 --> 00:17:20.522
So we have the two fields are being stimulated in a certain order.

00:17:20.682 --> 00:17:22.322
So you have this idea that something moving.

00:17:23.572 --> 00:17:28.752
And now you're going to flash a stimulus in the center area either synchronized

00:17:28.752 --> 00:17:34.292
with disappearing motion right or out of out of sync out of phase with this apparent motion,

00:17:34.852 --> 00:17:39.012
uh where you you look at the time difference of one frame right so this would

00:17:39.012 --> 00:17:42.992
be about 15 milliseconds or something like this i don't know so it's a very

00:17:42.992 --> 00:17:45.652
slight deviation right of the predicted path.

00:17:45.852 --> 00:17:54.532
And then what you do observe in your fMRI is that eight seconds after,

00:17:55.532 --> 00:17:59.032
stimulation, either congruent or incongruent,

00:17:59.172 --> 00:18:06.012
you see a slight deflection or deviation of the BOLD response in the area that

00:18:06.012 --> 00:18:09.692
you stimulated with this predictor-unpredictor stimulus,

00:18:09.692 --> 00:18:15.412
which is slightly higher the bolt response is slightly higher for the deviant

00:18:15.412 --> 00:18:21.352
stimulus as compared to the congruent stimulus this is the main effect and the deviation,

00:18:21.692 --> 00:18:27.752
the bolt signal is of the order of about what 8 to 10 percent,

00:18:27.932 --> 00:18:33.932
something like this right and it's also slightly enhanced in both cases relative

00:18:33.932 --> 00:18:34.872
to the baseline response,

00:18:35.972 --> 00:18:42.792
so now Now, why do you believe that this predictive coding into the power hypothesis

00:18:42.792 --> 00:18:46.992
interpretation is the most parsimonious explanation of the data?

00:18:50.677 --> 00:18:57.297
Andy Clark has a wonderful cover picture in his book on predictive processing,

00:18:57.597 --> 00:18:59.037
I think, or it's called differently.

00:18:59.097 --> 00:19:02.977
It's Surfing Uncertainty.

00:19:03.137 --> 00:19:08.137
And if you take the picture, we're in Barcelona. I've seen the beach this morning.

00:19:08.317 --> 00:19:17.197
The model is a little bit like this. You have this wave of motion energy, of prediction.

00:19:17.197 --> 00:19:24.297
It is translated to something that needs energy in V1, like a prediction.

00:19:25.217 --> 00:19:31.777
And then you place on this wave a surfer that serves the waves and then is efficient

00:19:31.777 --> 00:19:36.677
and comes through and is detected. That's the dot that's in time.

00:19:37.477 --> 00:19:42.237
And the surfer that has a little bit of a timing problem won't make this and

00:19:42.237 --> 00:19:46.557
will fall off the wave. And that's the prediction error.

00:19:47.197 --> 00:19:52.157
And that makes a small difference. And the surprising result is that this one

00:19:52.157 --> 00:19:54.777
frame difference is detectable in bold signal.

00:19:55.057 --> 00:20:00.177
And my most parsimonious explanation the way I think is,

00:20:00.257 --> 00:20:07.097
yeah, you see at this moment the mismatch between the prediction and the surfer,

00:20:07.137 --> 00:20:11.657
the dot that doesn't quite make it to be within this envelope.

00:20:11.917 --> 00:20:16.217
Now, as I said earlier, this envelope moves very fast, 60 degrees visual angle.

00:20:16.217 --> 00:20:19.117
That's faster than any motion detector on B1.

00:20:20.117 --> 00:20:25.537
B1 has lots of lateral connections, but I doubt that they have the speed to

00:20:25.537 --> 00:20:28.257
process this. But I'm not opposed to this explanation.

00:20:28.477 --> 00:20:32.017
I've had several discussions. People are on both sides saying,

00:20:32.157 --> 00:20:35.037
well, there could be a contribution.

00:20:35.337 --> 00:20:39.757
And in the model that I just pictured before, I do think there is a contribution

00:20:39.757 --> 00:20:45.977
of lateral interaction given the feedback signal of a certain motion envelope.

00:20:46.217 --> 00:20:51.897
And giving the neighborhood relation of the just disappearing dot.

00:20:53.145 --> 00:20:57.285
Gives the high precision of of this well i

00:20:57.285 --> 00:21:03.625
would say the long range long range lateral interactions in v1 are iso orientation

00:21:03.625 --> 00:21:09.665
right so neurons that like similar features linked together over long distances

00:21:09.665 --> 00:21:16.025
and that would of course be a perfect substrate to exploit to get a fair motion because,

00:21:16.665 --> 00:21:21.525
the activity will be very actively guided along neurons with a similar response

00:21:21.525 --> 00:21:24.305
tuning to move in a certain direction, right?

00:21:24.365 --> 00:21:28.485
So this would give you essentially a kind of entrainment response,

00:21:28.865 --> 00:21:33.245
right, to explain the parent motion, which might be also more in line with actually

00:21:33.245 --> 00:21:35.005
the Gestalt ideas, right?

00:21:37.485 --> 00:21:42.865
So can you… For this model, that's

00:21:42.865 --> 00:21:47.465
why I like to use the image of the

00:21:47.465 --> 00:21:52.545
paper of mock

00:21:52.545 --> 00:22:00.025
I'm blanking on her name but and who did their promotion with the gratings and

00:22:00.025 --> 00:22:07.105
the filling in is inducing a new feature which is a new orientation which is

00:22:07.105 --> 00:22:12.485
not just neighborhood relation it's a smooth transition now.

00:22:14.193 --> 00:22:18.973
Maybe the case can be made that the lateral interaction smoothly goes from one

00:22:18.973 --> 00:22:26.153
orientation over space to bleeds into other orientation.

00:22:26.433 --> 00:22:32.033
But it's not such that the feature that's picked up on the paramotor trace is

00:22:32.033 --> 00:22:36.233
just replication of the neighboring feature in orientation.

00:22:37.313 --> 00:22:41.893
So to me, it looks more as a constructed feature.

00:22:43.033 --> 00:22:47.333
But that would just be a matter, as long as I take enough freedom to fiddle

00:22:47.333 --> 00:22:51.913
with the topography of the lateral interactions, I could also get this apparent feature, right?

00:22:53.033 --> 00:22:56.813
But another answer, of course, could be you could still say,

00:22:56.853 --> 00:23:01.293
look, you see, we agree, because the predictive model still holds,

00:23:01.453 --> 00:23:05.813
it's just that the substrate is also included already in V1 wiring.

00:23:05.813 --> 00:23:11.813
But for some reason, in your view, you want to see it like a hierarchical system, right?

00:23:11.873 --> 00:23:18.673
And without ascribing a big functional role to the local structure of V1 in this case.

00:23:18.813 --> 00:23:28.253
Well, you also know that 97, 98% of the synapses in the V1 volume originate inside V1.

00:23:29.053 --> 00:23:33.233
Only a tiny fraction comes from outside V1, right?

00:23:33.233 --> 00:23:40.093
So, do you believe that this tiny fraction of long-range projections coming

00:23:40.093 --> 00:23:46.753
out of V2 or out of your thalamus or other cortical areas are sufficient to

00:23:46.753 --> 00:23:48.173
carry your predictive model?

00:23:49.093 --> 00:23:53.673
So, there are two lines of evidence where we really looked at this interaction.

00:23:53.893 --> 00:23:56.033
One that I presented, one that I didn't present.

00:23:57.133 --> 00:24:02.453
And maybe to add the one that I didn't present, we used an EEG experiment.

00:24:03.233 --> 00:24:04.833
Again, with a prior motion design.

00:24:05.913 --> 00:24:10.693
And we found two components that have energy components.

00:24:13.833 --> 00:24:16.933
Motion energy components as compared to flicker to

00:24:16.933 --> 00:24:19.933
flicker and those components were around 100

00:24:19.933 --> 00:24:22.993
milliseconds and around 140 milliseconds after

00:24:22.993 --> 00:24:32.113
stimulus onset and we then had the question which one is from retinotopic regions

00:24:32.113 --> 00:24:38.193
and which one is from v5 and what we found is we applied the a paramotion in

00:24:38.193 --> 00:24:41.453
the upper visual field and in the lower visual field,

00:24:41.533 --> 00:24:46.573
which induces an EEG component in the dorsal and in the ventral stream,

00:24:46.753 --> 00:24:48.093
so they have different orientation.

00:24:49.333 --> 00:24:53.273
And we subtracted one where we had a paramotion in the upper visual field from

00:24:53.273 --> 00:25:00.933
in the lower visual field and found that the early component was subtracted away,

00:25:01.113 --> 00:25:09.333
meaning that the motion sensitive area V5, which in both cases has the same localization,

00:25:09.973 --> 00:25:13.093
is the early one, around 100 milliseconds, and the later one,

00:25:13.153 --> 00:25:16.153
the 140, is the upper and the lower of the retrotopic.

00:25:16.293 --> 00:25:19.713
So the retrotopic component comes 40 milliseconds later.

00:25:23.133 --> 00:25:29.533
So the other experiment I showed was the TMS experiment at which we stimulate

00:25:29.533 --> 00:25:36.153
V5 50 milliseconds, 40 milliseconds before the onset of the flicker,

00:25:36.173 --> 00:25:37.773
which is on the power motion trace.

00:25:38.693 --> 00:25:44.153
And this TMS takes away the predictability effect on the power motion trace.

00:25:44.353 --> 00:25:52.593
So both of those indicators, I think, speak more to a communication between V5,

00:25:52.693 --> 00:26:02.713
which is tuned to process this high velocity motion across huge space.

00:26:03.753 --> 00:26:08.093
And the more localized features in v1.

00:26:08.273 --> 00:26:12.333
And so therefore I think it's a good paradigm because.

00:26:13.707 --> 00:26:18.687
Different features are processed in different different regions and and they

00:26:18.687 --> 00:26:22.787
are naturally optimized for this and you can see this kind of interaction um

00:26:22.787 --> 00:26:23.807
but now it's an alternative,

00:26:24.587 --> 00:26:28.407
for for your two experiments i could also argue look see your first experiment

00:26:28.407 --> 00:26:35.807
you show v5 leading v1 in this response right and um then i could argue about

00:26:35.807 --> 00:26:42.087
maybe you need a minimum volume of response actually be detectable with your methods of fMRI.

00:26:42.987 --> 00:26:49.467
This kind of, given the heavy convergence to V1, you will only get that initial

00:26:49.467 --> 00:26:54.847
critical response that is detectable at the V5 level and after the V1.

00:26:54.927 --> 00:26:59.007
So it's more an artifact of your measurement technique than really reflecting

00:26:59.007 --> 00:27:01.487
the underlying dynamics. Can you exclude that?

00:27:02.767 --> 00:27:13.807
So it's true that we usually have these block designs, but in the EEG experiment, we measure these.

00:27:16.267 --> 00:27:22.467
Short, I mean there's 100 milliseconds after onset of the apparent motion energy

00:27:22.467 --> 00:27:28.607
component as compared to blinking for example and also in our experiment where

00:27:28.607 --> 00:27:33.867
we combine apparent motion with saccades it's such that it even works after

00:27:33.867 --> 00:27:35.467
the saccade in the new hemisphere.

00:27:36.887 --> 00:27:42.087
Which Which, if I understand you correctly, you're saying you need repetition

00:27:42.087 --> 00:27:45.647
over time to build up the V5 signal.

00:27:46.447 --> 00:27:48.447
That's convergence. That's plain convergence.

00:27:49.367 --> 00:27:54.987
That you just have a sufficient drive onto that volume of cells to become really

00:27:54.987 --> 00:27:58.067
a clean signal that you can extract with your method.

00:27:58.347 --> 00:28:03.387
As opposed to a more distributed, diffused signal in V1 that is more difficult to detect.

00:28:03.387 --> 00:28:12.267
Well yeah i mean if you like in if if you take v1 after the saccade a blinking dot,

00:28:12.927 --> 00:28:15.867
there's the inducer the last stimulus of their paramotion

00:28:15.867 --> 00:28:22.847
and a blinking dot next to it the target those two are always perfect a paramotion

00:28:22.847 --> 00:28:27.267
if that would be the only event if v1 in the new hemisphere would process that

00:28:27.267 --> 00:28:34.387
on its own they are as much related to one another the in time is as much related as the out-of-time.

00:28:34.467 --> 00:28:41.287
The only thing that makes these two stimulation conditions different is the

00:28:41.287 --> 00:28:46.287
history that was processed in V5 and in V1 in the other hemisphere.

00:28:46.687 --> 00:28:55.407
So it's bringing this history to this new situation that creates a difference.

00:28:58.673 --> 00:29:02.713
It might be true that it needs to build up over time, this hypothesis in V5,

00:29:02.893 --> 00:29:06.253
because a prior motion is a peculiar situation.

00:29:06.353 --> 00:29:09.553
If you have just two dots and you perceive them as a prior motion,

00:29:09.793 --> 00:29:13.353
there's no other way describing that as postdiction.

00:29:13.693 --> 00:29:19.573
You need to, after the second event only, you can understand that it was a motion

00:29:19.573 --> 00:29:25.753
and every point in between would be integrated at a later time point.

00:29:25.753 --> 00:29:29.233
So it's like you're reversing the time frame.

00:29:31.013 --> 00:29:35.653
In our experiments, we have always had eight iterations of a pair motion.

00:29:35.913 --> 00:29:46.613
So that takes care, the visual system kind of has time to catch up with the

00:29:46.613 --> 00:29:50.713
delay in simulation to then run in time.

00:29:50.713 --> 00:29:56.193
Time if the famous experiment is if you're doing a pair of motion you take away one stimulus,

00:29:56.773 --> 00:29:59.393
so it's repetitive and then you take a

00:29:59.393 --> 00:30:02.473
stimulus away you will still continue to see a pair of motion because your visual

00:30:02.473 --> 00:30:07.553
system is now so predictive that it fills in this that it can catch up it needs

00:30:07.553 --> 00:30:12.053
to do post-diction of the missing stimulus to to realize so your fair motion

00:30:12.053 --> 00:30:18.713
is there but the question is do we really see a strong contribution to the phenomenon

00:30:18.833 --> 00:30:24.493
at the signal level of something we might want to call a predictive model, right?

00:30:24.633 --> 00:30:30.133
So then also for your second example, you could also see where you talk about your TMS experiment.

00:30:30.333 --> 00:30:38.173
So look, if I zap V5, I can sort of disrupt the error detection signal.

00:30:38.353 --> 00:30:42.233
This is how you would interpret it, right? And then the error detection signal,

00:30:42.273 --> 00:30:48.513
the deviant trigger signal signal ends up roughly the same magnitude as the congruent stimulus.

00:30:50.392 --> 00:30:52.552
But that's, of course, I could still

00:30:52.552 --> 00:30:55.732
argue, but that's at best a necessary condition of a predictive model.

00:30:55.872 --> 00:31:01.652
And it doesn't prove in any way that what comes from V5 is a predictive model

00:31:01.652 --> 00:31:06.712
that is processed into an error at the V1 level. Can you exclude that?

00:31:13.772 --> 00:31:21.292
I'm struggling to find a way in which we could prove that. your distinction how,

00:31:23.752 --> 00:31:27.052
so which aspect are you,

00:31:27.812 --> 00:31:35.712
questioning the content of the model so to say V5 speaks to V1 and okay,

00:31:36.432 --> 00:31:40.572
and the content of

00:31:40.572 --> 00:31:48.112
the message has something to do with the power motion and And are you now saying

00:31:48.112 --> 00:31:52.592
there could also be a contribution of lateral interaction in V1 that contributes

00:31:52.592 --> 00:31:57.812
to creation of this complex prediction that we test?

00:31:58.052 --> 00:32:03.712
Right. So this is an alternative explanation that so far you couldn't exclude yet.

00:32:03.992 --> 00:32:07.212
No, I'm not. Except that you're saying, well, but I have a TMS experiment.

00:32:07.692 --> 00:32:11.612
And then I'm saying, yeah, but wait, I could still have all this process playing

00:32:11.612 --> 00:32:16.492
out at the V1 level. We know anatomically V5 projects back to V1.

00:32:16.712 --> 00:32:20.792
So if I zap V5, it's not so strange. Something happens to the V1 level,

00:32:20.932 --> 00:32:24.992
but it doesn't tell us anything about whether this is a prediction being fed

00:32:24.992 --> 00:32:30.732
back or just there's an anatomical connection between the two areas and signal exchange.

00:32:31.232 --> 00:32:36.452
So I think Jeff Hawkins uses this kind of explanation of this hierarchical predictive

00:32:36.452 --> 00:32:41.212
memory prediction framework, which is as you know, you...

00:32:42.902 --> 00:32:48.182
At every level, the neuronal microunit

00:32:48.182 --> 00:32:56.902
is trying to explain its own activity given the surrounding stimulus history

00:32:56.902 --> 00:33:05.662
in the context of a message from a higher level top-down prediction.

00:33:05.662 --> 00:33:08.902
So and the example

00:33:08.902 --> 00:33:13.002
he's using is like if you have a melody then

00:33:13.002 --> 00:33:17.142
the higher up areas in

00:33:17.142 --> 00:33:23.742
auditory cortex would tell you what is expected to continue in this melody and

00:33:23.742 --> 00:33:30.582
giving the local information of transients at this lower level and giving the

00:33:30.582 --> 00:33:35.422
envelope context from higher you create these neurons kind of create what's

00:33:35.422 --> 00:33:36.962
most likely to happen next.

00:33:37.442 --> 00:33:41.882
And so it's within this kind of model that I would also, and there's actually

00:33:41.882 --> 00:33:47.422
a good model that does that for a pair of motion or for motion behind occluders,

00:33:48.122 --> 00:33:53.722
where it does exactly this, that the top-down is the kind of envelope and the

00:33:53.722 --> 00:33:58.862
lateral is the additional information that then converges to a precise prediction.

00:33:58.862 --> 00:34:09.762
So, the example here being you have a motion of a certain energy disappearing behind a.

00:34:10.702 --> 00:34:19.242
Occluder, and the way it's modeled is there's a motion energy expected in this area, and.

00:34:20.847 --> 00:34:23.747
But you can't be very precise where it is.

00:34:24.207 --> 00:34:27.787
The disappearance of the dot gives an additional constraint.

00:34:28.147 --> 00:34:36.007
And these two information together make a very precise prediction of the trajectory

00:34:36.007 --> 00:34:38.507
of the dot. And that's the way I think about it.

00:34:38.647 --> 00:34:44.067
So I'm not excluding the lateral interaction. So you're saying then that the

00:34:44.067 --> 00:34:50.387
idea of a hierarchical predictive model is a functional concept that's not mimicked

00:34:50.387 --> 00:34:55.087
or in sort of an isotropic way mapped to the anatomical hierarchy,

00:34:55.467 --> 00:35:00.527
because it might be implemented by some classic confluence of recurrent projections

00:35:00.527 --> 00:35:05.807
and local interactions in neural circuits. Exactly. This will be your point. Yes.

00:35:07.067 --> 00:35:10.087
All clear. because also in that sense you

00:35:10.087 --> 00:35:13.107
made the point and also resonates with

00:35:13.107 --> 00:35:15.927
this the traditional view and that's also where

00:35:15.927 --> 00:35:20.787
Hawkins I think sits would be rather dogmatic in that sense right there's a

00:35:20.787 --> 00:35:26.267
top down prediction and this comes together at a lower level in the hierarchy

00:35:26.267 --> 00:35:30.927
where it leads to an error there's a real comparison taking place between the

00:35:30.927 --> 00:35:34.727
state that this lower level area believes is correct

00:35:34.887 --> 00:35:38.967
it gets a reference as you want from a top-down area and now a comparison happens

00:35:38.967 --> 00:35:45.607
and I have an error right and your data would not really reflect that in such a literal sense right,

00:35:46.527 --> 00:35:51.287
because you in in essentially in your what you'll see a ball signal for the

00:35:51.287 --> 00:35:55.787
matching the congruent stimulus also leads to a deflection of the signal as

00:35:55.787 --> 00:35:59.327
does the incongruent stimulus although the deflection is somewhat different

00:35:59.327 --> 00:36:04.307
right so that means it's not about the error processing So what do you then

00:36:04.307 --> 00:36:05.527
think is being processed,

00:36:05.667 --> 00:36:12.247
is being generated in that local V1 circuit if it's not an error as the traditional

00:36:12.247 --> 00:36:13.547
model suite would predict?

00:36:15.742 --> 00:36:22.362
Well, I think this is, of course, a very good point, but it seems like a point

00:36:22.362 --> 00:36:28.762
about the labels we add to those coding differences, right?

00:36:28.862 --> 00:36:38.782
So if a prediction is violated and the violation of the prediction gives a different signal,

00:36:38.882 --> 00:36:43.282
or whether it's a confirmation of a prediction that gives another signal,

00:36:43.282 --> 00:36:52.762
It is the combination of input signal and expectations that are combined.

00:36:53.222 --> 00:37:04.122
And they can be subtracted, which is the prediction of the predictive coding proper way,

00:37:04.242 --> 00:37:08.862
or it could be multiplied and resonate.

00:37:08.862 --> 00:37:16.402
And it's very difficult to define the level at which we can resolve that debate

00:37:16.402 --> 00:37:25.782
because I'm currently not sure how we would resolve this because at the neuronal

00:37:25.782 --> 00:37:28.102
level, so what is the right level of description?

00:37:28.302 --> 00:37:37.682
We have the microcircuitry within V1 that is a combination of excitatory and

00:37:37.682 --> 00:37:39.502
inhibitory neurons neurons,

00:37:39.502 --> 00:37:47.802
the error units that can be also a combination of excited or inhibitory neurons,

00:37:47.862 --> 00:37:49.742
so it will be very difficult to resolve that.

00:37:50.762 --> 00:37:56.882
But the narrative of the predictive processing framework is very clear about

00:37:56.882 --> 00:38:00.842
those units in principle. So prediction error matters.

00:38:01.802 --> 00:38:04.262
Reduction of prediction error or...

00:38:07.091 --> 00:38:11.291
Is a currency that might be very useful.

00:38:12.571 --> 00:38:18.391
And also in the predictive coding framework, you have the resonating,

00:38:18.411 --> 00:38:25.431
the confirmation signal for a predicted signal that keeps an internal model running.

00:38:26.331 --> 00:38:33.811
So you have all those components, And I think it's, you know.

00:38:36.411 --> 00:38:40.211
It's going to be, do you have a suggestion how to solve that?

00:38:40.431 --> 00:38:45.171
I mean, do you have, is there some variance that is not explained or that would

00:38:45.171 --> 00:38:48.291
be more possible to explain by an alternative explanation?

00:38:48.631 --> 00:38:51.091
I'm not so sure. Yes. Okay.

00:38:51.611 --> 00:38:54.611
Temporal populating codes. But before we get there,

00:38:54.831 --> 00:39:01.811
so I feel one way, if I would be a real cynic and observing your results,

00:39:02.031 --> 00:39:07.111
I could say, look, you have done a fabulous job dismantling the predictive modeling framework.

00:39:07.111 --> 00:39:11.251
Work, because the second thing is, also if you go back to the traditional idea

00:39:11.251 --> 00:39:12.591
of, you mentioned Mumford,

00:39:13.171 --> 00:39:17.191
and then we have a number of other people who have variations on that,

00:39:17.331 --> 00:39:22.151
and before that also Barlow was talking about it, you would expect that in the

00:39:22.151 --> 00:39:25.211
cortical circuit, and it was very literally mapped to the anatomy,

00:39:25.491 --> 00:39:30.051
that in a cortical, people would put several layers in the cortex that would

00:39:30.051 --> 00:39:34.911
then deal with the prediction and with the current state and how the error would be computed.

00:39:35.071 --> 00:39:38.711
So very precisely to cortical circles, right?

00:39:38.831 --> 00:39:44.891
But even if you do your layer-specific analysis, you don't necessarily see such

00:39:44.891 --> 00:39:49.571
an asymmetry among the layers in their response.

00:39:49.731 --> 00:39:53.391
There seems to be a very gradual distribution of layers.

00:39:54.408 --> 00:39:59.008
Of the signal that you get in your task. In that case, it was more like an occlusion

00:39:59.008 --> 00:40:00.808
task, but I think it makes the same point.

00:40:01.008 --> 00:40:05.848
So in that sense, aren't you showing it actually goes on in these circuits,

00:40:06.048 --> 00:40:11.788
might reflect aspects of prediction, but it's not necessarily playing out in

00:40:11.788 --> 00:40:14.688
the way as anticipated in these more traditional models.

00:40:14.828 --> 00:40:21.788
Would you agree with that? I'm totally experimental and not dogmatic, right?

00:40:21.868 --> 00:40:24.468
So these are interesting questions.

00:40:24.768 --> 00:40:33.948
I want to see how does traditional neuroscience looks at the stimulus response

00:40:33.948 --> 00:40:36.988
to an unpredicted random stimulus.

00:40:36.988 --> 00:40:47.768
And of course, in those experiments, you can never see the capability of predicted sequences and so on.

00:40:47.808 --> 00:40:53.328
So you need to do the experiments in which the stimulus history is predictive

00:40:53.328 --> 00:40:57.388
for a certain variation and so on.

00:40:57.388 --> 00:41:06.168
And I think that's what we have done and others and whether this creates a,

00:41:06.288 --> 00:41:13.808
I mean, the one point in which I deviate with the Rauer and Ballard model, if you like,

00:41:13.928 --> 00:41:21.148
is that what we see is that the top-down prediction is creating a signal in

00:41:21.148 --> 00:41:23.248
where there is nothing, right?

00:41:23.248 --> 00:41:31.548
So predictive coding is a story in which you explain away as much data as possible,

00:41:31.688 --> 00:41:36.288
but now we show that in the non-stimulated region, where there's nothing to

00:41:36.288 --> 00:41:38.988
explain away, there's something created.

00:41:39.248 --> 00:41:45.148
There's a model, there's a prediction, a created and used energy to put their

00:41:45.148 --> 00:41:47.548
hypothesis to then be tested.

00:41:47.548 --> 00:41:50.448
It um so in in this respect

00:41:50.448 --> 00:41:53.948
it deviates and um and i mentioned flores

00:41:53.948 --> 00:41:57.248
de lange showing similar results for illusory contours they

00:41:57.248 --> 00:42:02.628
are created they are put into into a map it's an active blackboard framework

00:42:02.628 --> 00:42:07.368
in which you know the chalk is picked up and to to draw something on the active

00:42:07.368 --> 00:42:10.988
blackboard and say i you know it would make sense if there's something moving

00:42:10.988 --> 00:42:15.368
and it would make sense if there's some information missing at these and these points.

00:42:15.548 --> 00:42:22.288
In a global sense, that might be very useful and minimize energy in this sense

00:42:22.288 --> 00:42:30.268
that it prepares the organism to respond to more or less surprising stimuli.

00:42:30.268 --> 00:42:41.868
But it's not all about a quick explaining way of all energy in hierarchy as

00:42:41.868 --> 00:42:43.408
it was shown by Rau and Ballard,

00:42:43.568 --> 00:42:53.728
which is just a very simplified model to illustrate some principles of predictive processing.

00:42:54.939 --> 00:43:01.119
Um, yeah, so combining with that, also what we, you also mentioned much more

00:43:01.119 --> 00:43:06.119
detailed physiology that you did with Larkham and other people inside the O-Brain project.

00:43:06.479 --> 00:43:10.479
And also there, you went over those results rather quickly,

00:43:10.619 --> 00:43:16.559
but still at best they showed there is indeed interaction between higher and lower areas,

00:43:16.659 --> 00:43:21.579
but it was not necessarily a clean signature of anything that might only interpret

00:43:21.579 --> 00:43:26.499
like a strong prediction or or an error, or anything along these lines.

00:43:26.759 --> 00:43:32.299
It seems a more non-specific sensory processing that depends on the active dendrite.

00:43:32.619 --> 00:43:39.499
Yeah, so, I mean, this is ongoing data that is recorded and still analyzed.

00:43:42.159 --> 00:43:52.519
Like, well, I think in the classical V1 IceCube model and following some claims that, you know.

00:43:55.519 --> 00:44:02.459
You know, Jack Gallant would say, you know, it's almost explained,

00:44:02.699 --> 00:44:05.459
you know, 80% of the variance is explained in V1.

00:44:05.619 --> 00:44:11.819
We have a very good idea of our models about V1 and that is true actually for

00:44:11.819 --> 00:44:15.399
in certain restrictions.

00:44:15.399 --> 00:44:21.639
So it is in those models in which you have a strain of surprising stimuli,

00:44:21.799 --> 00:44:23.979
it doesn't explain ongoing activity,

00:44:24.299 --> 00:44:28.499
baseline activity, and these kinds of things, which are a huge contributor to

00:44:28.499 --> 00:44:35.559
the energy level that is boiled off in B1 volume.

00:44:36.699 --> 00:44:40.799
So it's maybe only the waves on top of that that are then explained.

00:44:40.799 --> 00:44:48.739
Now, in a classical receptive field V1 explanation model, you don't have the

00:44:48.739 --> 00:44:54.839
description of these feedback signals that I described.

00:44:55.059 --> 00:45:00.479
Another one that I haven't described today, but we have done research is in

00:45:00.479 --> 00:45:05.139
blindfolded subjects when we play auditory scenes, you have some activity in

00:45:05.139 --> 00:45:09.619
V1 that is related to these auditory scenes.

00:45:10.799 --> 00:45:16.839
Um, so the, the, the, the properties and the mechanisms in the envy one.

00:45:18.066 --> 00:45:23.046
Are still having room for negotiations

00:45:23.046 --> 00:45:27.726
of expectations and predictions in whatever kind of mechanisms.

00:45:28.486 --> 00:45:35.666
And I think Christian Liefeld is doing a wonderful job in decoding also the

00:45:35.666 --> 00:45:42.206
level of interneuronal activity that contributes to the detection and the explaining away.

00:45:42.506 --> 00:45:46.026
And I think he has found a subtype of

00:45:46.026 --> 00:45:57.546
inhibitor neurons that is very strong in exactly being silent during visual

00:45:57.546 --> 00:46:02.806
stimulation and being active in between stages.

00:46:03.006 --> 00:46:12.626
So it could be a role of working hard to explain things away that we still will find out.

00:46:12.626 --> 00:46:16.286
You know, um, so there are many sub units that can still be,

00:46:16.386 --> 00:46:22.126
um, contributing to this, to this model and to this compartment of visual cortex

00:46:22.126 --> 00:46:24.986
that we think we, we understand best.

00:46:25.146 --> 00:46:30.546
Maybe we want this thought to be one of those regions that are best studied and best understood.

00:46:30.706 --> 00:46:37.866
And yet there's, there are like languages of feedback signals that we don't fully conceptualize.

00:46:39.546 --> 00:46:45.386
You mentioned one of the results that you found was a difference between paranoid

00:46:45.386 --> 00:46:49.326
schizophrenic patients and controls on this apparent motion task.

00:46:50.306 --> 00:46:57.586
So what do you read from that in terms of support for your hypothesis about top-down predictions?

00:46:57.766 --> 00:46:59.966
And also, what does it tell us about schizophrenia?

00:47:00.486 --> 00:47:08.606
So we actually didn't find a difference between schizophrenic subjects and controls.

00:47:09.546 --> 00:47:13.546
Not in the dimension that we expected it, in the predictability effect,

00:47:13.686 --> 00:47:16.806
because we could confirm the predictability effect.

00:47:16.946 --> 00:47:22.406
We saw overall a difference in the amount of a paramotion perception.

00:47:22.646 --> 00:47:34.246
If you like, the responses to detection of a paramotion was different in controls in schizophrenia.

00:47:34.246 --> 00:47:37.986
I mean, there's a general result, isn't there, that schizophrenics are less

00:47:37.986 --> 00:47:39.546
susceptible to some visual illusions?

00:47:39.926 --> 00:47:45.226
Exactly, yeah, like the hollow face mask illusion, and the same is true for paramosion.

00:47:46.666 --> 00:47:51.526
But what we didn't find is that the effect that we see,

00:47:51.566 --> 00:47:56.406
the advantage of a flickering stimulus that is consistent within a paramotor

00:47:56.406 --> 00:48:05.446
context was more or less in schizophrenic subjects than in control subjects.

00:48:05.686 --> 00:48:17.106
So the original idea was that it could be that the signatures of creating a

00:48:17.106 --> 00:48:21.826
prediction error and the processing of a prediction error are altered in schizophrenic subjects.

00:48:21.826 --> 00:48:25.186
Subjects, they are less tuned to process the prediction error,

00:48:25.326 --> 00:48:28.286
or higher tuned. Actually they are both hypotheses.

00:48:29.866 --> 00:48:33.966
But we didn't find that, so we basically just replicated this.

00:48:34.066 --> 00:48:38.286
Now we are testing the same thing in autistic subjects, we're just starting

00:48:38.286 --> 00:48:39.606
to do that, same experiment.

00:48:43.255 --> 00:48:48.995
Some have suggested that it should precisely be the difference between autistic

00:48:48.995 --> 00:48:53.455
subjects and schizophrenic subjects, in which you have this low-level difference

00:48:53.455 --> 00:48:55.815
in visual predictability,

00:48:56.535 --> 00:48:59.955
that matters in autistic subjects but not in schizophrenic.

00:49:01.015 --> 00:49:10.735
So we will see that. But the prediction here would be that autistic pupils would

00:49:10.735 --> 00:49:15.515
be less tuned in explaining away the predicted stimulus.

00:49:15.715 --> 00:49:21.835
It would be as unexplained, as surprising as the unpredicted one.

00:49:23.195 --> 00:49:29.195
And, well, it shows a tuning for predictability that is important and essential.

00:49:29.195 --> 00:49:39.875
And we could think of it in the sense that if you experience lucid dreams,

00:49:40.535 --> 00:49:51.855
it's maybe a situation you're aware, you're conscious, but you don't respond to prediction errors.

00:49:51.855 --> 00:49:59.655
Uh you the the story you don't wake up and realize you are in your bed but uh

00:49:59.655 --> 00:50:05.415
you continue your dream and knowing being in your bed and knowing to dream does not um.

00:50:07.175 --> 00:50:12.915
Bring the dream to an end and and so that's that's maybe a situation that that

00:50:12.915 --> 00:50:18.095
comes very closely to to a auditory hallucination in in schizophrenic subject

00:50:18.095 --> 00:50:21.495
it is it is causing some prediction error They're surprised,

00:50:21.675 --> 00:50:24.135
they're worried, but it's not resolved.

00:50:24.435 --> 00:50:29.255
It doesn't make the internal representation

00:50:29.255 --> 00:50:34.155
go away or being replaced and overwritten by an alternative.

00:50:34.275 --> 00:50:39.295
SL. But now in terms of the actual data, in a schizophrenics issue,

00:50:39.295 --> 00:50:40.615
we do the parent motion task.

00:50:40.975 --> 00:50:46.835
How big is the deflection you would see in their birth response as compared to a healthy control?

00:50:47.195 --> 00:50:53.215
What is the exact difference? Uh, so, um, sorry for having been so brief,

00:50:53.255 --> 00:50:56.695
but in the, in the talk, I only presented behavioral results and we haven't

00:50:56.695 --> 00:50:58.675
done functional brain imaging with him.

00:50:58.735 --> 00:51:03.735
It's just a behavioral observation that they are not detecting the.

00:51:05.230 --> 00:51:10.450
In-time stimulus as good as the... What would be your prediction in much respect

00:51:10.450 --> 00:51:12.870
to what you would see if you would do fMRI on them?

00:51:13.490 --> 00:51:19.170
Yeah, that's a good one, right? In the autistic subjects, if we find behavioral.

00:51:20.390 --> 00:51:26.690
No difference between the predicted and the non-predicted flash stimulus,

00:51:26.970 --> 00:51:32.290
we would like to do the fMRI experiment using layer-specific fMRI to see whether

00:51:32.290 --> 00:51:37.850
then we have a continuous low-precision prediction in the superficial layer

00:51:37.850 --> 00:51:44.110
that doesn't discriminate and doesn't trigger an error signal could be one hypothesis.

00:51:44.590 --> 00:51:52.370
So another one would be that a prior motion is never creating any activity along

00:51:52.370 --> 00:51:57.430
the trace, so that the envelope is maybe processed feed-forwardly,

00:51:57.430 --> 00:51:59.570
but there's no feedback message.

00:51:59.570 --> 00:52:04.830
So everything that comes in hits the clear slate.

00:52:05.750 --> 00:52:10.250
So now, since I'm on this crusade to sort of demolish the predictive model,

00:52:11.070 --> 00:52:19.170
another piece of the data that feeds that assault is the data that you presented

00:52:19.170 --> 00:52:21.790
on the CIRBAN, which is really interesting, right?

00:52:21.850 --> 00:52:25.030
Because when you opened up your analysis in your whole brain,

00:52:25.250 --> 00:52:28.090
you also look at how other structures would be involved in this.

00:52:28.090 --> 00:52:33.590
Now, soon the cerebellum starts to also show activity in this task, right?

00:52:33.730 --> 00:52:37.830
So now I could say, aha, you see,

00:52:38.070 --> 00:52:42.610
apparently this sort of clean isomorphic mapping of prediction hierarchies to

00:52:42.610 --> 00:52:51.150
cortex is insufficient to explain the paradigm, to explain the behavioral effect, right?

00:52:51.150 --> 00:52:58.730
So how does the results you get from the cerebellum not question this predictive

00:52:58.730 --> 00:53:01.130
hierarchy model that we started out with?

00:53:02.070 --> 00:53:08.850
So I had been a cortical chauvinist by accident, not by conviction.

00:53:09.170 --> 00:53:17.810
So we just started out scanning cortex because cerebellum didn't make it into

00:53:17.810 --> 00:53:19.710
our slab. We have changed that now.

00:53:20.010 --> 00:53:26.610
And I'm still on my learning curve to learn more about the cerebellum.

00:53:26.610 --> 00:53:34.150
How much I understand of it, or some of the hypotheses are that it is a machinery

00:53:34.150 --> 00:53:45.730
that has an architecture to create a temporal forward model that is used in,

00:53:45.910 --> 00:53:49.010
for example, in tracing behavior.

00:53:49.010 --> 00:53:52.330
So seeing, you know, tracing a curve.

00:53:52.670 --> 00:54:00.130
Monkeys can follow this curve by smoothly,

00:54:00.410 --> 00:54:04.490
which is a visual motor integration task,

00:54:04.710 --> 00:54:16.890
that by taking away a cerebellum, this movement becomes very jittered and incoherent and so on.

00:54:16.890 --> 00:54:22.610
And there's a long line of different research, but so the cerebellum in a way

00:54:22.610 --> 00:54:26.950
is not directly connected to the outside world.

00:54:27.090 --> 00:54:35.910
It receives cortical input, processes this, and is the archetypical region for

00:54:35.910 --> 00:54:41.170
an afferent copy processing, which is basically a predictive machinery.

00:54:41.170 --> 00:54:48.650
And so, as I've heard in a recent talk, if evolution came out with a perfect

00:54:48.650 --> 00:54:53.850
prediction machinery and is connected potentially to the entire brain.

00:54:55.111 --> 00:54:58.211
What would you use that machinery for?

00:54:58.751 --> 00:55:07.611
And so I think the conclusion is that lots of the forward modeling.

00:55:08.531 --> 00:55:14.871
That is necessary for survival is done,

00:55:14.931 --> 00:55:18.511
is offloaded, or is done under the participation with the cerebellum.

00:55:18.571 --> 00:55:25.151
So it is, if you like, our predictive machinery that communicates then with

00:55:25.151 --> 00:55:31.291
the different sensory and motor output areas.

00:55:31.771 --> 00:55:33.071
Now, this was a funny one.

00:55:33.791 --> 00:55:39.251
The Cervelo projections are mainly going to frontal parts of the cortex, almost densely.

00:55:39.491 --> 00:55:46.891
So then to get that signal back into V1, we can take some cascade of recurrent projections.

00:55:47.311 --> 00:55:53.171
I thought so. that there are some I think there are some direct connections

00:55:53.171 --> 00:55:56.291
to visuals so there's some visual.

00:55:57.031 --> 00:56:01.531
Cerebellum connections I'm not entirely sure if they go to V1 or to exostride

00:56:01.531 --> 00:56:07.931
areas and so on but I think there must be something like a visual sub-part of

00:56:07.931 --> 00:56:16.111
cerebellum Sure Okay so the last part of your talk talk,

00:56:16.111 --> 00:56:21.791
went a little bit more in the direction of a more computational understanding

00:56:21.791 --> 00:56:26.711
of what this whole system might be doing, right?

00:56:26.791 --> 00:56:36.091
So the bottom line would be Cortex builds hierarchies of internal models that

00:56:36.091 --> 00:56:39.471
are coupled through prediction errors.

00:56:40.825 --> 00:56:43.945
Um and so that's

00:56:43.945 --> 00:56:47.225
the starting that was a starting point and that's been a great heuristic for

00:56:47.225 --> 00:56:50.565
you to to perform some some really fantastic experiments that

00:56:50.565 --> 00:56:53.665
also led to to new insights but new insights would

00:56:53.665 --> 00:56:59.125
also allow us to to reformulate the model right so so if you would have to sort

00:56:59.125 --> 00:57:02.905
of crystallize that that current view what would be your your current summary

00:57:02.905 --> 00:57:09.285
of of the model or the theory of of neocortex how should you think about neocortex

00:57:09.285 --> 00:57:11.965
given the data that you have in your hands?

00:57:15.765 --> 00:57:25.445
I think that, of course, one of the big struggles is to find the right level

00:57:25.445 --> 00:57:28.685
of abstraction in answering your question in the description,

00:57:28.905 --> 00:57:30.525
right? How many units do we need?

00:57:30.645 --> 00:57:34.665
What kind of abstraction is useful for what kind of understanding?

00:57:34.665 --> 00:57:41.205
Understanding we're still on we neuroscience is a data rich um science with

00:57:41.205 --> 00:57:47.545
relatively little theories um at least that's what some of the criticism um

00:57:47.545 --> 00:57:50.625
says about neuroscience so um,

00:57:51.185 --> 00:57:56.805
and we haven't this is a question we haven't fully resolved which level of description

00:57:56.805 --> 00:58:02.045
do Do we need spiking neurons explanation? Do we need principles?

00:58:02.865 --> 00:58:10.505
And I think this predictive processing has some hypothesis,

00:58:10.725 --> 00:58:15.885
some units, which could be in single neurons or microcircuits,

00:58:16.065 --> 00:58:21.585
which is the description of prediction errors, for example, and important currency, I think.

00:58:21.585 --> 00:58:26.745
And also the kind of um.

00:58:28.951 --> 00:58:32.051
Population code of internal models so

00:58:32.051 --> 00:58:35.631
some for more complex uh human

00:58:35.631 --> 00:58:38.451
tasks it is a it is extremely important to be

00:58:38.451 --> 00:58:42.111
able to simulate an counterfactual situation

00:58:42.111 --> 00:58:46.231
to plan behavior to plan a interaction

00:58:46.231 --> 00:58:49.411
or to plan a career we can think about

00:58:49.411 --> 00:58:53.931
i mean there are many very fascinating explain you

00:58:53.931 --> 00:58:57.071
know how how can how can you explain that a

00:58:57.071 --> 00:59:03.371
biological system is capable of committing suicide how how did evolution make

00:59:03.371 --> 00:59:09.471
it possible that you can you know you create an internal model that is somehow

00:59:09.471 --> 00:59:13.931
rewarding that you simulate and say that makes sense and then commit suicide which

00:59:14.011 --> 00:59:19.191
in biology is extremely unbiological,

00:59:19.371 --> 00:59:19.691
right?

00:59:20.191 --> 00:59:31.511
So there is only a few components that you need to be able to explain how internal

00:59:31.511 --> 00:59:37.131
models can create a good description about external facts, right?

00:59:37.251 --> 00:59:40.511
So deep encoding works well in the visual system, for example,

00:59:40.511 --> 00:59:49.771
to now label visual scenes and learn the different objects by coming up with

00:59:49.771 --> 00:59:54.171
a condensed description,

00:59:54.371 --> 01:00:01.651
but just by learning an incredible amount of visual stimuli and trying to extract

01:00:01.651 --> 01:00:03.651
the most essential features.

01:00:04.011 --> 01:00:07.851
So I think there are some of those components,

01:00:08.311 --> 01:00:19.031
that are sufficient to come up with a hierarchy of extracted features by using

01:00:19.031 --> 01:00:24.211
prediction error minimization.

01:00:25.747 --> 01:00:29.367
But I think what's unresolved so far, what I want to look into future,

01:00:29.567 --> 01:00:36.667
if that was your question, is the kind of, how do you lift up this internal

01:00:36.667 --> 01:00:40.447
models that are totally, that are then totally connected and well describing

01:00:40.447 --> 01:00:43.147
the environment to something that is alternative,

01:00:43.327 --> 01:00:52.167
that you can think, plan alternative to the currently existing outside world.

01:00:52.227 --> 01:00:55.387
I was looking for an alternative to the classical model.

01:00:55.747 --> 01:01:01.227
I don't really hear it yet, but Tony. Well, you mentioned the deep convolution

01:01:01.227 --> 01:01:03.747
neural nets a couple of times in your talk and just now.

01:01:04.227 --> 01:01:09.907
And, you know, sort of Paul and I are both interested in putting models of the

01:01:09.907 --> 01:01:12.847
brain into robots and getting them to do tasks in real time.

01:01:13.847 --> 01:01:18.207
So, and I take from some of the things you're saying that maybe there's some

01:01:18.207 --> 01:01:22.307
useful mileage to be had in using these convolution neural networks works as

01:01:22.307 --> 01:01:27.407
an approximation to what the visual pathways might be doing.

01:01:27.707 --> 01:01:32.587
But, I mean, say we get that working in some sort of first pass,

01:01:32.787 --> 01:01:37.107
what would be the things that we would be missing out on and what would we want

01:01:37.107 --> 01:01:39.267
to add to make this a more realistic model?

01:01:39.807 --> 01:01:45.627
So I think the flexibility, the

01:01:45.627 --> 01:01:56.987
adaptation to new environments is something that is a potential of expanding

01:01:56.987 --> 01:02:05.267
deep encoding neural networks that are feed-forward networks by expanding to recurrent looping.

01:02:05.927 --> 01:02:11.387
Might be able to make them even more clever.

01:02:11.387 --> 01:02:21.727
To, let's say, the context-dependent amplification and de-amplification is something

01:02:21.727 --> 01:02:24.147
that could potentially,

01:02:24.467 --> 01:02:32.107
once we find the right architecture, make deep encoding networks more flexible

01:02:32.107 --> 01:02:33.547
to adapt to new environments.

01:02:34.087 --> 01:02:37.367
So this is, I mean, you're more

01:02:37.367 --> 01:02:41.787
the experts in this, and I'm sure this is something that's widely debated.

01:02:42.147 --> 01:02:46.707
From our research, we just see the great difference between.

01:02:48.649 --> 01:02:53.129
The big success story of deep encoding networks, convolutional networks,

01:02:54.149 --> 01:03:00.069
that are very similar to the hierarchy in the visual system.

01:03:00.249 --> 01:03:01.769
However, they're both feed-forward.

01:03:02.809 --> 01:03:05.989
So they're feed-forward processing architecture.

01:03:07.889 --> 01:03:14.369
And the only thing that's fed back is the error signal.

01:03:14.369 --> 01:03:30.249
But if we take our research serious and add a feedback cascading network to the feedforward,

01:03:30.509 --> 01:03:35.509
then what we get, for example, is a network that can fill in occluded images,

01:03:35.789 --> 01:03:41.009
which is one of the examples that we have used in our lab, in which we see that

01:03:41.009 --> 01:03:49.189
humans have line drawings fitted in by their visual cortex and our deep encoding networks,

01:03:50.049 --> 01:03:57.809
connected to an autoencoder, so a U-shaped network, is able to then make predictions

01:03:57.809 --> 01:04:00.289
about occluded objects.

01:04:02.134 --> 01:04:05.314
Images that are currently not in sight.

01:04:05.634 --> 01:04:11.534
And so that could be an architecture that is also.

01:04:14.854 --> 01:04:19.334
If you like, an active blackboard of predictions, what is happening when something

01:04:19.334 --> 01:04:25.554
disappears over time and makes the world enriches, so to say,

01:04:25.654 --> 01:04:28.194
the environment in which the network is working.

01:04:28.194 --> 01:04:33.974
But wouldn't that imply that we would need much more brain volume with our skull

01:04:33.974 --> 01:04:39.174
and the size of a Skippy ball because for those models, if you want to bring

01:04:39.174 --> 01:04:42.274
in any kind of invariance, scale, rotation, position,

01:04:42.614 --> 01:04:45.234
you have to duplicate your wires, right?

01:04:45.374 --> 01:04:48.714
So you would run out of wires very quickly. So would it scale?

01:04:48.934 --> 01:04:55.054
Would that approach really scale, you think? I mean what we have done for the

01:04:55.054 --> 01:04:58.494
U-shape network is just doubling the network right it's a,

01:04:59.434 --> 01:05:06.694
so after the conversion it diverges and has lateral connections and to reconstruct

01:05:06.694 --> 01:05:15.174
the images and that makes it to if you like an active blackboard architecture.

01:05:17.234 --> 01:05:21.314
Nico has done it in a slightly

01:05:21.314 --> 01:05:24.714
different way used object recognition task

01:05:24.714 --> 01:05:27.754
in a network that is either feed

01:05:27.754 --> 01:05:30.794
forward or has also lateral and

01:05:30.794 --> 01:05:33.994
feedback connections this blt network and could

01:05:33.994 --> 01:05:37.574
show that this kind of network architecture um

01:05:37.574 --> 01:05:46.654
is better in recognizing overlaid numbers cluttered scenes and so on and that

01:05:46.654 --> 01:05:51.774
architecture came from kind of joint discussions we had about you know the question

01:05:51.774 --> 01:05:57.154
how can we how would you add what would feedback you know,

01:05:58.066 --> 01:06:02.066
add to your feedforward networks? How could it improve that?

01:06:02.106 --> 01:06:06.526
How could we design experiments that challenge the current recognition system?

01:06:06.926 --> 01:06:11.726
And so cluttered scenes and overlaps was one of those examples that he tried.

01:06:12.846 --> 01:06:19.586
And I think, you know, the Alex Neck kind of architecture had something like

01:06:19.586 --> 01:06:24.766
88% correct or more, or 95% correct classification.

01:06:24.766 --> 01:06:32.786
And then his BLT, bottom-up, lateral, and top-down network,

01:06:33.786 --> 01:06:39.766
was a few percentage better or part of percentage or something significantly.

01:06:40.486 --> 01:06:46.226
So it's small, but you need to find new tasks and new challenges that those

01:06:46.226 --> 01:06:52.966
networks can play out and maybe their potential because they're already in object recognition.

01:06:52.966 --> 01:06:56.506
That seems to be a field where they are ceiling now.

01:06:59.246 --> 01:07:03.866
So coming back to the point, it might be that they are now more flexible,

01:07:03.986 --> 01:07:09.866
maybe there are advantages in training rates or something. I'm not an expert.

01:07:10.726 --> 01:07:15.806
So we need to add in these sort of top-down cascades and we can get some improvement

01:07:15.806 --> 01:07:23.346
and presumably some sharpening of the representations in B1 I think one of the

01:07:23.346 --> 01:07:24.666
times you said in your talk,

01:07:24.846 --> 01:07:27.546
you know, you usually think of V1 as the bottom of the hierarchy,

01:07:27.626 --> 01:07:33.226
but in a sense, it's much higher up because the information ascends and comes back down.

01:07:33.286 --> 01:07:39.666
So you're reconstructing this high resolution version of the scene. But I mean, the...

01:07:41.024 --> 01:07:48.164
You also talked about how we can apply machine learning algorithms or support

01:07:48.164 --> 01:07:51.924
vector machines to read out from the brain to what the brain is actually seeing.

01:07:52.144 --> 01:07:59.204
So it seems to me that in some ways we're getting back to what really was an

01:07:59.204 --> 01:08:01.884
old idea about the visual system,

01:08:01.984 --> 01:08:07.164
that the brain sort of, or the visual brain represents its best

01:08:07.284 --> 01:08:11.504
guess at what's in the world in some almost literal sense.

01:08:11.804 --> 01:08:18.504
The whole idea of the movie screen in the head, which was dismissed by everybody as naive.

01:08:18.724 --> 01:08:22.964
But with these kind of approaches, with this notion that you're getting this

01:08:22.964 --> 01:08:26.564
high-resolution map which is tuned by your predictions,

01:08:27.144 --> 01:08:33.264
is there in some sense something like a representation in V1 of your best guess

01:08:33.264 --> 01:08:36.064
of the high-resolution scene as we know it?

01:08:39.384 --> 01:08:42.304
Um it's an interesting question right it's it's

01:08:42.304 --> 01:08:49.484
something it it does make sense it's intuitive and in another sense it it seems

01:08:49.484 --> 01:08:55.644
a bit um homunculus um it seems like the screen that yeah as you said you you

01:08:55.644 --> 01:08:59.644
you can laugh about but you open the door but I told you I did,

01:09:00.464 --> 01:09:02.224
I did, but this is totally improved.

01:09:03.864 --> 01:09:10.144
I know. So I think, yes, we've won now looks to us as if you, um,

01:09:10.744 --> 01:09:18.044
um, you have kind of mental line drawings, um, uh, put together and we want,

01:09:18.084 --> 01:09:20.844
and maybe that's, that is, is, is a bit the language of V1.

01:09:20.984 --> 01:09:25.944
And I mean, it's an incredible story. I think if you look back, Peter, um.

01:09:30.052 --> 01:09:38.172
Patrick Kavanagh talks about this 10,000 years of neuroscience by describing

01:09:38.172 --> 01:09:44.212
the, if you look at the line drawings in caves,

01:09:45.172 --> 01:09:49.512
that is a form of communication that works.

01:09:49.792 --> 01:09:56.252
And that's surprising because line drawings work even though line drawings don't exist in the world.

01:09:56.252 --> 01:10:01.452
What we see or what's out there are textures and texture borders,

01:10:01.672 --> 01:10:08.532
but from a texture border to come to a line drawing is an abstraction that seems

01:10:08.532 --> 01:10:11.992
to work incredibly good for our visual system.

01:10:12.412 --> 01:10:16.892
And since we were totally surprised that if we asked the subjects to fill in

01:10:16.892 --> 01:10:19.092
the missing information, they came up with the same line drawing.

01:10:19.092 --> 01:10:23.192
And maybe there's a clue because um we find

01:10:23.192 --> 01:10:26.032
a communication that works it's so proved it's so

01:10:26.032 --> 01:10:29.272
error prone uh proven

01:10:29.272 --> 01:10:32.832
so so um you know if you

01:10:32.832 --> 01:10:35.852
you try to do an error a line drawing of an

01:10:35.852 --> 01:10:38.632
animal and um you're not happy with it

01:10:38.632 --> 01:10:41.392
you optimize it until you're happy and you think

01:10:41.392 --> 01:10:44.492
this is a good illustration of a of an

01:10:44.492 --> 01:10:47.892
um deer or

01:10:47.892 --> 01:10:51.212
so and then you present that to someone else and

01:10:51.212 --> 01:10:57.032
the value the quality and the the goodness of fit to their internal model is

01:10:57.032 --> 01:11:01.312
so immediate because we have the same kind of visual system and that opens up

01:11:01.312 --> 01:11:05.232
a communication if we would open up this kind of communication with monkeys

01:11:05.232 --> 01:11:09.552
you know they could start drawing out their internal models and you You know,

01:11:09.612 --> 01:11:10.272
that's interesting, right?

01:11:10.292 --> 01:11:13.112
This is a speculation we could have had in the 60s by saying,

01:11:13.172 --> 01:11:18.732
oh, the language of V1 are contrast boundary contours, right?

01:11:18.852 --> 01:11:22.572
So in some sense, what's interesting about this, as opposed to having sort of

01:11:22.572 --> 01:11:28.472
complex predictive hierarchy, maybe you need the majority of machinery of vision just sits in V1.

01:11:28.632 --> 01:11:31.672
And what these hierarchies are doing is basically, if you want,

01:11:31.672 --> 01:11:34.892
just in a very coarse way, modulating this process in V1.

01:11:35.032 --> 01:11:38.852
So the whole hierarchy is now collapsing into V1, essentially.

01:11:39.432 --> 01:11:42.652
And all the rest does is say, well, let's ignore these bits, right?

01:11:42.772 --> 01:11:46.932
Or, okay, let's attend more to that part. Without telling it precisely what

01:11:46.932 --> 01:11:51.252
it should be seeing, maybe that's sort of a mistake people make.

01:11:51.312 --> 01:11:55.312
What about the predictive model is therefore maybe not telling you exactly,

01:11:55.452 --> 01:11:57.832
you should see this texture in this position.

01:11:58.152 --> 01:12:01.212
It may be the prediction just tells you, look, there's something interesting

01:12:01.212 --> 01:12:04.572
there. Check it out, right? Yeah. Can you buy that? note?

01:12:06.192 --> 01:12:09.232
Um, yeah, I mean, there's, like...

01:12:10.598 --> 01:12:13.658
What are we also experts in is face processing, right?

01:12:13.818 --> 01:12:21.978
So if you can… Face with pH or… Oh, face with… No, the emotion.

01:12:22.018 --> 01:12:27.158
We can read out the emotion and the gender out of faces and we can see kinship

01:12:27.158 --> 01:12:31.458
relations and all these kind of incredible tasks in faces.

01:12:31.538 --> 01:12:37.098
Now, if you imagine the face of persons that you know very well,

01:12:37.098 --> 01:12:40.398
um your line drawings aren't

01:12:40.398 --> 01:12:43.398
very precise and and and it's it needs a

01:12:43.398 --> 01:12:46.658
lot of training to be good at that to to do the

01:12:46.658 --> 01:12:58.938
um happy phase of um your wife or um you know so from this but we see we have

01:12:58.938 --> 01:13:04.038
done experiment in which we see that the um encoding of emotions and gender

01:13:04.138 --> 01:13:05.938
discrimination makes differences in V1.

01:13:06.098 --> 01:13:12.398
So whether you have the task to recognize gender or to recognize the emotional expression.

01:13:12.998 --> 01:13:18.618
We mapped out retrotopic spaces around mouth and eyes, but you can see a feedback

01:13:18.618 --> 01:13:21.238
signal in the other parts that are still having a signature,

01:13:21.318 --> 01:13:23.718
whether it is happy or fearful and so on.

01:13:24.178 --> 01:13:30.318
So this, I think, doesn't speak to a feedback signal that is very precise,

01:13:30.478 --> 01:13:33.598
let's say, like a line drawing of a precise emotion of a face.

01:13:34.038 --> 01:13:41.958
But it's like a marker, a token that says, here's a face, should be happy,

01:13:42.178 --> 01:13:48.898
you know very well your daughter, and that is a prediction.

01:13:49.098 --> 01:13:54.498
Now, that is a good template. that if then the stimulus comes up and there's

01:13:54.498 --> 01:14:00.558
a change in the facial expression you have FFA and other the face network to

01:14:00.558 --> 01:14:06.318
work out the differences between your prediction and the actual input but the.

01:14:07.735 --> 01:14:14.355
Token, the marking of where something is expected, is negotiated with higher

01:14:14.355 --> 01:14:15.395
and earlier visual areas.

01:14:15.515 --> 01:14:20.675
So I think the idea that I triggered of a cave drawing, in a way,

01:14:21.315 --> 01:14:26.475
B1 is something like active blackboard, which works like line drawings,

01:14:27.175 --> 01:14:30.095
and tokens, maybe tokens adjusted to that, where you say, oh,

01:14:30.135 --> 01:14:33.075
bad animal, and here is a phase of something.

01:14:33.195 --> 01:14:37.555
By this token, this is a particular B1. No, no, not the content.

01:14:37.675 --> 01:14:42.995
More like, you know, we open up a channel that we speak to each other.

01:14:43.095 --> 01:14:47.315
So you need to be a channel of high spatial frequencies because you're now interested

01:14:47.315 --> 01:14:49.395
in the emotion around the mouth.

01:14:49.475 --> 01:14:52.275
And that is interesting in this part of V1.

01:14:52.455 --> 01:14:54.895
V1 doesn't know anything about those phases.

01:14:55.155 --> 01:14:59.395
But the token is in the communication and higher.

01:14:59.555 --> 01:15:03.655
I thought you put a token also inside V1. So that sounded confusing to me.

01:15:03.715 --> 01:15:05.455
Like this, I get it. Absolutely. Yeah.

01:15:05.855 --> 01:15:08.855
That makes sense, because then we have very parsimonious linking,

01:15:09.115 --> 01:15:15.615
but then it's not necessarily defined along this notion of hierarchies of foreign models.

01:15:15.835 --> 01:15:18.475
It's a rather different kind of framework we're then in.

01:15:20.115 --> 01:15:22.995
Well we're talking about the generative capacity of the brain

01:15:22.995 --> 01:15:25.855
now and how that can uh you know

01:15:25.855 --> 01:15:29.875
be a tool for thought you know whether you're imagination and

01:15:29.875 --> 01:15:35.775
dreaming all of these processes where there's no visual input uh you can reconstruct

01:15:35.775 --> 01:15:41.195
activity throughout the visual hierarchy and including in v1 if you want to

01:15:41.195 --> 01:15:45.255
think about the details of say a line drawing you can imagine and that might

01:15:45.255 --> 01:15:48.475
require or involve activity in V1,

01:15:48.615 --> 01:15:52.095
which then sets off cascades of activity elsewhere in the brain.

01:15:52.255 --> 01:15:57.355
So we don't have to invoke a homunculus to see why it's worth reconstructing

01:15:57.355 --> 01:16:00.955
the visual scene at the V1 level. Look, I appreciate that.

01:16:02.360 --> 01:16:07.820
Speaking up for Lars now. No, I'm not. This is very much my own idea about what

01:16:07.820 --> 01:16:11.600
we might, why V1 might operate in this way.

01:16:11.740 --> 01:16:15.320
Look, I agree with this. I don't have a problem with that.

01:16:15.500 --> 01:16:18.980
Okay. My remark with the criticism was a bit different.

01:16:19.080 --> 01:16:24.120
Like, we started with a rather explicitly defined notion of prediction hierarchies,

01:16:24.220 --> 01:16:27.520
which are all sort of convergent to something like a Kalman filter.

01:16:28.020 --> 01:16:32.640
It's very explicitly defined and it really dictates to you what should happen

01:16:32.640 --> 01:16:34.360
in these cascades of interactions.

01:16:34.740 --> 01:16:38.480
And what I think we've seen at the end of the discussion, that there are cascades

01:16:38.480 --> 01:16:40.280
of interaction, but they might

01:16:40.280 --> 01:16:45.540
not follow this very restricted view on hierarchies of predictive filters.

01:16:46.240 --> 01:16:50.360
We might have to open up that perspective more. Like how Lars now describes

01:16:50.360 --> 01:16:54.100
the idea of having a higher level area that's a very abstract token type representation

01:16:54.100 --> 01:16:59.300
of something and it seeks information in preceding areas is not necessarily

01:16:59.300 --> 01:17:01.300
following this idea of prediction hierarchies.

01:17:01.320 --> 01:17:04.280
Maybe it just looks for confirmation, doesn't give a damn about errors, right?

01:17:04.780 --> 01:17:08.160
And that's how you can hallucinate things because you don't care about errors.

01:17:08.360 --> 01:17:12.280
You just care about confirmation about whatever stuff you believe in higher areas.

01:17:12.480 --> 01:17:16.760
So that was one of my challenges. And maybe the data and discussion is leading

01:17:16.760 --> 01:17:18.560
us to the point that we should open up the perspective.

01:17:18.720 --> 01:17:24.160
These are not just strictly defined prediction hierarchies. There's not enough data to support that.

01:17:24.300 --> 01:17:29.300
And maybe this model of token representations are maybe an alternative that

01:17:29.300 --> 01:17:32.080
is richer and maybe also closer to data.

01:17:32.140 --> 01:17:34.320
That was basically what I was saying.

01:17:34.680 --> 01:17:39.640
So are you happy with that, Tony? Or do you think I'm sort of misconstruing the discourse now?

01:17:39.760 --> 01:17:42.160
I think you're pushing it. Which I happily do most of the time.

01:17:42.280 --> 01:17:45.100
You're very much pushing it in a certain direction which I don't think Lars

01:17:45.100 --> 01:17:47.760
was necessarily agreeing with. But he was not mishearing.

01:17:48.520 --> 01:17:52.260
Maybe he's just tired. I think that's probably the case and he has a plane to

01:17:52.260 --> 01:17:54.000
catch. You've drowned him down.

01:17:54.920 --> 01:17:59.280
Good, you see? And Gern wins today again.

01:17:59.700 --> 01:18:05.420
So, Mars, now, before you can escape, there's two last hurdles you have to take.

01:18:05.880 --> 01:18:09.220
The first one is, though, this is not easy territory.

01:18:10.400 --> 01:18:13.660
It's really involved experimental methods.

01:18:14.240 --> 01:18:18.680
It's linked to theory. It's very precise. It's hard work, right?

01:18:18.740 --> 01:18:20.100
This stuff doesn't come for free.

01:18:20.800 --> 01:18:24.800
So, I'm sure you have plenty of scars, being in this domain.

01:18:25.280 --> 01:18:31.380
So if we would like to follow your route to try to understand the brain, what is Lars' law?

01:18:31.580 --> 01:18:35.400
What is Lars' law that we should follow to understand the brain?

01:18:38.010 --> 01:18:42.290
Um okay so

01:18:42.290 --> 01:18:46.930
of course there is

01:18:46.930 --> 01:18:50.110
a kind of learning of

01:18:50.110 --> 01:18:53.550
the tools that we try

01:18:53.550 --> 01:18:57.570
to sharpen and to get better to explore

01:18:57.570 --> 01:19:00.690
so what we bring to the table and what

01:19:00.690 --> 01:19:03.350
every neuroscience labs brings to the a table or a

01:19:03.350 --> 01:19:08.110
different you know background is um expertise

01:19:08.110 --> 01:19:11.250
with the tools and um making

01:19:11.250 --> 01:19:13.910
them sharper being able to to explore more and more

01:19:13.910 --> 01:19:19.470
um cutting edge for example we do the ultra high resolution to layer specific

01:19:19.470 --> 01:19:26.870
fmi um and using this uh techniques of brain reading for non-stimulated areas

01:19:26.870 --> 01:19:31.510
this is something that we learned over time that is possible and it's there's

01:19:31.510 --> 01:19:33.210
a lot of methodological kind of question.

01:19:33.910 --> 01:19:39.010
Now, once you have this kind of toolset that you think is very good to explore,

01:19:39.790 --> 01:19:46.830
with vector topic mapping and individual subject analysis and all this, then it's important to,

01:19:49.670 --> 01:19:50.350
find.

01:19:54.270 --> 01:19:59.850
Relevant questions and explain them

01:20:00.030 --> 01:20:06.470
in such terminology that your corner field becomes bigger and wider and you

01:20:06.470 --> 01:20:11.990
find more and more overlap with other labs and…,

01:20:12.770 --> 01:20:15.890
This is a law. We have to be able to put it on.

01:20:16.250 --> 01:20:19.970
I don't know what you mean by law. I kind of understood what's my method.

01:20:20.170 --> 01:20:21.850
The law of effect, right?

01:20:25.191 --> 01:20:31.871
Or the law of how to do science. I'm talking more about how to do science.

01:20:32.151 --> 01:20:41.171
I think there's a lot that we start to learn by doing multidisciplinary.

01:20:41.991 --> 01:20:45.111
Multiscale, multi-method science, which

01:20:45.111 --> 01:20:50.911
is a very difficult thing to do because everyone is leaving a little bit of

01:20:50.911 --> 01:20:56.791
their comfort zone and a little bit of terminology that is very well and precisely

01:20:56.791 --> 01:21:03.631
defined to open up to other fields in which different terminology is applied

01:21:03.631 --> 01:21:04.811
and different problems.

01:21:05.011 --> 01:21:11.111
But by leaving a little bit of this comfort zone, the more we do that,

01:21:11.211 --> 01:21:14.571
the more we find overlap, the bigger questions we can tackle.

01:21:15.311 --> 01:21:20.311
It's something that certainly we have all experienced in the Human Brain Project

01:21:20.311 --> 01:21:27.091
where we come together and start discussing our different approaches at which

01:21:27.091 --> 01:21:29.771
you see incredible work,

01:21:29.931 --> 01:21:34.831
but it is when the incredible works of different labs come together and find

01:21:34.831 --> 01:21:38.091
common language that you trigger an enormous amount of synergy.

01:21:38.551 --> 01:21:44.471
And that's something that's very hard work and it's very difficult and not so

01:21:44.471 --> 01:21:47.391
comfortable and it's something very, very important in science.

01:21:47.391 --> 01:21:51.091
I mean, I'm sometimes surprised when you, for whatever reason,

01:21:51.191 --> 01:21:54.671
step outside of your field and experience some other science,

01:21:54.891 --> 01:21:58.311
be it during a review process, during a conference.

01:21:58.371 --> 01:22:05.591
Usually a conference you wouldn't usually go to, and you realize how fragmented science is.

01:22:05.591 --> 01:22:13.291
And so it is important to converge, to find bigger stories, to not fight only

01:22:13.291 --> 01:22:15.811
the corner to sharpen your methods,

01:22:15.971 --> 01:22:20.611
but also to step outside and get the bigger picture, how it converges and how it fits together.

01:22:20.771 --> 01:22:27.011
And there's a lot of synergy for many parts of science where I find it extremely exciting. Mm-hmm.

01:22:27.640 --> 01:22:32.580
There's a bit in the trend of Ortega Gazette, right? Who spoke of counteracting

01:22:32.580 --> 01:22:34.660
the barbarism of specialization.

01:22:34.900 --> 01:22:37.740
So something he was saying, don't get sucked into the specialization,

01:22:37.820 --> 01:22:40.680
keep an open mind, right? Look at the other surrounding domains.

01:22:41.160 --> 01:22:47.680
Yes, that's part of it. I'm also a fan of using your methods very precisely.

01:22:47.820 --> 01:22:56.060
I think brain imaging has been a bit scarred by rapid exploitation of stories.

01:22:56.060 --> 01:22:58.800
And sometimes it has a bad image.

01:22:58.980 --> 01:23:04.720
People have seen and believed images because they were on the brain and they

01:23:04.720 --> 01:23:10.300
looked so scientifically and the stories connected to those were not always

01:23:10.300 --> 01:23:13.180
linked to very hard and precise signs.

01:23:13.180 --> 01:23:17.560
And so I think it's very important to gain back some trust there.

01:23:17.680 --> 01:23:23.760
And, you know, I think we push the limits in this a lot in our lab. It's a lot of work.

01:23:23.860 --> 01:23:27.480
And I think others are doing this in their field as well.

01:23:27.480 --> 01:23:32.040
But then there's a lot of reward by opening up a little bit and seeing these

01:23:32.040 --> 01:23:37.500
discussions over dinner and lunchtime when everyone is suddenly starting to

01:23:37.500 --> 01:23:41.360
discuss other things outside of their scientific expertise.

01:23:42.440 --> 01:23:46.720
There's a lot of, as I said, synergy and new ideas and converging ideas, I think.

01:23:46.980 --> 01:23:49.380
Especially now that we're able to have a meringue that might reflect activity

01:23:49.380 --> 01:23:50.800
of estrocytes instead of nerve.

01:23:53.320 --> 01:23:56.740
But then the last question is uh you know

01:23:56.740 --> 01:23:59.600
the last time that tony bought me a beer or paid for

01:23:59.600 --> 01:24:02.680
a beer it's a long time ago and and

01:24:02.680 --> 01:24:05.540
this has to do with the fact that he lives actually in for

01:24:05.540 --> 01:24:11.360
a long time so you have me virtually neighbors right you're in glasgow not sure

01:24:11.360 --> 01:24:14.880
you're over that but for that reason tony will come visit you in four years

01:24:14.880 --> 01:24:18.760
in glasgow assuming We are still there and I'm sure you will be doing great

01:24:18.760 --> 01:24:24.980
work and it's going to just come in a notebook to check whether you have confirmed or falsified,

01:24:25.957 --> 01:24:28.657
a specific hypothesis that you're going to share with us today.

01:24:28.797 --> 01:24:34.377
So what's the hypothesis, the most critical hypothesis in your research program

01:24:34.377 --> 01:24:37.737
that you want to see tested in this four-year timeframe?

01:24:38.697 --> 01:24:42.317
From now on in four years, what I want to see tested… So you're going to have

01:24:42.317 --> 01:24:45.117
a shifting time window from now four years.

01:24:45.337 --> 01:24:47.257
I'm not going to give away my best ideas.

01:24:48.377 --> 01:24:49.957
You just have to make a prediction.

01:24:53.377 --> 01:24:57.617
What will happen in my field or in other fields I think we'll see a lot of,

01:24:58.757 --> 01:25:04.117
he's coming to your lab to check a specific so you're going to say and you will

01:25:04.117 --> 01:25:11.297
come as well and you will see that you'll get more than a beer we are very generous

01:25:11.297 --> 01:25:13.637
in Glasgow, we don't like Edinburgh,

01:25:15.157 --> 01:25:16.937
we don't have the same No,

01:25:18.057 --> 01:25:22.517
no So you're up for a prediction error there. Exactly.

01:25:25.457 --> 01:25:28.577
No, I don't have a good prediction in this.

01:25:28.717 --> 01:25:33.817
I mean, what I'm most intrigued, I think maybe four years is not going to be enough,

01:25:33.917 --> 01:25:45.957
but I want to see the double coding of internal models that are about current situations.

01:25:46.817 --> 01:25:50.377
They are predictive about the scene and

01:25:50.377 --> 01:25:56.577
the mental models about something else being simultaneously processed in our

01:25:56.577 --> 01:26:01.877
brains which we know and we know in which areas more or less but we don't know

01:26:01.877 --> 01:26:08.117
how these codes coexist without interfering with one another now having layer um um.

01:26:09.429 --> 01:26:12.269
Resolution fmi i believe we might

01:26:12.269 --> 01:26:15.749
be able to find different codes simultaneously in

01:26:15.749 --> 01:26:19.349
these different areas in the different layers and i

01:26:19.349 --> 01:26:22.709
have to disagree with the remark that you made earlier that the differences

01:26:22.709 --> 01:26:27.109
in the layers that they there aren't differences in the layers i think they

01:26:27.109 --> 01:26:30.649
are you know i said that it's more gradual there's more gradual yeah if it's

01:26:30.649 --> 01:26:36.749
not as discrete as these models yeah so we have some new data in which we we

01:26:36.749 --> 01:26:39.009
have um visual illusions that are.

01:26:39.429 --> 01:26:42.469
Explained away like motion-induced blindness which you

01:26:42.469 --> 01:26:45.409
see a different code is still there but it's

01:26:45.409 --> 01:26:48.329
not conscious so so there's more to come from this

01:26:48.329 --> 01:26:51.389
but the the double coding of

01:26:51.389 --> 01:26:55.069
mental imagery and visual predictions

01:26:55.069 --> 01:27:01.109
i think is something that i'm i'm interested it must come at some level we we

01:27:01.109 --> 01:27:06.909
and we might find some ways in which we can um see these double codes some Some

01:27:06.909 --> 01:27:11.789
for the internal models that are currently happening and then other ones that are counterfactual.

01:27:11.929 --> 01:27:17.989
And it's something that maybe human fMRI or human research is particularly well-tuned

01:27:17.989 --> 01:27:22.209
to because these kind of experiments are relatively difficult to do in animals.

01:27:22.449 --> 01:27:30.269
You would need to instruct them to do a visual process and simultaneously do a mental imagery task.

01:27:30.509 --> 01:27:35.629
It's not totally impossible, but that's something I think I want to have more

01:27:35.629 --> 01:27:38.549
knowledge about. How do you have these two codes simultaneously?

01:27:38.949 --> 01:27:42.149
That will be a long discussion over a lot of beers. Lars Möckli,

01:27:42.229 --> 01:27:43.429
thank you very much for this conversation.

01:27:44.029 --> 01:27:46.309
Thank you. Thanks for the invitation. It was a pleasure.

01:27:49.209 --> 01:27:54.889
The CSN podcast was produced by the Convergent Science Network of Biometrics

01:27:54.889 --> 01:28:01.369
and Biohybrid Systems, a project funded by the European 7th Research Framework Program.

01:28:02.829 --> 01:28:08.149
For more interviews, recorded lectures, or upcoming conferences in the field

01:28:08.149 --> 01:28:14.389
of biometrics and biohybrid systems, go to csnnetwork.eu.

01:28:14.320 --> 01:28:22.480
Music.

01:28:15.209 --> 01:28:16.569
And thank you for listening.