WEBVTT

00:00:00.017 --> 00:00:05.657
All right. So this is Tony Prescott and Paul Verschure talking with Michael

00:00:05.657 --> 00:00:14.377
Arbib after his presentation at the Barcelona Brain, Cognition and Technology Summer School.

00:00:15.677 --> 00:00:20.737
And we want to revisit some of the main themes that Michael has been talking

00:00:20.737 --> 00:00:25.577
about to us. So, Michael, would you like to give a short summary in a few words?

00:00:25.617 --> 00:00:29.917
So what few of the key messages were in your two lectures?

00:00:32.177 --> 00:00:39.097
Well, for me, the grounding interest has been how vision is related to action.

00:00:39.977 --> 00:00:45.977
And for that, I've been looking at two different approaches and trying to integrate them.

00:00:46.017 --> 00:00:49.437
One is what I call the schema-based approach, is to try and take an overall

00:00:49.437 --> 00:00:54.937
behavior and think about what processes must interact in parallel and distributed

00:00:54.937 --> 00:00:57.997
way with each other to explain that behavior.

00:00:58.437 --> 00:01:04.337
And then that's balanced by how could those schemas, as those units are called,

00:01:04.417 --> 00:01:07.777
play out over particular neural networks of the brain.

00:01:08.617 --> 00:01:13.857
And of course, sometimes the original schema model dies because it's not consistent

00:01:13.857 --> 00:01:17.057
with the available neurophysiology, but there's a loop then of explanation.

00:01:17.057 --> 00:01:25.477
And the other part is that at any time, we, I claim, cannot model every detail of the brain.

00:01:25.777 --> 00:01:31.077
So we're always making selections as to which brain regions we will implicate in our models.

00:01:31.437 --> 00:01:35.957
At what level of detail will we look at those particular brain regions?

00:01:36.217 --> 00:01:41.557
And then as time goes by, we learn which details have to be added, which can be ignored.

00:01:41.557 --> 00:01:47.657
So I looked first at the control of rapid eye movements, saccadic eye movements,

00:01:47.997 --> 00:01:54.957
and stressed that we have below the sort of standard brain, the cortical structures,

00:01:55.417 --> 00:02:00.437
there is the brain stem, the superior colliculus, which can take visual input

00:02:00.437 --> 00:02:01.997
and control these movements.

00:02:01.997 --> 00:02:05.377
But once we get into interesting things like, don't look now,

00:02:05.477 --> 00:02:10.437
but you can look later, or you just heard two noises, look towards the first,

00:02:10.537 --> 00:02:14.757
then towards the second, where you have to bring in memory and sequencing of

00:02:14.757 --> 00:02:16.817
actions, then you have to bring in cortical structures.

00:02:18.502 --> 00:02:21.122
And then you get this balance between the back of the brain,

00:02:21.182 --> 00:02:24.562
the parietal system that seems to be saying, what do I need to pay attention

00:02:24.562 --> 00:02:26.942
to that's relevant to my action?

00:02:27.062 --> 00:02:30.242
And the front of the brain that's saying, well, what actions should I do?

00:02:30.702 --> 00:02:34.122
And then we bring in another part of the brain called the basal ganglia that

00:02:34.122 --> 00:02:40.342
do the scheduling or even scheduling of these actions. So that was the framework there.

00:02:40.482 --> 00:02:45.122
And then I moved on to another system where, again, we have this interaction

00:02:45.122 --> 00:02:52.082
of prefrontal and parietal systems, namely the visual control of hand movements.

00:02:52.962 --> 00:02:58.362
Then I reported that my colleague Giacomo Ruzzolati and his group at Palmer

00:02:58.362 --> 00:03:02.962
had made a discovery that within the premotor area involved in hand movements,

00:03:03.162 --> 00:03:05.042
there was a subset called mirror neurons,

00:03:05.282 --> 00:03:12.962
which had this amazing property that they were active not only during the animal's

00:03:12.962 --> 00:03:19.122
execution of particular hand movements, but also when he recognized other hand movements.

00:03:19.862 --> 00:03:27.302
And then that suddenly got interesting when we turned to human brain imaging

00:03:27.302 --> 00:03:33.242
and said, well, we can't monitor individual mirror neurons in the human the way we can in the monkey,

00:03:33.322 --> 00:03:38.462
but at least we can look for a brain region that lights up in a way that indicates

00:03:38.462 --> 00:03:40.462
it might contain the mirror system.

00:03:40.742 --> 00:03:50.242
This plus anatomical data converged to say that the area of mirror neurons seemed

00:03:50.242 --> 00:03:54.662
to exist in the human brain in what had been thought of as a speech area.

00:03:55.822 --> 00:03:59.122
What speech got to do with recognition of hand movements? Well,

00:03:59.222 --> 00:04:00.662
we know there is sign language.

00:04:01.062 --> 00:04:05.002
Language can exist in the manual domain as well.

00:04:05.122 --> 00:04:09.242
This gets us into what is called the gestural origins of Jim's theory of language

00:04:09.242 --> 00:04:14.062
that maybe, although for most of us speech is predominant, we all use co-speech

00:04:14.062 --> 00:04:17.862
gestures to embellish our speech.

00:04:18.502 --> 00:04:23.862
And so, I outlined a fairly elaborate,

00:04:24.122 --> 00:04:29.042
I would say, network of models rather than a single model for what might have

00:04:29.042 --> 00:04:33.942
been the 11 evolutionary changes from our common ancestor with the monkey 20 million years ago,

00:04:33.982 --> 00:04:38.002
our common ancestor with the chimpanzee of 7 million years ago,

00:04:38.002 --> 00:04:44.902
to build a system where the mirror neurons were still a core system,

00:04:45.002 --> 00:04:46.782
but we'd also gone beyond the mirror.

00:04:46.882 --> 00:04:51.122
How do we get from just recognizing actions to imitating novel actions?

00:04:51.402 --> 00:04:55.662
How can our use of actions to practical effect on objects.

00:04:56.879 --> 00:05:01.159
Provide the basis for pantomime and beginning to use hand movements for communication.

00:05:01.459 --> 00:05:07.359
What social interactions yield a system of conventionalized gestures rather than ad hoc pantomime?

00:05:07.479 --> 00:05:12.459
How does speech come into the picture so that we can move from purely gestural

00:05:12.459 --> 00:05:17.559
proto-sign to a proto-language that is in the spoken domain?

00:05:18.059 --> 00:05:24.619
And so there we got fairly elaborately into the back and forth between what

00:05:24.619 --> 00:05:26.939
happens in in terms of biological evolution,

00:05:27.259 --> 00:05:33.899
opening up new possibilities for brain activity and the cultural historical

00:05:33.899 --> 00:05:38.899
development of the human species which finds new ways of exploiting the brain

00:05:38.899 --> 00:05:41.119
that were not exploited before.

00:05:41.419 --> 00:05:46.739
One of the concluding suggestions was to emphasize the notion that it's probably

00:05:46.739 --> 00:05:53.079
not the case that our brain evolved to give us language in the sense of a big

00:05:53.079 --> 00:05:55.419
lexicon, lots of grammatical rules and so on,

00:05:55.459 --> 00:06:00.839
but it rather evolved to allow us over many tens of millennia to discover more

00:06:00.839 --> 00:06:06.379
and more aspects of what now constitute what we take for granted as part of human language.

00:06:07.790 --> 00:06:11.390
That's a great summary, Michael. It's a good thing I was listening.

00:06:13.830 --> 00:06:21.330
But what was interesting is, what's now the role of schema theory in this second part?

00:06:22.210 --> 00:06:27.210
Should we consider these as two separate proposals, or did the schema theory,

00:06:27.330 --> 00:06:30.770
the idea of schemas that you have been pushing for quite some time,

00:06:30.930 --> 00:06:36.610
give you leverage to look at this mirroring system and to think about language?

00:06:36.610 --> 00:06:39.770
So what's the relationship between these two?

00:06:40.270 --> 00:06:43.410
Okay, well, two parts. When we were looking at the control of hand movements,

00:06:43.510 --> 00:06:48.530
then in fact, the idea that we made a preliminary analysis, what do you have

00:06:48.530 --> 00:06:51.950
to notice about an object to be able to interact with it?

00:06:52.030 --> 00:06:55.610
So there are perceptual schemas for just the shape of the object.

00:06:56.490 --> 00:07:00.090
Because the details of that are going to be very important to the shaping of

00:07:00.090 --> 00:07:04.710
the hand, recognizing the location of the object, very important to how we move

00:07:04.710 --> 00:07:06.410
the arm to get the hand into place.

00:07:07.150 --> 00:07:14.670
So we had, therefore, the shape of the model was in terms of what are the perceptual

00:07:14.670 --> 00:07:19.370
schemas to know how to interact with the object, but also what are the perceptual

00:07:19.370 --> 00:07:20.590
schemas to know about the object.

00:07:21.010 --> 00:07:26.790
We recognize something as a coffee mug, then we can call on knowledge about

00:07:26.790 --> 00:07:28.470
the use of the handle to lift it.

00:07:28.850 --> 00:07:32.210
Whereas if we had a nonsense object, we wouldn't be able to call on those more

00:07:32.210 --> 00:07:37.630
meaningful schemas, and there are some correlates of where in the brain the processes might occur.

00:07:37.830 --> 00:07:44.310
So our big analysis of visual control of grasping is essentially a schema level,

00:07:44.430 --> 00:07:50.250
and then we said we can now go in, because we have recording data from the monkey,

00:07:50.350 --> 00:07:53.570
where we can begin to say, where are those schemas computed?

00:07:53.830 --> 00:07:57.590
How do we have to modify our understanding of those schemas to see how different

00:07:57.590 --> 00:07:59.830
brain regions must interact to support it?

00:08:00.550 --> 00:08:04.150
So in the case of visual control of grasping and then bringing in the mirror

00:08:04.150 --> 00:08:09.210
neurons, well, that's where we had a very solid neurophysiological data to then

00:08:09.210 --> 00:08:11.230
reflect back into the schema-level theory.

00:08:12.535 --> 00:08:17.115
That's part one. But part two is when we turn to language, we essentially have

00:08:17.115 --> 00:08:20.695
zero in the way of cellular data.

00:08:20.875 --> 00:08:27.575
We are at the level of analogies from the monkey brain to the human brain and

00:08:27.575 --> 00:08:29.935
homologies from the monkey brain to the human brain.

00:08:30.795 --> 00:08:35.355
But our other data are just people have a brain lesion, the system doesn't work

00:08:35.355 --> 00:08:39.235
so well, or we do some brain imaging and part of the brain lights up.

00:08:39.295 --> 00:08:42.935
But the trouble with part of the brain lights up is that it's just giving you

00:08:42.935 --> 00:08:48.575
information that one part of the brain is perhaps more active in task A than

00:08:48.575 --> 00:08:52.595
in task B, but it doesn't rule it out as being vital for task B.

00:08:52.595 --> 00:09:01.355
So we now develop, as it were, our informed data analysis is at the level of

00:09:01.355 --> 00:09:03.175
schemas. How are the different processes?

00:09:03.275 --> 00:09:06.435
How do I know what a word is? That's a schema-level description.

00:09:06.555 --> 00:09:10.695
It's not yet a neuron-level description. But what we're hoping to do in future

00:09:10.695 --> 00:09:14.435
is to say that because of what we've learned from the monkey brain,

00:09:14.655 --> 00:09:21.615
we can make informed hypotheses about the circuitry in the human brain for which

00:09:21.615 --> 00:09:25.195
we don't have detailed neurophysiological recordings to come up with better

00:09:25.195 --> 00:09:26.215
and better neural models.

00:09:26.215 --> 00:09:31.475
So in the end, we can render a consistent understanding at the schema level

00:09:31.475 --> 00:09:37.235
and the neural level that embeds our understanding of hand movements for practical

00:09:37.235 --> 00:09:42.195
ends where we can share a lot with other creatures with this refined use of

00:09:42.195 --> 00:09:43.955
language, which is particularly human.

00:09:44.335 --> 00:09:50.455
But would you equate a schema level with a computational level or an algorithmic

00:09:50.455 --> 00:09:56.035
level, or how should I relate these levels of description, these constructs? Okay.

00:09:56.155 --> 00:10:02.655
So if we look at computers, there is a machine language, which is the basic

00:10:02.655 --> 00:10:03.915
language of zeros and ones.

00:10:04.215 --> 00:10:08.395
And then above that, there will be something like an assembly language,

00:10:08.515 --> 00:10:11.955
which allows you to say, well, how can I think of these patterns of zeros and

00:10:11.955 --> 00:10:17.595
ones as recognizing letters or symbols or patches of pixels on a graphic screen?

00:10:17.595 --> 00:10:23.075
But the language that people who program in is a level up from that,

00:10:23.155 --> 00:10:28.315
something like Java or C++, which is using relatively high-level constructs,

00:10:28.315 --> 00:10:32.515
and they don't know actually how that plays out over the hardware.

00:10:32.775 --> 00:10:37.135
And then for most of us, we're at an even higher level where we just have an

00:10:37.135 --> 00:10:40.555
app which somebody else has programmed at that level.

00:10:40.555 --> 00:10:48.555
So schemas are probably describing computation at that level from the high-level

00:10:48.555 --> 00:10:51.855
programming language up to the app hierarchical levels there,

00:10:52.015 --> 00:10:54.355
and then the neurons are the computations.

00:10:55.899 --> 00:10:59.119
They correspond to the machine code in the computer.

00:10:59.239 --> 00:11:06.539
So it's an intermediate to high-level description of how computations occur in the brain. Right.

00:11:07.499 --> 00:11:14.139
But, Tony, so you said that you don't want to model the brain in all this detail.

00:11:14.539 --> 00:11:19.959
Oh, I do, but I know I won't. Okay. So what are your criteria for deciding which

00:11:19.959 --> 00:11:23.159
phenomena in the brain are important for informing the models?

00:11:23.159 --> 00:11:26.619
How do you apply those criteria in the process of modeling?

00:11:26.799 --> 00:11:34.259
Is it that you look for some key aspects of data that you try and switch at

00:11:34.259 --> 00:11:38.839
your model and then extend what you're looking at to try and bring in more phenomena?

00:11:39.199 --> 00:11:44.399
Right. So we now know a lot of details about the synaptic structure of the brain.

00:11:44.479 --> 00:11:49.279
We know a lot of details about how different molecules provide the ability of

00:11:49.279 --> 00:11:55.359
a synapse to take signals from one neuron to another and apply learning rules and so on.

00:11:55.439 --> 00:12:00.999
So one could make a model which basically loses itself in just the details of one synapse.

00:12:01.559 --> 00:12:06.779
Or one could be a little simpler about the synapse and blow a whole supercomputer

00:12:06.779 --> 00:12:09.059
on just a few interacting neurons.

00:12:09.859 --> 00:12:14.959
And for some people, that's the career path. For me, it really is starting from this cognitive level,

00:12:15.039 --> 00:12:18.999
visual control of hand movements, visual perception control

00:12:18.999 --> 00:12:22.179
of action language and so

00:12:22.179 --> 00:12:24.959
there i'm taking a sort of

00:12:24.959 --> 00:12:29.439
survey approach where i say what is known at the neurophysiological level of

00:12:29.439 --> 00:12:34.379
correlates what is known at the psychological level how can i make a preliminary

00:12:34.379 --> 00:12:40.499
model perhaps just using pure schemas to make sense of the psychological data

00:12:40.499 --> 00:12:44.619
now how can i constrain that to meet the neurophysiological Now,

00:12:44.739 --> 00:12:48.599
how can I refine those schemas so that they not only yield the behavior and

00:12:48.599 --> 00:12:54.179
how the behavior is damaged by lesions, but also can give me explanations for

00:12:54.179 --> 00:12:55.859
how individual cells are farmed?

00:12:56.139 --> 00:12:59.259
And then the literature just keeps pouring in.

00:13:01.021 --> 00:13:04.121
Filtering in a perhaps not very intelligent way of saying oh

00:13:04.121 --> 00:13:06.841
here's a new paper that looks really important i have to be

00:13:06.841 --> 00:13:09.701
able to either show my model can explain it

00:13:09.701 --> 00:13:12.781
or expand my model to be able to address those data

00:13:12.781 --> 00:13:16.081
here's something else to somebody to another

00:13:16.081 --> 00:13:19.221
person that might appear very important but i'm finite

00:13:19.221 --> 00:13:21.841
so i'll hope they'll address it and i'll have

00:13:21.841 --> 00:13:25.161
to leave that out so it's a it's opportunistic once

00:13:25.161 --> 00:13:28.081
the first set of big models is in

00:13:28.081 --> 00:13:31.321
place i think so you're interested fit in these sort of cognitive behavioral

00:13:31.321 --> 00:13:35.521
phenomena and your goal is a decomposition of that

00:13:35.521 --> 00:13:39.381
task that you observe the personal monkey doing into computation

00:13:39.381 --> 00:13:44.861
elements that you call schema and you're not necessarily at that first stage

00:13:44.861 --> 00:13:49.181
particularly concerned about mapping the schema onto the brain is that no i

00:13:49.181 --> 00:13:53.441
would say that no i i'm very much engaged in mapping it onto the brain but what

00:13:53.441 --> 00:13:57.841
i'm suggesting is if you just look at the neurons and try to make sense of them.

00:13:58.961 --> 00:14:03.361
You may not succeed so by starting with a hypothesis about what the schemas

00:14:03.361 --> 00:14:07.261
are you're not saying how is this complex thing being able to speak english,

00:14:08.101 --> 00:14:11.261
mapped to neurons you're saying here is

00:14:11.261 --> 00:14:14.401
how do we recognize a particular auditory profile

00:14:14.401 --> 00:14:17.401
as a word for example then that's a tractable problem

00:14:17.401 --> 00:14:23.021
and recognizing the words of a vocabulary would be would be schemas within a

00:14:23.021 --> 00:14:28.161
language understanding system for example so the notion is that you go top-down

00:14:28.161 --> 00:14:32.661
from the psychology and the behavior to negotiate what seemed to be the necessary,

00:14:33.161 --> 00:14:34.621
intermediate-level processes.

00:14:35.321 --> 00:14:44.381
Then you use whatever data are available to say, but I'm not happy as a neurocomputational

00:14:44.381 --> 00:14:48.661
type to just say I've got schemas as abstract computational processes.

00:14:48.861 --> 00:14:53.041
That might be enough if I'm building a robot to say, okay, that's a good architecture for the robot.

00:14:53.281 --> 00:14:56.621
But if I really want to understand the human brain, as indeed I do,

00:14:56.621 --> 00:15:02.161
then I don't rest with the schema analysis if there are data available which

00:15:02.161 --> 00:15:08.021
will let me be more explicit about how plausible neural networks in the brain

00:15:08.021 --> 00:15:11.321
will actually implement those schemas and that means the original schema level

00:15:11.321 --> 00:15:15.721
model may get restructured to accommodate more neural level data.

00:15:15.901 --> 00:15:19.801
In a comment on another talk you mentioned that you weren't entirely happy with

00:15:19.801 --> 00:15:23.821
the split proposed by David Mark which is algorithms and the implementations.

00:15:24.101 --> 00:15:28.241
Your suggestion was that if you want to understand the brain algorithms,

00:15:28.821 --> 00:15:31.361
then you might want to come through the implementation.

00:15:31.761 --> 00:15:35.701
So that's what we're talking about. It's looking at what we can see about the

00:15:35.701 --> 00:15:41.761
implementation in the decomposition of the brain and then say your schema-level

00:15:41.761 --> 00:15:46.081
system sounds like the algorithm, but it is informed by... Right.

00:15:46.181 --> 00:15:50.681
So I think the problem was that a lot of David Maher's writing and a lot of

00:15:50.681 --> 00:15:54.761
the way people quote his statement is the idea I can specify the problem,

00:15:54.921 --> 00:15:58.021
then I can come up with the algorithm, and then I can implement it.

00:15:58.735 --> 00:16:02.635
The point is that if we want to understand the brain, you've really got to look

00:16:02.635 --> 00:16:09.155
at a dynamic loop where, yes, you already understand the behavior and have the top level fixed,

00:16:09.375 --> 00:16:13.995
but the algorithm is going to depend so crucially on whether you're using neural

00:16:13.995 --> 00:16:17.435
nets or serial computers that that's an ongoing loop.

00:16:17.515 --> 00:16:21.075
I will have a schema model as an initial algorithm.

00:16:21.295 --> 00:16:24.715
Then I will see how well I can implement it. The feedback from that may change

00:16:24.715 --> 00:16:27.215
my schema level model so I have a loop of understanding.

00:16:27.215 --> 00:16:33.215
But in that approach, you also were using this concept of causal completeness

00:16:33.215 --> 00:16:37.935
to guide your choices with respect to the constraints you want to consider.

00:16:38.315 --> 00:16:45.855
So, but how complete can you actually be in reality and in defining these kinds of models?

00:16:46.115 --> 00:16:51.035
So, is causal completeness a hope or a reality of building? Oh, no, it's a reality.

00:16:51.275 --> 00:16:55.135
I mean, the point of a computational model is that it's causally complete in

00:16:55.135 --> 00:17:00.495
the sense that when you provide the appropriate input and you build on the appropriate

00:17:00.495 --> 00:17:04.455
memories, you get the observed behavior. behavior.

00:17:05.555 --> 00:17:12.935
But if I'm causally complete with respect to, let's say, an analysis of saccadic

00:17:12.935 --> 00:17:18.055
eye movements, that same model is not going to be causally complete with respect

00:17:18.055 --> 00:17:20.155
to arm movements, let alone language.

00:17:20.435 --> 00:17:24.035
So it's not going to be causally complete for every possible behavior.

00:17:24.195 --> 00:17:27.175
It's causally complete with respect to that behavior. Again,

00:17:27.335 --> 00:17:32.315
the level of description that you start with will determine if I'm looking at

00:17:32.315 --> 00:17:33.655
subtle learning effects,

00:17:33.995 --> 00:17:39.915
then I may have to go back and iterate the model to include data about synaptic

00:17:39.915 --> 00:17:46.235
plasticity to understand the timing of learning so that the causal completeness

00:17:46.235 --> 00:17:49.015
is not saying I have covered everything in the universe.

00:17:49.155 --> 00:17:53.415
The causal completeness is saying that where the experimentalist could just

00:17:53.415 --> 00:17:56.195
go in and say say, I'm monitoring activity in a part of the brain,

00:17:56.275 --> 00:17:59.535
and it correlates with some particular behavior.

00:17:59.755 --> 00:18:04.335
He doesn't have to say how the sensory stimuli got to the point where they could cause that.

00:18:04.635 --> 00:18:09.855
He doesn't have to specify how that activity could get to the muscles to yield the overt behavior.

00:18:10.435 --> 00:18:13.335
He's just saying, here's a fascinating correlate. I have to say.

00:18:14.903 --> 00:18:18.923
The model is causally complete in the sense that if I stimulate my model with

00:18:18.923 --> 00:18:20.843
a representation of the sensory stimuli,

00:18:21.123 --> 00:18:24.763
then my representation of those cells will fire in that way,

00:18:24.823 --> 00:18:30.663
and I have a network which will show how that emerges in the behavior.

00:18:30.843 --> 00:18:35.183
So it's causally complete with respect to the level of description.

00:18:35.183 --> 00:18:40.743
Right, and also then given the assumptions I have made about the primitive elements

00:18:40.743 --> 00:18:45.183
that are playing the key role in my model, that you say, okay,

00:18:45.243 --> 00:18:47.043
below that level, I don't need to go.

00:18:47.583 --> 00:18:53.183
And again, the point I made yesterday was that in some learning models,

00:18:53.403 --> 00:18:57.123
we have the idea that we have an input signal and a training signal,

00:18:57.143 --> 00:19:02.143
which tells you to remember that input or to change your response to that input.

00:19:02.343 --> 00:19:04.783
And a lot of those models are at the event level.

00:19:05.063 --> 00:19:09.223
So you say, here is the input at this event, here is the training signal at

00:19:09.223 --> 00:19:10.303
this event, what's the output?

00:19:10.303 --> 00:19:13.043
And they were saying in some cases in the real brain, though.

00:19:13.623 --> 00:19:22.423
The output might be planning a movement, and the actual work by the muscles

00:19:22.423 --> 00:19:25.403
follows later, and then the observable effect of that result.

00:19:25.483 --> 00:19:30.763
So the training signal might well be 200 milliseconds later than the brain activity.

00:19:30.923 --> 00:19:32.283
And then that forces me to say,

00:19:32.383 --> 00:19:37.223
what is going on in the brain that bridges across that fifth of a second?

00:19:37.223 --> 00:19:42.603
So there I was forced to look at details of synaptic function that were not

00:19:42.603 --> 00:19:46.523
engaged in the initial model, which was just event by event,

00:19:46.663 --> 00:19:49.023
rather than looking at the actual time course of action.

00:19:54.663 --> 00:20:00.343
Coming on to your later work stop eating or lie like a shell so just can you

00:20:00.343 --> 00:20:05.903
just clarify why you moved into this area of the evolution of language from

00:20:05.903 --> 00:20:10.003
trying to understand how the brain implements some of these very important,

00:20:11.403 --> 00:20:16.343
actions and cognitions and perceptions that you've now it's actually a different

00:20:16.343 --> 00:20:18.203
it's not that I moved into it But firstly,

00:20:18.803 --> 00:20:23.903
as a schoolboy, I was very much intrigued by the history of the English language.

00:20:24.823 --> 00:20:31.523
So that has always been an interest. Then in my mid-career at the University

00:20:31.523 --> 00:20:36.603
of Massachusetts in the 80s, I helped found the Cognitive Science Program,

00:20:37.083 --> 00:20:39.763
where I worked with the linguists.

00:20:39.763 --> 00:20:45.103
So therefore, the forging of connections between my work on visual control of

00:20:45.103 --> 00:20:49.323
action and their work on language became a very important topic,

00:20:49.383 --> 00:20:51.823
and we had about four PhD theses on that.

00:20:51.903 --> 00:20:54.403
Then I moved to the University of Southern California in 1986.

00:20:56.836 --> 00:21:01.756
I got busy with other things. And then when Ritz-Zolartes' group discovered

00:21:01.756 --> 00:21:06.496
the mirror neurons in the early 90s, I was already working with that group on

00:21:06.496 --> 00:21:08.136
visual control of hand movements.

00:21:08.616 --> 00:21:15.536
And my group at USC did brain imaging, which established that the activity that

00:21:15.536 --> 00:21:19.976
looked like mirror neurons in the human brain was a broker's area or a speech

00:21:19.976 --> 00:21:23.116
area, which we now understand more as a language area.

00:21:23.356 --> 00:21:28.916
That provided the path after a 10-year break back into the study of language

00:21:28.916 --> 00:21:34.976
because now there was a really strong connection that exploited what I had said before.

00:21:35.336 --> 00:21:42.696
In fact, from 1979 to 1985, I was publishing in the area of models of language.

00:21:43.116 --> 00:21:48.216
The work you're doing now is always anticipating the models.

00:21:48.356 --> 00:21:52.436
The modeling is is pulling behind in some sense, or has it caught up with where

00:21:52.436 --> 00:21:56.556
you are theoretically in terms of your ideas about language evolution?

00:21:57.276 --> 00:22:00.996
Or is that the way it's always been, that you've always had the theories first,

00:22:01.476 --> 00:22:03.176
and then the model followed on?

00:22:04.416 --> 00:22:10.976
No, I think, again, it's a loop that at times you raise questions and form hypotheses,

00:22:11.996 --> 00:22:16.496
and then look for data to test the resulting model.

00:22:16.496 --> 00:22:21.676
At other times, you're confronted with a body of data and you're trying to make sense of it.

00:22:22.116 --> 00:22:29.176
In the case of language, I think the state of neurolinguistics is very fragmentary

00:22:29.176 --> 00:22:30.956
from a computational point of view.

00:22:30.956 --> 00:22:39.116
So here we are establishing a range of models that we hope will begin to fill in the landscape.

00:22:39.916 --> 00:22:43.376
But meanwhile, the overarching model I have is a conceptual model,

00:22:43.436 --> 00:22:44.556
not an implemented model.

00:22:45.556 --> 00:22:50.436
But it is already engaged in many conversations, talking to primatologists.

00:22:50.496 --> 00:22:53.656
What do we know about communication in monkeys and apes?

00:22:54.216 --> 00:22:59.156
How does that make one set of hypotheses more plausible than another at the conceptual level?

00:23:00.956 --> 00:23:04.516
Looking at people working with sign languages,

00:23:06.001 --> 00:23:11.641
How do we change our view of what language is because we realize it's not speech,

00:23:11.901 --> 00:23:13.461
it's a more general capability?

00:23:14.841 --> 00:23:18.281
And then just getting into debates about what is the nature of language.

00:23:18.621 --> 00:23:23.521
We have Noam Chomsky on the one hand looking at syntax as an abstract structure.

00:23:24.341 --> 00:23:27.981
There are other people who are looking at language as a flexible means of communication.

00:23:27.981 --> 00:23:33.921
So I find myself moving over there to saying, how do I capture certain aspects

00:23:33.921 --> 00:23:38.861
of that more action-oriented approach to language that I can now bring back

00:23:38.861 --> 00:23:42.401
to the brain in a way that I think is consistent with my other work on the brain?

00:23:42.581 --> 00:23:48.501
So there's a great deal of work going on to create this framework in which some

00:23:48.501 --> 00:23:53.801
models already exist, but in which we're also defining spaces for future modeling.

00:23:53.801 --> 00:23:57.001
But if I look at this from the outside,

00:23:57.221 --> 00:24:01.721
that does give the impression of a discontinuity from a very much action-oriented

00:24:01.721 --> 00:24:07.901
view with the schemas to this view of the mirror system and language that actually

00:24:07.901 --> 00:24:13.141
seems to start with the primitive element of gestures or any communicative actions.

00:24:13.141 --> 00:24:19.201
So it seems to me that there's a discontinuity between actions as in relation

00:24:19.201 --> 00:24:24.481
to objects in the world and now communicative gestures as the starting point

00:24:24.481 --> 00:24:27.441
of developing language. So how should I relate to these two?

00:24:27.761 --> 00:24:33.341
Well, my joke is that my work on language evolution is to replace one big miracle

00:24:33.341 --> 00:24:34.941
by a series of small miracles.

00:24:34.941 --> 00:24:39.841
And so what I'm trying to say is that getting all the way from a monkey-like

00:24:39.841 --> 00:24:43.781
brain, that's a simplification, but let's say from whatever our common ancestor

00:24:43.781 --> 00:24:47.261
had 20 million years ago to today in one leak is too much.

00:24:47.301 --> 00:24:51.261
But if I can break it into, okay, getting from recognizing other actions.

00:24:52.551 --> 00:24:57.751
As one similar to those they already have, to understanding other actions as

00:24:57.751 --> 00:25:02.431
a means to imitating them, to being able to understand complex actions in terms

00:25:02.431 --> 00:25:04.351
of the structure of goals and movements,

00:25:04.631 --> 00:25:11.391
then these are reasonable miracles in which to address specific modeling, as we are now doing.

00:25:11.391 --> 00:25:20.271
And then again, the transition from the use of these actions for praxis to others

00:25:20.271 --> 00:25:26.131
imitating them to get praxis, to being able to build pantomime is another small miracle.

00:25:26.131 --> 00:25:31.171
And then once I get to pantomime, then the social ritualization of those into

00:25:31.171 --> 00:25:35.091
symbolic gestures is, again, a meaningful step.

00:25:35.311 --> 00:25:39.851
And then once I've got that use of arbitrary gestures for communication,

00:25:40.071 --> 00:25:44.371
bringing the vocal apparatus back into play is a reasonable thing.

00:25:44.371 --> 00:25:47.551
So in some sense, what I'm doing is, as I say, I'm breaking it into miracles

00:25:47.551 --> 00:25:52.111
that are small enough that, yes, they're discontinuities. Evolution is a discontinuity.

00:25:53.011 --> 00:25:58.491
We have forelimbs. Birds have wings. We have a common ancestor.

00:25:58.751 --> 00:26:00.831
So there's always going to be divergence points.

00:26:01.351 --> 00:26:05.891
But the issue is, how can we define that in a way that it becomes plausible

00:26:05.891 --> 00:26:11.051
that a relatively small suite of genetic changes could support that divergence?

00:26:11.051 --> 00:26:18.071
So, at the moment, I'm trying to consolidate data from many different disciplines

00:26:18.071 --> 00:26:24.371
to come up with what I think is a plausible set of bifurcations in our evolutionary history,

00:26:24.491 --> 00:26:26.911
and then to build before and after models.

00:26:26.911 --> 00:26:31.111
And say, this is what the brain was like before, this is what the brain was like after.

00:26:31.291 --> 00:26:32.991
And then hopefully, in the end.

00:26:34.051 --> 00:26:38.271
Evolving work in genetics and molecular biology will catch up and say,

00:26:39.191 --> 00:26:45.531
we can begin to understand the genetic correlates of the changes in neural architecture,

00:26:45.711 --> 00:26:50.691
brain architecture, that you've posited to provide that set of stepping stones

00:26:50.691 --> 00:26:55.151
from the common ancestor of 20 million years ago to the language-ready brain of the human.

00:26:55.151 --> 00:27:04.011
So to get to a conclusion of our short interview, I have two generic questions.

00:27:04.451 --> 00:27:07.771
So you are in this field now for a really long time.

00:27:07.851 --> 00:27:13.791
You trained with some of the heroes of the cybernetic age, Wiener and McCulloch.

00:27:14.991 --> 00:27:20.551
So in your long experience in this field, what's the law of RBEAP that we,

00:27:20.651 --> 00:27:25.011
sort of younger, representatives of the younger generation should take on board

00:27:25.011 --> 00:27:29.151
to actually help us understand the brain and cognition? Yeah.

00:27:31.081 --> 00:27:39.481
Well, I think schema theory has been a very minor theme in the field,

00:27:39.581 --> 00:27:44.141
and I have a certain pride of ownership, and I think it should be a bigger theme.

00:27:44.241 --> 00:27:48.581
The idea that there is a functional decomposition to be placed in conversation

00:27:48.581 --> 00:27:50.041
with a neural decomposition.

00:27:50.741 --> 00:27:53.661
Often we'll get something like, well, I want to look at vision,

00:27:53.701 --> 00:27:59.401
and oh, I want to look at stereo, and then I jump immediately to the neural networks.

00:27:59.401 --> 00:28:06.041
And the idea of thinking about stereo in terms of how it contributes to an overall

00:28:06.041 --> 00:28:13.221
set of interacting schemas for vision in the service of planning behavior gets lost.

00:28:13.221 --> 00:28:20.421
So I think my big lesson is I want people to think more about how to develop

00:28:20.421 --> 00:28:24.581
schema theory as a high-level, if you will, brain programming language,

00:28:24.701 --> 00:28:30.261
which can then be either played out on circuitry for the design of robots or

00:28:30.261 --> 00:28:33.581
put in conversation with lesion data.

00:28:33.721 --> 00:28:38.801
Imaging data, neurophysiological data to try and move into very complex systems

00:28:38.801 --> 00:28:42.021
where approaching it from the level of,

00:28:42.021 --> 00:28:44.941
here's a lot of synapses or here's a lot of neurons is doomed to

00:28:44.941 --> 00:28:47.901
failure and i think as as we go into

00:28:47.901 --> 00:28:52.021
this world of really large integrated

00:28:52.021 --> 00:28:55.081
systems with complex suites of behavior this

00:28:55.081 --> 00:28:58.241
will become more and more necessary than it has

00:28:58.241 --> 00:29:01.081
been in the past okay and then the concluding question

00:29:01.081 --> 00:29:06.141
for me would be um if we're going to meet up again five years from now you know

00:29:06.141 --> 00:29:10.741
in science it's all about prediction so what's what's the prediction i can hold

00:29:10.741 --> 00:29:15.581
you to five years from now whether we can find out whether it was true or false

00:29:15.581 --> 00:29:19.441
what is one prediction you want to stick your neck out for today.

00:29:22.592 --> 00:29:26.252
I don't really have, in the sense of a limited prediction.

00:29:26.512 --> 00:29:30.452
It's rather what I've laid out, especially in my second talk here,

00:29:30.532 --> 00:29:38.992
has been a framework for the study of how the brain supports language that is integrated with,

00:29:39.152 --> 00:29:46.332
despite evolutionary divergences, from the way in which sensory data are processed

00:29:46.332 --> 00:29:50.032
to perceive our relation with the world and move us on.

00:29:50.952 --> 00:29:59.472
It also has emphasized the way that neuroscience has to go from a focus on the

00:29:59.472 --> 00:30:04.232
isolated individual responding to sensory data with a course of action to more

00:30:04.232 --> 00:30:08.312
and more thought about social interactions, a trend that's already started.

00:30:08.992 --> 00:30:11.912
So my prediction is a very wishy-washy prediction.

00:30:11.912 --> 00:30:19.212
It is that five years from now, the framework that I have presented to this

00:30:19.212 --> 00:30:23.352
point will still be seen as correct in its overall structure,

00:30:23.492 --> 00:30:26.712
but there will be a lot of very specific models,

00:30:27.372 --> 00:30:31.652
embedded within that structure, which will change some of the details,

00:30:31.852 --> 00:30:35.012
but not change the overall conceptual framework. Very good.

00:30:35.452 --> 00:30:38.872
Michael Arbique, thank you very much for joining us, and we hope to see you

00:30:38.872 --> 00:30:40.572
back very soon. I look forward to it.