WEBVTT

00:00:03.597 --> 00:00:10.517
This is the Convergent Science Network podcast. Leading researchers in the domain

00:00:10.517 --> 00:00:16.797
of neuroscience, brain theory and technology are interviewed by Paul Verschoor and Tony Prescott.

00:00:19.497 --> 00:00:23.137
This is Paul Verschoor with the Convergent Science Network podcast.

00:00:24.757 --> 00:00:28.997
And today I have a conversation with Dana Ballard, who is also a speaker in

00:00:28.997 --> 00:00:30.337
our summer school here in Barcelona.

00:00:32.257 --> 00:00:36.297
And Dana, in your presentation, which focused very much on vision,

00:00:36.897 --> 00:00:41.457
you started out by making a pretty strong point about why, let's say,

00:00:41.597 --> 00:00:46.697
the standard notion of feed-forward saliency maps actually don't really do the job in vision.

00:00:47.397 --> 00:00:51.877
So what's the problem with sort of the standard notion of a saliency map and

00:00:51.877 --> 00:00:53.197
why is it not sufficient?

00:00:55.017 --> 00:00:58.317
Wow, Paul. Well, first, let me thank you for inviting me here.

00:00:58.397 --> 00:01:01.977
It's a lot of fun, and the establishment is very new and beautiful.

00:01:02.237 --> 00:01:07.917
But as for your question, the problem,

00:01:08.057 --> 00:01:11.617
people who aren't vision scientists take vision for granted,

00:01:11.637 --> 00:01:17.257
and every person has the sensation of vision of being a person on the street

00:01:17.257 --> 00:01:21.617
and being in the middle of a very beautiful three-dimensional movie.

00:01:21.617 --> 00:01:25.057
So the act of seeing is completely taken for granted.

00:01:25.097 --> 00:01:30.817
Whereas, as a scientist, if you start to examine the brain and how vision works,

00:01:31.117 --> 00:01:36.017
how the images are gathered and sent to the brain, then you get a completely different picture.

00:01:36.157 --> 00:01:40.297
And it's almost an untenable picture from the standpoint of everyday experience.

00:01:40.877 --> 00:01:45.977
And so the hardest thing to come to grips with, as I mentioned in my talk,

00:01:46.157 --> 00:01:51.937
is the binocular visual system has only good resolution in a tiny,

00:01:51.997 --> 00:01:55.317
tiny place in the middle of the gaze vector. So it's one degree.

00:01:55.777 --> 00:02:01.037
So if a person holds his arm at arm's length, the width of the thumb is the

00:02:01.037 --> 00:02:03.737
place where you have very, very good resolution you can read.

00:02:04.037 --> 00:02:09.557
Outside of that, the resolution becomes so poor that you can't even read out that.

00:02:09.797 --> 00:02:15.377
And as I mentioned in my lecture, that people found this out by changing the

00:02:15.377 --> 00:02:17.237
letters outside of the gaze point.

00:02:17.477 --> 00:02:26.197
So people started to think, hmm, if this is true, where should the guys go?

00:02:26.277 --> 00:02:30.617
If there's a tiny little sort of pistol of good resolution that you're pointing

00:02:30.617 --> 00:02:33.037
around the visual world,

00:02:33.037 --> 00:02:38.577
and I should also mention that the visual system is discrete so that we're not

00:02:38.577 --> 00:02:43.997
aware of it but every third of a second we change our eyes to a different point

00:02:43.997 --> 00:02:47.137
and so if this is true, where should we look?

00:02:47.868 --> 00:02:55.568
And so a substantial group of scientists thought that where you would look is saliency.

00:02:55.648 --> 00:02:59.728
So the pieces of the image that are most visually complicated,

00:02:59.988 --> 00:03:05.588
like if you're wearing a wristwatch or if you're holding some bracelet or just

00:03:05.588 --> 00:03:09.788
the eyes of another person, those are places where the image is busy.

00:03:10.068 --> 00:03:12.908
And the thought is that, well, those would be candidate places.

00:03:12.908 --> 00:03:15.928
And so the original thinking was, well, those are candidate places,

00:03:16.028 --> 00:03:20.008
and somehow the brain knows how to pick one of those from moment to moment.

00:03:20.108 --> 00:03:28.328
But I would say another camp, a camp of which I'm a member, is the idea that,

00:03:28.388 --> 00:03:32.768
well, that might not be true because it might be more agenda-driven.

00:03:32.768 --> 00:03:35.748
So this is very, very hard to get used to,

00:03:35.788 --> 00:03:41.908
but it could be that vision is really a succession of information gathering

00:03:41.908 --> 00:03:47.028
experiences and where you point your eyes to get some quanta of information

00:03:47.028 --> 00:03:50.708
and then somehow the brain knows

00:03:50.708 --> 00:03:54.808
how to integrate these quanta into some sensation, which you call seeing.

00:03:54.808 --> 00:03:57.708
And no one knows exactly how this works.

00:03:57.768 --> 00:04:00.628
If anybody figures it out, it's a Nobel Prize for sure.

00:04:00.988 --> 00:04:07.928
But at the moment, we're still sort of trying to guess some of the constraints

00:04:07.928 --> 00:04:11.008
that would lead us on the right path towards understanding this.

00:04:11.128 --> 00:04:14.288
And that's the saliency thing.

00:04:14.428 --> 00:04:18.868
So the saliency is really an initial try.

00:04:19.468 --> 00:04:24.288
And now the agenda faction thinks they are the second try. Okay,

00:04:24.428 --> 00:04:29.528
but now would you believe in some sort of compromise between these two schools of thought?

00:04:29.628 --> 00:04:33.388
Is that the solution or you really think it's going to be either one or the other?

00:04:34.806 --> 00:04:37.846
Okay, I know I'm in Europe and it's appropriate to compromise.

00:04:39.206 --> 00:04:43.926
So one issue where you might be able to compromise is the following,

00:04:44.666 --> 00:04:49.846
is that if vision is entirely agenda-driven, then the issue comes up of,

00:04:49.906 --> 00:04:52.026
well, when would you ever change your agenda?

00:04:52.506 --> 00:04:57.626
So if you're driving along and you're paying attention to the car in front of

00:04:57.626 --> 00:05:03.646
you and what would make you pay attention to, say, a small child runs in front

00:05:03.646 --> 00:05:06.926
of the car, or what would make you pay attention.

00:05:07.246 --> 00:05:14.566
So a compromise would be to take busy places in the image, but then you get

00:05:14.566 --> 00:05:17.706
to pick and choose them according to your agenda.

00:05:18.266 --> 00:05:23.606
So in front of the car, you would set it up so that you're sensitive to motion.

00:05:23.806 --> 00:05:27.546
Any irregular movement, so in the car when you're moving along,

00:05:27.766 --> 00:05:31.346
there's lots of motion, but it's very expected motion.

00:05:31.646 --> 00:05:35.946
But if there's some unexpected motion And you can quantify this on a computer.

00:05:36.826 --> 00:05:41.826
If there's some unexpected, let's call it salient motion, then you would allow

00:05:41.826 --> 00:05:44.006
yourself to be interrupted and deal with that.

00:05:44.226 --> 00:05:50.986
So the compromise would be that you have a gender-driven saliency where you

00:05:50.986 --> 00:05:54.706
can modify what's interesting in the image depending on your…,

00:05:55.566 --> 00:06:05.246
So one experiment that you described was, let's say, your litter-gathering experiment, right?

00:06:05.306 --> 00:06:09.666
Where you actually tried to also dissect now what this more agenda-driven form

00:06:09.666 --> 00:06:14.346
of attention might look like, what its components could be and how they could

00:06:14.346 --> 00:06:16.006
possibly work together.

00:06:16.226 --> 00:06:18.946
So what was the key idea behind that experiment?

00:06:20.526 --> 00:06:31.046
Well, the key idea is that everyone would like to, every scientist or such a

00:06:31.046 --> 00:06:35.166
huge group of scientists would like to solve the last big mystery, how does the brain work?

00:06:35.486 --> 00:06:43.506
And so how does thinking work? And there's a very sharp fork in the road as

00:06:43.506 --> 00:06:48.066
to whether you can think the system can be composed of primitives.

00:06:48.066 --> 00:06:54.266
So is there some way to break intelligence down so that you have these little

00:06:54.266 --> 00:06:59.406
quantum pieces and then so you can get very complicated behaviors by picking

00:06:59.406 --> 00:07:01.266
and choosing pieces like a puzzle?

00:07:02.029 --> 00:07:07.369
And one thought is that, well, from what we are learning about the brain.

00:07:07.909 --> 00:07:13.169
The amount of pieces in your puzzle has to be small, say less than 10.

00:07:13.389 --> 00:07:16.949
And so this is another idea that's very hard to get used to,

00:07:16.989 --> 00:07:18.989
but people are starting to think along this line.

00:07:19.069 --> 00:07:23.749
You have a lot of puzzle pieces, but for the puzzle you want to make,

00:07:23.789 --> 00:07:25.929
you have to pick a small amount in the time.

00:07:25.929 --> 00:07:32.889
And so, of course, in the laboratory, we tend to become more and more modest

00:07:32.889 --> 00:07:37.049
in the kinds of problems we tackle because of technical difficulties.

00:07:37.329 --> 00:07:44.409
And so in the problem you mentioned, we actually have human subjects in a virtual environment.

00:07:44.649 --> 00:07:48.869
They're walking down a sidewalk in the city, and there is litter there to pick up.

00:07:48.969 --> 00:07:51.929
They have that job, and then there's obstacles to go around,

00:07:52.089 --> 00:07:55.309
and then they have to stay on the sidewalk. box. That's three things to do at a time.

00:07:55.449 --> 00:08:01.409
And so we ask the question, well, can we keep track of the agenda during these

00:08:01.409 --> 00:08:05.749
three simultaneous tasks?

00:08:05.929 --> 00:08:11.289
Can we watch it and tell which one the brain's working on? And so we use the eye movements.

00:08:11.469 --> 00:08:19.949
We analyze the eye movement traces and report on what task is immediately being worked on there.

00:08:20.089 --> 00:08:24.029
And the hope is, of course, that if we can understand how to do three tasks,

00:08:24.249 --> 00:08:28.049
then lots and lots of tests are just around the corner, as long as we can pick

00:08:28.049 --> 00:08:30.549
them in successions of three or five or ten.

00:08:30.629 --> 00:08:34.849
Right. But in this case, the task you try to decompose is always going back

00:08:34.849 --> 00:08:36.469
to an eye movement, right?

00:08:36.529 --> 00:08:40.449
It's either an eye movement in the service of picking up the litter or an eye

00:08:40.449 --> 00:08:42.209
movement in the service of avoiding an obstacle.

00:08:43.160 --> 00:08:46.220
Or is that a too limited interpretation? No, not at all.

00:08:46.280 --> 00:08:53.260
I mean, you have your five senses, and I always say vision is the most important.

00:08:54.400 --> 00:08:58.180
Other people say, no, no, taste is the most important, because if you lose that,

00:08:58.320 --> 00:09:00.640
you'll eat something, you'll eat poison and die.

00:09:00.640 --> 00:09:07.420
But if you want to get information from a considerable distance and you want

00:09:07.420 --> 00:09:13.260
to get very elaborate, rich sources of what's out there in the world, you can't beat vision.

00:09:13.420 --> 00:09:20.260
And so the number of people studying vision probably dwarfs the number of people

00:09:20.260 --> 00:09:21.640
studying the rest of the senses.

00:09:21.640 --> 00:09:26.900
And so, vision is so important that we feel if we understand that one,

00:09:27.100 --> 00:09:29.700
the other senses will fall.

00:09:30.720 --> 00:09:34.380
The other people are working on the other way. They think, let's go simple first

00:09:34.380 --> 00:09:38.500
and try to understand one of the simpler senses, and then we can build up.

00:09:38.660 --> 00:09:43.280
But vision has certainly attracted the attention of many, many researchers.

00:09:43.840 --> 00:09:46.680
Right. But now in your task, in this decomposition of a task,

00:09:46.820 --> 00:09:49.900
so we have the litter gathering, avoiding the obstacles, stay on the path,

00:09:50.100 --> 00:09:54.140
right? then you map that into, let's say, three behavioral modules that would

00:09:54.140 --> 00:09:56.440
drive the eye movements or the gaze.

00:09:57.300 --> 00:10:00.280
But now in some sense, that's like a one-to-one mapping of, let's say,

00:10:00.300 --> 00:10:03.480
properties of the virtual world because either it's litter, that's one module,

00:10:03.600 --> 00:10:07.860
or an obstacle, another module, or it's the path, it's the other module.

00:10:08.000 --> 00:10:12.240
So if you have a mapping, let's say, one-to-one between behavioral modules and

00:10:12.240 --> 00:10:16.700
the relevant objects in the world, would you get an explosion of behavioral modules? Yes.

00:10:18.438 --> 00:10:21.898
Well, you would, of course, if you did it incorrectly.

00:10:22.318 --> 00:10:28.558
And so it's always embarrassing in science how you sweep certain problems under the rug.

00:10:28.738 --> 00:10:35.058
And so, as you point out, for our three tasks, there's a unique feature identifying the task.

00:10:35.278 --> 00:10:43.018
But in general, you want to categorize the world into objects or,

00:10:43.078 --> 00:10:47.238
as an American psychologist said, affordances.

00:10:47.738 --> 00:10:50.818
And into things not based on

00:10:50.818 --> 00:10:53.778
their sort of native features per se but rather

00:10:53.778 --> 00:10:59.918
in are they going to be useful and so um if you're in a dark alley and and there's

00:10:59.918 --> 00:11:04.118
people coming towards you and you need some kind of weapon to protect yourself

00:11:04.118 --> 00:11:09.338
it might be a piece of wood might be a pipe might be a stone there's there's

00:11:09.338 --> 00:11:12.538
but some what are those things that have in common they have some feature that

00:11:12.578 --> 00:11:13.698
would help you from this test.

00:11:13.858 --> 00:11:18.418
And so there are people in vision looking at ways to convert.

00:11:19.718 --> 00:11:27.138
These elementary geometric and mass properties description of objects into some

00:11:27.138 --> 00:11:30.618
sort of tool-like feature.

00:11:30.778 --> 00:11:33.738
So that problem is being worked on, but not by me.

00:11:33.898 --> 00:11:39.838
Well, actually, also in your experiment, you showed that the eye movement trajectory

00:11:39.838 --> 00:11:44.758
varied with whether the object had to be avoided or not. Yes.

00:11:44.958 --> 00:11:47.918
Right? Because then I think you also showed that in case it was,

00:11:47.978 --> 00:11:53.158
let's say, a to-be-avoided object in the task, the eye movements were more at

00:11:53.158 --> 00:11:57.318
the edges, so you wouldn't bump into it, while if it was more a viewing-related.

00:11:57.958 --> 00:12:02.138
Task, the eye movements were more going towards the center of the object.

00:12:02.358 --> 00:12:06.358
So in that sense, I think this also shows how the affordance of that object

00:12:06.358 --> 00:12:08.918
varies depending on the task that in turn translates in the

00:12:09.312 --> 00:12:12.172
in the eye movement behavior. Is that correct? That's true.

00:12:12.252 --> 00:12:17.552
And that sort of gets us into the sort of research that would unify what you

00:12:17.552 --> 00:12:20.672
and I are sometimes doing is this idea of embodied cognition.

00:12:21.152 --> 00:12:25.232
So it's actually not only the properties of the object, but it's the interaction

00:12:25.232 --> 00:12:27.812
of the user of the object and the object itself.

00:12:27.912 --> 00:12:33.752
It's sort of how the human is coupled to the world, which would lead us to perhaps

00:12:33.752 --> 00:12:37.312
the most shocking topic of all all, is in the eye movement,

00:12:37.552 --> 00:12:42.932
the original thinking was that when the eyes looked out in the world,

00:12:42.952 --> 00:12:47.172
it captured an image, that somehow that image was copied in the brain in some way.

00:12:47.332 --> 00:12:52.232
And so people quickly debunked that because then there would have to be somebody

00:12:52.232 --> 00:12:56.152
in the brain looking at that image, and then that never stops.

00:12:56.532 --> 00:13:04.832
So the other end of the spectrum that is getting a lot of weight is that when

00:13:04.832 --> 00:13:06.652
you look at a certain place,

00:13:06.852 --> 00:13:11.972
you actually are not painting an image at all, but you're after this property that we talked about.

00:13:12.152 --> 00:13:17.052
After some feature of the spot, you're looking at this, some information you

00:13:17.052 --> 00:13:21.472
want, and like you mentioned going around an object, you look at the edge because

00:13:21.472 --> 00:13:23.272
you want to rotate around the object.

00:13:23.352 --> 00:13:26.792
If you're picking up, you look for the center because you want to go right towards it.

00:13:27.032 --> 00:13:33.532
And so this is an idea that really takes a lot of time to get used to the idea

00:13:33.532 --> 00:13:38.532
that every third of a second, that when you move your eyes to a point,

00:13:38.612 --> 00:13:40.732
you're doing that for some visual test.

00:13:40.932 --> 00:13:44.512
If it's reading, you're trying to decode the word you're looking at in the text.

00:13:44.652 --> 00:13:47.972
But in the real world, if you're picking up objects or you have some tasks or

00:13:47.972 --> 00:13:53.952
you're cooking, driving a car, the tests get very interesting and different in each case.

00:13:54.032 --> 00:13:59.012
And one of the tasks is to see if somehow we can categorize all the different

00:13:59.012 --> 00:14:03.292
tests that you do. that you do. But now, so in...

00:14:03.947 --> 00:14:07.087
This would illustrate this notion of your agenda, if you want,

00:14:07.207 --> 00:14:12.047
dictating how you deal with objects in the world visually, how you actually

00:14:12.047 --> 00:14:14.127
extract information from these objects, right?

00:14:14.247 --> 00:14:17.467
But now, this could be interpreted in two ways. Either you could say like,

00:14:17.527 --> 00:14:21.667
well, the object affordance relationship, like the object, what you can do with

00:14:21.667 --> 00:14:26.927
it, gets defined in a rather different way. That's almost a different category.

00:14:27.087 --> 00:14:29.167
It's not, in some sense, internally an object.

00:14:29.747 --> 00:14:32.747
Or you could argue, well, actually, you really detect the same object.

00:14:32.747 --> 00:14:36.087
It's still the same sort of obstacle, but the same shape, and in both cases,

00:14:36.107 --> 00:14:40.427
you extract that, but it's more like a biasing of how you process that object.

00:14:40.907 --> 00:14:44.227
So, which of these two interpretations would you favor? Hmm.

00:14:45.307 --> 00:14:48.127
I mean, I think that's a tricky one.

00:14:48.287 --> 00:14:53.767
I mean, it really gets, I think we have to go back to agenda-driven, you know.

00:14:53.767 --> 00:14:58.727
So when you pick a particular agenda item to work on, when you pick the task,

00:14:58.827 --> 00:15:07.807
that task has the properties of the object you're interacting with written in some internal form.

00:15:08.007 --> 00:15:13.487
And so you just have to query the world to see if the object you're looking

00:15:13.487 --> 00:15:17.947
at, in fact, satisfies those properties. And so I'm trying to think.

00:15:18.487 --> 00:15:21.707
I know this is going to come down one side or the other of your question,

00:15:21.807 --> 00:15:23.727
but I'm going to let you pick one.

00:15:25.147 --> 00:15:29.987
Okay. Well, so my bet would be that you build up a scene.

00:15:31.477 --> 00:15:35.497
Which will contain this object, but I think the way you bias it with respect

00:15:35.497 --> 00:15:36.577
to your action will vary.

00:15:36.797 --> 00:15:40.917
So it's not, on the other hand, you could argue, yes, okay, but if you would

00:15:40.917 --> 00:15:44.917
look at examples of, let's say, inattentional blindness, then the objects are

00:15:44.917 --> 00:15:48.697
actually out there, but you're not seeing them because it's not related to your agenda.

00:15:49.777 --> 00:15:54.337
So in that sense, it looks like it's a difficult problem to solve at this stage,

00:15:54.457 --> 00:15:58.617
but you were the one giving the talk and me, so you're the one who has to solve it now for me.

00:15:59.457 --> 00:16:04.537
I see. But I think you opened the door to attentional blindness,

00:16:04.677 --> 00:16:12.337
and perhaps we should just revisit the idea that in the rather astonishing variety

00:16:12.337 --> 00:16:15.657
of experiments done by Dan Simons and others,

00:16:15.897 --> 00:16:21.157
that people don't know huge changes in their visual world.

00:16:21.157 --> 00:16:29.057
And one of the famous examples was some variant of a person approaching a counter,

00:16:29.237 --> 00:16:33.177
like you would if you're checking into a hotel and you're dealing with the clerk.

00:16:33.377 --> 00:16:37.437
But, of course, it's an experiment. So the clerk ducks under the table to get

00:16:37.437 --> 00:16:40.857
something, supposedly, and there's a person hiding behind there,

00:16:40.917 --> 00:16:42.717
and another person pops up.

00:16:42.717 --> 00:16:49.217
And if the person who replaces the original person is only vaguely similar,

00:16:49.417 --> 00:16:53.677
the person who came to the table would never, never notice the difference.

00:16:53.957 --> 00:16:59.837
And so the thinking, of course, from what we call our Cartesian view of the

00:16:59.837 --> 00:17:03.797
world, where everything's a picture and there are objects in it that we can sort of label,

00:17:03.917 --> 00:17:10.837
like an inverse paint-by-numbers world, world, that just defies explanation

00:17:10.837 --> 00:17:13.797
because we should notice the changes.

00:17:13.837 --> 00:17:19.837
But in an agenda-driven view of the world, well, we don't because we can be

00:17:19.837 --> 00:17:26.077
nice to the clerk and polite, but we don't expect the relationship to go on forever.

00:17:26.237 --> 00:17:32.477
And so we don't quote a lot of details about what the person looks like or et cetera, et cetera.

00:17:32.577 --> 00:17:34.977
So the change blindness example is actually...

00:17:35.996 --> 00:17:37.556
Quintessential example of

00:17:37.556 --> 00:17:43.396
a gender-driven vision at work. You just need the features to do the job.

00:17:43.616 --> 00:17:49.516
And so I would say that the fact that in our manufactured world,

00:17:49.616 --> 00:17:55.536
we give lots of things helpful hints, like putting them in cylinders and writing Coke on them.

00:17:55.656 --> 00:18:02.276
But for the most part, we come prepared to just exploit the functional features

00:18:02.276 --> 00:18:05.516
of something and getting to the tool use stage.

00:18:05.996 --> 00:18:10.796
So then, from that perspective, so in this agenda-driven view of perception,

00:18:11.696 --> 00:18:17.756
then your gaze behavior becomes one of, if you want, information-seeking or

00:18:17.756 --> 00:18:20.936
uncertainty reduction, the notion you used for that.

00:18:21.256 --> 00:18:24.636
So how does it relate exactly to the agenda-driven view?

00:18:28.776 --> 00:18:33.036
Well, so let's see, we'd have to try to summarize this question.

00:18:34.528 --> 00:18:42.228
Succinctly. And so one central issue in the brain and how the brain uses vision

00:18:42.228 --> 00:18:48.068
is, of course, that you have these agenda items, which we think of as programs,

00:18:48.428 --> 00:18:51.268
internal programs that neurons are in charge of.

00:18:51.388 --> 00:18:53.908
And the question is, how does that all happen?

00:18:54.128 --> 00:18:59.088
And why we don't know even the beginnings of it. We have some clues.

00:18:59.348 --> 00:19:05.308
And one is that what the brain has to do is be its own programmer.

00:19:05.608 --> 00:19:09.188
So if you're working for a company, you can be a programmer and write programs for it.

00:19:09.428 --> 00:19:14.088
But if you, the person, have internal programs, how do you code them?

00:19:14.308 --> 00:19:19.908
And so the prevailing view is that somehow the neural system has a way of suggesting

00:19:19.908 --> 00:19:25.028
programs, and then you would score them as to how effective they are.

00:19:25.308 --> 00:19:31.348
And this internal scoring is believed to be this chemical molecule dopamine.

00:19:31.768 --> 00:19:35.208
So dopamine is like an internal currency.

00:19:35.808 --> 00:19:41.048
I call it in the classroom, I call it the neuro in honor of the euro.

00:19:41.268 --> 00:19:46.308
But it's just an internal pay scale for rating different programs.

00:19:46.408 --> 00:19:51.268
And the idea of you as a surviving person who wants to spread your genes around,

00:19:51.488 --> 00:19:57.308
you would you try to earn the most, you're designed to earn the most neurons.

00:19:58.148 --> 00:20:01.288
And the neurons in your head, they can't see out.

00:20:01.408 --> 00:20:09.648
They just have to deal with what their internal structures are.

00:20:09.948 --> 00:20:17.128
And so they do things like these visual tests and try the visual test.

00:20:17.288 --> 00:20:25.048
Everyone comes with its own rate of return and the internal gender-driven programs

00:20:25.048 --> 00:20:27.248
that you're running, they are all worth something,

00:20:27.848 --> 00:20:33.668
and some part of you is trying to pick the best ones.

00:20:33.888 --> 00:20:36.768
Now, what makes me earn neuros exactly?

00:20:37.848 --> 00:20:43.048
Oh, you come out of the box. You come out of the box as a neuro… Accumulator.

00:20:44.268 --> 00:20:45.388
Neuro accumulator.

00:20:46.828 --> 00:20:52.508
It's rather interesting. thing the the are you saying I'm out there really trying

00:20:52.508 --> 00:20:57.648
to collect drops of juice and sugar and sex drugs and rock and roll,

00:20:58.393 --> 00:21:01.573
Or am I also earning neuros for doing other things?

00:21:03.333 --> 00:21:08.453
The answer is both. So your brain evolved in stages. And the part we associate

00:21:08.453 --> 00:21:10.993
with being some kind of a computer is the forebrain.

00:21:11.153 --> 00:21:14.773
And so that's the last to go. And it's very complicated. It has lots of parts

00:21:14.773 --> 00:21:17.113
that we could talk about, but maybe we shouldn't.

00:21:17.193 --> 00:21:23.533
But it sits on top of an earlier system that contains the chemical rewards.

00:21:23.533 --> 00:21:32.373
And the neurons that are communicating in this computer part that give reward,

00:21:32.593 --> 00:21:34.233
they're dopaminergic neurons.

00:21:34.473 --> 00:21:39.353
And so you have a vast system of wires that goes through all your modern forebrain.

00:21:39.393 --> 00:21:45.273
But the part down in the brainstem that's handing this out is right next to

00:21:45.273 --> 00:21:49.973
the part that has your basic rewards, your drives. the four Fs are called for

00:21:49.973 --> 00:21:53.693
fight, flee, feed, and reproduction.

00:21:54.213 --> 00:21:58.813
And so basically the way your brain works, the forebrain works,

00:21:58.893 --> 00:22:04.013
even though you do these elaborate programs like algebra and physics and trading

00:22:04.013 --> 00:22:08.053
on the stock market, they all have to communicate somehow with the basic drives.

00:22:08.273 --> 00:22:13.273
So the basic drives, somehow your forebrain comes up with these elaborate,

00:22:13.273 --> 00:22:20.913
It has to do the elaborate translation of why you're doing what you're doing is worth this reward.

00:22:21.313 --> 00:22:25.833
Right, but now if we map that back to gaze behavior, right?

00:22:25.893 --> 00:22:34.533
So you mentioned that gaze is driven by, let's say, the need to reduce uncertainty about the world.

00:22:35.493 --> 00:22:41.313
So now what makes me earn euros? when I identify the spots that give me the

00:22:41.313 --> 00:22:43.553
maximum uncertainty reduction?

00:22:43.993 --> 00:22:48.373
Or do I earn neuros for jumping to the spot where I get the juice?

00:22:49.356 --> 00:22:55.996
Reward? Well, I mean, it depends how that can fall out in different ways in different contexts.

00:22:56.196 --> 00:23:01.756
But the fact that you've come this far means you've, from our perspective that

00:23:01.756 --> 00:23:05.016
we share, I think, is that we've totally bought into this program.

00:23:05.356 --> 00:23:12.736
So some evidence like Kenji Doya and others are suggesting another one of these

00:23:12.736 --> 00:23:16.096
molecules, serotonin, is responsible for risk.

00:23:16.336 --> 00:23:21.536
So when you, So if we think back to choosing one of these agenda-driven behaviors,

00:23:21.916 --> 00:23:26.516
and we don't really want to know what it is, we just want to know how to characterize

00:23:26.516 --> 00:23:31.456
it in the most basic sense. It's like a gamble.

00:23:31.636 --> 00:23:35.056
So how much reward and how much risk is it?

00:23:35.176 --> 00:23:40.676
And so the brain doesn't want to do these apples and oranges comparisons.

00:23:40.876 --> 00:23:43.316
It wants to reduce everything to a common denominator.

00:23:43.636 --> 00:23:47.696
So how many neurons? What's the risk? So your internal programs,

00:23:47.896 --> 00:23:49.856
the way they run, it's like making a bet.

00:23:50.156 --> 00:23:56.416
And so from that perspective, the eye movement system can help by reducing risk.

00:23:56.596 --> 00:24:01.596
So if you can reduce your risk, then your bet will become more of a sure bet,

00:24:01.676 --> 00:24:03.476
and you're going to get more reward.

00:24:04.036 --> 00:24:08.896
And if the internal thing is matched to the external world, then it'll be an

00:24:08.896 --> 00:24:11.416
accurate rendition of the value to you.

00:24:11.416 --> 00:24:16.016
It's also yesterday you showed a more theoretical experiment where you also

00:24:16.016 --> 00:24:21.516
then try to, let's say, extract the neurons that a viewer would be accumulating

00:24:21.516 --> 00:24:25.696
with a certain scan path to the world, giving a task.

00:24:25.956 --> 00:24:29.556
So does that really make sense?

00:24:29.556 --> 00:24:35.876
So what the regularities so if you map an eye movement pattern back to an inference

00:24:35.876 --> 00:24:43.216
on what the value would be or could be how consistent is that what you get out of that well,

00:24:44.116 --> 00:24:45.396
I mean it.

00:24:46.799 --> 00:24:50.999
In the lab, what can we do? So if we think we got it, if we think we got it.

00:24:51.019 --> 00:24:53.319
So practically, a couple of things first.

00:24:53.519 --> 00:25:01.659
Practically, people have shown that in the brain, the neurons are sensitive to dopamine reward.

00:25:01.939 --> 00:25:08.379
So there's miles and miles of evidence where people have recorded from the cells

00:25:08.379 --> 00:25:12.379
passing out reward and show they behave in a very consistent fashion.

00:25:12.379 --> 00:25:16.379
So the monkeys doing tasks are doing tasks under different circumstances,

00:25:16.539 --> 00:25:19.719
and under these circumstances, they should get more or less reward.

00:25:19.919 --> 00:25:24.339
And their actual neural recordings are very consistent, showing that they get

00:25:24.339 --> 00:25:29.939
more, the cells are firing more furiously, passing out more reward than when

00:25:29.939 --> 00:25:33.819
they should. So that says, okay, well, maybe this is on the right track.

00:25:33.899 --> 00:25:40.739
But in terms of what we're working on, you know, that's one of the issues is

00:25:40.739 --> 00:25:48.039
that, you know, if you eat an apple, your body structure can convert that into calories for you.

00:25:48.119 --> 00:25:53.099
So your brain can be told what's the calories to neurons conversion. version.

00:25:53.359 --> 00:25:57.339
But if you're writing a scientific paper and you get it accepted in a journal.

00:25:57.959 --> 00:26:01.059
What is, how many neurons is that worth?

00:26:01.319 --> 00:26:06.279
And that's a delicate problem, is that you handing out the reward and you earning

00:26:06.279 --> 00:26:07.759
reward are the same person.

00:26:07.979 --> 00:26:14.619
And so the question you raise of how to keep that in calibration is a very delicate and important one.

00:26:14.859 --> 00:26:20.659
And in the lab, we're not working with monkeys, we're working with human subjects

00:26:20.659 --> 00:26:21.719
in virtual environments.

00:26:21.719 --> 00:26:26.779
So we can change the environments in a way that would suggest the reward should

00:26:26.779 --> 00:26:31.019
behave in a certain way or the uncertainty should interact with the reward in a certain way.

00:26:31.139 --> 00:26:36.559
And then if those results come back consistently, then we think,

00:26:36.599 --> 00:26:37.979
oh, we're on the right track.

00:26:38.259 --> 00:26:42.959
Right. But now, if you look at this one interpretation of a dopamine system,

00:26:43.339 --> 00:26:48.599
going back to Wolfram Schultz and others, would be that dopamine responds to unpredicted reward.

00:26:49.491 --> 00:26:53.371
But in some sense, if you take this sort of uncertainty reduction interpretation

00:26:53.371 --> 00:27:01.431
of gaze behavior, then I'm gazing to positions where I'm expecting something, right?

00:27:01.511 --> 00:27:05.311
So that means it's not an unexpected reward.

00:27:05.551 --> 00:27:08.851
If you hit the right spot and it helps to reduce uncertainty,

00:27:09.211 --> 00:27:11.971
it's an expected reward. So then dopamine should not fire.

00:27:13.311 --> 00:27:18.171
So what did I miss? You didn't miss anything,

00:27:18.311 --> 00:27:23.591
but you might have skidded over what we would think of as a fairly minor technical

00:27:23.591 --> 00:27:29.831
point, is that how exactly is the information coded in the brain?

00:27:30.071 --> 00:27:36.891
And since basically the brain's programs are run over and over and over again,

00:27:36.891 --> 00:27:40.431
gain, that, for example, eye movements,

00:27:40.631 --> 00:27:47.311
these fast eye movements we talked about, are made at the rate of 150,000 eye gaze points per day.

00:27:47.651 --> 00:27:51.431
And so for all the tasks like cooking, making coffee and stuff,

00:27:51.531 --> 00:27:52.731
you've done it many, many times.

00:27:52.871 --> 00:27:57.071
And so before you run the program, you have a very good estimate of what you should get.

00:27:57.451 --> 00:28:02.231
And so it's cheaper for the brain. This is Wolfram Schultz's work,

00:28:02.311 --> 00:28:08.671
predominantly, that it's cheaper for the brain to record the difference between

00:28:08.671 --> 00:28:10.871
what you thought you'd get and what you'd get.

00:28:11.011 --> 00:28:14.931
And there's a lot of evidence for this, that this is the way the brain chooses,

00:28:15.031 --> 00:28:20.151
because it's just cheaper to just keep track of when something isn't what you expected.

00:28:20.271 --> 00:28:23.191
Either way, you got more than expected or less than expected.

00:28:23.451 --> 00:28:28.691
And technically, some of the algorithms that we use work on that principle.

00:28:28.691 --> 00:28:31.351
They work on the difference coding principle.

00:28:31.571 --> 00:28:39.051
But it's really, I would say, a bit secondary because it's sort of an economical

00:28:39.051 --> 00:28:42.771
coding principle rather than the main lesson,

00:28:42.871 --> 00:28:48.311
which is the brain has to keep the secondary reward model of things. Right.

00:28:48.546 --> 00:28:53.446
But then the other issue, if you take this sort of the law of effect approach

00:28:53.446 --> 00:28:58.486
of Thorndike, which it goes back to, which basically means you optimize your rewards.

00:28:59.126 --> 00:29:03.846
So you reinforce the things that gives you reward and you try to stamp out,

00:29:03.946 --> 00:29:05.966
as he called it, the things that do not give you reward.

00:29:06.426 --> 00:29:11.366
Then in your case, you might end up following gaze patterns that are tuned to

00:29:11.366 --> 00:29:14.406
a certain task in a certain environment because you optimize reward.

00:29:14.406 --> 00:29:19.046
And it might be maladapted to, let's say, changes in the task or to other tasks.

00:29:19.486 --> 00:29:24.526
And then you have to first stamp out that whole gaze pattern you have acquired

00:29:24.526 --> 00:29:27.726
and reacquire another one from scratch.

00:29:28.006 --> 00:29:32.066
So it might lead to inefficiencies if you talk about switching between different

00:29:32.066 --> 00:29:35.366
tasks. So how would your model deal with that?

00:29:35.586 --> 00:29:40.386
So let's say we go from picking up litter to catching birds or something like this.

00:29:41.246 --> 00:29:46.026
Well, listen, we have to tie this back to some things we talked about at the beginning, really.

00:29:48.426 --> 00:29:53.206
I actually, this is, Gay's work, I should mention, is the work of Nathan Sprague.

00:29:53.246 --> 00:30:00.006
He's a former PhD student, and I actually told him, when he started to think

00:30:00.006 --> 00:30:03.906
about this problem, I gave him the Thorndike line that I thought,

00:30:03.946 --> 00:30:07.926
just looking at the right place, should you get you the most reward?

00:30:07.926 --> 00:30:15.006
And Nathan came back and explained to me no it's the if you want to account for the data then,

00:30:15.886 --> 00:30:21.306
reward the reducing the reward weighted uncertainty is the right way to go so

00:30:21.306 --> 00:30:27.126
it's not reward per se but it's the reduction in the looking increases the.

00:30:27.553 --> 00:30:30.973
Odds of winning your bet in our gender-driven world.

00:30:31.173 --> 00:30:37.733
And so he showed, basically, that the original Thorndike idea was unstable when you apply it to gays.

00:30:37.893 --> 00:30:43.833
But if you did this product of reward and uncertainty, it was stable and actually

00:30:43.833 --> 00:30:46.273
could outperform the alternatives.

00:30:46.353 --> 00:30:50.353
Okay. So that would lead to a somewhat different interpretation of,

00:30:50.373 --> 00:30:53.293
let's say, the cues that drive reward. It would not be just,

00:30:53.413 --> 00:30:59.133
let's say, a flat-out drop of juice or something like this or some image that looks very pleasing.

00:30:59.473 --> 00:31:03.913
It is really first the detection of an uncertainty reduction that in itself

00:31:03.913 --> 00:31:06.133
will be driving a reward signal.

00:31:06.273 --> 00:31:09.033
That would be the interpretation of this, right? Right. Okay.

00:31:09.173 --> 00:31:14.933
There's a very famous, perhaps we can add, there's a very famous tape from John

00:31:14.933 --> 00:31:20.493
Senders, who was really a pioneer in thinking about information properties of gaze.

00:31:20.493 --> 00:31:27.433
And he asked the question, what if you can't use your eyes?

00:31:27.613 --> 00:31:32.353
And he built a very special device that would, while driving a car,

00:31:32.593 --> 00:31:36.253
it would eliminate his gaze for different amounts of time.

00:31:36.293 --> 00:31:39.453
This huge clamshell came down and blocked his vision.

00:31:39.493 --> 00:31:43.913
And it was driven by a motor, so he could have it secured for one second,

00:31:44.013 --> 00:31:45.813
two seconds, three seconds, four seconds.

00:31:45.813 --> 00:31:49.533
And about four or five seconds while driving a car on the highway,

00:31:49.753 --> 00:31:55.373
you get into terrible trouble, and you really have a very visceral sense that

00:31:55.373 --> 00:31:59.513
it's the uncertainty on where you are that's the driving force.

00:31:59.653 --> 00:32:04.973
And I think this was Nathan's kind of insight, too, is that when you have reward

00:32:04.973 --> 00:32:09.253
and uncertainty, in the case of gays, it's the uncertainty that's dominating

00:32:09.253 --> 00:32:12.253
and that needs to be paid attention to. Right.

00:32:12.553 --> 00:32:17.193
So, but then if we would take this literal interpretation of the dopamine signal

00:32:17.193 --> 00:32:21.493
as unexpected reward, would you say, look.

00:32:22.215 --> 00:32:28.995
This idea of uncertainty reduction as a reward-driving signal,

00:32:29.235 --> 00:32:32.835
is it important for you that this gets mapped to dopamine, or you also would

00:32:32.835 --> 00:32:37.075
be happy if it would be some other neuromodulatory system that is conveying

00:32:37.075 --> 00:32:39.955
that signal to the rest of the brain,

00:32:40.195 --> 00:32:43.015
or really specific for you to the dopamine system?

00:32:43.015 --> 00:32:48.295
No, I mean, as a modeler, of course, it's a scalar something.

00:32:48.535 --> 00:32:51.515
And so if it was something else, that would be fine. But of course,

00:32:51.695 --> 00:32:56.235
like, you know, the Nobel Prize has already been given out for the discovery

00:32:56.235 --> 00:32:59.055
of dopamine's role in this context.

00:32:59.275 --> 00:33:03.375
So you'd be swimming. We really don't know what the alternatives,

00:33:03.635 --> 00:33:05.955
what a good alternative is in this place.

00:33:06.155 --> 00:33:10.415
No, but that's good. So you really, you would fight and die more for the general

00:33:10.415 --> 00:33:12.255
idea of the scalar value. Absolutely.

00:33:13.635 --> 00:33:18.335
It doesn't have to be that. But there's a lot of evidence for it.

00:33:18.335 --> 00:33:23.195
And all the addictions like nicotine and cocaine are linked to breaking into

00:33:23.195 --> 00:33:24.595
the dopamine storehouse.

00:33:24.875 --> 00:33:29.455
That's right. But how would, let's say, these addictions be informative on,

00:33:29.515 --> 00:33:30.935
let's say, uncertainty reduction?

00:33:32.235 --> 00:33:35.295
Right? I mean, sure, that relates to reward. That's clear.

00:33:35.515 --> 00:33:40.255
But your specific angle on it is the uncertainty reduction. Well, okay.

00:33:40.415 --> 00:33:44.595
But we have to keep in mind where we started here is that we started thinking

00:33:44.595 --> 00:33:49.595
about vision and we've started by this very esoteric and high performance gaze

00:33:49.595 --> 00:33:53.415
system that can really move the eyes to different parts in the visual world

00:33:53.415 --> 00:33:56.015
at speeds up to 700 degrees per second.

00:33:56.195 --> 00:34:01.515
So the eye movement system is really remarkable and very different.

00:34:01.595 --> 00:34:05.715
And so that at that end of the spectrum with that part of the human machine,

00:34:05.875 --> 00:34:08.475
then uncertainty comes into play.

00:34:08.775 --> 00:34:14.175
But when you're breaking into a house and taking the high-definition television

00:34:14.175 --> 00:34:17.635
so that you can sell it and have more cocaine and more dopamine,

00:34:17.855 --> 00:34:22.935
then you're dealing with the… There are uncertainties, but you're really after the reward.

00:34:23.135 --> 00:34:26.675
The reward is driving you, and overwhelmingly so.

00:34:26.915 --> 00:34:30.835
Yeah, okay. That's clear. So then you also showed experiments where you were

00:34:30.835 --> 00:34:36.275
generalizing this way of thinking to driving cars in virtual reality, right?

00:34:36.375 --> 00:34:42.335
Yes. So how were these experiments with following another car and trying to

00:34:42.335 --> 00:34:45.335
see – so people have a task to follow another car at a certain speed.

00:34:45.395 --> 00:34:47.215
This car moves along some highway.

00:34:47.675 --> 00:34:53.075
And what you're looking at is then the switching between the car you're following

00:34:53.075 --> 00:34:57.255
and the speedometer because you have to stick to a certain speed.

00:34:58.075 --> 00:35:02.815
So how has this been informative on this notion of uncertainty reduction? Yeah.

00:35:03.355 --> 00:35:08.975
Well, of course, like we talked about John Sender's experiment,

00:35:09.175 --> 00:35:15.435
driving the car, and if you don't pay, of course, being a researcher,

00:35:15.735 --> 00:35:21.755
the style is to defend certain hypotheses that you're trying to run down.

00:35:21.755 --> 00:35:27.055
And so their thought was, well, driving is perhaps a little like walking down

00:35:27.055 --> 00:35:28.475
a sidewalk in the following senses.

00:35:28.535 --> 00:35:33.115
You have a limited agenda of things that you can do in this multitasking sense.

00:35:33.455 --> 00:35:38.615
And what could they be? And in the lab we picked, in the demonstration you're

00:35:38.615 --> 00:35:42.775
talking about, we picked something simple or relatively simple,

00:35:42.855 --> 00:35:45.975
following a car at a certain speed. So dual task.

00:35:46.195 --> 00:35:50.555
And then in our subjects, in a virtual car, so we have a car simulator.

00:35:50.555 --> 00:35:55.575
Simulator, the gaze pattern goes back and forth from the car they're following to the speedometer.

00:35:56.575 --> 00:36:01.615
But of course, we know in real life that multitasking in driving is critical.

00:36:01.795 --> 00:36:07.875
And in the US, there's a terrible problem with teenagers and texting and talking

00:36:07.875 --> 00:36:09.175
on the phone during driving.

00:36:09.355 --> 00:36:16.055
And so that's a very demanding task to be doing and competes often,

00:36:16.175 --> 00:36:18.775
sometimes fatally with a normal driving test.

00:36:18.775 --> 00:36:22.755
Even in simple situations like driving on a freeway and things like that,

00:36:22.815 --> 00:36:24.795
people will wander into the wrong lane.

00:36:25.035 --> 00:36:31.255
And finally, the television is trying to alert people, particularly young people,

00:36:31.355 --> 00:36:32.995
that they shouldn't do this.

00:36:33.075 --> 00:36:35.915
Because in the U.S., it's very hard to pass a law against anything.

00:36:36.215 --> 00:36:42.375
It's a free country, but this is definitely a place where we shouldn't be a free country.

00:36:42.435 --> 00:36:45.195
We shouldn't be texting and talking on the phone while driving the car.

00:36:45.195 --> 00:36:52.915
But what it points out, though, I think, rather cruelly, is this idea of uncertainty,

00:36:53.115 --> 00:36:58.535
because here's where you start multitasking on a phone, or typing something

00:36:58.535 --> 00:37:03.435
on an Android keyboard, sorry for Android, any old phone.

00:37:05.867 --> 00:37:12.627
Then that just steals your cycles, if you wish, computer cycles, and it steals your gaze.

00:37:12.707 --> 00:37:16.527
So your gaze is now staring at this keyboard when it should be looking out at driving.

00:37:16.687 --> 00:37:19.847
And so it's multitasking. So you're sneaking some gazes on the road,

00:37:19.927 --> 00:37:24.007
hopefully, but in the accidents, you're not doing it enough.

00:37:24.127 --> 00:37:27.407
And so you're just spending too much time on the keyboard, not enough time.

00:37:27.527 --> 00:37:32.287
But if you think about it abstractly, it's just one of these task things.

00:37:32.287 --> 00:37:35.267
So you're texting or you're looking at the road.

00:37:35.387 --> 00:37:38.987
Your gaze is going back and forth and you're trying to make a decision about uncertainty.

00:37:39.207 --> 00:37:42.567
But since you're a teenager, you don't quite have the numbers right.

00:37:42.707 --> 00:37:46.247
And so you're not giving enough weight to looking and looking at what you're doing.

00:37:46.507 --> 00:37:50.767
But how rapidly in this driving task, what can you say about,

00:37:50.847 --> 00:37:57.747
let's say, the speed or the rate at which this uncertainty increases over time?

00:37:57.747 --> 00:38:00.747
Time is that let's say linear with time

00:38:00.747 --> 00:38:03.507
or is there some other relationship oh we want to

00:38:03.507 --> 00:38:06.647
know but that's a laboratory that's a research question

00:38:06.647 --> 00:38:09.687
i mean we're we're trying to think it's exactly

00:38:09.687 --> 00:38:15.287
what kinds of manipulations we would do to pin that down if we we would be very

00:38:15.287 --> 00:38:20.487
very very satisfied if we could come up with some sort of model for that but

00:38:20.487 --> 00:38:24.167
you have it no because it's the probability to see an eye movement from the

00:38:24.167 --> 00:38:28.367
car you're tracking to the speedometer or vice versa.

00:38:29.007 --> 00:38:32.407
It's like when a person looks at the speedometer, so now I'm not tracking the

00:38:32.407 --> 00:38:35.627
car, now uncertainty about the position of the car is building up.

00:38:36.287 --> 00:38:42.207
So, and that should, I guess, correlate with now the probability to see a saccade back to the car.

00:38:43.155 --> 00:38:47.575
No, no, no, you got it, you got it, Paul. I think that's a good suggestion.

00:38:47.675 --> 00:38:49.875
Certainly, we want to do something like that.

00:38:49.975 --> 00:38:55.875
The only thing that makes it tricky is we don't know from first principles how

00:38:55.875 --> 00:39:00.035
much reward we should give to the individual tasks.

00:39:00.475 --> 00:39:05.175
And so there's a confound. We can't quite, if we have a product of reward and uncertainty.

00:39:05.635 --> 00:39:11.775
And so we don't, the tricky part is to thinking of experiments that would decompose

00:39:11.775 --> 00:39:15.615
those factors. so we could pin down how much of the factor is the reward and

00:39:15.615 --> 00:39:16.735
how much is the uncertainty.

00:39:17.075 --> 00:39:21.015
So that's the only thing that makes it ticklish, but that's what we wanted to do.

00:39:21.215 --> 00:39:26.555
But then you just can make it a game where the instruction varies from just

00:39:26.555 --> 00:39:32.035
worry about speed to just worry about the car and all combinations in between with some gradation,

00:39:32.135 --> 00:39:35.355
and then you want to plot the probability to see these eye movements from one

00:39:35.355 --> 00:39:38.775
to the other, and that would tell you then the uncertainty buildup, I guess.

00:39:38.975 --> 00:39:42.115
At this point, I'm thinking we should bring you to Texas.

00:39:42.715 --> 00:39:46.315
Where i'm from because clearly clearly you

00:39:46.315 --> 00:39:49.555
can solve this problem for us okay good

00:39:49.555 --> 00:39:53.655
you got a deal but um so

00:39:53.655 --> 00:39:56.395
so but this is great right because now we saw you so

00:39:56.395 --> 00:40:04.075
you we're looking at perception as an agenda-driven uh active process um with

00:40:04.075 --> 00:40:06.995
respect to and the measures in eye movement but then i could say yeah that's

00:40:06.995 --> 00:40:10.575
all really nice dana but i think you're a bit too vision-centric because Because

00:40:10.575 --> 00:40:14.335
I'm not flapping my ears around in a similar way, right?

00:40:14.595 --> 00:40:19.415
My ears have sort of almost omnidirectional access to the auditory world,

00:40:19.495 --> 00:40:20.835
to the sound pressure waves.

00:40:21.415 --> 00:40:27.555
So I don't have that problem. So I don't have to have the same kind of selectivity

00:40:27.555 --> 00:40:33.215
in now sucking in the information because I don't have something like an auditory phobia, right?

00:40:33.615 --> 00:40:35.815
But I have to say….

00:40:38.744 --> 00:40:42.884
So, your listeners may not know that you're a relatively young person.

00:40:43.104 --> 00:40:47.224
And so, if you were an old person, you would answer that question completely

00:40:47.224 --> 00:40:50.444
differently because… So, I lost my ticket to Texas now.

00:40:50.764 --> 00:40:53.964
No, no, no. We have old people. Old people can come there too.

00:40:54.204 --> 00:40:56.004
Okay, good. I mean young people.

00:40:56.424 --> 00:41:04.664
But what I think you remembered is in the auditory system, one nice illustration

00:41:04.664 --> 00:41:06.544
is the cocktail party effect.

00:41:06.544 --> 00:41:10.344
Whereas if you're in a crowded cocktail party and someone calls out your name,

00:41:10.504 --> 00:41:12.704
you immediately recognize it.

00:41:12.824 --> 00:41:17.564
And so what you can do with the auditory system, you really have something like the visual problem.

00:41:17.664 --> 00:41:25.544
Is that in a situation with lots and lots of background noise that you can tune

00:41:25.544 --> 00:41:29.884
in to one particular speaker and one particular conversation.

00:41:29.884 --> 00:41:35.264
And so it's not exactly the same as vision, but it shares a lot of the same

00:41:35.264 --> 00:41:39.264
abstract features in that in the auditory picture, in the visual world,

00:41:39.404 --> 00:41:44.224
you want to pick out a place where the object of interest is.

00:41:44.324 --> 00:41:49.764
In the auditory world, you want to pick a part of the auditory spectrum where

00:41:49.764 --> 00:41:52.524
the actual voice signal is being generated.

00:41:52.804 --> 00:41:57.704
And so at an abstract level, some of the problems are quite similar.

00:41:57.704 --> 00:42:04.264
And also, you have some ability, since you have two ears and the auditory signal

00:42:04.264 --> 00:42:08.144
is different depending on the location and space it's coming from,

00:42:08.244 --> 00:42:13.704
you have some ability to filter the signal from where it's coming from in space.

00:42:14.084 --> 00:42:16.784
And so, there are similarities.

00:42:17.244 --> 00:42:23.584
And so, in an agenda where there's multiple speakers, that would be like multitasking.

00:42:23.744 --> 00:42:28.204
Right. Okay, so that means, so you would say, look, maybe the implementation

00:42:28.204 --> 00:42:30.844
might be somewhat different because we're not flapping our ears around.

00:42:31.184 --> 00:42:34.664
But in both cases, you have an information bottleneck.

00:42:34.744 --> 00:42:39.824
It's like the central processing of information about the world is restricted, right?

00:42:39.904 --> 00:42:44.744
So there's a limited bandwidth and you have to pre-filter in some sense what you allow to enter.

00:42:44.964 --> 00:42:48.144
And that is done on the basis of this agenda that you're driving.

00:42:48.144 --> 00:42:55.144
And you would say this would generalize to also olfaction or somatosensory information? Right.

00:42:56.073 --> 00:43:00.473
Yes. I mean, I think I'm not an expert in any of the other senses,

00:43:00.533 --> 00:43:09.153
but I think the other ones, certainly the haptic sense of, like, for example, your skin.

00:43:09.573 --> 00:43:14.333
Everybody in Texas has had the sensation of ants crawling on them.

00:43:14.333 --> 00:43:20.393
And we do have fire ants where they're trained to bite simultaneously.

00:43:20.613 --> 00:43:26.113
And a lot of people are allergic and they need to carry an EpiPen with them.

00:43:26.213 --> 00:43:30.253
And so when they get bitten, they have to inject themselves right away.

00:43:31.353 --> 00:43:35.873
But there you have the same kind of issue. You can think, let's leave a fire

00:43:35.873 --> 00:43:38.213
ant out of it and just take a regular ant.

00:43:38.333 --> 00:43:42.393
That a regular ant can crawl all over your skin unnoticed.

00:43:42.393 --> 00:43:45.273
Or you may be able to sense where it

00:43:45.273 --> 00:43:47.953
is in in some part of your skin and so there you have the

00:43:47.953 --> 00:43:53.013
same issue of you have space your your your the skin covering your body and

00:43:53.013 --> 00:43:56.833
then there's a location on it where something of interest is happening and you

00:43:56.833 --> 00:44:00.993
know it's it it becomes some of these things become rather similar olfaction

00:44:00.993 --> 00:44:06.813
and taste um maybe we have to give them special um special

00:44:06.993 --> 00:44:09.793
exemptions right exactly no but this

00:44:09.793 --> 00:44:12.653
is this is very good right because in this would imply

00:44:12.653 --> 00:44:17.173
that from this perspective of uncertainty reduction right you can really have

00:44:17.173 --> 00:44:20.853
a modality independent view of this process right this is the power of what

00:44:20.853 --> 00:44:24.613
you're doing so this is very good absolutely so so we're not stuck with vision

00:44:24.613 --> 00:44:31.093
alone no no no no very good so now um so now that you solved the vision problem.

00:44:32.633 --> 00:44:36.233
Um then you thought okay now i solve vision uh

00:44:36.233 --> 00:44:40.553
let's do motor control right so you jumped into motor control where did why

00:44:40.553 --> 00:44:47.873
did that happen well i moved to texas and and um when we moved the lab to texas

00:44:47.873 --> 00:44:55.933
um for a lab that we had to recreate it it's quite expensive so the the university of Texas at Austin,

00:44:56.093 --> 00:45:03.813
gave us a very nice package that allowed us to recreate the lab in a new location.

00:45:04.893 --> 00:45:11.193
But in that move, I made a personal decision that we should start taking more

00:45:11.193 --> 00:45:15.993
risks rather than continue exactly on the same line that we were doing.

00:45:16.213 --> 00:45:21.373
And so we thought that we would take a look at moving the body,

00:45:21.433 --> 00:45:24.373
not just the eyes, but But what if you move the rest of the body?

00:45:25.413 --> 00:45:30.213
What would that look like computationally? So what we do is this embodied cognition

00:45:30.213 --> 00:45:36.073
field, which is rather strange because you build these computer models of the human.

00:45:36.393 --> 00:45:41.753
And you try to respect a lot of the knowledge coming out of neuroscience and psychology.

00:45:41.853 --> 00:45:44.513
But within those boundaries, you try to...

00:45:45.406 --> 00:45:50.886
Find what the principles are, like you just did this wonderful summary of what

00:45:50.886 --> 00:45:52.946
we're about in terms of getting abstract models.

00:45:53.546 --> 00:45:57.866
And so we thought we'd try to do that for motor control. And we've started.

00:45:57.886 --> 00:46:00.646
It's just all new and different.

00:46:00.906 --> 00:46:06.366
We have what's called MOCAP, motion capture system, that allows us to capture

00:46:06.366 --> 00:46:11.646
the movement of an entire body and making arbitrarily poses and motions.

00:46:11.646 --> 00:46:17.566
And then we can convert that to a skeletal representation.

00:46:17.806 --> 00:46:23.166
So we have this elaborate computer software package developed at Stanford, OpenSim.

00:46:23.986 --> 00:46:29.726
And that allows you to have what we can loosely characterize as half a human.

00:46:29.826 --> 00:46:36.446
So a human has 600 muscles and what they call 300 degrees of freedom. Let's call them joints.

00:46:37.146 --> 00:46:42.846
And the software package has just half of that. So 300 possible muscles and

00:46:42.846 --> 00:46:47.026
150 possible joints or degrees of freedom, as we call them.

00:46:47.606 --> 00:46:55.826
And so we've been working with that to try to come up with abstract characterizations of movements.

00:46:56.026 --> 00:47:00.326
And we're making some headway, we think. Okay.

00:47:00.446 --> 00:47:05.646
But now, in looking at this motor control issue, you drew a parallel between,

00:47:05.766 --> 00:47:11.566
let's say, your standard robot, or humanoid robot, and a cat in this case.

00:47:11.806 --> 00:47:18.146
Right. So, what are, let's say, the obvious differences between these two?

00:47:18.326 --> 00:47:20.686
Like, what makes a robot so different from a cat?

00:47:23.546 --> 00:47:29.806
Well, it's a delicate thing to explain, but it centers around the notion of abstraction.

00:47:30.326 --> 00:47:34.606
And the easiest way to start is with Mozart.

00:47:34.966 --> 00:47:42.506
So if you take sheet music, everything Mozart ever wrote fits on one CD.

00:47:42.926 --> 00:47:48.666
But, of course, he was very prolific and he wrote hundreds of things.

00:47:48.666 --> 00:47:54.946
And so if you have the CD for all the music that's not coded in sheet music,

00:47:55.086 --> 00:47:56.906
it'll take many, many, many CDs.

00:47:57.206 --> 00:48:01.106
And so the thought is, so what's the difference? It's because,

00:48:01.226 --> 00:48:06.806
let's think in terms of the piano, the code is the piano has keys,

00:48:06.906 --> 00:48:10.966
but then each key plays a complicated note.

00:48:11.206 --> 00:48:15.906
And so playing the note per se gets all these high-frequency vibrations going.

00:48:16.086 --> 00:48:19.766
And that's expensive to reproduce. produce, whereas the hecto note,

00:48:19.826 --> 00:48:22.266
it's very easy to write down the symbol that will do that.

00:48:22.426 --> 00:48:27.946
So in the same way, the cat, you mentioned the cat, and also humans,

00:48:28.026 --> 00:48:30.126
for that matter, have the spinal cord.

00:48:30.426 --> 00:48:34.446
And so when it comes before the brain, inside the spinal cord is what's called

00:48:34.446 --> 00:48:39.286
pattern generators that are very loosely analysis. Yes.

00:48:41.639 --> 00:48:46.319
Loosely could be put in correspondence with a piano. So basically your spinal

00:48:46.319 --> 00:48:52.539
cord is playing the role of the piano and then your brain then just has to have

00:48:52.539 --> 00:48:54.679
the sheet music and play the spinal cord.

00:48:54.859 --> 00:48:59.459
And so that's the insight where the particular path we're going down and looking into.

00:48:59.699 --> 00:49:05.639
Whereas if we take a conventional robot, most conventional robots do not have this concept.

00:49:05.859 --> 00:49:10.879
And so basically there's no concept of sheet music. you are actually creating

00:49:10.879 --> 00:49:15.779
the notes with very, very high performance silicon computation.

00:49:16.419 --> 00:49:20.279
But then, so if you look at this encapsulation, because it's basically what you're saying, right?

00:49:20.339 --> 00:49:24.699
So you have a basic functionality encapsulated at your spinal cord level.

00:49:25.099 --> 00:49:28.039
Let's say I might have something like these force fields I'm controlling on

00:49:28.039 --> 00:49:31.339
my spinal cord, so I can move the effectors in some space.

00:49:31.799 --> 00:49:36.499
And then how many piano players are you considering above that?

00:49:36.579 --> 00:49:39.399
Is that one? one but for instance

00:49:39.399 --> 00:49:42.499
the brainstem might be pushing these keys and then the frontal

00:49:42.499 --> 00:49:45.139
areas might be pushing the keys of the brainstem etc right so

00:49:45.139 --> 00:49:49.379
how many layers of piano players would you consider well roughly

00:49:49.379 --> 00:49:52.619
i would say we'll make one more distinction

00:49:52.619 --> 00:49:59.419
before we get there and so if you want a movement that's repetitive that's a

00:49:59.419 --> 00:50:05.459
part of all your movements so so one One thing in the spinal cord I should have

00:50:05.459 --> 00:50:10.739
one little elaboration that will make things simpler is that rather than have a different,

00:50:10.859 --> 00:50:13.519
what's the size of the piano we could be arguing?

00:50:13.879 --> 00:50:18.819
And mathematically, there'd be a way of basically playing chords so that you

00:50:18.819 --> 00:50:21.019
can have some basic chords.

00:50:21.082 --> 00:50:24.202
Keys, frequencies, and then combine them in many, many different ways.

00:50:24.342 --> 00:50:28.622
And so you can get a huge spectrum of movements rather cheaply in that regard.

00:50:29.002 --> 00:50:35.682
But another problem you face is that when you put on your backpack to go home

00:50:35.682 --> 00:50:39.222
as a student, then you've changed where your center of gravity is.

00:50:39.362 --> 00:50:42.682
So all the movements that are carefully designed to keep you from falling over.

00:50:43.522 --> 00:50:45.862
Won't work unless they get some help.

00:50:46.102 --> 00:50:50.722
And so you have a part of your brain that's much older than in the forebrain,

00:50:50.722 --> 00:50:52.662
is called the cerebellum.

00:50:52.702 --> 00:50:57.542
And so inside the cerebellum, it's been shown that the job of the cerebellum

00:50:57.542 --> 00:51:00.802
is just to put these temporary adaptations in there.

00:51:01.022 --> 00:51:06.662
And so one big thing you have to solve before we get to the question you raised is load balancing.

00:51:06.982 --> 00:51:10.662
So there's something about your body that's changed. In some way,

00:51:10.662 --> 00:51:16.142
you're swimming in the ocean, or you've got a backpack on, or something strange

00:51:16.142 --> 00:51:17.342
has happened. It has to be adjusted.

00:51:17.602 --> 00:51:21.882
That's the cerebellum. So once the cerebellum is putting that capability.

00:51:22.342 --> 00:51:29.802
Then now we come to the task that you put on us, and that is that somehow if

00:51:29.802 --> 00:51:34.502
you have a movement, like getting the kettle off the stove or whipping up an

00:51:34.502 --> 00:51:36.042
omelet, those movements,

00:51:36.282 --> 00:51:42.682
some coordinates in the world have to be translated into coordinates in the body.

00:51:42.682 --> 00:51:45.402
And so the forebrain is really in charge of that.

00:51:45.522 --> 00:51:48.062
And so we don't know how this is done.

00:51:48.222 --> 00:51:52.242
Nobody knows how this is done. But the idea, but what might be,

00:51:52.282 --> 00:51:56.742
what the job might be is believed to be somehow doing this translation.

00:51:56.962 --> 00:52:03.162
So somehow the newer forebrain has to take the task requirements and put them

00:52:03.162 --> 00:52:05.402
into some sheet music-like code.

00:52:05.662 --> 00:52:10.042
Right. And we know this isn't easy because the forebrain itself is organized

00:52:10.042 --> 00:52:15.102
into layers. So you start out with basic ideas and then you put more and more

00:52:15.102 --> 00:52:16.622
abstract layers on top of it.

00:52:16.682 --> 00:52:19.702
So the code for these movements is not going to be easy to find out.

00:52:19.722 --> 00:52:23.502
But at least we have some ideas on how it might be organized. Right.

00:52:24.198 --> 00:52:28.458
So the model would be a little bit like I have my piano keys and then my spinal cord.

00:52:28.858 --> 00:52:33.258
Then I have, let's say, some sort of background modulator, which is the cerebellum,

00:52:33.258 --> 00:52:35.198
that keeps this all a little bit calibrated.

00:52:35.318 --> 00:52:38.338
And then you have your piano player in a four-brain structure pushing specific

00:52:38.338 --> 00:52:41.038
keys so you can actually walk the stairs, something like this.

00:52:41.058 --> 00:52:42.018
That would be the model. Yes.

00:52:42.558 --> 00:52:49.678
Yes. But then it's almost like you're an expert musician. So you can somehow

00:52:49.678 --> 00:52:55.918
chunk huge pieces of music into one common code that you remember.

00:52:56.138 --> 00:52:58.318
So you can kind of economize it.

00:52:58.658 --> 00:53:07.058
It's almost like any other kind of athlete where you find huge sections of body

00:53:07.058 --> 00:53:09.598
movements you can repeat from

00:53:09.598 --> 00:53:13.038
memory without thinking about it because they're sort of coded by rote.

00:53:13.038 --> 00:53:19.698
And so the forebrain is sort of the generator of these kinds of more and more advanced codes.

00:53:20.018 --> 00:53:23.378
Yeah, but so now we have this multi-stage system.

00:53:25.018 --> 00:53:29.518
But if you now go back to the robotics example, there actually there's a very

00:53:29.518 --> 00:53:32.698
standard procedure of how to control movement.

00:53:32.838 --> 00:53:39.298
Because you would say, okay, look, I want to move the endpoint to some XYZ position in space.

00:53:39.298 --> 00:53:42.018
Space and now i just need some

00:53:42.018 --> 00:53:45.438
sort of inverse um a model that

00:53:45.438 --> 00:53:48.418
now maps that to the forces have to apply to my to my joints

00:53:48.418 --> 00:53:51.998
and then i move my endpoint in space that just

00:53:51.998 --> 00:53:55.258
however turns out to be a tricky problem right but

00:53:55.258 --> 00:53:58.478
so do you see then this division you see between spinal

00:53:58.478 --> 00:54:01.238
cord cerebellum forebrain as we just sketched it

00:54:01.238 --> 00:54:04.418
as mapping onto that division that engineers

00:54:04.418 --> 00:54:08.078
are using in the the control of robots or that's

00:54:08.078 --> 00:54:11.658
that's sort of an alter another kind of solution it's another

00:54:11.658 --> 00:54:14.658
kind of solution i think it's a huge a huge fork

00:54:14.658 --> 00:54:17.458
in the road and i don't know if this would help

00:54:17.458 --> 00:54:22.458
but we could think of trigonometry so if we take the function the sine of an

00:54:22.458 --> 00:54:26.418
angle we have an angle we want its sine function and so there's two ways in

00:54:26.418 --> 00:54:31.138
computers there's two ways to do that one is you could take small divisions

00:54:31.138 --> 00:54:36.258
of angle and pre-compute the sine for each one of those so So you can have a table,

00:54:36.398 --> 00:54:40.638
an elaborate table. You supply the angle.

00:54:41.078 --> 00:54:44.718
I get that angle, and I go to my table, and I look up, what's the sign?

00:54:45.443 --> 00:54:50.603
So in that table, it's rather spacious, but it's very fast.

00:54:51.803 --> 00:54:55.583
And the other way to do it is I could use a series expansion.

00:54:55.843 --> 00:54:59.323
So for sine, we could, what's the sine of theta?

00:54:59.483 --> 00:55:06.923
Well, it's theta minus theta squared over factorial two plus theta cubed over

00:55:06.923 --> 00:55:09.643
factorial three. I think I'm remembering the formula correctly.

00:55:10.303 --> 00:55:15.883
But you can write out a formula for it. And then every time you want to know

00:55:15.883 --> 00:55:19.743
the value of an angle, you could just run it through this summation,

00:55:19.903 --> 00:55:22.863
which you can roll out to whatever accuracy you want.

00:55:23.063 --> 00:55:28.103
So the two, let's summate the two approaches.

00:55:28.283 --> 00:55:32.963
One is you have your table that has the fast lookup. And then the other one

00:55:32.963 --> 00:55:34.763
is you compute it as needed.

00:55:34.903 --> 00:55:38.563
But the computation is much more extensive. One table lookup,

00:55:38.663 --> 00:55:41.763
many, many terms to be computed and added up together.

00:55:41.763 --> 00:55:46.183
And so you could think of the robotics approach,

00:55:46.343 --> 00:55:51.243
one interpretation of the robotics approach is it's very much the series expansion

00:55:51.243 --> 00:55:56.983
mode, the computationally intensive mode, which we can go down that road because

00:55:56.983 --> 00:55:58.243
the computers are so fast.

00:55:58.283 --> 00:56:01.063
We have a lot of cycles, so it's tempting to use them.

00:56:01.063 --> 00:56:06.003
Whereas if you can think of the body and the human body as not having the cycles

00:56:06.003 --> 00:56:11.443
because the neural hardware is over a million times slower than silicon,

00:56:11.683 --> 00:56:15.443
so that the human has to use these stable lookup approaches.

00:56:15.863 --> 00:56:22.363
And furthermore, the table lookups are sort of burned in over evolution and also development.

00:56:22.423 --> 00:56:26.603
Each human starts out not knowing how to make movements, and then you sort of

00:56:26.603 --> 00:56:30.003
rather painstakingly fall on the floor, learn to crawl, learn to raise your

00:56:30.003 --> 00:56:30.943
head, et cetera, et cetera.

00:56:31.063 --> 00:56:35.063
And so you burn them in over, really over years, you're willing to take the

00:56:35.063 --> 00:56:38.443
time to develop the movements you'll need as an adult. So it's like you're trading

00:56:38.443 --> 00:56:40.543
off memory versus processing. Exactly.

00:56:40.783 --> 00:56:43.443
Exactly. It's exactly right. So in your...

00:56:43.931 --> 00:56:48.171
You would think that in this division that you just early made of the brain.

00:56:49.011 --> 00:56:52.951
It actually relies on memory to do this, perform this task efficiently,

00:56:53.251 --> 00:56:56.091
as opposed to computing this all the time, right?

00:56:56.271 --> 00:57:00.711
And these ideas have been around a long time. They're not new.

00:57:00.911 --> 00:57:05.871
And Chris Atkinson had them early on with MIT and then Georgia Tech,

00:57:05.891 --> 00:57:14.931
but amongst others. but a lot of math has been developed to come up with more compact codes.

00:57:15.171 --> 00:57:20.071
So the codes for these tables are becoming cheaper and cheaper so that they're

00:57:20.071 --> 00:57:26.091
becoming more possible as being conceptualized as the way the humans might be doing it.

00:57:26.191 --> 00:57:30.551
So what's the benchmark you would like to put out there also for the roboticists

00:57:30.551 --> 00:57:35.891
to which you would like to also compare your own solutions in this kind of motor control task?

00:57:36.151 --> 00:57:39.351
What would be a benchmark that you think is plausible and convincing?

00:57:41.631 --> 00:57:49.151
Well, I have a graduate student, Joseph Cooper, and his benchmark is he's making a racquetball player.

00:57:49.691 --> 00:57:56.551
And so he has to get his PhD, he'll have a racquetball player that he can play against.

00:57:56.871 --> 00:58:00.431
In the physical world? In the virtual world.

00:58:00.471 --> 00:58:10.231
In the virtual world. So his virtual, an avatar of him, will play against another avatar.

00:58:10.431 --> 00:58:12.691
Right. And may the best avatar win.

00:58:12.991 --> 00:58:15.051
Okay. I agree with that.

00:58:16.071 --> 00:58:18.851
So Dana, I mean, you're around in this business for a while,

00:58:19.071 --> 00:58:25.091
made incredible progress on this understanding of perception, active vision.

00:58:25.931 --> 00:58:31.171
What would you see as Dana's law? What's Dana's law that we should adhere to

00:58:31.171 --> 00:58:32.711
in understanding the brain?

00:58:35.631 --> 00:58:45.271
Well, I think that as an academic, it's hard to have a law because no one obeys you.

00:58:45.851 --> 00:58:49.831
That's the nature of your calling. It's an ideal world.

00:58:50.191 --> 00:58:55.731
Everybody will obey Dana's law. That's right. Well, I think one thing I do think

00:58:55.731 --> 00:59:00.111
that is sorely needed in understanding the brain, we'd all like to understand

00:59:00.111 --> 00:59:04.291
the brain, is the idea that an idea that is,

00:59:05.023 --> 00:59:07.823
totally essential to silicon computation, that's abstraction.

00:59:08.663 --> 00:59:14.643
So we can't even think about computation without thinking of different layers of abstraction.

00:59:14.863 --> 00:59:19.803
So you have the operating system that runs your program. You have your program

00:59:19.803 --> 00:59:21.643
that's written in a high-level language.

00:59:21.903 --> 00:59:25.323
That program gets translated into machine language.

00:59:25.663 --> 00:59:30.663
That program gets translated into a microcode that the particular machine architecture

00:59:30.663 --> 00:59:34.303
that can understand, that code runs on gates, right?

00:59:34.363 --> 00:59:38.223
The gates are composed of layers of silicon, and it keeps going and going.

00:59:38.403 --> 00:59:43.083
And so if we didn't have this essential concept of layers of abstraction, we'd be stuck.

00:59:43.363 --> 00:59:50.623
And Alan Newell, really, he articulated this most elegantly in his book,

00:59:50.783 --> 00:59:52.403
Unified Theories of Cognition.

00:59:52.763 --> 00:59:56.263
But I think that's missing. And I think what we really need,

00:59:56.383 --> 00:59:58.443
and I think what the math developments,

00:59:58.663 --> 01:00:02.963
all the math modeling and machine learning that it's helping us with,

01:00:03.083 --> 01:00:08.663
is we really need those kinds of concepts in thinking about how the brain works, because it's obvious,

01:00:08.883 --> 01:00:15.443
or at least we think it's very, very necessary for the brain to succeed.

01:00:15.543 --> 01:00:19.723
It has to be somehow organized into layers of abstraction.

01:00:19.903 --> 01:00:23.203
And in this course thing, so spinal cord, et cetera, et cetera,

01:00:23.243 --> 01:00:27.663
et cetera, we can come up with good guesses, but when we get to the forebrain, we're not done.

01:00:27.663 --> 01:00:32.283
And I think that that's really the work site, is to go into the forebrain and

01:00:32.283 --> 01:00:36.803
try to figure out what are the useful layers of abstraction that the brain is using.

01:00:37.023 --> 01:00:39.283
So Dana's law is abstraction is good for you.

01:00:39.523 --> 01:00:45.163
That's right. Twice a day. Very good. So then to finish up, five years from

01:00:45.163 --> 01:00:49.103
now, probably earlier, but let's say five years from now, I'm going to come

01:00:49.103 --> 01:00:53.023
down to Texas and I'm going to remind you of a hypothesis you want to generate today,

01:00:53.223 --> 01:00:59.443
which is what hypothesis do you feel most passionate about today that I can

01:00:59.443 --> 01:01:03.803
ask you about five years from now and then it will turn out to be verified? Right.

01:01:07.836 --> 01:01:15.456
Well, I'll pick one. One of many hypotheses. Right. And will this be like a beer bet?

01:01:16.996 --> 01:01:20.356
It's more serious than that. More serious? Like a dinner out?

01:01:20.536 --> 01:01:22.316
What are we talking here? That's nothing.

01:01:22.716 --> 01:01:25.456
Come on. We're talking about serious bets here. The housing market?

01:01:25.596 --> 01:01:26.736
Maybe it's recovered by now.

01:01:27.696 --> 01:01:29.176
Now you're talking. Yeah, okay.

01:01:30.616 --> 01:01:34.716
Here's one rather specialized hypothesis that's very important to me.

01:01:34.716 --> 01:01:40.196
And so we talked about the forebrain. And inside the main memory system of the

01:01:40.196 --> 01:01:41.156
forebrain is the cortex.

01:01:41.956 --> 01:01:46.516
And the neurons in the cortex, they communicate by sending spikes.

01:01:46.816 --> 01:01:51.696
And those spikes are sent at very low rates.

01:01:51.876 --> 01:01:56.456
So somewhere 10 spikes per second, 50 spikes per second.

01:01:56.536 --> 01:02:01.716
Those are the communication channels of nerve cells in the brain's main memory

01:02:01.716 --> 01:02:06.956
system. And one of the most popular current hypotheses is rate coding.

01:02:07.316 --> 01:02:13.916
So neurons are trying to send a number to the other neurons they communicate with.

01:02:14.056 --> 01:02:17.616
And one of the astonishing things that the brain's memory system has,

01:02:17.876 --> 01:02:23.116
each cell can talk faithfully to 10,000 other cells.

01:02:23.236 --> 01:02:25.996
And this is a feat that silicon can't come close to. do.

01:02:26.096 --> 01:02:30.376
And so the thought is that these other cells, what they want to know is they

01:02:30.376 --> 01:02:33.936
want to know what this number is and how they estimate is by counting spikes.

01:02:34.376 --> 01:02:41.236
And so I just think that for a variety of reasons, this is untenable because

01:02:41.236 --> 01:02:44.856
you just can't communicate fast enough to get to do programs.

01:02:45.116 --> 01:02:52.696
And that the what's called rate coding hypothesis is actually a correlate of

01:02:52.696 --> 01:02:56.376
different kinds of codes that the neurons actually use.

01:02:56.576 --> 01:03:00.696
And so five years from now, this will be generally believed to be true.

01:03:00.916 --> 01:03:09.956
And so as a wonderful substitute is that the brain uses some kind of latency code.

01:03:10.176 --> 01:03:18.076
So it has each little agenda task has a clock and what's called the gamma frequency

01:03:18.076 --> 01:03:21.076
that's somewhere between 30 and 90 Hertz.

01:03:21.396 --> 01:03:28.076
And each agenda task gets a frequency, and then the spikes can send a little

01:03:28.076 --> 01:03:33.776
analog number by delaying their spike with respect to this clock pulse.

01:03:33.896 --> 01:03:36.456
So if you're right on the clock pulse, you're a big number, if you're.

01:03:37.954 --> 01:03:42.914
If you're delayed, you're a smaller number. And so the actual spikes are actually numbers.

01:03:43.074 --> 01:03:48.634
And if you can do that, then all of a sudden the doors are open for a lot of fast communication.

01:03:49.134 --> 01:03:54.474
So if you want a prediction, that's my prediction, that five years from now, a spike will be a number.

01:03:54.694 --> 01:03:58.294
Okay, very good. Thank you. So, Dana Ballard, thank you very much for this conversation.

01:03:58.714 --> 01:04:01.554
Oh, I was delighted to be here, Paul. You and I have been friends for a long

01:04:01.554 --> 01:04:03.294
time, and it was fun to do this, too.

01:04:03.294 --> 01:04:09.414
The CSN podcast was produced by the Convergent Science Network of Biometrics

01:04:09.414 --> 01:04:16.034
and Biohybrid Systems, a project funded by the European 7th Research Framework Programme.

01:04:17.360 --> 01:04:44.094
Music.