WEBVTT

00:00:03.617 --> 00:00:06.517
This is the Convergent Science Network podcast.

00:00:08.537 --> 00:00:13.057
Leading researchers in the domain of neuroscience, brain theory and technology

00:00:13.057 --> 00:00:16.797
are interviewed by Paul Verschoor and Tony Prescott.

00:00:19.797 --> 00:00:23.277
This is Paul Verschoor with the Convergent Science Network podcast.

00:00:24.397 --> 00:00:29.077
And today I'm talking with Peter Gardefors, who's one of our speakers in our summer school.

00:00:30.017 --> 00:00:35.357
And Peter, you presented your work in generalizing a theoretical framework that's

00:00:35.357 --> 00:00:37.697
called conceptual spaces towards language.

00:00:39.557 --> 00:00:42.617
So what's this notion of a conceptual space exactly?

00:00:43.757 --> 00:00:47.497
It's a model of how you represent knowledge. I mean, I'm a cognitive scientist

00:00:47.497 --> 00:00:52.457
and I want to understand in some sense how our minds represent knowledge.

00:00:52.577 --> 00:00:53.917
And we have different traditions.

00:00:54.097 --> 00:00:58.557
I mean, you have the symbolic approach of the early AI and then you have the neural networks.

00:00:59.077 --> 00:01:04.717
My conceptual space is something in between, because I work with geometric structures,

00:01:04.837 --> 00:01:13.337
with metrics, distances, vectors, and that kind of notions to model different kinds of knowledge.

00:01:13.857 --> 00:01:21.077
So, can you give me an example of that? Well, the normal example I use is the color space.

00:01:21.717 --> 00:01:26.117
We perceive colors, but we organize colors along three dimensions.

00:01:26.117 --> 00:01:31.317
I mean, there is the hue, the color circle going red, blue, green, yellow, and so on.

00:01:31.417 --> 00:01:34.817
And then there is the dark and light.

00:01:35.117 --> 00:01:38.817
And then there is the intensity going from gray. So there's the three-dimensional space.

00:01:39.177 --> 00:01:44.557
And that's how we perceive colors. And it's fairly well established in psychophysics

00:01:44.557 --> 00:01:48.277
that we have this spatial arrangement of colors.

00:01:49.337 --> 00:01:54.917
So, but now in some sense, I could say, well, if I talk about an object having

00:01:54.917 --> 00:01:56.817
a certain color, like an apple being red.

00:01:57.644 --> 00:02:01.304
Then this apple might have a number of properties. So it will be round,

00:02:01.484 --> 00:02:03.744
it will have a certain taste, a smell, a color, and so on.

00:02:03.984 --> 00:02:06.184
So how do I bring that together in a conceptual space?

00:02:06.564 --> 00:02:10.864
Well, one of my key notions is that of a domain. I mean, we have several domains

00:02:10.864 --> 00:02:12.184
that we organize the knowledge.

00:02:12.304 --> 00:02:16.364
Color is one, size is another, shape is a third, temperature, and so on.

00:02:16.484 --> 00:02:19.464
We have a lot of, some of them are based on perception, some of them are based

00:02:19.464 --> 00:02:23.124
on action, some are based on our social grounding in the society.

00:02:23.124 --> 00:02:29.344
So, there is this general notion of domain that is used to sort up the information.

00:02:29.904 --> 00:02:34.024
So, you would not so much see it as, let's say, dimensions spanning this space,

00:02:34.204 --> 00:02:36.504
but a number of domains, or is it equivalent?

00:02:36.804 --> 00:02:41.084
Each domain could consist of a number of dimensions, and color domain is three-dimensional,

00:02:41.164 --> 00:02:44.884
and taste domain is four- or five-dimensional.

00:02:45.444 --> 00:02:49.804
So, are these domains like a Kantian prior? You're just born with it?

00:02:51.784 --> 00:02:55.224
Partly I mean it's my theory is neo-Kantian if

00:02:55.224 --> 00:02:58.044
you really want to bring in the philosophy Kant said

00:02:58.044 --> 00:03:01.124
we are born with space and time I say that we are born with

00:03:01.124 --> 00:03:06.184
we are disposed to have representations of space we learn time much later but

00:03:06.184 --> 00:03:10.124
then of course there are many domains that are dependent on us being in a culture

00:03:10.124 --> 00:03:15.264
and learning to discriminate new things and so on so our set of domains are

00:03:15.264 --> 00:03:19.604
expanding as we grow up in a society and they can

00:03:19.684 --> 00:03:22.484
change over time so it's a mixture if you want it's a

00:03:22.484 --> 00:03:25.324
mixture so there's a so do you is there really a

00:03:25.324 --> 00:03:31.364
a defined set of prior domains no no no no i mean we are born with certain sensory

00:03:31.364 --> 00:03:36.364
organs that that determine some of the basic domains like color perception like

00:03:36.364 --> 00:03:42.084
perception of heat and the tastes and and and so on but on top of this we build

00:03:42.084 --> 00:03:43.384
a number of domains i mean I mean,

00:03:43.424 --> 00:03:46.284
we extract other types of information.

00:03:46.824 --> 00:03:52.584
For instance, I mean, I have been emphasizing our perception of forces.

00:03:52.724 --> 00:03:56.224
We are quite good at seeing forces in people's emotions and so on.

00:03:56.284 --> 00:03:57.724
Right. Okay. So now we have our domains.

00:03:58.284 --> 00:04:02.044
And then you say, look, if I now see an apple, then I have a number of domains.

00:04:02.264 --> 00:04:08.444
I might have a visual domain and a haptic domain and olfactory domain and a taste domain.

00:04:08.864 --> 00:04:13.064
And now across these domains, apples are a set of data points.

00:04:13.364 --> 00:04:14.824
In that multidimensional space.

00:04:15.224 --> 00:04:18.524
So I have a cluster of data points and I say all these data points together,

00:04:18.764 --> 00:04:21.464
this cloud of data points, this is now Apple. Yep.

00:04:22.124 --> 00:04:25.444
It's a point in a very high dimensional space, but the space is organized along

00:04:25.444 --> 00:04:26.704
a number of domains. Right, exactly.

00:04:27.084 --> 00:04:31.664
But now, would there be a way to, let's say, reorganize that space?

00:04:31.844 --> 00:04:35.724
Like, for instance, you might at time, you might sometimes discover that,

00:04:35.764 --> 00:04:37.304
let's say, Santa Claus doesn't exist.

00:04:37.724 --> 00:04:42.684
So suddenly one organizational element of my reality is sort of,

00:04:42.684 --> 00:04:43.804
has disappeared, and now I

00:04:43.804 --> 00:04:47.304
have to reorganize all my data. So how does it work in a conceptual space?

00:04:47.464 --> 00:04:51.084
Well, first of all, you can add new domains. You can learn about apples having

00:04:51.084 --> 00:04:53.064
nutritional value or something like that.

00:04:53.891 --> 00:04:57.691
What happens quite often in learning new things is that you change the importance

00:04:57.691 --> 00:05:01.591
of domains, which are the most important in making the classifications.

00:05:01.911 --> 00:05:08.651
So in biology, maybe the behavior of an animal was more important when you classified

00:05:08.651 --> 00:05:12.571
something as a fish, but then came the biologists and said, no,

00:05:12.671 --> 00:05:16.051
it's the skeletal structure or how you feed your kids or whatever.

00:05:16.251 --> 00:05:21.951
Other variables that are more important in zoological classification.

00:05:21.951 --> 00:05:24.791
Classification yeah but then i was more worried about

00:05:24.791 --> 00:05:27.591
the case if i lose a domain not about adding domains but if

00:05:27.591 --> 00:05:30.831
i lose one that's why i thought about santa claus yeah

00:05:30.831 --> 00:05:34.411
but santa claus is not a domain i mean it's a fictional fictional object

00:05:34.411 --> 00:05:40.411
yeah i don't know any good examples of losing domain so yeah well imagine that

00:05:40.411 --> 00:05:44.971
i believe that uh let's say objects move physical objects move in the world

00:05:44.971 --> 00:05:49.391
because they have intentional states yeah and then i discovered that actually

00:05:49.391 --> 00:05:51.151
stones don't have intentional states.

00:05:51.271 --> 00:05:56.331
Now suddenly, the way I frame the world is changing. Okay, I would describe

00:05:56.331 --> 00:06:02.351
that as the domain that maybe does not disappear, but you assign it a very low value.

00:06:02.471 --> 00:06:06.751
It becomes a zero in your concept classification.

00:06:07.331 --> 00:06:13.631
To take an example from chemistry, you had some idea of the caloric dimension

00:06:13.631 --> 00:06:16.411
that was used to classify chemical stuff.

00:06:16.731 --> 00:06:18.871
And then suddenly, there was

00:06:18.871 --> 00:06:22.191
a revolution in chemistry and the caloric dimension totally disappeared.

00:06:22.451 --> 00:06:24.531
I mean, you can find examples of that in the history of science.

00:06:24.671 --> 00:06:27.531
In the history of human perception or in human classification,

00:06:27.731 --> 00:06:30.911
it's maybe more difficult to find such changes.

00:06:31.151 --> 00:06:35.091
So now I have this cloud of points in apple space.

00:06:38.443 --> 00:06:43.383
If you understand your proposal, and if you want the center of gravity of that

00:06:43.383 --> 00:06:46.283
cloud of data points is now my prototype of an apple, right?

00:06:46.463 --> 00:06:51.063
Yes, and that cloud is, I make the assumption that it's a convex region.

00:06:51.363 --> 00:06:57.103
I mean, a concept, Apple, corresponds to a convex region, meaning that for any

00:06:57.103 --> 00:07:00.043
two points in the region, all points in between are also there.

00:07:00.183 --> 00:07:06.043
And that is very useful in helping us understand how we can learn concepts to

00:07:06.043 --> 00:07:08.043
have this notion of convexity.

00:07:08.043 --> 00:07:14.223
But it does mean that you clearly have a very empiricist view on prototypes

00:07:14.223 --> 00:07:17.523
and concepts that are really derived from, if you want, data points obtained

00:07:17.523 --> 00:07:19.883
from the world. Yes and no. It's empiricist.

00:07:20.043 --> 00:07:26.703
And once you have the domains, it's data-driven how you divide the spaces into regions.

00:07:27.023 --> 00:07:32.983
But the domains are, well, the neocantrian position that they are partly given, partly learned.

00:07:33.243 --> 00:07:37.143
Okay. But now, how should I think about the development of these?

00:07:37.143 --> 00:07:38.743
Imagine I want to add a new domain.

00:07:38.903 --> 00:07:43.903
So the discovery of a new domain could be the result of just adding data points

00:07:43.903 --> 00:07:50.083
to my conceptual space and discover that I cannot organize these data points anymore.

00:07:50.183 --> 00:07:53.583
And I have to say, aha, I should now assume there's another domain.

00:07:54.043 --> 00:07:58.283
So how does that work? How do you go from single observations and their accumulation

00:07:58.283 --> 00:08:01.023
to this decision? Okay, new domain.

00:08:01.943 --> 00:08:06.503
That's normally tougher in individual. I mean, here in our society,

00:08:06.623 --> 00:08:10.503
we have scientists who say that we need this dimension, this domain,

00:08:10.583 --> 00:08:14.843
in order to understand the difference between this and that. I mean, so I...

00:08:17.847 --> 00:08:21.187
You can do it as an individual, but it's very tough.

00:08:21.367 --> 00:08:24.867
I mean, you're living in a culture where there are new ideas about what it means

00:08:24.867 --> 00:08:26.807
you need to understand something.

00:08:27.147 --> 00:08:31.867
Right. But do you see this in the same in relation to, let's say,

00:08:31.907 --> 00:08:36.327
the standard notions of assimilation and accommodation that Jean Piaget would

00:08:36.327 --> 00:08:37.087
talk about in development?

00:08:37.927 --> 00:08:43.027
You can interpret it in that way. I mean, he didn't never say anything about

00:08:43.027 --> 00:08:48.247
how these processes work. He didn't have any model of assimilation and accommodation.

00:08:48.887 --> 00:08:52.027
Assimilation would have been just collecting data points, and accommodation

00:08:52.027 --> 00:08:55.107
would be changing the space, I mean, the structure of the underlying space.

00:08:55.287 --> 00:09:00.767
So I can model these processes that way here. Okay, but now at the core of your,

00:09:00.867 --> 00:09:04.847
and this is also what you said earlier, at the core of your approach is really

00:09:04.847 --> 00:09:06.667
a notion of a metric, a topology,

00:09:06.907 --> 00:09:10.287
and also a quantification of similarity, right? Right.

00:09:10.387 --> 00:09:17.347
So can that really generalize to, let's say, all different levels of, let's say, concepts?

00:09:18.127 --> 00:09:25.127
No, that's a good question, because our concept classification is very much

00:09:25.127 --> 00:09:28.127
dependent on our perceptual mechanism. That's the basic classification.

00:09:28.407 --> 00:09:32.727
But then we can introduce more abstract concepts, different means,

00:09:32.927 --> 00:09:40.127
and in particular via language. I mean, you have mathematics and in law you

00:09:40.127 --> 00:09:42.187
introduce new concepts by definitions.

00:09:42.367 --> 00:09:47.547
And we use language to construct new concepts. And there you may get out of

00:09:47.547 --> 00:09:48.387
the geometric constructions.

00:09:48.967 --> 00:09:52.947
I mean, I don't know if I can apply my methods there. But they are dependent

00:09:52.947 --> 00:09:57.127
on being grounded in these more low-level concepts.

00:09:57.427 --> 00:10:01.647
You can't start from the bottom by defining concepts. You have to ground them

00:10:01.647 --> 00:10:04.527
in some kind of perceptual domain.

00:10:04.867 --> 00:10:09.707
Okay, so you're saying the conceptual space is exposed if you want to both feed

00:10:09.707 --> 00:10:15.067
forward or bottom up influences perception, sensation, and top-down influences

00:10:15.067 --> 00:10:18.107
like high-level symbolic systems related to language.

00:10:18.407 --> 00:10:22.227
And in some sense, what you're saying is that what is most clear for you at

00:10:22.227 --> 00:10:25.387
this point in time is how this feed forward component actually operates.

00:10:25.587 --> 00:10:29.467
And then the top-down element is sort of for the future. Yeah.

00:10:29.587 --> 00:10:36.407
Well, I have one area that I think is very useful to create new concepts is by using metaphors.

00:10:36.747 --> 00:10:40.587
So if I mean, to take an example from science, when you introduce the notion

00:10:40.587 --> 00:10:45.387
of electricity, you have these notions of current and voltage and so on new

00:10:45.387 --> 00:10:46.727
dimensions that you don't perceive.

00:10:47.007 --> 00:10:50.587
And in order to understand how they work, you can then compare electricity with

00:10:50.587 --> 00:10:52.127
water running or whatever.

00:10:52.247 --> 00:10:56.267
And you get some kind of grounding in the perceptual domain in that way.

00:10:56.267 --> 00:11:00.847
But the electrical dimensions, they live on their own in one sense.

00:11:00.887 --> 00:11:02.167
You can formulate them mathematically.

00:11:02.747 --> 00:11:05.107
If you go to domains like, let's say, machine vision.

00:11:05.711 --> 00:11:10.531
There are lots of methods around for data association. So these are basic different

00:11:10.531 --> 00:11:14.851
kinds of methods, adaptive filtering methods and what have you,

00:11:14.951 --> 00:11:20.171
that try to explain how you can bring data points together in high dimensional space.

00:11:20.591 --> 00:11:26.091
Is that a rather direct expression of your conceptual space notion or is there a difference?

00:11:26.391 --> 00:11:31.491
Well, you must distinguish between technical solutions of dimension reductions.

00:11:31.491 --> 00:11:35.831
I mean, you can have lots of data sets and you can use multidimensional scaling

00:11:35.831 --> 00:11:40.571
or principal component analysis and so on to reduce and pick out the most important dimensions.

00:11:40.871 --> 00:11:44.071
But that's a technical solution. Then you have the more biological solution

00:11:44.071 --> 00:11:49.831
where you really try to connect the dimensions to some kind of perceptual mechanism

00:11:49.831 --> 00:11:52.831
or to your motor systems or whatever.

00:11:53.111 --> 00:11:55.331
I mean, and that's two different methods.

00:11:55.811 --> 00:12:00.591
Okay. All right. But they're not necessarily... You could imagine that any of

00:12:00.591 --> 00:12:05.811
these... Some of these methods would be rather directly implementing these notions of a conceptual space.

00:12:05.991 --> 00:12:08.971
Yeah, yeah. Okay, so there's no fundamental problem there.

00:12:09.031 --> 00:12:13.611
No, no, it's just a question of how biologically realistic you want to be in

00:12:13.611 --> 00:12:15.391
your implementations. Right, exactly.

00:12:15.711 --> 00:12:21.031
So now we have this idea of a conceptual space.

00:12:21.271 --> 00:12:23.691
And you've been working on that for quite a while. Yeah, yeah.

00:12:23.851 --> 00:12:28.431
And today in your presentation, you were emphasizing two, let's say,

00:12:28.491 --> 00:12:30.131
extensions of this framework.

00:12:30.131 --> 00:12:33.531
So on the one hand, you wanted to capture a notion of action,

00:12:33.691 --> 00:12:38.191
because so far you had looked more or less at static concepts,

00:12:38.211 --> 00:12:41.431
like apples, and on the other hand you want to go to language.

00:12:41.631 --> 00:12:49.131
So let's start with action. Well, I want to go to language as kind of application of these two.

00:12:49.371 --> 00:12:53.131
I mean, I want to use conceptual spaces to model the cognitive representation

00:12:53.131 --> 00:12:57.871
of actions, and I want to use them to model the cognitive representation of events.

00:12:58.451 --> 00:13:03.391
How do we think of events? What is the structure? Just like we have discovered

00:13:03.391 --> 00:13:04.851
the three-dimensional color space.

00:13:05.071 --> 00:13:10.371
I want to understand what is the action space in our minds. Right. Okay.

00:13:10.571 --> 00:13:15.471
So in a sort of boneheaded way, which I usually tend to follow,

00:13:15.591 --> 00:13:17.971
I could say, well, no big deal.

00:13:18.111 --> 00:13:22.051
I mean, also action I might be able to classify in some high-dimensional space.

00:13:22.807 --> 00:13:25.967
Let's say different limbs involved or whether it is

00:13:25.967 --> 00:13:28.907
just between agents or

00:13:28.907 --> 00:13:31.947
it's a single agent or an action on an object or whatever right

00:13:31.947 --> 00:13:35.387
so so why is that straightforward generalization

00:13:35.387 --> 00:13:38.327
not enough to classify action what's missing well

00:13:38.327 --> 00:13:42.167
what kind of data do you start from i mean you you can get a lot of data points

00:13:42.167 --> 00:13:46.687
about the human human body moving but that's a very high dimensional set you

00:13:46.687 --> 00:13:51.207
have to reduce the data somehow and you can do use as i said just technical

00:13:51.207 --> 00:13:57.267
ways of doing it But you could also use data from psychology on how we perceive actions.

00:13:57.587 --> 00:14:03.327
And then there are experiments showing that we have a kind of hierarchical representation of our bodies.

00:14:03.487 --> 00:14:07.867
I mean, there is the main body, then we have arms and legs, and we have forearms

00:14:07.867 --> 00:14:11.587
and lower arms and upper arms, and we have hands and fingers.

00:14:11.687 --> 00:14:13.007
I mean, you have a hierarchical representation.

00:14:13.467 --> 00:14:22.067
And an action is a movement in this hierarchical structure. So it's a fairly

00:14:22.067 --> 00:14:25.787
high-dimensional vector, dynamic vector that represents.

00:14:26.067 --> 00:14:31.447
And you need a more condensed way of understanding what an action is.

00:14:31.707 --> 00:14:35.067
I think that our brains need that kind of reduction.

00:14:35.547 --> 00:14:37.427
So it's interesting, right? Because that would mean that you're saying,

00:14:37.527 --> 00:14:41.727
well, the notion of conceptual space is like a core representational engine,

00:14:41.887 --> 00:14:43.707
which is modality independent.

00:14:44.387 --> 00:14:48.707
And then dependent on the perceptual filtering, I can now exploit this mechanism

00:14:48.707 --> 00:14:51.887
in different ways. So, this would be for action that you're saying,

00:14:52.027 --> 00:14:56.307
well, to map action into a conceptual space, it's basically how I filter action.

00:14:56.487 --> 00:15:01.287
So, I should maybe not look at action in static terms, but I should look at

00:15:01.287 --> 00:15:04.547
more at its dynamical property. So, how am I going to do that?

00:15:04.787 --> 00:15:10.087
I mean, if you see a person walking, take walking as a typical example here,

00:15:10.167 --> 00:15:13.487
you don't care about the clouds of the person, you don't care about whether

00:15:13.487 --> 00:15:17.407
it's hot or cold, you care about the movements of the person.

00:15:17.407 --> 00:15:20.267
So you abstract away a lot of the domains.

00:15:20.507 --> 00:15:25.207
And what remains in my mind is that you look at what are the forces that the

00:15:25.207 --> 00:15:27.867
person exerts on his or her body parts.

00:15:28.007 --> 00:15:32.087
So I see an action as a pattern of forces.

00:15:32.507 --> 00:15:40.307
That's my reduction of the domain for actions, to focus on the forces.

00:15:40.307 --> 00:15:43.867
Okay, so now, so also you showed in your talk that, for instance,

00:15:43.947 --> 00:15:48.707
humans actually need very little, few cues to extract motion,

00:15:48.967 --> 00:15:50.267
right? Like biological movement.

00:15:51.434 --> 00:15:54.854
And it's also been shown by others like Martin Gies and other people.

00:15:56.054 --> 00:15:59.494
So how does that help me to get to a notion of force?

00:16:00.194 --> 00:16:03.554
Well, I mean, first of all, these examples, I mean, it was Gunnar Johansson

00:16:03.554 --> 00:16:05.394
who started with this patch-like technique.

00:16:05.474 --> 00:16:08.974
You only put small lights on the joints of your body and that's enough,

00:16:09.074 --> 00:16:12.094
the information you get to identify actions.

00:16:12.174 --> 00:16:14.034
And you identify them extremely quickly.

00:16:14.174 --> 00:16:17.274
I mean, it takes 200 milliseconds to see what kind of action it is.

00:16:17.274 --> 00:16:21.334
It helps because it shows you that all the other features are not necessary

00:16:21.334 --> 00:16:24.114
you don't have to see the surface of the person moving,

00:16:24.374 --> 00:16:28.294
you don't have to see the color all the other features are gone so you see the

00:16:28.294 --> 00:16:32.334
kinematics and then of course you could say okay this kinematics that is,

00:16:33.714 --> 00:16:38.694
what you perceive in action but I go one step further and say that no it's the

00:16:38.694 --> 00:16:42.874
changes of the kinematics I mean if you talk mathematics there's a second derivative

00:16:42.874 --> 00:16:46.354
that bring out the forces forces.

00:16:46.714 --> 00:16:52.834
I would say that when we identify an action, it's more using the forces give

00:16:52.834 --> 00:16:58.554
us a more coherent and simpler representation of actions than looking at kinematics.

00:16:59.034 --> 00:17:04.674
Okay. This is interesting. I want to say, look, we have to have a direct perception

00:17:04.674 --> 00:17:08.234
of change in the kinematics, that's one thing. That's the second derivative.

00:17:09.314 --> 00:17:12.394
The direct perception of the change you call a force.

00:17:13.274 --> 00:17:16.034
But if I would look at that from, let's say, an engineering perspective,

00:17:16.294 --> 00:17:22.054
I could say, hey, wait one moment, but that would not map necessarily onto the

00:17:22.054 --> 00:17:26.974
forces I'm really exerting on the degrees of freedom of that walking human, right?

00:17:27.054 --> 00:17:29.954
So, how do I relate these two now?

00:17:30.154 --> 00:17:36.794
Okay, that's a good question because you have the perceptual inputs when seeing

00:17:36.794 --> 00:17:40.934
somebody walking and then you can exert the forces on the body limbs.

00:17:40.934 --> 00:17:46.674
On the other hand, you have the kinesthetic experiences of your muscle control,

00:17:46.874 --> 00:17:51.554
and you know that you have to stretch your arm in a certain way to push a door or whatever.

00:17:51.974 --> 00:17:58.474
And I'm making a tacit assumption here that these two systems map onto one another.

00:17:58.694 --> 00:18:04.054
This is basically like when I understand when you're talking,

00:18:04.254 --> 00:18:08.114
I'm somehow representing how you produce the sounds, but I also have to map

00:18:08.114 --> 00:18:10.134
them on how I produce the sound.

00:18:10.134 --> 00:18:14.994
It's the same kind of perception and motor action mapping. Right. Okay.

00:18:15.594 --> 00:18:20.474
But that would mean I would have great difficulties to understand the walking pattern of a bird.

00:18:21.466 --> 00:18:24.166
Because the bird has a rather different kind of body than I do,

00:18:24.266 --> 00:18:25.666
at least according to me.

00:18:25.946 --> 00:18:31.786
So now the interpretation, the grounding of the forces that you call them,

00:18:31.806 --> 00:18:32.946
I observe, will be difficult.

00:18:33.326 --> 00:18:35.666
So how do I circumvent that problem?

00:18:36.286 --> 00:18:41.746
Well, we understand the movements of other animals. We do understand ourselves.

00:18:41.926 --> 00:18:45.966
So there is this generalization problem. On the other hand, once your brain

00:18:45.966 --> 00:18:50.926
has learned to extract the second derivatives of movement, you can then see

00:18:50.926 --> 00:18:52.166
the patterns in movement.

00:18:52.366 --> 00:18:56.886
You can see that the patterns of sparrows flying are similar.

00:18:57.126 --> 00:19:03.486
They have much quicker wing flapping than if you look at an albatross flying.

00:19:03.626 --> 00:19:08.646
It's very slow and much more forceful wing flapping than in a sparrow.

00:19:08.806 --> 00:19:13.106
There are differences in the patterns, even in birds, and we can learn to identify them.

00:19:13.606 --> 00:19:16.906
But now, wouldn't there be an other way to interpret this where we say,

00:19:17.006 --> 00:19:20.326
well, what we really learned to classify are these derivatives.

00:19:21.526 --> 00:19:27.526
And we can call them, let's say, kinematic dynamics, something like this, or kinematic change.

00:19:28.786 --> 00:19:30.766
And these changes will, in the

00:19:30.766 --> 00:19:35.246
end, be brought about causally by forces operating on degrees of freedom.

00:19:35.326 --> 00:19:38.166
But from a perceptual perspective, I don't care about these forces.

00:19:38.326 --> 00:19:42.446
I care about kinematic change. Okay. So what would be wrong with just saying,

00:19:42.486 --> 00:19:44.486
look, why don't we call this kinematic change for now?

00:19:46.846 --> 00:19:49.866
It's a bit of a tough question, because now we've been talking about biological

00:19:49.866 --> 00:19:53.026
movements as the only type of actions.

00:19:55.216 --> 00:20:00.676
And we humans are not very good. Well, we are good Newtonians in one way that

00:20:00.676 --> 00:20:02.616
we can do these second derivatives.

00:20:02.976 --> 00:20:08.356
But our minds are also full of other types of forces, like in social interaction.

00:20:08.676 --> 00:20:14.836
I mean, I know that somebody is my superior. He or she has a force, can control my actions.

00:20:15.736 --> 00:20:25.056
I know that I'm attracted to a certain woman. and we describe attraction as a force as well.

00:20:25.156 --> 00:20:33.056
So in our understanding of what causes actions, there are other types of forces

00:20:33.056 --> 00:20:35.696
than the traditional Newtonian forces.

00:20:35.976 --> 00:20:39.536
That's why I want to use forces rather than just the movements,

00:20:39.656 --> 00:20:42.796
the kinematics or the dynamics involved in biological motion.

00:20:43.236 --> 00:20:47.436
But that's interesting, right? Because then what a prediction could be,

00:20:47.436 --> 00:20:52.036
You say, look, our perceptual systems, as applied to, let's say,

00:20:52.116 --> 00:20:54.856
many different phenomena, different levels of complexity in the world,

00:20:54.976 --> 00:20:57.256
are, if you want, assigning a notion of force.

00:20:57.496 --> 00:21:04.516
They are just inventing, let's say, a pseudo-causal relationship behind the

00:21:04.516 --> 00:21:08.956
change in the world we observe, which can be completely beside the real causes

00:21:08.956 --> 00:21:10.836
of what we observe. Sure, sure.

00:21:12.096 --> 00:21:14.796
That's a good interpretation because our brain is adding these,

00:21:14.916 --> 00:21:17.896
let's call them theoretical variables or hidden variables.

00:21:18.196 --> 00:21:23.316
But this is just like in visual perception. I mean, our eyes are adding the contours of objects.

00:21:23.536 --> 00:21:28.116
I mean, on the retina, there are no contours, but our brain is adding that in

00:21:28.116 --> 00:21:29.476
the interpretation of the world.

00:21:29.676 --> 00:21:34.776
So when we are looking at an action, we are adding the forces as a kind of contour,

00:21:34.916 --> 00:21:38.876
if you like, of the perception you have.

00:21:38.956 --> 00:21:41.416
It would be like a dynamic contour. It's a dynamic contour, exactly.

00:21:43.270 --> 00:21:46.750
So then that means we should not get too confused about this notion of force.

00:21:47.090 --> 00:21:50.570
No, no, you shouldn't restrict it to the physical force.

00:21:50.810 --> 00:21:53.650
But there's an interesting prediction there, right? Because it really means

00:21:53.650 --> 00:21:58.530
that this might be a way in which the brain is also imposing an interpretation of the world.

00:21:58.650 --> 00:22:04.090
And by virtue of that, that we are so good at recognizing biological motion with minimal cues.

00:22:04.470 --> 00:22:07.490
Yeah. Okay. No, I mean, since you mentioned Kant earlier, I mean,

00:22:07.510 --> 00:22:10.490
he was saying that we can't help by seeing causes and effects.

00:22:10.570 --> 00:22:16.930
And I say we can't help by seeing forces. I mean, that's part of doing the causal effect relation.

00:22:17.190 --> 00:22:22.230
Exactly. Yeah. No, that's very good. So now we have an idea about how to deal

00:22:22.230 --> 00:22:24.770
with motion perception of action.

00:22:26.030 --> 00:22:29.210
But in some sense, it's more like a classification, right?

00:22:29.250 --> 00:22:30.490
Because now we have these changes

00:22:30.490 --> 00:22:34.130
in my posture, let's say. I change my posture. I call this walking.

00:22:34.470 --> 00:22:39.450
And it has a certain gait. and then you say, well, and I can detect and classify

00:22:39.450 --> 00:22:44.770
it because I know how this gate was changed over time, right?

00:22:44.830 --> 00:22:47.050
I lean on one leg first and the other and so on.

00:22:47.770 --> 00:22:51.970
But then I could say, yeah, but that's not action because to talk really about

00:22:51.970 --> 00:22:54.910
action, there must be an intentional component in this.

00:22:55.090 --> 00:22:57.610
This is just movement. So how do we get from movement to action?

00:22:57.970 --> 00:23:02.070
No, I don't. Okay, that's a philosophical point. I mean, some philosophers would

00:23:02.070 --> 00:23:06.690
say that an action involves an intention. I don't use that.

00:23:06.810 --> 00:23:10.430
I don't restrict actions, the notion of an action to intentional actions.

00:23:10.770 --> 00:23:17.870
I mean, no, I don't. I mean, so I have a more general, maybe the term is not

00:23:17.870 --> 00:23:19.510
appropriate, but that's how I use it.

00:23:19.710 --> 00:23:23.130
Okay. Now, look, I'm not actually going to bicker over it. I just want to understand

00:23:23.130 --> 00:23:26.270
whether you saw a division there between, let's say, movement pattern action,

00:23:26.370 --> 00:23:29.330
but basically you say, look, as long as we're changing the body,

00:23:29.490 --> 00:23:30.850
that's what I want to recognize.

00:23:31.150 --> 00:23:37.230
Yeah. Okay. So now we have a way to directly perceive features of movement so

00:23:37.230 --> 00:23:39.070
that we can map them in the conceptual space.

00:23:39.830 --> 00:23:45.390
Now, what are the domains of that conceptual space to organize representations of action?

00:23:46.871 --> 00:23:49.451
So what are the domains? The domains are the force patterns.

00:23:49.691 --> 00:23:50.871
I mean, these are the forces.

00:23:51.131 --> 00:23:55.471
They can be complicated because you have a body moving as a complicated structure

00:23:55.471 --> 00:23:58.091
of forces. So it's a pattern.

00:23:58.231 --> 00:24:02.951
But the domain is still this more general notion of force.

00:24:03.191 --> 00:24:08.211
And my claim is that when we classify actions, I shouldn't say it's only that

00:24:08.211 --> 00:24:10.751
because there could be constraints coming from other things.

00:24:11.411 --> 00:24:17.671
If I hammer something, I use an object. I can't be hammering with my nose.

00:24:17.751 --> 00:24:22.231
I can be hammering with my hand, but that's a metaphorical use of hammer.

00:24:22.391 --> 00:24:28.071
Hammer means that that kind of action is involving an instrument.

00:24:28.431 --> 00:24:32.311
It's constrained by having an object that functions as a hammer.

00:24:32.671 --> 00:24:35.911
Maybe not the best example, but something in that direction.

00:24:35.911 --> 00:24:38.811
But still, what are the domains?

00:24:39.351 --> 00:24:43.371
What's the dimensionality of this space? Is it the limbs I'm using?

00:24:43.591 --> 00:24:48.851
Is it the direction of movement in some Cartesian coordinate system?

00:24:49.471 --> 00:24:52.531
What are these core domains? Is it body types?

00:24:52.951 --> 00:24:57.051
Okay. We've been talking mainly about biological motion.

00:24:57.291 --> 00:25:04.791
A car has a sort of pattern for you to accelerate and decelerate the car and so on.

00:25:04.991 --> 00:25:11.891
Of course, in physics, there are generators of forces, motors or muscles or whatever.

00:25:12.471 --> 00:25:18.031
But I think that when we perceive actions, when we categorize action, we extract from that.

00:25:18.231 --> 00:25:25.051
A robot walking has a totally different system of generating the forces than

00:25:25.051 --> 00:25:28.231
a human walking, but we would still classify it as the same action.

00:25:28.611 --> 00:25:33.731
My claim is that we can abstract distract away from the mechanism behind the

00:25:33.731 --> 00:25:35.411
forces and just focus on the forces.

00:25:35.751 --> 00:25:41.971
Okay. All right. So, but it still remains to be seen how that force space is exactly structured.

00:25:42.231 --> 00:25:46.531
Yeah. Yeah. Okay. So now we've actually… Well, it depends on the parts of the

00:25:46.531 --> 00:25:49.171
acting individual, the acting object.

00:25:49.351 --> 00:25:53.351
I mean, how many force generators there are and how they are related.

00:25:53.511 --> 00:25:56.371
Like our muscles in our body, for instance, which is complicated.

00:25:56.791 --> 00:26:01.551
But it's interesting, right? But if you take the car example and let's say ourselves

00:26:01.551 --> 00:26:05.771
motoring around, if you want, or navigating the world, indeed,

00:26:06.011 --> 00:26:09.831
at an abstract, more Cartesian level description, you say, okay,

00:26:09.851 --> 00:26:11.331
I'm changing my position in space.

00:26:11.631 --> 00:26:14.691
And that's the level where these forces act. And I don't really care whether

00:26:14.691 --> 00:26:16.871
the wheels are turning or the legs are moving.

00:26:17.231 --> 00:26:19.931
So this would make that point about abstraction. Yeah. Okay.

00:26:21.071 --> 00:26:25.131
So now we have an idea of conceptual space, how we can use this to...

00:26:25.725 --> 00:26:30.225
To understand action or describe, represent action, heavily relying on,

00:26:30.245 --> 00:26:35.765
let's say, a transformation of action in the real world to an internal sense of force, if you want.

00:26:36.545 --> 00:26:42.025
But now you also elaborated the same framework towards the notion of an event,

00:26:42.685 --> 00:26:45.825
which is rather critical in how we deal with the world because we don't only

00:26:45.825 --> 00:26:47.625
have dynamics, we also have an event.

00:26:47.785 --> 00:26:51.105
I have not been able to find very many cognitive theories of events.

00:26:51.245 --> 00:26:54.585
There are lots of philosophical theories, but if we think about the cognitive,

00:26:54.745 --> 00:26:58.445
I mean, The most naive description of an event is something happens to something.

00:26:59.025 --> 00:27:05.165
And then this something I call a patient. This is what is in the focus of the event.

00:27:05.685 --> 00:27:09.505
And then there is something that causes a change. And there is a result of this.

00:27:09.645 --> 00:27:12.445
I mean, well, in most cases, something happens.

00:27:12.885 --> 00:27:17.245
So I divide an event into two vectors.

00:27:17.425 --> 00:27:25.905
One describing the forces that apply to the patient. and the other vector describing

00:27:25.905 --> 00:27:27.325
the change in the patient.

00:27:27.465 --> 00:27:30.645
Sometimes the change is null, but it can be a state.

00:27:31.065 --> 00:27:35.985
So my basic model of an event are two vectors, one force vector,

00:27:36.105 --> 00:27:38.585
one result vector, acting on a patient.

00:27:38.865 --> 00:27:44.325
Very often there is an agent generating the force vector involved in the event, but that need not be.

00:27:44.445 --> 00:27:52.225
It can be just gravitation or some other non-object generating force. Right.

00:27:52.365 --> 00:27:57.365
But now, so this is nice, right? So we have the notion of event and we have

00:27:57.365 --> 00:28:01.065
decomposed it in, let's say, an agent and a patient,

00:28:01.145 --> 00:28:08.345
which let's say are some core entities who are sort of physically present, defining the event.

00:28:08.665 --> 00:28:11.645
And then we have exchanges between them, which are the forces.

00:28:11.705 --> 00:28:15.485
And these are the two vectors, right? So we have a cause and effect. fact yeah um

00:28:15.485 --> 00:28:18.205
so but are these is that really the

00:28:18.205 --> 00:28:21.345
minimal description of an event we might

00:28:21.345 --> 00:28:24.805
have events where there's no agent that is causing anything

00:28:24.805 --> 00:28:27.965
yeah the minimal is is a patient and

00:28:27.965 --> 00:28:34.365
and and and a force and a result vector that's a this is my hypothesis i mean

00:28:34.365 --> 00:28:40.505
this is a theory and there has some some consequences uh in how we perceive

00:28:40.505 --> 00:28:45.805
actions for instance i mean we would We'd be very surprised if something happens without the cause.

00:28:45.965 --> 00:28:48.085
I mean, that's the Kantian notion again.

00:28:48.265 --> 00:28:52.685
I mean, if we see something happen, we presume that there is some kind of action

00:28:52.685 --> 00:28:58.085
pattern going on behind the screens. Right. But if nothing happens?

00:28:59.484 --> 00:29:03.344
We can still describe an event in which, in some sense, nothing happens.

00:29:03.644 --> 00:29:08.824
Well, there are two kinds of nothing happening. One is just that there is no

00:29:08.824 --> 00:29:12.924
force, and consequently there is no change either, no result vector.

00:29:13.084 --> 00:29:17.424
So it's just a state. And that's a special case of an event,

00:29:17.604 --> 00:29:19.144
a fairly boring case of an event.

00:29:19.364 --> 00:29:23.564
But then there are also events where there is a force and a counterforce that

00:29:23.564 --> 00:29:25.664
balance each other, so still nothing happens.

00:29:25.664 --> 00:29:31.224
So when I'm pushing a door and the door doesn't open, I mean,

00:29:31.244 --> 00:29:35.904
I'm exerting a force and the door exerts a counterforce.

00:29:36.744 --> 00:29:41.064
Nothing happens, but there is still an interplay of forces and there is still

00:29:41.064 --> 00:29:43.484
an action from my side of pushing the door.

00:29:43.584 --> 00:29:47.864
But the counterforce brings out no effect.

00:29:47.864 --> 00:29:52.344
But if the event is, let's say, more abstract, where we say,

00:29:52.484 --> 00:29:59.784
in 2012, the Summer Olympics happened in London, took place in London, right?

00:30:00.104 --> 00:30:04.544
So does that mean that I would have to decompose that in all sorts of microscopic

00:30:04.544 --> 00:30:08.384
elements where I can now again recover these forest relationships?

00:30:08.384 --> 00:30:11.824
Relationships in this case it's a lot of intentionality going

00:30:11.824 --> 00:30:14.564
on i mean lots of people involved in this event and their

00:30:14.564 --> 00:30:19.464
joint intentionality constitutes the creation of this summer olympics it's a

00:30:19.464 --> 00:30:26.444
very complex event involving i mean a complex factor of of mental causes not

00:30:26.444 --> 00:30:31.884
physical causes generating this event i don't know how to analyze this in detail i mean i stick to

00:30:31.924 --> 00:30:34.304
the more concrete actions.

00:30:34.624 --> 00:30:39.284
Right, okay. No, but this is interesting, right? Because this issue of generalization

00:30:39.284 --> 00:30:41.424
is a challenge right now for the framework.

00:30:41.564 --> 00:30:45.244
Yeah, of course it is. I mean, going up to these more abstract types of events

00:30:45.244 --> 00:30:49.064
that we talk about that are generated by our societal structure,

00:30:49.244 --> 00:30:52.504
by our interactions with other people. I mean, I don't really have a good analysis.

00:30:52.704 --> 00:30:56.164
I mean, this is the kind of program I have. I hope to be able to extend it.

00:30:56.264 --> 00:31:01.584
But I start with the more basic concrete actions which involve physical forces and so on.

00:31:01.884 --> 00:31:06.084
No, but moreover, of course, it's sort of an easy game to sort of throw these

00:31:06.084 --> 00:31:08.144
pot shots at you and say, oh, here, I have an exception.

00:31:09.284 --> 00:31:12.384
So that's, I think, not really the point, right? It's really the point to try

00:31:12.384 --> 00:31:16.264
to understand where would be, let's say, principled transitions of the approach.

00:31:16.384 --> 00:31:20.344
That's why I thought Olympics might be sort of abstract enough that that could be challenging.

00:31:20.604 --> 00:31:27.484
It is. So we can now decompose, let's say, events in a patient.

00:31:27.484 --> 00:31:32.224
Why did you use the word patient for, let's say, the core object in the event?

00:31:33.125 --> 00:31:36.505
Well, that's something that undergoes the change, so to speak.

00:31:36.685 --> 00:31:41.005
I mean, sometimes the agent is identical with the patient. When I'm walking,

00:31:41.185 --> 00:31:44.665
I'm exerting a force on myself and I'm changing my own position.

00:31:44.965 --> 00:31:48.505
So there are cases where the agent and patient are identical.

00:31:49.745 --> 00:31:53.525
But in many cases, we can separate them, or in most cases.

00:31:53.685 --> 00:31:59.505
Okay, so is there any psychological or neuroscientific grounding for this decomposition of an event?

00:32:00.905 --> 00:32:06.485
That's a very good question. I don't know a very good answer to it yet.

00:32:06.545 --> 00:32:10.125
I mean, there are some people who are speculating about, I mean,

00:32:10.145 --> 00:32:14.925
we have two visual pathways in the brain.

00:32:15.005 --> 00:32:18.565
I mean, the more dorsal is going for the motion patterns, and you can think

00:32:18.565 --> 00:32:21.305
of that as picking out the kinematics.

00:32:21.525 --> 00:32:26.585
And maybe, I mean, maybe, I don't know if you can find some correlates of picking

00:32:26.585 --> 00:32:29.165
out the forces, but I have no idea.

00:32:29.165 --> 00:32:33.925
And then you have the more eventual pathway going to object identification,

00:32:34.165 --> 00:32:38.805
which is more involved in the static properties of all the objects.

00:32:38.965 --> 00:32:43.385
So you have a little bit of division of the forces and the objects.

00:32:43.605 --> 00:32:46.905
But I mean, that's on a very rough and general scale. I don't know if there

00:32:46.905 --> 00:32:49.905
is more detailed knowledge here.

00:32:50.225 --> 00:32:56.045
But now, so in the face of that challenge, in some sense you're using very physical

00:32:56.045 --> 00:33:01.005
metaphors, right? With force, cause, effect. And is that not limiting the framework?

00:33:01.145 --> 00:33:07.665
Because as soon as we talk about these more, let's say, metaphorical causes or about intentions.

00:33:10.602 --> 00:33:14.282
It might become confusing if we want to think about this as a cause.

00:33:15.022 --> 00:33:22.102
If I ask you, please give me this cup of coffee, Peter, to really start to think

00:33:22.102 --> 00:33:24.982
about that in terms of causes can also get very confusing.

00:33:25.062 --> 00:33:30.822
My larynx is doing stuff, sound pressure waves, hit your cochlea, et cetera.

00:33:31.342 --> 00:33:35.122
And then so, oh, but Peter grew up in this culture where these words mean certain

00:33:35.122 --> 00:33:42.462
things. But it sounds like a very, very unfruitful way to pursue the question.

00:33:42.882 --> 00:33:48.702
So wouldn't it be more useful to replace the notion cause and effect to something a bit more neutral?

00:33:49.162 --> 00:33:53.242
Well, that's what I'm trying to do. I mean, by having these force vectors or

00:33:53.242 --> 00:33:57.442
result vectors, that's, in my opinion, more.

00:33:57.642 --> 00:34:02.142
It's a way of reducing causes and effects to something that I can describe in

00:34:02.142 --> 00:34:04.582
semi-mathematical models.

00:34:04.962 --> 00:34:12.242
Okay. But then force would then be broadened to something like a metaphorical force.

00:34:12.402 --> 00:34:16.922
Yeah, well, as I say, social forces, emotional forces would be included in this

00:34:16.922 --> 00:34:18.822
notion. But are there other concepts we could consider for this?

00:34:18.862 --> 00:34:23.022
Maybe it's a bit of a boring question, but it can lead to so much confusion,

00:34:23.122 --> 00:34:25.382
right? If we use these physical metaphors.

00:34:25.482 --> 00:34:29.682
Yeah. So I'm just wondering whether there are other candidate concepts for this

00:34:29.682 --> 00:34:31.422
we could consider or not really.

00:34:31.682 --> 00:34:36.302
I don't know. No, I mean, some people talk about powers rather than forces,

00:34:36.362 --> 00:34:39.942
social powers and so on. I don't know whether that's better or not.

00:34:40.102 --> 00:34:43.502
I mean, I don't care very much about the terminology here. I mean,

00:34:43.502 --> 00:34:45.722
I care about what kind of models I'm using.

00:34:45.862 --> 00:34:49.162
And as we talked about earlier, I mean, my use of action may not fit with the

00:34:49.162 --> 00:34:50.482
philosophers' uses of action.

00:34:50.862 --> 00:34:53.442
But that's not your main concern. That's not my main concern.

00:34:53.722 --> 00:34:57.562
Okay, very good. So, okay, so we got events sorted. it.

00:34:59.102 --> 00:35:03.742
And also, you would make the point that this decomposition of events would hold,

00:35:05.039 --> 00:35:10.179
in principle, at any level of description. So whether I talk about the cup falling

00:35:10.179 --> 00:35:13.619
from the table or me writing a paper,

00:35:14.019 --> 00:35:19.839
the basic decomposition of agent, force, patient, and then a resultant vector

00:35:19.839 --> 00:35:23.359
or force would hold. Yeah, that's a minimal description.

00:35:23.579 --> 00:35:25.899
Then many events involve more components.

00:35:26.159 --> 00:35:29.299
I mean, if I hit something with a hammer and I have the instrument that's

00:35:29.299 --> 00:35:32.299
between my force exertion and the force that happens to

00:35:32.299 --> 00:35:35.279
the object I'm hitting uh and i can have

00:35:35.279 --> 00:35:38.239
uh when i give if i give something to

00:35:38.239 --> 00:35:41.359
you i mean there is a physical movement of the cup if

00:35:41.359 --> 00:35:46.419
i give you my coffee but there is also the the transfer of possession i mean

00:35:46.419 --> 00:35:53.039
which is a much more advanced and intentional part of of the of the action so

00:35:53.039 --> 00:35:56.499
there are there are different details so different components you can add in

00:35:56.499 --> 00:36:00.919
in a in an event description but the The basic ones are the two vectors and the patient.

00:36:01.039 --> 00:36:04.659
I mean, and I would say that this is a cognitive theory, so that these parts

00:36:04.659 --> 00:36:06.759
appear in all our representations of theories.

00:36:06.939 --> 00:36:09.819
Then you can go down on finer details and add more components,

00:36:09.959 --> 00:36:13.339
depending on what level of description you're aiming for.

00:36:13.859 --> 00:36:20.879
But now, if we decompose events in these terms, you could imagine that we could

00:36:20.879 --> 00:36:24.639
expose humans to, let's say, event descriptions. We put them in a scanner,

00:36:24.759 --> 00:36:28.299
fMRI, and we see which areas of the brain light up.

00:36:28.379 --> 00:36:32.159
And then hopefully you see four different areas where you can say,

00:36:32.199 --> 00:36:35.179
okay, agent and two forces. And is there anything like that?

00:36:35.419 --> 00:36:40.419
I think so. I mean, people have been looking at how verbs are represented in

00:36:40.419 --> 00:36:41.979
the brain and we are getting into verbs now.

00:36:42.359 --> 00:36:47.359
I would look for differentiations between these force vectors and the result

00:36:47.359 --> 00:36:51.639
vectors. If the brain lights up in different areas, depending on whether you

00:36:51.639 --> 00:36:54.039
talk about the causes or whether you talk about the effects.

00:36:54.399 --> 00:36:58.279
I don't know if we can find any clear results. I've taken part in a small experiment

00:36:58.279 --> 00:37:02.019
on this where there are some indication that we can make such a division.

00:37:02.319 --> 00:37:06.739
But there is a lot of research to do on this. I mean, what happens and does

00:37:06.739 --> 00:37:10.479
the brain distinguish between causes and effects in its analysis?

00:37:10.699 --> 00:37:13.919
That's, for me, a very interesting question, but I don't know anything about

00:37:13.919 --> 00:37:18.899
the answer. Well, actually, very basic learning mechanisms like classical conditioning

00:37:18.899 --> 00:37:24.079
have been interpreted in terms of the brain extracting cause-effect relationships from the world.

00:37:24.139 --> 00:37:28.779
Because you get exposed to, let's say, the tone and there comes the foot shock.

00:37:29.519 --> 00:37:35.139
And in some sense, you might interpret that as a causal relationship. That's true.

00:37:35.519 --> 00:37:38.679
So there have been relatively interesting theories about this.

00:37:38.699 --> 00:37:40.819
And it would be very consistent with what you're proposing.

00:37:40.899 --> 00:37:43.199
Okay, but that's an area I don't know very much about. Okay,

00:37:43.219 --> 00:37:46.519
yeah. So, indeed, but now you mentioned language because this,

00:37:46.559 --> 00:37:48.339
I think, is sort of if you want.

00:37:50.185 --> 00:37:53.745
The long-term ambition of this program is also to account for language.

00:37:53.805 --> 00:37:58.105
Yeah, I mean, that's, I shouldn't say the main application, but the application

00:37:58.105 --> 00:38:00.825
I'm working on right now. But why do you call it an application?

00:38:01.425 --> 00:38:05.405
Because I use this model. I mean, it's a model. I can use it to analyze causes

00:38:05.405 --> 00:38:07.165
and effects, how we perceive causes and effects.

00:38:07.325 --> 00:38:12.565
I can use it to analyze how we use verbs. I can maybe use it to analyze social interaction.

00:38:12.705 --> 00:38:16.525
I don't know what. But at the moment, I'm focusing on how this maps onto our

00:38:16.525 --> 00:38:17.885
understanding of language.

00:38:18.305 --> 00:38:21.965
Okay. Okay, but before we go to language, there's an interesting implication of that, no?

00:38:22.045 --> 00:38:24.845
Because then you're saying, well, from a cognitive perspective.

00:38:25.325 --> 00:38:29.345
There's a core meaning system organized in conceptual spaces.

00:38:29.605 --> 00:38:34.305
And this meaning system is sort of modality independent, and now it can be expressed in language.

00:38:34.445 --> 00:38:38.425
It can be used in how we interpret the world through haptics,

00:38:38.425 --> 00:38:39.665
vision, blah, blah, blah.

00:38:39.765 --> 00:38:42.625
But this is like an independent meaning module.

00:38:42.905 --> 00:38:46.565
Yes, well, independent meaning, but it is a kind of core module.

00:38:46.565 --> 00:38:51.565
I mean, the event representation, I think, would be at the core of our understanding of the world.

00:38:51.685 --> 00:38:56.265
And it depends on, I mean, you can have it to analyze perception.

00:38:56.465 --> 00:39:02.305
You perceive an event, but you can also use it as a basis for formulating linguistic expression.

00:39:02.465 --> 00:39:05.325
You can use it for different things. And as you say, I mean,

00:39:05.345 --> 00:39:06.665
it's modality independent.

00:39:06.905 --> 00:39:11.685
I mean, I don't assume a separate module for language semantics and another

00:39:11.685 --> 00:39:15.265
module for representing perceptual concepts. I mean, for me,

00:39:15.345 --> 00:39:17.225
they are on the same system. But that's interesting.

00:39:17.525 --> 00:39:22.685
The really important consequence is I can use this meaning system to also make predictions.

00:39:22.965 --> 00:39:26.825
Oh, yes. And I can, because it's modality independent, I can make cross-modal predictions. Yes.

00:39:27.045 --> 00:39:31.465
Right? So I think that… And these cross-modal predictions show up in language

00:39:31.465 --> 00:39:33.685
in terms of metaphors. Right, exactly.

00:39:33.945 --> 00:39:39.865
And they show up in body movement, in how we gesture or how we mimic things or stuff like that.

00:39:39.925 --> 00:39:43.785
Okay, so given that we're both very enthusiastic about this program….

00:39:44.271 --> 00:39:47.331
How far did you get in accounting for language with it? Well,

00:39:47.331 --> 00:39:52.131
not to the details. I mean, I've got into some of the rough stuff.

00:39:52.311 --> 00:39:56.251
And if I start at the very basic level, I mean, all languages,

00:39:56.491 --> 00:39:59.831
I mean, the syntactic structures of languages are quite different,

00:40:00.071 --> 00:40:03.791
but all languages seem to have noun phrases and verb phrases.

00:40:03.951 --> 00:40:07.671
I mean, that's the basic thing you can have. And that division,

00:40:07.991 --> 00:40:12.091
I mean, linguists never explain why noun phrases and verb phrases exist.

00:40:12.091 --> 00:40:14.931
I mean, why are these the basic building components of language?

00:40:15.091 --> 00:40:16.271
They are taken for granted.

00:40:17.031 --> 00:40:22.931
Now, given the model of events that I have presented, known phrases map onto

00:40:22.931 --> 00:40:25.331
agents and patients primarily.

00:40:26.031 --> 00:40:29.891
They might map onto instruments and other stuff. But basic uses of known phrases

00:40:29.891 --> 00:40:31.571
is to denote agents and patients.

00:40:32.011 --> 00:40:38.351
And basic uses of verb phrases is to denote force vectors and result vectors.

00:40:38.351 --> 00:40:43.551
So this division between the two objects and the two vectors maps onto the division

00:40:43.551 --> 00:40:45.051
between noun phrases and verb phrases.

00:40:45.371 --> 00:40:47.671
Okay, but how clean is that mapping?

00:40:48.031 --> 00:40:53.471
I don't know how clean it is, but if it serves as a grounding for learning language.

00:40:55.191 --> 00:41:01.031
I don't think I can explain all features of the semantics of a human language

00:41:01.031 --> 00:41:02.951
because that's very rich.

00:41:02.951 --> 00:41:08.051
But if I can say something about the general principles of how the semantics

00:41:08.051 --> 00:41:11.951
is structured, I can use that to explain how children can learn a language.

00:41:12.311 --> 00:41:17.511
Because the data you get as a child is not very rich. You need some kind of constraints.

00:41:18.011 --> 00:41:23.171
And these event structures, if my model of the cognitive representations of

00:41:23.171 --> 00:41:29.411
events are correct, they give you a structure that constrains what words can refer to.

00:41:29.551 --> 00:41:32.931
I mean, I make this basic distinction between noun and verb.

00:41:32.951 --> 00:41:33.911
Phrases and verb phrases.

00:41:34.391 --> 00:41:40.651
But now, of a verb, you said, look, a verb is essentially a convex region in a single domain.

00:41:41.091 --> 00:41:44.631
Yeah. Right? So that seems a rather strong statement.

00:41:44.931 --> 00:41:49.411
Yeah. I mean, before we get to that, I mean, let me start by saying that a verb means either.

00:41:49.491 --> 00:41:55.191
It either refers to the force vector or it refers to the result vector.

00:41:55.311 --> 00:41:57.931
That's right. You don't have a single verb. I mean, you can have compositions

00:41:57.931 --> 00:42:02.231
with prepositions and stuff like that where you get. But the root of a verb,

00:42:02.291 --> 00:42:06.051
is my hypothesis, refers to either the force vectors.

00:42:06.231 --> 00:42:09.111
And these verbs are what we call manner verbs, how you do things.

00:42:09.191 --> 00:42:10.651
You hit things, you pull and you push.

00:42:10.831 --> 00:42:14.091
And then you have verbs that refer to the result vector.

00:42:14.271 --> 00:42:19.691
And these are, well, results like painting or heating or whatever, opening.

00:42:21.311 --> 00:42:25.411
But wait, if it's a verb, it always requires an agent. Right.

00:42:27.149 --> 00:42:30.029
Um but if it's an agent it

00:42:30.029 --> 00:42:34.009
requires that um but look

00:42:34.009 --> 00:42:36.989
if we have to the patient now and the patient

00:42:36.989 --> 00:42:39.929
undergoes a change let's say uh you

00:42:39.929 --> 00:42:42.869
you hit my finger with a hammer my finger turns purple

00:42:42.869 --> 00:42:46.349
the result vector is

00:42:46.349 --> 00:42:49.169
not necessarily an action as i can

00:42:49.169 --> 00:42:52.889
capture in a verb no no right well you say

00:42:52.889 --> 00:42:55.829
turns purple i mean that's uh you have turning here means

00:42:55.829 --> 00:42:58.909
and then you have the color term so what happens is

00:42:58.909 --> 00:43:01.629
that turn it means a change or a movement no but

00:43:01.629 --> 00:43:04.709
you agree i could describe it in a way that there's no verb involved i

00:43:04.709 --> 00:43:07.589
said now yeah my finger is blue

00:43:07.589 --> 00:43:11.289
well the property change yeah there's a property change there's a property change

00:43:11.289 --> 00:43:16.689
so how so how do i well on it this is not in itself an action of the patient

00:43:16.689 --> 00:43:20.729
right it's a property change yeah it's a property but it's that's the result

00:43:20.729 --> 00:43:25.049
vector and it's described i mean you can say turns red turns purple,

00:43:25.229 --> 00:43:26.269
you can say becomes purple.

00:43:26.449 --> 00:43:32.049
We have those verbs that help us in saying that there is a change in a property,

00:43:32.189 --> 00:43:33.409
and purple is a property word.

00:43:33.689 --> 00:43:37.529
But you're saying it's going from some other color into the region of purple.

00:43:37.749 --> 00:43:42.869
So there is a change in the result. The result vector is going from some region

00:43:42.869 --> 00:43:45.469
into the purple region. That's the result vector.

00:43:45.789 --> 00:43:49.249
So what I was after, I was just trying to understand whether you're saying verbs

00:43:49.249 --> 00:43:53.229
describe change or verbs describe action.

00:43:53.369 --> 00:43:56.949
No, no. No, there are two kinds. There are verbs that describe actions.

00:43:57.109 --> 00:44:02.509
These are the manner verbs. And then there are verbs that describe changes.

00:44:02.849 --> 00:44:06.269
And I say there is a fairly tight division between them. There are some verbs

00:44:06.269 --> 00:44:09.749
that can be used for both topics, but in a particular sentence,

00:44:09.909 --> 00:44:13.609
they're used either to describe the manner, the force vector.

00:44:13.629 --> 00:44:18.609
Right, but I was trying to understand, let's say, the overlap between these two domains.

00:44:18.789 --> 00:44:22.829
Because you could say, look, you lift the hammer and now the patient ran away.

00:44:22.829 --> 00:44:29.849
So now I'm acting so I'm not turning blue anymore so how unique is that mapping

00:44:29.849 --> 00:44:34.929
of these two kinds of verbs to either the cause or the effect,

00:44:36.609 --> 00:44:43.489
I would say it's fairly of course there is a particular action can have very

00:44:43.489 --> 00:44:47.809
many different outcomes as you say depending on the situation on the context

00:44:47.809 --> 00:44:52.469
but we don't have verbs that summarize causes and effects I mean this is This

00:44:52.469 --> 00:44:55.189
is my constraint on how we learn these verbs.

00:44:55.289 --> 00:45:00.129
They either express the cause, I mean, the force vector, or the effect of an event.

00:45:00.349 --> 00:45:03.009
So that's a kind of cognitive prediction here.

00:45:05.069 --> 00:45:11.109
For instance, you could be running towards me, and I'm running away. So we're both running.

00:45:11.329 --> 00:45:13.329
The agent and the patient are both running.

00:45:13.649 --> 00:45:17.709
So the verb we use to now describe the cause and effect is the same.

00:45:18.705 --> 00:45:22.745
Is that a problem? No, because you use the preposition away,

00:45:23.045 --> 00:45:25.785
which introduces a result.

00:45:26.785 --> 00:45:33.905
Run away is, you modify the manner by running away that introduces a change

00:45:33.905 --> 00:45:35.965
in space, which is a result vector.

00:45:36.245 --> 00:45:39.605
By having this preposition added to running, you're changing it from a manner

00:45:39.605 --> 00:45:40.845
vector to a result vector.

00:45:41.325 --> 00:45:43.845
Okay, but that could also say Peter's running and Paul is running.

00:45:44.145 --> 00:45:46.145
Yeah, but then you don't have a cause and effect.

00:45:46.145 --> 00:45:49.025
Okay but then

00:45:49.025 --> 00:45:52.165
then it's two events but you did bring in the proposition now

00:45:52.165 --> 00:45:55.465
so then it becomes let's say a conglomerate yeah it's

00:45:55.465 --> 00:45:58.305
not only the verb anymore no no no it's it's i mean there

00:45:58.305 --> 00:46:04.005
has to be at least one vector and one one uh editor patient i mean so that's

00:46:04.005 --> 00:46:07.365
the noun phrase and the verb phrase are the minimal elements of of an event

00:46:07.365 --> 00:46:11.465
description okay okay so it's a phrase it's not just a single word no no no

00:46:11.465 --> 00:46:15.785
it can it can i mean then you add grammar but i mean you're and all kinds of

00:46:15.785 --> 00:46:16.885
compositions of descriptions.

00:46:17.285 --> 00:46:22.745
But what is denoted by these composite expressions are agents and patients.

00:46:23.085 --> 00:46:30.665
Okay, so now we have these two kinds of verbs, and then from that you came with

00:46:30.665 --> 00:46:34.385
the prediction, if you want, because, okay, if I look at that from the perspective

00:46:34.385 --> 00:46:35.385
of the conceptual space.

00:46:36.065 --> 00:46:42.945
It means that if every verb is indeed in the convex region, they cannot sort

00:46:42.945 --> 00:46:45.285
of be in two disjoint regions at the same time.

00:46:45.405 --> 00:46:50.545
And therefore, let's say cause and effect verbs, there are no occurrences of

00:46:50.545 --> 00:46:52.585
verbs that are both cause and effect.

00:46:52.765 --> 00:46:56.385
No. And as a matter of fact, you can make it even stronger because among the

00:46:56.385 --> 00:46:59.325
result verbs, they only refer to a single domain.

00:46:59.565 --> 00:47:01.805
So when you're heating something, you're changing the temperature.

00:47:01.825 --> 00:47:04.245
When you're moving something, you're changing the spatial location.

00:47:04.585 --> 00:47:07.365
And when you're painting something, you're changing the color domain.

00:47:07.605 --> 00:47:12.565
I mean, I had previously a theory about but adjectives that say that adjectives

00:47:12.565 --> 00:47:15.345
refer to regions in single domains.

00:47:15.525 --> 00:47:19.045
So you have the color words, color adjectives that refer to the color space.

00:47:19.145 --> 00:47:22.985
You have the temperature, hot and cold, I refer to temperature, and so on.

00:47:23.145 --> 00:47:27.145
Now, I'm generalizing this single domain hypothesis also to verbs.

00:47:27.465 --> 00:47:31.445
And there are lots of potential counterexamples.

00:47:31.485 --> 00:47:36.185
I don't know how well this hypothesis will hold up. But if it does,

00:47:36.265 --> 00:47:40.745
it shows a very nice parallel between the structure of adjectives and the structure of verbs.

00:47:40.885 --> 00:47:43.265
Right. Which would be nice from a cognitive point of view.

00:47:43.625 --> 00:47:47.405
From the point of view of cognitive economy, I should say.

00:47:47.685 --> 00:47:51.325
On the other hand, you have some wiggle space, right? Because you have a free

00:47:51.325 --> 00:47:55.485
parameter in your model, which is the domain itself. Okay, so that...

00:47:56.029 --> 00:47:59.889
So the point is, even if you find counterexamples, they raise interesting questions

00:47:59.889 --> 00:48:01.889
about really the structuring of these domains.

00:48:02.249 --> 00:48:07.809
Well, I'm saying that at least that the same domains apply to adjectives that apply to verbs.

00:48:08.009 --> 00:48:14.829
And we find these nice mappings between movement verbs and position adjectives and stuff like that.

00:48:15.049 --> 00:48:21.069
But now one problem I had with that prediction is that I could also argue,

00:48:21.169 --> 00:48:22.489
well, look, yeah, sure thing.

00:48:22.489 --> 00:48:27.469
This is not a big surprise because, let's say, in general, words used in a language

00:48:27.469 --> 00:48:33.989
that are ambiguous and confusing will just die out because they don't help for pragmatical terms.

00:48:34.529 --> 00:48:41.349
So in that sense, could I say, well, this is like, this is the truism.

00:48:41.689 --> 00:48:46.549
Because, of course, words in our language are only there because they're not ambiguous. Yeah.

00:48:46.669 --> 00:48:51.809
I would say it's, well, you can think of it as a truce, but the reason is that

00:48:51.809 --> 00:48:55.809
if you have two complicated words, I mean, words with two complicated semantics,

00:48:56.189 --> 00:48:59.109
they would be very difficult to learn.

00:48:59.569 --> 00:49:05.009
And having them sorted up into referring to a single domain makes them easier to learn.

00:49:05.169 --> 00:49:09.629
I think that this is a good constraint that explains how children can pick up

00:49:09.629 --> 00:49:10.989
language so quickly as they do.

00:49:11.189 --> 00:49:14.689
That's one of the constraints, and it's not the only one. But your prediction

00:49:14.689 --> 00:49:18.249
is that language have evolved the way they did, also to facilitate learnability.

00:49:18.629 --> 00:49:25.029
Oh, yes. Oh, yes. And, I mean, there are these famous Chomsky arguments of the

00:49:25.029 --> 00:49:28.289
positive of the data. We don't get enough data.

00:49:28.389 --> 00:49:30.489
We have to have innate structures.

00:49:30.929 --> 00:49:35.129
I don't think that's a very strong argument because he only considers syntax in his studies.

00:49:35.169 --> 00:49:39.249
If you look at the interplay between perception and language,

00:49:39.449 --> 00:49:44.049
as I've been doing in my studies of actions and events, I mean,

00:49:44.049 --> 00:49:46.829
you can generate a lot more constraints on learning.

00:49:46.929 --> 00:49:52.589
And that's what one of my aims is, to identify these constraints on how the

00:49:52.589 --> 00:49:54.969
semantics of words are structured.

00:49:55.349 --> 00:49:58.229
Yeah, but it's interesting about this argument about the poverty of the stimulus

00:49:58.229 --> 00:50:00.209
originating in the 1950s.

00:50:00.962 --> 00:50:04.102
That in some sense the data is still out there.

00:50:04.262 --> 00:50:11.022
I mean, the data has not been conclusively summarized in some way that indeed

00:50:11.022 --> 00:50:17.662
the sensory stimuli the growing child is exposed to is that restricted or not

00:50:17.662 --> 00:50:20.502
structured enough to allow learnability.

00:50:20.802 --> 00:50:24.182
But what kind of data do you have to support your position here?

00:50:25.162 --> 00:50:31.522
No, I mean, the data I have comes from, to the extent I can confirm my models here,

00:50:31.702 --> 00:50:36.622
that I have to look at, for instance, the verbs, to see whether this classification

00:50:36.622 --> 00:50:41.662
between manner verbs and result verbs hold water, and that's a hot topic in

00:50:41.662 --> 00:50:43.782
linguistics at the moment.

00:50:44.102 --> 00:50:45.822
And if it does, then we have some

00:50:45.822 --> 00:50:50.682
data showing that there is this kind of division of meanings in verbs.

00:50:50.962 --> 00:50:55.462
And that division of meanings would help me, would generate a constraint that

00:50:55.462 --> 00:50:59.222
can help me in explaining meaning why kids can learn language so easily.

00:50:59.682 --> 00:51:03.982
Okay, but that would mean that the structuring of the conceptual space of language,

00:51:04.502 --> 00:51:07.922
should then coincide with the progression of language learning.

00:51:08.962 --> 00:51:12.462
Yeah, I mean, that's a give and take variation, yeah.

00:51:12.622 --> 00:51:15.522
So how, but how do you really see that feedback mechanism? Because there's some,

00:51:15.582 --> 00:51:17.302
there must be some bootstrapping in this, right?

00:51:17.362 --> 00:51:21.062
Because the argument is always, look, if it's just feed forward and you grab

00:51:21.062 --> 00:51:22.362
it from the world, it's not enough.

00:51:22.542 --> 00:51:27.082
No. So what are these key rules that would help the bootstrapping in this?

00:51:27.142 --> 00:51:28.262
Do you have an idea about that?

00:51:28.342 --> 00:51:33.602
I have thought a little bit about the ordering in which you learn domains as a child.

00:51:33.802 --> 00:51:37.562
I mean, you learn the shape of objects quite early. You interact with things

00:51:37.562 --> 00:51:39.602
with your hands and your mouth and so on.

00:51:40.242 --> 00:51:43.182
Color comes perhaps later. You learn about spatial relations.

00:51:43.582 --> 00:51:46.882
But then there are lots of spaces where you only learn things later.

00:51:47.122 --> 00:51:52.282
And I looked a little bit at child data on when certain domains are learned.

00:51:52.722 --> 00:51:56.082
So color and shape comes very early. But for instance, all the...

00:51:56.570 --> 00:52:03.630
All the domains related to knowledge. I mean, knowing, believing, and lying, and so on.

00:52:03.870 --> 00:52:07.070
Children normally don't learn them until they're about three or four years old.

00:52:07.230 --> 00:52:10.790
So that's a more abstract domain that comes later.

00:52:11.150 --> 00:52:18.310
Or take economic terms. I mean, like a loan or inflation, to take a tough word.

00:52:18.470 --> 00:52:21.250
I mean, you can't teach a child that. Bail out.

00:52:21.390 --> 00:52:27.210
Because for kids, money are coins and bills. And they don't know about this

00:52:27.210 --> 00:52:28.950
abstract space of economic relations.

00:52:29.330 --> 00:52:34.550
That's something that they learn much later. So my idea of some kind of progression

00:52:34.550 --> 00:52:39.550
of domains would be another constraint on language learning.

00:52:39.810 --> 00:52:44.530
Some domains come earlier in their development, and then via learning a language,

00:52:44.590 --> 00:52:47.930
via being part of a culture, you learn further, more abstract domains.

00:52:48.230 --> 00:52:51.330
But then your prediction is that there must be a hierarchy of conceptual spaces.

00:52:51.730 --> 00:52:57.530
A hierarchy of domains, yes. Of domains. Yeah. Okay. And would that hierarchy,

00:52:57.670 --> 00:53:01.790
let's say, also be an explosion in dimensionality or not necessarily?

00:53:02.130 --> 00:53:07.250
Well, each new domain adds dimensions, so it's bringing out more dimensions.

00:53:07.390 --> 00:53:12.550
Yeah, but it could also collapse dimensions of preceding domains, right? It could, yeah.

00:53:12.990 --> 00:53:18.410
No, I think in general, our minds are growing in the number of dimensions we

00:53:18.410 --> 00:53:19.950
are using. Okay, all right.

00:53:20.050 --> 00:53:28.810
So then you had a number of predictions, if you want, about verbs, right?

00:53:28.890 --> 00:53:34.270
For instance, that similarities in verb meaning, or the subcategories of verbs,

00:53:34.490 --> 00:53:38.050
or the subregions in the notion of walking.

00:53:39.169 --> 00:53:44.229
So what are these predictions exactly? Well, we categorize.

00:53:44.289 --> 00:53:48.289
I mean, one idea behind the concept of spaces is that similarity plays a great

00:53:48.289 --> 00:53:50.869
role in categorization.

00:53:51.209 --> 00:53:57.449
So we categorize colors as yellow because they are similar in our perception.

00:53:57.669 --> 00:54:01.409
And we categorize actions because they are similar in our perception.

00:54:01.409 --> 00:54:06.449
So I would say that running is very much similar to walking,

00:54:06.629 --> 00:54:12.549
or more similar to walking than running is to flying, for instance.

00:54:12.649 --> 00:54:14.129
Yeah, that's a good example.

00:54:14.469 --> 00:54:19.729
And we use these similarities. I mean, I don't have a measure of distance between

00:54:19.729 --> 00:54:24.109
force patterns, but somehow the force pattern of running is more similar,

00:54:24.229 --> 00:54:27.269
closer to that of walking than it is to flying.

00:54:27.269 --> 00:54:33.669
And we use these similarities when we judge the similarities between verb meanings.

00:54:34.049 --> 00:54:37.269
And also, if you can generalize, I mean, take walk again.

00:54:37.469 --> 00:54:41.129
There are a number of subdivisions of walking. You can be limping,

00:54:41.169 --> 00:54:43.049
you can be marching, you can be strutting.

00:54:43.669 --> 00:54:46.509
I mean, there are lots of kinds of walking.

00:54:46.709 --> 00:54:51.169
But they all fall under the general notion of the walking domain of possible

00:54:51.169 --> 00:54:54.049
walking patterns, force patterns.

00:54:54.389 --> 00:54:59.709
But the subdivisions then are sub-regions of this action space.

00:54:59.949 --> 00:55:03.909
Right, and that would then coincide with, say, our ability to distinguish these

00:55:03.909 --> 00:55:06.949
movement patterns. Yeah, this is just like we have subdivisions of nuns.

00:55:06.949 --> 00:55:09.149
You can have an animal, a dog, a terrier.

00:55:09.349 --> 00:55:13.069
You have this hierarchy. We have the same similar hierarchy of action representations.

00:55:17.842 --> 00:55:23.042
But you also had, let's say, a more detailed decomposition of,

00:55:23.082 --> 00:55:27.362
let's say, how verbs translate to action in the world.

00:55:27.462 --> 00:55:33.042
Because if we take the example of push that you talked about,

00:55:33.222 --> 00:55:39.242
to push an object in itself is not necessarily the complete story, right?

00:55:39.282 --> 00:55:43.342
Because in the result vector, different things can happen, right?

00:55:43.342 --> 00:55:48.502
So does that mean I also have to think about then subcategories or let's say

00:55:48.502 --> 00:55:54.322
decomposition of push into different kinds of resultant vectors or not?

00:55:54.482 --> 00:55:58.542
You must make a distinction between what the verb represents and the event you're

00:55:58.542 --> 00:56:00.942
describing because the verb represents a force vector.

00:56:01.002 --> 00:56:06.302
Pushing is a force vector normally horizontally towards an object.

00:56:06.522 --> 00:56:13.082
But then the object may be stuck or may be heavy or so nothing happens.

00:56:13.342 --> 00:56:16.582
But that's part of the event I mean that's the result vector but

00:56:16.582 --> 00:56:20.182
the push it as the verb

00:56:20.182 --> 00:56:23.122
only concerns the force vector and the

00:56:23.122 --> 00:56:26.942
result may be a complicated story depending on how the world looks like but

00:56:26.942 --> 00:56:30.482
how do I segment this now because imagine so you push me and I'm pushing you

00:56:30.482 --> 00:56:35.982
back so I'm your patient and you're my patient yeah then there are two events

00:56:35.982 --> 00:56:42.422
okay exactly right so the segmentation of let's say um.

00:56:44.777 --> 00:56:50.637
Real-world events, this segmentation would fall along the lines of the forces

00:56:50.637 --> 00:56:51.457
and how they're exchanged.

00:56:52.257 --> 00:56:56.997
But would that decomposition be unambiguous? No, not necessarily.

00:56:57.177 --> 00:56:58.357
Does it need to be unambiguous?

00:56:59.497 --> 00:57:04.077
You can have different levels of description of events.

00:57:04.417 --> 00:57:08.277
I mean, if you take wrestling, there is a continuous exchange of forces,

00:57:08.297 --> 00:57:10.817
but we still generalize this

00:57:10.897 --> 00:57:18.957
general pattern of two people pushing against or pulling against each other

00:57:18.957 --> 00:57:21.357
as a general type of interaction.

00:57:21.817 --> 00:57:26.037
So that's a high-level description. Then, of course, I can have a more detailed

00:57:26.037 --> 00:57:28.617
description of a wrestling match.

00:57:28.777 --> 00:57:33.877
I can have, I'm bending your knee or bending your arm or pushing there or lifting there or whatever.

00:57:34.117 --> 00:57:37.337
I mean, there are lots of sub-events in this global event.

00:57:37.337 --> 00:57:42.017
But is the prediction of this possibly that if the forces cancel each other

00:57:42.017 --> 00:57:45.157
out, or if the roles of patients and agents cancel each other out,

00:57:45.297 --> 00:57:51.657
that that is a necessary reason to step to a next level of abstraction? No, no, no.

00:57:51.737 --> 00:57:56.437
I mean, it's just as when you're describing a scene around you,

00:57:56.477 --> 00:57:57.737
you can say, well, I'm in a room.

00:57:57.857 --> 00:58:01.137
But I can also say I'm in a room with this and that pieces of furniture.

00:58:01.337 --> 00:58:04.897
I mean, I can go down and talk about the details.

00:58:05.057 --> 00:58:09.457
I mean, it's a level of attention we have on a particular situation.

00:58:09.457 --> 00:58:12.297
Situation okay so so now you you came

00:58:12.297 --> 00:58:15.297
a long way with the conceptual spaces approach to understand

00:58:15.297 --> 00:58:18.317
language right so in some sense you're saying nouns are categories

00:58:18.317 --> 00:58:25.217
yeah right adjectives are properties um prepositions are are force and spatial

00:58:25.217 --> 00:58:31.357
relations sort of relationships yeah verbs now are forces and results causes

00:58:31.357 --> 00:58:36.617
and effects vectors yeah right and then adverbs verbs are like modifying vectors.

00:58:37.077 --> 00:58:41.077
Yeah. What would that exactly mean? Well, if I push something,

00:58:41.357 --> 00:58:45.557
I mean, I'm exerting a vector, but I'm not saying very much about how strongly I push.

00:58:45.717 --> 00:58:50.837
I mean, the strength. If I say I push strongly, I'm saying that the vector I'm

00:58:50.837 --> 00:58:57.137
exerting belongs to the, well, with stronger forces, I exert more force.

00:58:57.297 --> 00:59:01.237
So it's like a multiplier or a modifier

00:59:01.237 --> 00:59:05.677
of a vector. Or I can change the direction. I say I push downwards.

00:59:06.217 --> 00:59:11.337
That means I'm changing the direction of the pushing.

00:59:11.637 --> 00:59:17.517
Okay. So that's a very complete, if you want, decomposition of this very basic

00:59:17.517 --> 00:59:20.297
notion of an event in the atoms of language.

00:59:21.517 --> 00:59:27.497
And then your proposal is that sentences as such, as a unit, capture the event.

00:59:27.657 --> 00:59:32.097
Yep. But now to go from these linguistic atoms to a sentence, we need grammar.

00:59:33.066 --> 00:59:36.786
Not necessarily. I mean, we can communicate without grammar quite a lot.

00:59:36.946 --> 00:59:40.506
I mean, there is this thing that's called proto-language, that is just meaning

00:59:40.506 --> 00:59:41.986
components without any grammar.

00:59:42.126 --> 00:59:47.586
I think grammar is needed to disambiguate a lot of things here.

00:59:47.586 --> 00:59:56.506
I mean, in English, for instance, there are a lot of words that can mean either a verb or an object.

00:59:56.726 --> 00:59:59.666
Like hammer. I mean, hammer is a verb or it can be an object.

01:00:00.066 --> 01:00:03.946
And the word in isolation doesn't tell you what it is.

01:00:04.306 --> 01:00:11.866
But if you put it in a context and add syntactic markers, tenses or third-person

01:00:11.866 --> 01:00:16.846
markers or whatever, then you can see whether it's a noun or a verb.

01:00:17.586 --> 01:00:23.746
So syntax for me is a tool to disambiguate the uses of words.

01:00:23.946 --> 01:00:29.326
And then syntax is another constraint on how to learn a language because syntactic

01:00:29.326 --> 01:00:35.926
markers will help a child to see whether we're talking about an action or we're

01:00:35.926 --> 01:00:36.886
talking about an object.

01:00:37.686 --> 01:00:43.526
Yeah, but what I find surprising about this, and that's probably because you're

01:00:43.526 --> 01:00:47.306
more aware about the intricacies of language, But I would expect that you would

01:00:47.306 --> 01:00:51.086
have said something like, well, I have my basic decomposition of an event. I have an agent.

01:00:51.326 --> 01:00:54.586
I have my two vectors and I have a patient. It's a cause and effect.

01:00:55.086 --> 01:00:59.566
And it's that order of an agent cause and effect on the patient.

01:00:59.646 --> 01:01:01.426
It's that order that translates to syntax.

01:01:02.146 --> 01:01:09.806
No, not necessarily. Why not? Because there is an event or an element in between.

01:01:10.046 --> 01:01:15.686
There is an event. But there is also you looking at the event and you focusing on something.

01:01:15.946 --> 01:01:20.746
I mean, you're looking at what the agent is doing or you're looking at what the patient is doing.

01:01:20.806 --> 01:01:24.386
That's two different foci on the event.

01:01:24.506 --> 01:01:29.346
And that attention focusing shows up in language.

01:01:29.646 --> 01:01:37.946
So in a large class of languages, the subject turns out to be the thing you're focusing on.

01:01:37.946 --> 01:01:43.366
So if I say that I'm hitting you, I'm focusing on me doing the action,

01:01:43.486 --> 01:01:46.066
but if I'm saying you are being hit...

01:01:46.677 --> 01:01:51.297
I'm focusing on what's happening to you. I'm making you the subject.

01:01:51.817 --> 01:01:55.957
And in both cases, I'm the patient. I'm the agent. You're the patient.

01:01:56.317 --> 01:02:02.017
So what becomes the subject of a sentence is determined by what is the focus

01:02:02.017 --> 01:02:03.817
of attention on the event.

01:02:04.217 --> 01:02:07.117
Yeah, but exactly this example, I think, is a good one. I mean,

01:02:07.137 --> 01:02:10.877
you might have reached a stage in this discussion that you want to hit me.

01:02:10.977 --> 01:02:15.857
But still we say Peter is hitting Paul. all. So first we have the agent,

01:02:15.937 --> 01:02:17.997
then we have the cause, and then we have the patient.

01:02:18.357 --> 01:02:20.717
And if it's the other way around, we say Paul is hitting Peter.

01:02:20.897 --> 01:02:25.917
So it's that ordering in the language that is rather directly correlated with

01:02:25.917 --> 01:02:27.397
the ordering of your event decomposition.

01:02:27.857 --> 01:02:31.697
No, not necessarily. First of all, there are all kinds of orderings between

01:02:31.697 --> 01:02:34.897
subject, object, and verb in languages, so that doesn't.

01:02:34.997 --> 01:02:38.477
But I think the most important thing is just what is your focus of attention?

01:02:38.717 --> 01:02:43.637
That's what determines the ordering, because the thing most focused on comes first.

01:02:43.957 --> 01:02:48.797
Yeah, but isn't that now a tricky direction to take? Because now you bring in

01:02:48.797 --> 01:02:51.237
for the first time, let's say, a point of view.

01:02:51.697 --> 01:02:56.437
And you say, depending on the point of view, you reorganize the conceptual space,

01:02:56.657 --> 01:02:59.457
or you reorganize the event structure.

01:02:59.717 --> 01:03:02.877
But it shouldn't really matter, because independent of the point of view,

01:03:03.097 --> 01:03:07.017
the event structure always has the agent, the cause, the patient, and the event.

01:03:07.077 --> 01:03:10.677
No, you don't reorganize it. You focus on it, And then you turn this focusing

01:03:10.677 --> 01:03:11.977
into a linguistic expression.

01:03:12.417 --> 01:03:19.177
So this is an intermediary step between your event perception and how you describe it.

01:03:19.317 --> 01:03:23.857
And between your perception, there is the focusing of attention.

01:03:24.077 --> 01:03:30.197
And that determines partly how you assign syntactic roles to the parts of the

01:03:30.197 --> 01:03:31.777
event. Okay, so this is interesting, right?

01:03:31.857 --> 01:03:35.117
So I find this interesting. interesting, because on the one hand,

01:03:35.137 --> 01:03:40.037
you're saying, look, from, if you want a cognitive representational perspective,

01:03:40.257 --> 01:03:43.597
I can tell you how an event is represented, is decomposed.

01:03:44.157 --> 01:03:46.977
So there we go. We have the agent and our cause, the patient effect.

01:03:47.657 --> 01:03:52.337
Then you say those components, I can map on the components of language.

01:03:52.577 --> 01:03:56.557
Yep. Okay. Yep. So we're still doing fine. So this is like we have our conceptual

01:03:56.557 --> 01:03:59.337
space, the domains, and there we go.

01:03:59.597 --> 01:04:02.177
And we map this to, let's say, components of language.

01:04:03.157 --> 01:04:08.017
And then now we want to make a sentence. And you're saying the whole sentence

01:04:08.017 --> 01:04:10.837
describes the whole event. No, I'm not saying that.

01:04:11.057 --> 01:04:14.697
I'm saying the sentence describes an aspect of an event. You have to pick out

01:04:14.697 --> 01:04:16.417
a certain aspect of an event.

01:04:16.517 --> 01:04:20.497
And you can start very basically with a subject and a verb.

01:04:20.637 --> 01:04:23.357
But then you can add prepositional phrases. You can add modifiers.

01:04:23.617 --> 01:04:28.157
You can add, I mean, I can say I hit you, but I can say I hit you strongly with a hammer.

01:04:28.277 --> 01:04:32.677
I hit you strongly with a hammer so your eyes became blue and so on.

01:04:32.717 --> 01:04:39.197
I can expand the description of the event by adding more and more parts of the event.

01:04:39.377 --> 01:04:42.677
So that means that I have now a transformation. I have to look at some transformation

01:04:42.677 --> 01:04:49.397
that goes from, let's say, a neutral mapping into a conceptual space of an event,

01:04:49.437 --> 01:04:50.637
as it is in the outside world.

01:04:51.037 --> 01:04:54.057
Then I want to translate this to language, and for that, I have to make a transformation.

01:04:54.357 --> 01:04:56.097
And this transformation depends on the point of view.

01:04:56.437 --> 01:05:00.517
Well, yeah, you make a selection, basically. But in making this selection,

01:05:00.897 --> 01:05:04.677
I still map it into your basic event structure. Yep.

01:05:05.240 --> 01:05:09.700
No? Yes. And to map that basic event structure in language, I still face the

01:05:09.700 --> 01:05:14.360
same problem, which means there's an agent to cause a patient and an effect.

01:05:14.420 --> 01:05:17.040
I mean, the event structure will constrain language.

01:05:17.260 --> 01:05:20.520
It will not determine what you say, but it will constrain the structure of your language.

01:05:20.880 --> 01:05:23.920
But then where's the magic of coming in for the syntax?

01:05:24.960 --> 01:05:28.820
Because still, this is how I have to map it to my sentence. Yeah.

01:05:29.520 --> 01:05:36.420
But, I mean, of course, as I said, that you have to have some way of disambiguating the role.

01:05:36.580 --> 01:05:41.880
So you have to decide, know whether something is a verb or a noun,

01:05:41.980 --> 01:05:46.980
for instance, or whether something is talking about spatial relations or whatever.

01:05:47.640 --> 01:05:54.060
You can do a lot with proto-language without syntax. And when children are learning

01:05:54.060 --> 01:05:55.700
language, they start with the proto-language.

01:05:55.860 --> 01:05:59.280
The grammar comes, most of the grammar comes much later.

01:05:59.580 --> 01:06:03.420
Right. And if you start communicating in another foreign language,

01:06:03.480 --> 01:06:07.240
you start without the syntax. I mean, you have a vocabulary and you try your best.

01:06:08.020 --> 01:06:11.980
So, I mean, this is the proto-language structure of communication.

01:06:12.260 --> 01:06:17.980
For me, syntax is just a tool for helping us in getting more better understanding

01:06:17.980 --> 01:06:22.280
how you made the transformation for the event description to the linguistic expression.

01:06:23.120 --> 01:06:27.180
So, Peter, to wrap up, I have two questions.

01:06:27.180 --> 01:06:32.600
So, I mean, you've gone around in this sort of very cognitive view on the mind,

01:06:32.680 --> 01:06:34.140
if you want, and its different capabilities,

01:06:34.380 --> 01:06:41.780
proposing this notion of conceptual spaces for a while now, looking at its consequences,

01:06:41.920 --> 01:06:43.700
and that seems to be panning out rather well.

01:06:44.080 --> 01:06:49.000
So what would be Peter's law that we have to adhere to in trying to understand the mind?

01:06:51.260 --> 01:06:56.120
Well, one important thing is that our mind divides information into domains.

01:06:56.120 --> 01:07:03.500
I don't have a crisp definition of what is a domain, but I can give you examples.

01:07:04.180 --> 01:07:09.240
And this domain structure shows up in a lot of circumstances.

01:07:09.900 --> 01:07:13.720
So that's one of the big messages I have. Okay.

01:07:14.100 --> 01:07:18.140
And then the second one is five years from now I'm going to go visit you in

01:07:18.140 --> 01:07:22.860
Lund and say, okay, you gave me this prediction five years ago and now I want

01:07:22.860 --> 01:07:26.300
to know whether it really happened. So what's the one prediction you feel most

01:07:26.300 --> 01:07:27.160
passionate about today?

01:07:29.402 --> 01:07:34.982
A prediction, testable prediction that you're going to see tested and validated

01:07:34.982 --> 01:07:37.702
five years from now. I have a lot of smaller predictions.

01:07:37.902 --> 01:07:42.082
I mean, I have predictions about how the semantics of children develop.

01:07:42.282 --> 01:07:45.582
I mean, I can tell you about the order of objectives that we learn, for instance.

01:07:45.722 --> 01:07:50.662
I have predictions about what will be easy to teach to a robot and more difficult

01:07:50.662 --> 01:07:51.922
to teach in terms of verbs.

01:07:52.642 --> 01:07:54.522
Give me the most exciting one.

01:07:56.302 --> 01:08:00.482
With the highest chance of failure. And I haven't thought about formulating.

01:08:00.642 --> 01:08:06.482
Okay. I mean, yeah. Okay. The highest rate of failure is that nouns in general

01:08:06.482 --> 01:08:07.662
involve several domains.

01:08:07.902 --> 01:08:12.322
And when we talk about dogs, they have sounds and smells and sizes and shapes and whatnot.

01:08:12.802 --> 01:08:20.022
Other words, adjectives, prepositions, verbs normally refer to single domains.

01:08:20.022 --> 01:08:27.902
So, if I can validate this hypothesis, I will have a fairly strong toolbox to

01:08:27.902 --> 01:08:31.682
generate further, more detailed predictions about how language works.

01:08:31.702 --> 01:08:37.462
I'm going to have this kind of program of extending this single domain hypothesis

01:08:37.462 --> 01:08:38.822
to different word classes.

01:08:38.982 --> 01:08:40.962
That's a fairly general hypothesis. Exactly.

01:08:41.442 --> 01:08:45.742
That's a good one. And if that works out, I… You'll be very pleased.

01:08:45.802 --> 01:08:46.842
I will be very pleased, yes.

01:08:47.022 --> 01:08:50.282
Okay. Peter Gardefors, thank you very much for this conversation. Thank you very much.

01:08:52.682 --> 01:08:58.562
The CSN podcast was produced by the Convergent Science Network of Biometrics

01:08:58.562 --> 01:09:05.162
and Biohybrid Systems, a project funded by the European Sevens Research Framework Programme.

01:09:05.200 --> 01:09:33.296
Music.