WEBVTT

00:00:03.497 --> 00:00:10.497
This is the Convergent Science Network podcast. Leading researchers in the domain

00:00:10.497 --> 00:00:16.777
of neuroscience, brain theory and technology are interviewed by Paul Verschoor and Tony Prescott.

00:00:21.897 --> 00:00:29.597
This is Paul Verschoor with the Convergent Science Network podcast and I'm here with Daniel Polanyi.

00:00:30.017 --> 00:00:34.457
Daniel spoke about his work here at BCBT Summer School 2016.

00:00:36.017 --> 00:00:42.577
And Daniel, you very much emphasized the more, let's say, information-oriented

00:00:42.577 --> 00:00:46.117
perspective on cognition, and in particular embodied cognition.

00:00:47.277 --> 00:00:53.477
So, how did you end up taking that specific perspective and trying to understand

00:00:53.477 --> 00:00:55.137
this complex phenomenon? phenomenon.

00:00:56.497 --> 00:01:02.337
One key issue that bothered me for a long time is the question,

00:01:02.477 --> 00:01:10.797
how is evolution directed in a way that moves it forward sufficiently fast?

00:01:11.057 --> 00:01:15.137
So one question I was interested in many years ago was evolution of sensors.

00:01:15.477 --> 00:01:23.017
How does an organism or a species learn evolutionarily that a particular sensoric

00:01:23.017 --> 00:01:28.617
channel has information that's relevant for its survival, or could be relevant for its survival.

00:01:28.837 --> 00:01:34.657
Which means that because we can assume that evolution is very local in terms

00:01:34.657 --> 00:01:37.077
of exploring the solution space,

00:01:37.457 --> 00:01:44.077
that there are indicators which are of local nature that nevertheless,

00:01:44.777 --> 00:01:49.757
give a drift to the evolutionary process, which advance it towards exploration

00:01:49.757 --> 00:01:52.897
of further and deeper informational sources.

00:01:52.897 --> 00:01:58.477
And the way to quantify that, to characterize that, for that you wanted to be

00:01:58.477 --> 00:02:02.637
able to say how much or what this information is about.

00:02:03.137 --> 00:02:11.197
And for this we had to understand how to reinterpret information theory as basically handed down by Shannon.

00:02:12.372 --> 00:02:16.252
In a way that allows us to incorporate the concept of relevance.

00:02:16.852 --> 00:02:18.492
So relevant information.

00:02:19.132 --> 00:02:23.172
And a very important article came out in 1999.

00:02:23.272 --> 00:02:27.172
It was the information bottleneck article, which basically made the argument

00:02:27.172 --> 00:02:30.792
that you can actually color, you can actually tag information.

00:02:31.012 --> 00:02:34.692
In other words, you can make a distinction between information that's relevant

00:02:34.692 --> 00:02:36.812
and information that you may have,

00:02:36.812 --> 00:02:40.392
but you don't want but now and

00:02:40.392 --> 00:02:43.792
you you take a very specific view on

00:02:43.792 --> 00:02:46.912
on the evolution in that sense of both uh sensing and

00:02:46.912 --> 00:02:52.832
and cognition because you consider sensors as being highly optimized systems

00:02:52.832 --> 00:02:59.712
that that work close to let's say their physical limits um is that is that critical

00:02:59.712 --> 00:03:04.032
to this approach that you take that sensors are really that highly optimized or that's arbitrary,

00:03:04.972 --> 00:03:10.152
In principle, it's arbitrary, but there are indicators that this gives us a

00:03:10.152 --> 00:03:15.392
hint on the fact that information really is a major driver for evolution.

00:03:15.672 --> 00:03:21.212
So I'd rather would say that this is originally it was a motivator to say,

00:03:21.252 --> 00:03:23.072
yes, information is important for our nature.

00:03:23.072 --> 00:03:29.472
But as we started looking at the information perspective itself we suddenly

00:03:29.472 --> 00:03:34.592
saw that informational perspectives if you assume that evolution uses it as

00:03:34.592 --> 00:03:37.712
a proxy for directing, for genesis.

00:03:39.072 --> 00:03:42.072
Processing information processing or behaviors etc.

00:03:42.312 --> 00:03:50.972
That it may actually desire or not desire but drive, push towards a increasingly

00:03:51.252 --> 00:03:55.112
warm, high resolution sensorics.

00:03:55.192 --> 00:03:57.272
So in other words, it's basically a mutual.

00:03:58.986 --> 00:04:03.066
Feedback loop mutual positive feedback but now

00:04:03.066 --> 00:04:06.086
it's okay if we consider that that sensors are

00:04:06.086 --> 00:04:10.846
highly optimized they're highly optimized relative to what excellent

00:04:10.846 --> 00:04:16.686
question the original idea was that they are highly optimized relevant to some

00:04:16.686 --> 00:04:21.886
hypothesized goals that was what you originally looked at so the original relevant

00:04:21.886 --> 00:04:28.466
information was just about that some indicators came up that that indicates no, it's not enough,

00:04:28.726 --> 00:04:34.826
you get actually a high level of other possible goals that become accessible.

00:04:35.206 --> 00:04:39.726
And of course, if your niche changes, or your agent gets a bigger brain,

00:04:39.906 --> 00:04:46.106
slowly, or you have additional goals in your life, it turns out that suddenly,

00:04:46.246 --> 00:04:48.206
with the same senses, you can solve other problems.

00:04:48.606 --> 00:04:52.686
So in other words, you get the opportunity to drift from goal to goal.

00:04:53.126 --> 00:04:55.866
So several questions emerge, where do the goals come from?

00:04:56.506 --> 00:05:00.146
Second question is how does this drift possible at all?

00:05:00.406 --> 00:05:04.066
Because often, if you're highly specialized, you can't change anything without

00:05:04.066 --> 00:05:05.346
losing performance or else.

00:05:05.466 --> 00:05:10.686
The argument that we suppose now, or that information theoretical view has given

00:05:10.686 --> 00:05:16.366
us is sometimes you get more information than you bargained for.

00:05:16.506 --> 00:05:20.606
This extra information gives you what people would call some kind of.

00:05:21.654 --> 00:05:24.674
Openness for evolution or permits you more

00:05:24.674 --> 00:05:27.614
adaptiveness in evolution that then you

00:05:27.614 --> 00:05:30.494
would naively expect but now we

00:05:30.494 --> 00:05:33.294
can distinguish different sensor systems right so for

00:05:33.294 --> 00:05:37.294
instance we because the oldest sensory systems are probably mechanical systems

00:05:37.294 --> 00:05:42.754
um so they just detect forces then we would have sensor systems that deal with

00:05:42.754 --> 00:05:47.014
molecules molecular structure which is chemical sensing then you have your sensory

00:05:47.014 --> 00:05:50.094
systems with sound pressure waves which which might build again on mechanical

00:05:50.094 --> 00:05:51.634
sensing, that's your auditory system,

00:05:52.154 --> 00:05:56.454
or you might look at a lateral line of fish that would pick up turbulence in flow.

00:05:56.834 --> 00:05:59.554
And then lastly, you would have sensor systems that like photons,

00:05:59.674 --> 00:06:01.074
and there you have vision, right?

00:06:01.134 --> 00:06:06.214
So in that sense, these biological systems you look at, and that you try to

00:06:06.214 --> 00:06:08.394
understand at the first step,

00:06:09.174 --> 00:06:12.654
the sensors from which interface to these different environments in which they

00:06:12.654 --> 00:06:15.194
can exist, you could argue, well,

00:06:15.954 --> 00:06:19.074
but maybe these different environments have also different informational requirements

00:06:19.074 --> 00:06:23.814
and a sense of a single notion of optimization that will sort of fall flat rather

00:06:23.814 --> 00:06:28.554
quickly because there's diversity of subdomains in which sensors have to adapt.

00:06:28.774 --> 00:06:34.614
So do you believe we can have like a generic informational optimization criterion

00:06:34.614 --> 00:06:37.914
to look at these sensor systems or we should already link it to the specific

00:06:37.914 --> 00:06:41.634
subdomain in which they act if we go from mechanical sensing to vision?

00:06:42.514 --> 00:06:47.054
I should expect that, of course, the ecological niche plays a role.

00:06:47.354 --> 00:06:51.974
And you will basically select for various, there will be different attractives.

00:06:51.974 --> 00:06:55.874
If you look at how a sensor and its environment interact, you have different

00:06:55.874 --> 00:06:58.074
attractives, different solutions for the same problem.

00:06:58.294 --> 00:07:00.474
Classic example is of course, bats.

00:07:01.074 --> 00:07:06.354
Bats use essentially the auditory sense in a vision form, in a vision modality.

00:07:07.070 --> 00:07:12.690
This is a very interesting development because it's vision, but it's also active

00:07:12.690 --> 00:07:16.650
sensing, very, very active sensing without sun generation doesn't work.

00:07:16.910 --> 00:07:21.230
But on the other hand, you have, for example, owls, and they just improved their night vision.

00:07:21.550 --> 00:07:27.110
So it's not a unique solution, and maybe there is a historical accident that

00:07:27.110 --> 00:07:28.190
gets you in a particular direction.

00:07:28.190 --> 00:07:32.890
One interesting case is, of course, snakes can detect infrared,

00:07:33.090 --> 00:07:35.450
and they use actually skin sense,

00:07:35.670 --> 00:07:40.270
which has been basically anatomically formed in such a way that it operates

00:07:40.270 --> 00:07:43.210
as a camera obscura for infrared.

00:07:43.450 --> 00:07:47.950
Now, the funny thing about that is everything about this hardware that they

00:07:47.950 --> 00:07:50.250
use essentially is equivalent to what we have.

00:07:50.450 --> 00:07:55.870
So in other words, the main difference between the snake skin sense is,

00:07:56.070 --> 00:08:00.010
apart from the wiring the brain essentially the

00:08:00.010 --> 00:08:03.110
anatomy not the actual nerve uh

00:08:03.110 --> 00:08:05.890
quality or skin property or so

00:08:05.890 --> 00:08:09.290
it's really the anatomy most but now

00:08:09.290 --> 00:08:12.430
if you if you say that that sensors tend to

00:08:12.430 --> 00:08:17.210
operate at some physical limit right they're optimized and indeed we look at

00:08:17.210 --> 00:08:21.810
these strange kind of sensor systems like infrared and the snake that sort of

00:08:21.810 --> 00:08:28.710
is is capitalizing on different sensory systems in addition of course such like

00:08:28.710 --> 00:08:30.250
an infrared detector only can.

00:08:31.340 --> 00:08:35.460
Is effective as a sensor if there's actual signal transduction related to it.

00:08:35.640 --> 00:08:41.540
So can we really talk about the evolution of a sensor in isolation without taking

00:08:41.540 --> 00:08:45.380
into account the morphology in which it's embedded and also the signal transduction

00:08:45.380 --> 00:08:49.040
mechanisms that it has to deploy or it has to be interfaced to?

00:08:49.220 --> 00:08:52.560
Of course, I completely agree. I mean, you have multiple systems interacting.

00:08:52.940 --> 00:08:57.200
And there may be in the evolutionary history, you had different balancers,

00:08:57.200 --> 00:09:02.800
different drives in different directions, and there may be not a single attractor

00:09:02.800 --> 00:09:06.700
into which you converge, starting in from the same original species.

00:09:07.220 --> 00:09:14.300
So I don't expect that. But again, this becomes then a process of actually modeling

00:09:14.300 --> 00:09:15.880
the particular evolutionary pathway.

00:09:18.180 --> 00:09:23.140
What the informational view gives you is not saying how this pathway will look

00:09:23.140 --> 00:09:26.520
like, because that requires a lot of assumptions.

00:09:26.780 --> 00:09:30.360
What it says is, however, what possible niches could look like.

00:09:30.440 --> 00:09:34.960
So what are places where information is hidden that could be discovered?

00:09:35.160 --> 00:09:39.700
And it gives us a hint why you actually can find this niche.

00:09:39.880 --> 00:09:43.680
The big question in evolution is not that there are solutions which work.

00:09:43.880 --> 00:09:48.940
The big question in evolution is, how do you get there? How does evolution actually discover the niche?

00:09:49.180 --> 00:09:54.260
And the argument is, if you have some information that indicates something interesting

00:09:54.260 --> 00:09:58.620
is there, there is, that would be the hypothesis,

00:09:58.960 --> 00:10:06.100
an innate drive, or an innate, relatively generic mechanism that always assumes

00:10:06.100 --> 00:10:07.940
that when there's better information available,

00:10:08.340 --> 00:10:11.480
then it's worth refining it, wherever it comes from.

00:10:11.540 --> 00:10:16.860
And then you have basically something that accelerates adaptation by pure.

00:10:18.719 --> 00:10:21.739
If you like, experience that this actually works in our world.

00:10:21.959 --> 00:10:26.899
In other words, information is a local, gives you a local gradient or a local

00:10:26.899 --> 00:10:30.039
indicator where good solutions may be found.

00:10:30.459 --> 00:10:35.819
Right. But this is an important issue, right? Because it's also sometimes because

00:10:35.819 --> 00:10:41.179
of the processing that goes on behind the sensor that you actually can reach

00:10:41.179 --> 00:10:44.139
the actual hyperacuity of the sensor itself.

00:10:44.559 --> 00:10:48.619
So that means the optimization of that sensor sheet, In some sense,

00:10:48.659 --> 00:10:53.119
it's not informing you about the kind of information processing that is possible.

00:10:53.239 --> 00:10:58.119
You start, let's say, to integrate across multiple senses. A typical example is chemical sensing,

00:10:58.379 --> 00:11:04.219
right, where the sensitivity of single chemoreceptors on the male moth antenna

00:11:04.219 --> 00:11:10.499
are orders of magnitude weaker than the detectability,

00:11:10.799 --> 00:11:15.639
the ability that the animal has with respect to heart rate responses to pheromones, right?

00:11:15.639 --> 00:11:20.639
So, you will see heart rate changes to homeopathic levels of hormones in the

00:11:20.639 --> 00:11:25.179
air that you will never be able to pick up with a single sensor that they have on their antenna.

00:11:25.559 --> 00:11:30.579
So, that means, in some sense, it would argue that to really think about sensor

00:11:30.579 --> 00:11:35.599
systems in isolation could also then be misleading you in terms of understanding

00:11:35.599 --> 00:11:41.979
what the informational capabilities are of the system that is integrating that sensor.

00:11:41.979 --> 00:11:46.259
So, isn't there a risk in the analysis you presented that you say,

00:11:46.299 --> 00:11:50.679
well, if we have a sensor and the sensor gives you the upper limit of what can be achieved.

00:11:50.939 --> 00:11:54.719
And we know for many biological systems, this is maybe not really the case.

00:11:54.739 --> 00:11:59.839
They can go beyond what the sensor as such in isolation will be able to deliver.

00:12:00.399 --> 00:12:04.179
Well, if you look at that, you'd have to look at integration over time,

00:12:04.299 --> 00:12:05.939
for example, and take memory into account.

00:12:06.359 --> 00:12:08.739
Right. So, of course….

00:12:10.717 --> 00:12:18.017
When we say a sensor limits the information in the sense of data processing

00:12:18.017 --> 00:12:22.577
inequality, that you can't get more information through a sensor than the sensor permits.

00:12:22.737 --> 00:12:28.097
But of course, if you can integrate over time, then the effective bandwidth is much higher.

00:12:28.757 --> 00:12:32.097
You can accumulate evidence, there's no question about that.

00:12:32.197 --> 00:12:35.617
And if you do it for long enough, then you at some point have enough information

00:12:35.617 --> 00:12:36.857
to make your decision for itself.

00:12:37.597 --> 00:12:41.857
Or you can integrate information which otherwise is worthless and suddenly other

00:12:41.857 --> 00:12:44.257
information comes in and suddenly this information becomes valuable.

00:12:44.497 --> 00:12:50.257
In fact, in the case of the snakes, the infrared and optical information is

00:12:50.257 --> 00:12:55.677
integrated in the optic tactile and it's quite intricate how they interact and

00:12:55.677 --> 00:12:58.797
how the snake decides there's an object of interest there.

00:13:01.377 --> 00:13:06.317
So this now becomes relevant for the second point that you made,

00:13:06.457 --> 00:13:10.237
also your talk, where you say well so sensors are optimized so that means also

00:13:10.237 --> 00:13:11.397
from an energetic perspective,

00:13:12.057 --> 00:13:15.017
they they are optimized to give you certain kinds of

00:13:15.017 --> 00:13:20.717
information efficiently while the process and the cognition that runs behind

00:13:20.717 --> 00:13:26.157
it is expensive right because metabolically the brain takes a disproportionate

00:13:26.157 --> 00:13:34.617
amount of your energy budget as as an organism and um so so you see that as the

00:13:34.697 --> 00:13:37.557
main from an informational perspective this this

00:13:37.557 --> 00:13:40.837
this energetic optimization of then the processing

00:13:40.837 --> 00:13:43.937
the brains perform is the main challenge for evolution

00:13:43.937 --> 00:13:48.837
is that is that the consequence of what you're saying it's a bit more complicated

00:13:48.837 --> 00:13:53.197
than that um there are different time scales at play here because if i have

00:13:53.197 --> 00:13:57.717
a brain that's big it will eat that energy and i can't just say okay as a as

00:13:57.717 --> 00:14:03.197
an adult i will just shrink my brain by 50 percent who need it actually does happen in some animals.

00:14:03.677 --> 00:14:06.077
There's a it's called a fish that eats its own brain.

00:14:06.837 --> 00:14:09.857
And that animal basically finds a rock.

00:14:10.906 --> 00:14:14.486
So it's wrong until it finds a rock, and when it finds it, it will never leave the rock again.

00:14:15.066 --> 00:14:19.366
And when that happens, it actually basically consumes its own brain.

00:14:19.486 --> 00:14:20.966
So it does happen, but it's not typical.

00:14:21.246 --> 00:14:25.926
So the way to look at it, in my opinion, is from a point of view,

00:14:26.026 --> 00:14:32.486
can you profit from a bigger brain, say, on a longer timescale,

00:14:32.526 --> 00:14:34.326
longer means over several generations?

00:14:34.326 --> 00:14:37.386
And if you can, then yes, you maintain

00:14:37.386 --> 00:14:42.786
that, otherwise you just reduce the investment into bigger brains.

00:14:43.066 --> 00:14:47.426
You should also not forget that there's of course Darwin's brilliant idea of sexual selection.

00:14:47.566 --> 00:14:53.166
So maybe the highly social animals will be more selective towards more intelligent

00:14:53.166 --> 00:14:58.166
sexual partners, and thereby the brains will be driven to be bigger.

00:14:58.246 --> 00:15:01.706
On the other hand, that has to be sustainable,

00:15:01.886 --> 00:15:05.346
and that only works if the brain actually that's something sensible right

00:15:05.346 --> 00:15:08.146
now would you but now so now you see the

00:15:08.146 --> 00:15:11.326
what i'm driving at right because i was

00:15:11.326 --> 00:15:14.206
making the point that the sensory the

00:15:14.206 --> 00:15:17.186
the perceptual or the sensory capabilities of the

00:15:17.186 --> 00:15:20.066
organism can go beyond what

00:15:20.066 --> 00:15:25.046
the sensor can deliver by virtue of doing the processing right so now i could

00:15:25.046 --> 00:15:30.686
argue well maybe this solution was identified or was converged on during evolution

00:15:30.686 --> 00:15:35.226
Because to really optimize the sensor would actually metabolically be way more

00:15:35.226 --> 00:15:39.586
expensive than to put that effort in the processing.

00:15:39.986 --> 00:15:42.186
Like integration and time, use memory.

00:15:42.426 --> 00:15:45.786
And it's in that combination that they can actually get a virtual sensor,

00:15:45.906 --> 00:15:49.746
if you want, or an effective sensor that gives me the information I need.

00:15:49.746 --> 00:15:57.426
I basically would be loathe to separate them so strictly, because sensing and

00:15:57.426 --> 00:16:04.166
processing are, I can't imagine situations where it's almost impossible to distinguish.

00:16:06.513 --> 00:16:10.433
So in some cases, relatively simple to distinguish, in some cases not.

00:16:11.853 --> 00:16:17.193
We have an example, I didn't show that in the talk because of lack of time,

00:16:17.273 --> 00:16:22.773
but we have an example where we can choose whether we prefer to use memory or

00:16:22.773 --> 00:16:28.093
sense the sensor as to achieve a certain utility value.

00:16:28.333 --> 00:16:31.393
And you can shift that around and say if the sensing is cheaper,

00:16:31.513 --> 00:16:36.673
then you shift it towards sensing and you use less memory, and vice versa.

00:16:37.133 --> 00:16:40.853
And you can look at it yourself. If you look at your Google Maps,

00:16:41.053 --> 00:16:45.413
when you find a road, look at the map all the time, then essentially you say,

00:16:45.533 --> 00:16:48.053
sensing is cheaper than remembering the path.

00:16:48.453 --> 00:16:54.833
But of course, this is very inconvenient when you do the path a lot of times,

00:16:54.933 --> 00:16:58.333
and it may be actually cheaper for you to keep it in mind, not having to look

00:16:58.333 --> 00:17:01.773
and stop at every corner to watch the map externally.

00:17:01.773 --> 00:17:04.713
Externally so i think i think i do agree

00:17:04.713 --> 00:17:07.853
with you that um processing is not maybe

00:17:07.853 --> 00:17:10.913
viewed as an extension of sensing perhaps that's

00:17:10.913 --> 00:17:13.893
the way it's it emerged that basically um

00:17:13.893 --> 00:17:19.113
the system discovered that while treating your own brain as a meta sensor is

00:17:19.113 --> 00:17:25.733
is a good thing uh on the other hand it's i'm not clear the way but whether

00:17:25.733 --> 00:17:31.033
we can really separate that yeah but still for the for your framework, this is,

00:17:31.573 --> 00:17:36.393
let's call it the challenge or a constraint we have to again look at,

00:17:36.533 --> 00:17:38.333
maybe after we went through the framework.

00:17:39.473 --> 00:17:45.253
Because it seems to say, well I must be optimal in informational terms because

00:17:45.253 --> 00:17:48.273
that will help me to reduce the cost of the processing.

00:17:51.433 --> 00:17:53.313
I would say that.

00:17:54.954 --> 00:17:58.494
That the cost of the processing must be sustainable.

00:17:58.694 --> 00:18:06.334
It's like having a company that permits itself to have a certain amount of administration level.

00:18:06.994 --> 00:18:11.734
And for example, if I have a big administration, you have to make enough money

00:18:11.734 --> 00:18:13.434
to keep these people working.

00:18:14.274 --> 00:18:19.054
Doesn't mean that you immediately reduce the administration when you don't have

00:18:19.054 --> 00:18:24.294
enough money, or that you will just not survive on the long term if you have

00:18:24.294 --> 00:18:28.154
a big administration that you can handle or that can handle your stuff efficiently.

00:18:28.654 --> 00:18:34.214
So the parsimony principle is, of course, that if I have something to process,

00:18:35.194 --> 00:18:41.614
then I don't want to waste too much effort because I may have other things to process.

00:18:42.314 --> 00:18:46.214
If I have an emergency service, of course, that emergency service has a certain

00:18:46.214 --> 00:18:52.034
bandwidth, public, which is required, and it has to be activated at any time.

00:18:53.134 --> 00:18:57.254
And if the emergency service requires 10 bits per second for some reason,

00:18:57.394 --> 00:19:01.214
these 10 bits must be available to me as an organism, but of course,

00:19:01.254 --> 00:19:07.574
I may do different things and of boring things I'm doing kind of grazing or just walking around.

00:19:07.734 --> 00:19:12.174
I don't want to lose a lot of processing power because I may need it for other

00:19:12.174 --> 00:19:18.294
things. So the argument is a mixture of energy processing, a mixture of other

00:19:18.294 --> 00:19:21.414
resources that may be required for other tasks.

00:19:22.394 --> 00:19:25.294
It is, however, still a parsimony.

00:19:26.554 --> 00:19:34.414
Right. Okay. But then, so that then brings us to sort of the linking of the

00:19:34.414 --> 00:19:36.394
sensing and the action and decisions, right?

00:19:36.394 --> 00:19:41.574
And this is where I think you really put the brunt of your effort to try to

00:19:41.574 --> 00:19:48.834
understand what should be the properties of what's called a decision-making component. So...

00:19:51.164 --> 00:19:58.724
So how do you see then the core on the one hand constraints that this decision-making system is facing?

00:19:59.444 --> 00:20:04.604
And how do you see what are the main principles that allow a decision-making

00:20:04.604 --> 00:20:06.384
system to satisfy those constraints?

00:20:07.904 --> 00:20:11.804
Well, the core constraint is, of course, the way the agent is embodied in the

00:20:11.804 --> 00:20:15.824
world. This imprints a signature on the information flow. of.

00:20:16.344 --> 00:20:22.144
The way what you do in the world impinges in the world and you perceive the

00:20:22.144 --> 00:20:25.944
impingement again determines what you can possibly do.

00:20:26.164 --> 00:20:29.984
Because that's not something that it's in your choice. It's given by physics,

00:20:30.064 --> 00:20:32.544
by your physicality, by your body.

00:20:33.684 --> 00:20:37.064
So this is the main constraint that determines what happens.

00:20:37.184 --> 00:20:41.524
The other constraint is, and that's much freer, and that's of course an apostolate,

00:20:41.664 --> 00:20:45.284
which may be wrong, of course, that the brain essentially Essentially,

00:20:45.384 --> 00:20:51.144
at least in our models, it's essentially free to organize this information.

00:20:51.424 --> 00:20:52.924
This is, of course, not real.

00:20:53.384 --> 00:20:59.184
In reality, there are other constraints. But what we would like to know is what

00:20:59.184 --> 00:21:01.284
other constraints are natural.

00:21:01.584 --> 00:21:06.084
So one example is, for example, this goal-relevant information where we specifically

00:21:06.084 --> 00:21:11.304
split the simple decision-making and the long-term goal, say,

00:21:11.464 --> 00:21:14.764
to study the emergence of sub-goals.

00:21:14.864 --> 00:21:19.024
Of course we put in an assumption here that there's a long term memory that

00:21:19.024 --> 00:21:24.164
stores the goal we want to go to ultimately this is an assumption as Sander

00:21:24.164 --> 00:21:29.404
van Dyck one of my former PhD students said it's embranement it's,

00:21:30.532 --> 00:21:35.512
not the body that we are fixing here, but actually how the brain is constrained internally.

00:21:36.552 --> 00:21:42.652
Ideally, in our studies, we want to limit the assumptions about embrayment as

00:21:42.652 --> 00:21:46.612
far as possible, because we would like to have an answer is what type of brain

00:21:46.612 --> 00:21:50.632
structures do you want to process this information efficiently?

00:21:51.772 --> 00:21:56.992
So in other words, you ask, I want to process information, okay, Okay, a certain amount.

00:21:57.112 --> 00:22:01.892
And the question is, are there special sub-manifolds of solutions which prefer

00:22:01.892 --> 00:22:07.792
certain brain organization for processing the search adaptation faster or efficient?

00:22:08.092 --> 00:22:10.032
This is a question that is ongoing research.

00:22:10.392 --> 00:22:14.252
We don't have a clear answer. But clearly, when you make assumptions about a

00:22:14.252 --> 00:22:16.192
brain, if I may use this word.

00:22:18.192 --> 00:22:21.412
Then we get things like, for example, versions of sub-goals,

00:22:21.452 --> 00:22:27.312
just by virtue of saying, okay, there's somewhere where I'm storing slowly changing long-term goals.

00:22:27.632 --> 00:22:32.192
And then this concept of some goal emerges naturally from the interaction in the world.

00:22:32.372 --> 00:22:36.392
So, making judicious assumptions about how the brain is structured,

00:22:37.172 --> 00:22:43.952
how the world is structured, can give you very natural hypotheses about emergence of natural phenomena.

00:22:44.412 --> 00:22:48.692
Again, yeah. But wouldn't it be fair to say that it's more like enmindment?

00:22:48.852 --> 00:22:55.452
Because you cannot really make normative statements on structure at best you

00:22:55.452 --> 00:23:03.952
can make normative statements on function on information flows just for clarity I'm not very,

00:23:05.672 --> 00:23:11.952
ideological on that it is not a statement about how the brain actually looks

00:23:11.952 --> 00:23:18.432
it's a statement about how a possible organization of information processing

00:23:18.432 --> 00:23:21.372
may look for certain purposes.

00:23:22.372 --> 00:23:25.932
We have assumptions about how information is being processed,

00:23:26.052 --> 00:23:27.692
but these assumptions are.

00:23:29.019 --> 00:23:32.699
At this stage not really well founded and

00:23:32.699 --> 00:23:35.759
they are based on possibility and whether

00:23:35.759 --> 00:23:38.639
the phenomena that emerge are something you

00:23:38.639 --> 00:23:41.359
actually see yeah but in that sense you really want to

00:23:41.359 --> 00:23:44.479
get to a normative formal framework right

00:23:44.479 --> 00:23:47.279
because that was also one of the points you made that it says

00:23:47.279 --> 00:23:50.179
the biology is ambiguous we have no clear understanding

00:23:50.179 --> 00:23:53.199
how this works as a structure robotics is

00:23:53.199 --> 00:23:55.999
arbitrary people have many different solutions for the same

00:23:55.999 --> 00:23:59.599
thing we don't know whether there's any common principle so

00:23:59.599 --> 00:24:02.759
what we really need is is let's say

00:24:02.759 --> 00:24:05.639
a normative framework that says okay these are the

00:24:05.639 --> 00:24:08.919
decisive criteria that all these systems have to follow in order

00:24:08.919 --> 00:24:12.079
to be now informationally optimal yes okay

00:24:12.079 --> 00:24:15.159
and and that that's why i was hassling you earlier on

00:24:15.159 --> 00:24:18.079
this on this notion of informational optimality because that's of course a very

00:24:18.079 --> 00:24:24.099
important guiding principle and assumption of this whole framework yes right

00:24:24.099 --> 00:24:30.119
yeah um so now but then you you linked your framework to this informational

00:24:30.119 --> 00:24:33.839
framework that you're advancing to the Carnot cycle,

00:24:34.059 --> 00:24:36.459
which essentially describes energy.

00:24:37.379 --> 00:24:41.339
It's sort of the expansion of a chamber that allows you to do work.

00:24:41.619 --> 00:24:46.779
So why do you think that's a good metaphor to look at decision-making and information

00:24:46.779 --> 00:24:48.439
processing in biological systems?

00:24:48.779 --> 00:24:57.359
I was very kind of going quite bravely into an aggressive metaphor here.

00:24:57.659 --> 00:25:03.619
But of course, there are actual attempts to link information processing and physics.

00:25:03.979 --> 00:25:09.099
And we have seen quite a bit of progress in recent years, actually.

00:25:09.759 --> 00:25:13.079
For example, by David Volpert, who has generalized Landauer's principle,

00:25:13.259 --> 00:25:16.459
and there are a couple of other really interesting pieces of work in this area.

00:25:16.919 --> 00:25:23.239
And the question is, on a very low level of physics, there seems to be an intricate

00:25:23.239 --> 00:25:26.999
relation between energy processing processing, energy consumption,

00:25:27.339 --> 00:25:29.019
energy production, entropy production.

00:25:29.739 --> 00:25:34.759
Of course, these levels are very, very far away from where an organism sits

00:25:34.759 --> 00:25:39.479
in his information processing level. However, in between...

00:25:40.371 --> 00:25:44.571
There are many levels, and we have to take into account that in every level,

00:25:44.611 --> 00:25:46.231
there is information processing happening.

00:25:46.351 --> 00:25:50.571
When a cell organizes its organelles, the organelles organize the ATP consumption.

00:25:51.111 --> 00:25:53.611
This is organization. This is information processing.

00:25:54.271 --> 00:25:57.951
And I would now go on an extreme speculation.

00:25:58.191 --> 00:26:02.751
I made it entirely wrong, so don't take my word for it.

00:26:02.751 --> 00:26:09.011
But what I would say is that as you progress to arrange your information to

00:26:09.011 --> 00:26:10.211
higher and higher hierarchies,

00:26:10.211 --> 00:26:15.011
you have a kind of loss function or loss component to basically every hierarchy

00:26:15.011 --> 00:26:18.951
level loses your factor 100 of your free energy that you have.

00:26:18.951 --> 00:26:21.731
And what remains is kind of your investment for the next level.

00:26:22.231 --> 00:26:28.871
And as you go up and up and up, only very little remains for you to actually

00:26:28.871 --> 00:26:33.611
operate on freely and free in the sense that you can do new discoveries that

00:26:33.611 --> 00:26:35.911
accumulate this information. Most of the stuff is administration.

00:26:36.371 --> 00:26:41.571
So administering ATP, where ATP goes, where your cellular motors are driving

00:26:41.571 --> 00:26:45.831
stuff out, getting trash out, getting nutrients in.

00:26:46.831 --> 00:26:50.731
This is information processing except that you don't know what's happening.

00:26:50.891 --> 00:26:53.391
It just happens under the boot.

00:26:53.571 --> 00:26:58.971
But I do think that in principle a complete theory would encompass the lowest

00:26:58.971 --> 00:27:04.191
level unbreakable barriers of Landauer essentially and friends.

00:27:05.668 --> 00:27:10.988
To the highest level where essentially information is almost detached from the physicality,

00:27:11.588 --> 00:27:14.288
in a way um and find out no no

00:27:14.288 --> 00:27:18.068
of course it's not detached there is a link but i think this link becomes more

00:27:18.068 --> 00:27:21.288
and more tenuous with every level so it's very at this stage we're very far

00:27:21.288 --> 00:27:25.888
from actually seeing how the high level information processing constraints are

00:27:25.888 --> 00:27:30.928
linked with a low level physical so i two parts with two answers one answer

00:27:30.928 --> 00:27:32.688
is an aggressive method metaphor nothing

00:27:32.788 --> 00:27:37.888
else the other part is no it's actually not just a metaphor it's real but um

00:27:37.888 --> 00:27:44.468
that kind of cycle is really far down the scale and cognition is very far right but that's an important,

00:27:45.068 --> 00:27:48.948
point about this right because it also means that you are willing to commit

00:27:48.948 --> 00:27:52.928
the notion of information processing to actually physical properties of the

00:27:52.928 --> 00:27:57.228
system because in the end what we're talking about is quantifying the entropy

00:27:57.228 --> 00:27:59.388
in the system where the entropy is increasing or decreasing.

00:27:59.708 --> 00:28:03.728
If you say, look, this is information processing, it means there's a change in entropy.

00:28:04.888 --> 00:28:08.768
Essentially, this is really important that we don't get confused what we mean with information then.

00:28:09.108 --> 00:28:13.928
We need to be careful. You don't necessarily reduce entropy in the system itself

00:28:13.928 --> 00:28:15.688
by doing information processing.

00:28:16.108 --> 00:28:19.268
You have to take into account the environment.

00:28:19.968 --> 00:28:24.728
So in the environment, you can choose sub-environments where you reduce entropy,

00:28:24.908 --> 00:28:26.588
for example, by information processing. sensing.

00:28:27.048 --> 00:28:33.548
You definitely increase other type of entropy because you simply generate trash, if you want to.

00:28:33.748 --> 00:28:36.108
And even your system remains unchanged.

00:28:36.328 --> 00:28:40.888
You can have a completely reactive robot that essentially pushes all the boxes

00:28:40.888 --> 00:28:43.088
in your room to the walls.

00:28:43.368 --> 00:28:49.408
But that system reduces the entropy of the boxes distributed over the room.

00:28:49.968 --> 00:28:53.608
Itself, it's completely reactive, so there's no internal change of entropy.

00:28:54.548 --> 00:28:59.088
And of course, entropy in total of the universe increases, because there's heat

00:28:59.088 --> 00:29:04.468
and the atoms, the gas of the air moves faster, whatever. So in other words.

00:29:06.280 --> 00:29:09.740
Information processing does not mean for the agent itself reduction of entropy.

00:29:10.480 --> 00:29:14.800
No, it relates to the observer describing the agent in its environment.

00:29:15.260 --> 00:29:19.180
Yes, if you include the environment, then yes. And also it depends on what you

00:29:19.180 --> 00:29:19.880
consider the environment.

00:29:20.500 --> 00:29:27.740
So it really depends what you look at. And on top of that, even it would depend

00:29:27.740 --> 00:29:30.400
on which state variables of the agent environment you consider.

00:29:30.560 --> 00:29:35.500
Absolutely, yes. So this is really important to understand. It's a relative perspective.

00:29:36.480 --> 00:29:41.580
It's observer-dependent. It's always observer-dependent, except if you go to

00:29:41.580 --> 00:29:43.860
the full wave function, if you like, the full state,

00:29:44.660 --> 00:29:50.180
in which case the problem is, in my opinion, not yet satisfactorily addressed,

00:29:50.420 --> 00:29:51.840
but perhaps it will happen.

00:29:52.080 --> 00:29:56.100
Right. So then to illustrate, to introduce your framework,

00:29:56.240 --> 00:30:00.460
you start to make a distinction between open and closed-loop systems And to

00:30:00.460 --> 00:30:07.220
try to build a more formal perspective on how they would be differentiated better or worse, right?

00:30:07.540 --> 00:30:14.240
So why do you think now open and closed loop, that comparison is helpful to

00:30:14.240 --> 00:30:17.940
introduce this information theoretic framework that you're advancing?

00:30:18.420 --> 00:30:21.920
We could have looked at, let's say, sensory processing as we discussed earlier, right?

00:30:21.920 --> 00:30:28.460
Okay, I mean, essentially, this idea of to Shetton Lloyd of considering these

00:30:28.460 --> 00:30:32.800
two cases there in the ways of the extreme cases, an agent that is basically

00:30:32.800 --> 00:30:36.320
nothing else but the blind process doesn't take in any information.

00:30:36.320 --> 00:30:39.140
It's basically a modulation of physics.

00:30:40.237 --> 00:30:43.977
Consider a modulation of physics, which has a particular property that itself,

00:30:44.217 --> 00:30:45.697
it does not take in any information.

00:30:46.277 --> 00:30:52.537
Closed loop means it does take in this information and gives it extra power.

00:30:52.757 --> 00:30:55.217
It makes it a more complicated process.

00:30:55.557 --> 00:30:59.457
But it turns out that you can, and that's where it gets interesting.

00:31:00.697 --> 00:31:07.277
Bound the extra entropic influence of this closed loop agent by how much it takes in.

00:31:07.277 --> 00:31:11.797
And this is, in my opinion, very cool, because you see for the first time,

00:31:11.797 --> 00:31:14.597
in a way, well, not for the first time, I should be solving it before,

00:31:14.677 --> 00:31:17.157
but in a precise sense for the first time,

00:31:17.257 --> 00:31:26.637
that cognition or cognitive performance underlies information processing principles.

00:31:26.637 --> 00:31:34.557
You can't just make decisions of a certain quality without having a certain

00:31:34.557 --> 00:31:39.797
informational invariance or minimal balance respected.

00:31:40.077 --> 00:31:46.317
And that is cool because you essentially say cognition is not some kind of abstract

00:31:46.317 --> 00:31:50.417
platonic thing that just happens somewhere and you can make anything happen.

00:31:50.577 --> 00:31:54.117
No, you can't make anything happen. happen, there are certain constraints on

00:31:54.117 --> 00:31:56.897
what you can make happen under certain circumstances,

00:31:57.117 --> 00:32:03.517
which I think is a really important step because it says, unlike what AI usually

00:32:03.517 --> 00:32:07.377
does, which reads AI kind of intelligent decision making in an abstract world

00:32:07.377 --> 00:32:10.417
that's devoid of any constraint,

00:32:10.777 --> 00:32:16.977
you can essentially think of anything, you are constrained by very, very tangible.

00:32:18.897 --> 00:32:19.597
Aspects of the world.

00:32:20.217 --> 00:32:24.317
But now, so what you're saying is, look, if you consider the open loop case

00:32:24.317 --> 00:32:28.897
plus the information added by being able to sense, you give an upper bound of

00:32:28.897 --> 00:32:31.677
what the closed loop controller can do. Yes. Right?

00:32:32.797 --> 00:32:38.937
But I could, that's a toy example because within the context of the niche and

00:32:38.937 --> 00:32:43.597
that you earlier emphasized as being also relevant, if I have an open loop control

00:32:43.597 --> 00:32:46.517
in an environment with predators, I'm dead in no time.

00:32:46.937 --> 00:32:52.137
Right? So then you can go, okay, but maybe the upper bound would require the

00:32:52.137 --> 00:32:57.897
open loop plus the information coming from your sensors plus some minimum memory system.

00:32:58.357 --> 00:33:02.917
Sure. Right? To satisfy, let's say, some lower bound of survivability. Of course.

00:33:03.557 --> 00:33:07.537
Does that matter to you or that doesn't matter? No, no. Memory is the next step.

00:33:07.777 --> 00:33:12.277
Okay. We did not consider memory for a very simple reason because,

00:33:12.517 --> 00:33:16.077
well, to shed a light, did not consider memory. But of course,

00:33:16.097 --> 00:33:17.377
memory is the next natural step.

00:33:18.117 --> 00:33:24.637
In fact, there are some attempts to consider memory as a kind of constructed

00:33:24.637 --> 00:33:32.337
reality that is constructed in such a way that it keeps the things active and alive, which are...

00:33:33.522 --> 00:33:37.502
Relevant and which require history to accumulate, for example. Yeah.

00:33:37.802 --> 00:33:42.782
So if you want to have a thresholding process, you have a memory that is able

00:33:42.782 --> 00:33:46.102
to count or to measure, oh, yes, I have seen enough, and please,

00:33:46.262 --> 00:33:48.502
now we can make a decision.

00:33:48.722 --> 00:33:52.342
So yes, of course, memory is the natural next step.

00:33:52.602 --> 00:33:57.622
But memory is a strange thing, in my opinion, because it's half environment

00:33:57.622 --> 00:34:01.302
and half agent in a way. It's a quite hybrid thing.

00:34:02.562 --> 00:34:05.502
I agree no you're right so then

00:34:05.502 --> 00:34:08.522
okay so so here we have this this example but

00:34:08.522 --> 00:34:13.422
now we can start to define let's say constraints onto this agent environment

00:34:13.422 --> 00:34:17.422
interaction but now in open closed-loop case you discuss basically we have a

00:34:17.422 --> 00:34:22.582
world state uh which are considered discrete states if i got it right and this

00:34:22.582 --> 00:34:27.682
is something else we can worry about and then we have sensory states and we have actions, okay?

00:34:28.402 --> 00:34:32.842
So we have a three-state system. And so world, sensor, action,

00:34:32.962 --> 00:34:34.602
and they're coupled to each other.

00:34:36.362 --> 00:34:42.002
What in that makes the agent then? That is a very sharp question.

00:34:42.142 --> 00:34:50.142
So if you remember my diagram about world, sensors, memory, actors,

00:34:50.282 --> 00:34:55.182
memory, and so on, I emphasized the symmetry between world and memory.

00:34:55.182 --> 00:34:57.902
And when you ask what makes the

00:34:57.902 --> 00:35:00.902
agent there the question is very subtle if you look just

00:35:00.902 --> 00:35:04.002
at the graph you don't see it the graph

00:35:04.002 --> 00:35:07.202
itself does not make a distinction there's no way of distinguishing world

00:35:07.202 --> 00:35:11.982
memory my personal opinion is and that's completely speculative i can't prove

00:35:11.982 --> 00:35:16.822
it at this stage the main difference between world memory is the fact that the

00:35:16.822 --> 00:35:22.042
world arrows are highly constrained there is very little that can happen there

00:35:22.042 --> 00:35:25.862
and the information density is low the world is simple.

00:35:26.282 --> 00:35:32.682
Memory is where you can rewire in principle, at least arbitrarily.

00:35:32.862 --> 00:35:37.402
So you could have a maximally compressed information processing.

00:35:38.722 --> 00:35:44.242
Which is essentially when we talk free will, I think that's what hides behind it.

00:35:44.662 --> 00:35:49.302
The fact that in principle you can have via evolution or adaptation or whatever,

00:35:49.442 --> 00:35:53.962
a very, very complicated rewiring of the memory, which is virtually arbitrary.

00:35:54.222 --> 00:35:58.322
Think of a computer, a keyboard, you can choose any keyboard you want,

00:35:58.382 --> 00:36:00.282
can rewind it as you want.

00:36:00.402 --> 00:36:04.882
But what you can't choose is how to organize the pixels on the screen so that

00:36:04.882 --> 00:36:06.082
your eyes will recognize it.

00:36:06.322 --> 00:36:09.882
So in other words, there you have very strict constraint about geometry,

00:36:10.162 --> 00:36:14.542
but on the way your actuators will operate with that, and basically your memory

00:36:14.542 --> 00:36:16.722
can operate with it, and you have many choices.

00:36:17.682 --> 00:36:22.882
And I believe that the agent, If I give you a system which contains a world

00:36:22.882 --> 00:36:26.042
and an agent, I think the agent will be that part of the system.

00:36:27.347 --> 00:36:32.007
Where the constraints are basically unconstrainedly complicated.

00:36:32.387 --> 00:36:36.787
So the world would be the part which has lots of compressible structure.

00:36:37.407 --> 00:36:42.307
It's a very vague statement, I know. But let's resonate with it anyway.

00:36:42.647 --> 00:36:51.607
So already the diagram that you sketch is in some sense a tiny fraction of the

00:36:51.607 --> 00:36:56.467
total set of possible states, because in some sense in your diagram, you go back in time.

00:36:56.467 --> 00:37:02.507
You say, okay, I'm at T0, and now I can show you back into the past where I came from.

00:37:02.527 --> 00:37:06.007
And that's this trajectory of world states, sensor states, action, et cetera.

00:37:06.087 --> 00:37:11.907
Because at any point in time, there's a plurality of world states, right?

00:37:11.987 --> 00:37:14.567
Sure, yeah. There's a plurality of sensor states.

00:37:15.267 --> 00:37:18.907
There's a huge action potential. That's true. Right?

00:37:19.927 --> 00:37:23.667
And then it all collapses into one world state, one sensory state,

00:37:23.747 --> 00:37:27.107
one action. And then we have our next world state, right?

00:37:27.307 --> 00:37:34.787
So that means we have to constrain that highly variable set of states of different kinds.

00:37:34.787 --> 00:37:41.747
So my question is, aren't we lacking the key state that makes the agent an agent,

00:37:41.887 --> 00:37:43.747
and that is an internal drive state,

00:37:43.947 --> 00:37:50.107
that the agent is, I'm ready to explore to serve my survival of my informational

00:37:50.107 --> 00:37:57.127
needs, or I'm ready to consume a resource because I have to take care of my energetic needs.

00:37:57.127 --> 00:38:03.487
So, isn't that the internal state defined by the survivability of the agent,

00:38:03.547 --> 00:38:07.827
not a key constraint on this plurality of world sensor and action states?

00:38:08.407 --> 00:38:14.047
Well, the model is quite general. So, in fact, what M is…,

00:38:14.870 --> 00:38:17.870
i didn't say so it's not a problem to plug into

00:38:17.870 --> 00:38:20.650
m or to internally consider m as

00:38:20.650 --> 00:38:23.590
consign contains such a let's call it pseudo goal

00:38:23.590 --> 00:38:28.470
or pseudo teleological um parameter that's

00:38:28.470 --> 00:38:33.650
not a problem um so we could split it so the assumption that m is one coherent

00:38:33.650 --> 00:38:39.130
blob is the most generic assumption when you have just one age um so you could

00:38:39.130 --> 00:38:43.350
put it in wait why would you put it in m It should actually already mediate

00:38:43.350 --> 00:38:45.270
between the sensory state and the action state,

00:38:45.490 --> 00:38:47.750
right? In the earlier example.

00:38:48.750 --> 00:38:53.070
Yes. In the earlier example, we had just a reaction. But essentially,

00:38:53.310 --> 00:38:57.750
in the later example, S doesn't talk to A without M.

00:38:58.110 --> 00:39:03.630
Yeah, exactly. So you collapse it all in M, essentially. Everything is in M, yes. Okay.

00:39:04.930 --> 00:39:11.370
So then, okay, so we have a scheme now. We can think about how behavior comes

00:39:11.370 --> 00:39:16.110
about and how behavior in turn depends on sensory states in the more advanced

00:39:16.110 --> 00:39:17.690
version, how it also depends on memory.

00:39:17.890 --> 00:39:23.290
Okay, good. So we got that. But now what you really want to understand is,

00:39:23.350 --> 00:39:24.670
okay, what should my actions be?

00:39:24.790 --> 00:39:29.690
And then you say, well, my normative perspective on that is that your actions

00:39:29.690 --> 00:39:34.990
should serve your informational needs because the controller wants to optimize

00:39:34.990 --> 00:39:39.210
its information processing because that's the most expensive thing it's facing. Correct?

00:39:39.990 --> 00:39:46.090
It's a mixture. I mean, we looked at Lagrangians. So we looked at a mixture

00:39:46.090 --> 00:39:52.330
of goals or goal rewards versus informational needs.

00:39:52.630 --> 00:39:56.990
It can go just for informational needs, but in that case, you can just do nothing, for example.

00:39:57.270 --> 00:40:00.490
In the case of empowerment, it doesn't care about informational needs.

00:40:00.710 --> 00:40:04.570
In fact, it's completely orthogonal to that. It says, this is my goal.

00:40:04.690 --> 00:40:07.250
It produces a goal from this prediction. direction.

00:40:07.430 --> 00:40:12.630
I have not made a statement on how to balance informational needs and empowerment.

00:40:13.582 --> 00:40:16.862
It's possible to do that. We have some work in that direction.

00:40:16.962 --> 00:40:21.122
Then you get some salient strategies emerging. This is work by Tom Anthony.

00:40:22.442 --> 00:40:27.222
But the interesting point is really at this stage that we want first to understand

00:40:27.222 --> 00:40:33.682
the ingredients before we try to build a synthesis of things which we individually

00:40:33.682 --> 00:40:35.402
don't understand. Right. Okay.

00:40:35.702 --> 00:40:38.422
Fair enough. So basically what you're saying is, look.

00:40:39.522 --> 00:40:42.782
Within the informational perspective, perspective right i

00:40:42.782 --> 00:40:45.562
can make a normative statement of how i should bring these

00:40:45.562 --> 00:40:48.302
things together and how i can use my memory to do that yeah it's

00:40:48.302 --> 00:40:51.202
just one one one view yes on

00:40:51.202 --> 00:40:54.222
this system right and there um you

00:40:54.222 --> 00:40:57.042
you made the point that there might not be a free lunch but

00:40:57.042 --> 00:41:00.042
sometimes there's free beer so so what

00:41:00.042 --> 00:41:04.002
does that mean relative to optimization of information that

00:41:04.002 --> 00:41:06.922
was essentially an allusion to the embodiment example

00:41:06.922 --> 00:41:09.802
so the twisted world example where you have basically in

00:41:09.802 --> 00:41:12.502
the agent that if the world is

00:41:12.502 --> 00:41:18.562
simply organized so there is a labeling of conditions which is consistent over

00:41:18.562 --> 00:41:23.442
the world then the agent can solve certain natural problems very very easily

00:41:23.442 --> 00:41:28.322
with little information processing when you do the relabeling which in terms

00:41:28.322 --> 00:41:30.482
of what we would call traditional,

00:41:31.142 --> 00:41:33.882
AI is completely equivalent people would say it doesn't

00:41:33.882 --> 00:41:36.942
make a difference it turns out for an embodied agent if

00:41:36.942 --> 00:41:42.042
you take the embodiment seriously so actually know of an ease are essentially

00:41:42.042 --> 00:41:47.142
a local coordinate system of the agent which agent takes it with it if that's

00:41:47.142 --> 00:41:54.322
completely skewed up says and twisted around the world the agent will.

00:41:55.742 --> 00:41:59.822
Have a much harder time performing the same task in other words,

00:42:01.213 --> 00:42:04.133
your world if it's well designed if your embodiment is

00:42:04.133 --> 00:42:07.293
well designed or your world is nice to you that's the

00:42:07.293 --> 00:42:10.493
way i like to say it then your cognitive cost

00:42:10.493 --> 00:42:15.053
is so low that you can easily solve a task that actually looks quite difficult

00:42:15.053 --> 00:42:19.453
and that is something that of course a group of pfeiffer and many others have

00:42:19.453 --> 00:42:25.873
made for a long time but i believe information theory gives us a window and

00:42:25.873 --> 00:42:28.593
to why it's such an advantage.

00:42:28.793 --> 00:42:32.033
It really reduces the cognitive load we need to solve the task.

00:42:33.013 --> 00:42:40.253
Yeah, but I could argue that isn't that almost a trivial statement because if

00:42:40.253 --> 00:42:43.093
you get information for free then it's easier.

00:42:43.293 --> 00:42:48.833
Well, you don't get this explicitly for free. The information is implicit and

00:42:48.833 --> 00:42:51.753
the physicists would call it a gauge symmetry.

00:42:52.133 --> 00:42:57.373
So an agent that has its actions Basically, when you move the agent around the

00:42:57.373 --> 00:43:00.833
world, these actions keep a certain type of meaning.

00:43:01.053 --> 00:43:06.833
If that meaning completely is perturbed or shuffled around from the movement

00:43:06.833 --> 00:43:11.253
of the agent, then this meaning doesn't help you.

00:43:12.213 --> 00:43:16.253
This is what we have in the twisted world. In that case, the agent basically

00:43:16.253 --> 00:43:21.993
is moved around, but the actions north, east, south, west lose their meaning in any other location.

00:43:22.133 --> 00:43:26.513
They mean different things. On the other hand, if you have basically anything

00:43:26.513 --> 00:43:30.153
that takes its action with it, it's like a local gauge symmetry,

00:43:30.373 --> 00:43:36.833
basically saying, I'm taking this property of north means roughly the same thing in this world.

00:43:36.933 --> 00:43:40.533
Now, in which sense does it mean the same thing?

00:43:41.053 --> 00:43:46.493
We haven't properly defined that. But what we did say, we measured the information impact.

00:43:46.993 --> 00:43:52.873
That is very visible. So it's not saying, this is the number of bits that the

00:43:52.873 --> 00:43:54.033
world actually gives you.

00:43:54.133 --> 00:43:58.273
We say, if the world has this kind of symmetry or pseudo-symmetry.

00:43:59.359 --> 00:44:04.499
Then you gain cognitive load. Your cognitive burden is reduced.

00:44:05.079 --> 00:44:09.699
Right. My hope would be that there would be at some point a way of actually

00:44:09.699 --> 00:44:16.919
writing down, saying this is how much your world actually tells you up front. Right, exactly.

00:44:18.219 --> 00:44:26.799
So if we forget about the embodiment for a second, what is the specific informational

00:44:26.799 --> 00:44:28.959
quantity that's being now optimized?

00:44:29.359 --> 00:44:32.239
Is it like the description length of the information I deal with?

00:44:32.379 --> 00:44:38.459
Is it, let's say, the information gain I have per step size?

00:44:39.059 --> 00:44:43.479
In our examples, it was what we call relevant information, the information you

00:44:43.479 --> 00:44:48.839
need to take an action at a certain average utility level.

00:44:50.699 --> 00:44:55.039
So that's basically a pull quantity. It tells you how much information do I

00:44:55.039 --> 00:44:57.759
need from the environment to perform actions at that level.

00:44:58.919 --> 00:45:02.739
Description length is something more complicated than that we don't look at that at all,

00:45:04.279 --> 00:45:07.139
this will be also probably more related to learning itself,

00:45:08.079 --> 00:45:12.999
rather than to actually we would like to call this metabolic information it's

00:45:12.999 --> 00:45:14.339
information that you just process,

00:45:15.019 --> 00:45:19.639
basically on every step it's like you bring out your trash you repost,

00:45:19.759 --> 00:45:23.319
you process it and this is what you do every day it's basically how much time

00:45:23.319 --> 00:45:25.799
do I have to allocate or how much resources, information resources resource

00:45:25.799 --> 00:45:29.959
to allocate for just maintaining status quo. Okay.

00:45:30.739 --> 00:45:34.199
So then the criterion is really utility, right? I have to sort of optimize it

00:45:34.199 --> 00:45:34.859
to reflect some utility.

00:45:35.139 --> 00:45:42.939
But then how well does this scale if the number of possible goal states increases

00:45:42.939 --> 00:45:46.939
and also when the potential goal states can be contradictory?

00:45:48.319 --> 00:45:52.479
Excellent question. The scaling is something we start to address.

00:45:52.699 --> 00:45:56.299
It's a problem in empowerment. The component has some aspects,

00:45:56.439 --> 00:45:57.739
gets some aspects of that.

00:45:57.839 --> 00:46:03.339
We have various tricks, I've released an approximate of algorithms which we are using.

00:46:03.639 --> 00:46:07.799
It's probably of the method to be doing the most well-developed one,

00:46:07.939 --> 00:46:08.979
because it turned out to be.

00:46:10.072 --> 00:46:12.952
Now, you used to talk about contradictory goals. It's an excellent question.

00:46:13.832 --> 00:46:19.772
And I'll give you an example. If you are in Barcelona and you say you want to

00:46:19.772 --> 00:46:24.912
go to, have to decide you go to Lisbon or you go to Granada.

00:46:25.892 --> 00:46:30.632
Then I would say that, first of all, you can go to Madrid and then decide.

00:46:30.892 --> 00:46:35.652
So in other words, it's part of the rule, which you can take without committing to a particular goal.

00:46:36.432 --> 00:46:40.052
Then at some point you have to commit and then you have to split let them make the decision.

00:46:40.512 --> 00:46:47.772
And I do think that organism, if we believe our goal-relevant information formalism,

00:46:48.292 --> 00:46:51.592
would profit from doing so for various reasons.

00:46:51.672 --> 00:46:54.692
Number one, you don't have to know so much.

00:46:54.772 --> 00:46:59.332
You can concentrate on moving on highways only. For example, the road to Madrid.

00:46:59.912 --> 00:47:05.252
You don't have to learn all the side roads that you would have to take to Granada or to Lisbon.

00:47:05.632 --> 00:47:08.592
First you go to Madrid, and only then you worry about the next step.

00:47:09.092 --> 00:47:14.152
Second, organisms, I think, profit from not committing to a decision too early.

00:47:14.732 --> 00:47:19.692
So if you can avoid to not commit, it's actually an advantage because it means

00:47:19.692 --> 00:47:24.992
that you can still reorganize, re-decide if another goal pops up.

00:47:25.412 --> 00:47:30.532
So my argument would be contradictory goals, as long as you don't have to make

00:47:30.532 --> 00:47:36.332
the decision now, don't hurt you necessarily. You go just to the intermediary

00:47:36.332 --> 00:47:39.072
goal, sub-goal that supports the goal.

00:47:40.252 --> 00:47:45.172
Okay. But now there might be other constraints on that, right? Like cost.

00:47:46.092 --> 00:47:49.652
It's not only that goals can be contradictory, like I want to go east or west.

00:47:49.992 --> 00:47:54.992
It can also be I can get a higher reward, 10% more than baseline,

00:47:55.272 --> 00:47:58.072
for 20% more metabolic cost.

00:47:58.352 --> 00:48:02.412
Now that might be a bad deal. But if I just go for my utility,

00:48:02.772 --> 00:48:09.512
it might be an acceptable increase, or it might be I have to take a certain risk of damage.

00:48:11.027 --> 00:48:17.207
In order to obtain a certain reward. So it's not only that they are opposing

00:48:17.207 --> 00:48:19.467
within the task domain, if you want,

00:48:20.127 --> 00:48:23.827
but they can also be opposing on different dimensions of survivability,

00:48:23.947 --> 00:48:26.207
to call it that. This is an excellent question.

00:48:26.807 --> 00:48:31.287
I don't think there's one answer to that.

00:48:31.427 --> 00:48:36.167
If risk-taking is an interesting point, I would say,

00:48:36.347 --> 00:48:45.947
if I had to make a blunt statement, that you take only risks if you really think

00:48:45.947 --> 00:48:49.967
that your chances in the future to get that goal are not that high.

00:48:50.187 --> 00:48:54.887
So risk-taking will be higher if you are in a bad position. It will be lower if you are not.

00:48:55.207 --> 00:48:58.727
That's very natural. I think that would also, if you write it down properly,

00:48:58.887 --> 00:49:02.247
emerge from a mixed evolutionary reinforcement framework. work.

00:49:02.607 --> 00:49:09.747
I would expect an agent that is very confident of continuous and steady growth of power, not risk it.

00:49:10.047 --> 00:49:15.487
In fact, utility, if you look at utility of course, they look typically risk

00:49:15.487 --> 00:49:19.367
averse when you are in positive and high positive values, but they are risk

00:49:19.367 --> 00:49:20.207
friendly when you're not.

00:49:20.967 --> 00:49:26.547
The other point, and this is very interesting, is what is more important,

00:49:26.747 --> 00:49:28.807
the goal or information saving?

00:49:29.187 --> 00:49:32.047
Now, I would answer it this way.

00:49:32.487 --> 00:49:36.507
If you are somebody who is trying to.

00:49:39.186 --> 00:49:42.186
Run through a door because he wants to get catch the train

00:49:42.186 --> 00:49:45.246
um so he runs to

00:49:45.246 --> 00:49:47.886
the door and just you know tries to get in the middle of the

00:49:47.886 --> 00:49:51.806
door so they have enough space and doesn't bother getting stuck a little or

00:49:51.806 --> 00:49:56.286
something like that and tries to get get the train that's it so he probably

00:49:56.286 --> 00:49:59.346
will go for information saving yes he wants to be fast but he doesn't have the

00:49:59.346 --> 00:50:03.926
tire and hasn't practiced it but imagine a sports an olympic sport which is

00:50:03.926 --> 00:50:05.906
running through doors in the shortest time

00:50:05.986 --> 00:50:09.026
possible reaching a train that leaves in exactly 15

00:50:09.026 --> 00:50:11.946
seconds and people train for 10 years to put

00:50:11.946 --> 00:50:14.866
the olympics off running through catching the train

00:50:14.866 --> 00:50:17.826
catching the train i bet with you

00:50:17.826 --> 00:50:21.046
these guys will not care about the information cost they will

00:50:21.046 --> 00:50:24.266
basically run through the door in a way that will optimize their

00:50:24.266 --> 00:50:27.566
throughput so they will run through the door or whatever gripping the

00:50:27.566 --> 00:50:30.546
handle with a hand so they can swing

00:50:30.546 --> 00:50:33.306
around in a very specific way and train probably to

00:50:33.306 --> 00:50:36.586
leave exactly at the angle of 63.5 degrees so that

00:50:36.586 --> 00:50:39.746
will be exactly propagated into the just

00:50:39.746 --> 00:50:43.306
closing door of the train in other words yes if

00:50:43.306 --> 00:50:48.406
i do better infinity then i don't care about saving information right but if

00:50:48.406 --> 00:50:52.926
i'm in my default behavior that's one of many possible actions i do and that's

00:50:52.926 --> 00:50:56.626
the typical behavior for lending organizers they're not playing olympics usually

00:50:56.626 --> 00:51:00.206
then i take the one that's cheaper.

00:51:01.286 --> 00:51:09.086
Right. Okay, so we have constraints now on how to optimize information processing.

00:51:09.366 --> 00:51:14.966
And already you indicated that embodiment itself can be not a source of constraints

00:51:14.966 --> 00:51:20.846
that help you to optimize information processing within the decision-making system.

00:51:22.026 --> 00:51:27.366
But now there's sort of a hidden assumption there that the world is actually

00:51:27.366 --> 00:51:29.586
Markovian so far. Thank you.

00:51:30.814 --> 00:51:35.354
Right? Not exactly, no, no it's not.

00:51:35.554 --> 00:51:40.354
So we have done that for systems where in principle we're at a Markovian.

00:51:40.494 --> 00:51:43.974
If you do a relative information according to the formulas we've shown,

00:51:44.114 --> 00:51:48.394
what we do is the sensor is actually sub-Markovian.

00:51:48.474 --> 00:51:53.914
So you choose a sensor that just picks out the information it wants and that

00:51:53.914 --> 00:51:55.614
creates a non-Markovian world.

00:51:56.314 --> 00:51:59.654
However it doesn't care. So the information in the models we have looked at,

00:51:59.714 --> 00:52:05.834
the sensor essentially will be less information-carrying than the world,

00:52:05.994 --> 00:52:11.074
but it has the freedom of choosing information it wants.

00:52:11.914 --> 00:52:14.434
But you know the upper bound for your information stays constant,

00:52:14.714 --> 00:52:17.194
right? The upper bound is a full… Because the world is predictable.

00:52:17.914 --> 00:52:22.374
In principle, yes. However, you can do the same thing with a limited sensor.

00:52:22.994 --> 00:52:28.994
And what you get there is typically that your relevant information goes up, goes up, not down.

00:52:29.514 --> 00:52:35.934
Surprising result, Christoph, Christoph Salge, you get more relevant information

00:52:35.934 --> 00:52:40.014
if your sensor is incomplete, because you essentially quality of information is worse.

00:52:40.374 --> 00:52:44.374
So you can't choose the information you want to get a worse set of information.

00:52:44.514 --> 00:52:50.454
So you have to look at the bigger part of it to get a same quality of information.

00:52:50.454 --> 00:52:54.154
But that would mean that, let's say, locally you have less information,

00:52:54.314 --> 00:52:57.654
but let's say collectively over all your sensors, you gain information.

00:52:59.314 --> 00:53:02.954
Not necessarily. You gain what we call piggyback information.

00:53:03.414 --> 00:53:07.334
Piggyback information is information that's not useful for the original goal.

00:53:07.794 --> 00:53:12.514
But you have to collect it to be able to reach your original goal at the desired level.

00:53:13.503 --> 00:53:18.203
It's kind of what, whatever you order, you order a laptop from a company,

00:53:18.283 --> 00:53:24.463
you don't get just a big box with lots of styrofoam, which you don't want, but you get it.

00:53:24.703 --> 00:53:28.683
And it's a bit like that. So this extra information comes with it.

00:53:28.783 --> 00:53:32.223
But the extra information is correlated with the goal relevant information.

00:53:32.563 --> 00:53:37.383
It's not uncorrelated. It is correlated, but you don't actually,

00:53:37.483 --> 00:53:44.103
you could, you could throw it away, could throw it away and just keep the core valuable information.

00:53:44.323 --> 00:53:50.223
But the problem is throwing it away is, is, is a waste. You process it already. It's already there.

00:53:50.443 --> 00:53:56.523
So can you do something with it? And we claim that it gives you an opportunity for acceptation.

00:53:56.563 --> 00:54:02.263
So I'm using it for other purposes than original, the original goal.

00:54:02.263 --> 00:54:08.423
So open-ended evolution and that it may be a driver for pushing sensors to the

00:54:08.423 --> 00:54:13.563
maximum refinement without requiring this to be an explicit.

00:54:14.843 --> 00:54:20.463
Evolutionary pressure which would answer why you may have very good sensors

00:54:20.463 --> 00:54:24.663
although there's not an obvious reason why you'd have to have a maximum resolution,

00:54:25.883 --> 00:54:31.923
right so I get that but why I brought up this idea of the predictable world

00:54:31.923 --> 00:54:34.363
or this Markovian assumption is that,

00:54:35.103 --> 00:54:40.643
in terms of a normative framework where you want to dictate in some sense the

00:54:40.643 --> 00:54:43.243
principles along which the system has to optimize itself,

00:54:43.583 --> 00:54:47.323
maybe in a Markovian world those principles are collectively different than

00:54:47.323 --> 00:54:51.723
in a non-Markovian world because in a non-Markovian world I am forced to explore.

00:54:52.523 --> 00:54:57.783
Okay, yeah. To a larger extent and following maybe different procedures than

00:54:57.783 --> 00:55:00.063
I can do in a predictable world, right?

00:55:00.363 --> 00:55:07.723
Now, if I'm forced to explore, this might compromise my optimization norms for

00:55:07.723 --> 00:55:09.043
my information processing system.

00:55:09.263 --> 00:55:14.523
Absolutely. So how do you see that trade-off between then the ability to explore

00:55:14.523 --> 00:55:20.603
in an unknown world or partially unknown world while optimizing my information processing?

00:55:21.243 --> 00:55:24.603
I would like to make a comparison with business.

00:55:24.823 --> 00:55:29.503
In business, you have the components. You have basically the null money flow.

00:55:30.363 --> 00:55:35.023
And you have the investment. And investment is what we would call exploration.

00:55:35.763 --> 00:55:41.123
If you have lots of extra resources, you can invest a lot in trying to basically

00:55:41.123 --> 00:55:44.103
reach out to new markets.

00:55:44.543 --> 00:55:48.403
But sometimes you don't have the reserves, and then you just live on your metabolism.

00:55:48.623 --> 00:55:50.883
You don't invest anything into exploring.

00:55:51.243 --> 00:55:56.223
So I don't think there's a single answer to that, because exploration is not

00:55:56.223 --> 00:56:02.123
per se a value. It's a value because of two reasons, the world does change,

00:56:02.243 --> 00:56:05.123
and we may want to be ready for this change.

00:56:07.355 --> 00:56:11.655
Because things may also, we just have the, we can.

00:56:12.015 --> 00:56:15.375
We have the resources, the extra resources that we can use.

00:56:15.575 --> 00:56:20.935
Like Trisco of Columbus used when Spain finished the war.

00:56:21.795 --> 00:56:26.775
Or the Reconquista, they had extra ambition, extra money. Portugal was already

00:56:26.775 --> 00:56:29.175
starting to explore navigational routes.

00:56:29.295 --> 00:56:33.555
And Spanish thought, okay, we have the extra money, let's invest it.

00:56:33.555 --> 00:56:40.995
But I think if you operate on the verge of survival, you don't explore. You just try to survive.

00:56:42.035 --> 00:56:50.035
Or if you're at a very, very, very well exploiting niche, there are some examples of that.

00:56:50.575 --> 00:56:58.475
One issue I see there is that your exploration norm might actually work orthogonally

00:56:58.475 --> 00:57:00.895
to your exploitation norm.

00:57:00.895 --> 00:57:04.435
And this is, of course, a conflict that you have to resolve in some way if you

00:57:04.435 --> 00:57:06.095
talk about the controller, right?

00:57:06.735 --> 00:57:09.755
So, of course, you can then say, well, under survival pressure,

00:57:10.155 --> 00:57:11.575
exploration will be minimized.

00:57:11.955 --> 00:57:15.995
But at baseline, let's say, you might want to explore so you actually know how

00:57:15.995 --> 00:57:19.335
to escape in the future more efficiently or what have you, right?

00:57:19.335 --> 00:57:27.335
So, in some sense, it also means you might want to break stable states in your

00:57:27.335 --> 00:57:30.795
optimized information processing in order to, let's say,

00:57:30.895 --> 00:57:36.035
identify new models by which you can describe your environment.

00:57:36.355 --> 00:57:40.815
Which in some sense relates to the question I had in this morning about three-year-olds

00:57:40.815 --> 00:57:44.075
having absolutely no memory about the period before that time.

00:57:44.495 --> 00:57:47.455
Because there's some state transition there in human memory,

00:57:47.535 --> 00:57:52.955
right? so the information processing optimization might face a similar catastrophic

00:57:52.955 --> 00:57:58.155
forgetting phase that is absolutely required to abduct into a new level of operation.

00:57:59.035 --> 00:58:05.835
I do think that this is an interesting question. I view the process of doing

00:58:05.835 --> 00:58:10.195
stuff as you have a point, this is where you are, say, and the question is,

00:58:10.215 --> 00:58:13.815
do you have knowledge about the environment of where you could move to?

00:58:14.495 --> 00:58:18.215
This lateral or virtual

00:58:18.215 --> 00:58:21.255
knowledge of what could happen if I

00:58:21.255 --> 00:58:25.015
would move there I think it's very important of whether you want to explore

00:58:25.015 --> 00:58:30.395
or not so if the other options are pretty good there's no reason why not send

00:58:30.395 --> 00:58:33.315
out some species or spend some

00:58:33.315 --> 00:58:38.455
time in nearby solutions if you know the solution is just the best then,

00:58:39.135 --> 00:58:43.995
exploration becomes a problem it's like you have a company which is incredibly successful product.

00:58:44.655 --> 00:58:47.495
And it's very clear that any modification of the product will,

00:58:48.621 --> 00:58:54.521
not be good and that does happen and this company has a big problem actually finding then a new niche,

00:58:55.261 --> 00:59:00.181
they have this problem um so why i'm taking the company because in the company

00:59:00.181 --> 00:59:03.681
there are sentient beings who control it and they can actually do a forward

00:59:03.681 --> 00:59:08.621
model evolution as far as we know is limited before the bonds perhaps there's

00:59:08.621 --> 00:59:13.361
some local forward volume can do in and sexual selection, but it's very limited.

00:59:14.561 --> 00:59:19.141
So it would really depend if the optimum is sharply defined.

00:59:19.401 --> 00:59:21.201
I think you have a problem with exploration.

00:59:21.621 --> 00:59:25.441
Exploration can only happen in big steps, and I don't see, for example,

00:59:25.441 --> 00:59:27.721
a local algorithm like evolution doing that.

00:59:28.541 --> 00:59:30.361
Humans do something else.

00:59:31.361 --> 00:59:37.121
When humans explore, they, because of the superior wiring or the more intricate

00:59:37.121 --> 00:59:39.161
wiring of what we call M, the memory,

00:59:39.781 --> 00:59:44.441
they can modify the topology of a search space and suddenly things that are

00:59:44.441 --> 00:59:52.061
far away become close so I can, when the moment I have Newtonian dynamics suddenly the concept of,

00:59:53.041 --> 00:59:59.521
ballistic flight is something that's close to what else would be just a local

00:59:59.521 --> 01:00:03.681
trying to get something somehow right now suddenly I can make a prediction,

01:00:03.801 --> 01:00:08.401
yes I can throw a stone to the moon in principle if I do it with the right type of energy,

01:00:09.081 --> 01:00:16.441
In other words, this is a qualitative transition in this concept space,

01:00:16.701 --> 01:00:19.141
which we cannot see in evolution.

01:00:19.301 --> 01:00:23.301
Evolution doesn't have it. So in evolution, everything must be somehow locally visible.

01:00:23.521 --> 01:00:26.721
There must be local hints that an exploration.

01:00:28.213 --> 01:00:31.253
Can be successful if there is no local hints then poor

01:00:31.253 --> 01:00:34.013
species will be going extinct if something

01:00:34.013 --> 01:00:36.933
changes we can see that in cycles of of

01:00:36.933 --> 01:00:40.173
say um parasite host cycles

01:00:40.173 --> 01:00:42.933
that are very very strongly linked or

01:00:42.933 --> 01:00:46.913
orchid ears and and hummingbirds which

01:00:46.913 --> 01:00:49.773
are very very tightly linked and they can't actually

01:00:49.773 --> 01:00:52.633
separate to take out one member of the

01:00:52.633 --> 01:00:55.733
echolo ecological web that will never disappear is

01:00:55.733 --> 01:00:58.793
no no very unlikely they

01:00:58.793 --> 01:01:02.333
will end up right so then um okay

01:01:02.333 --> 01:01:05.573
so now we have this is the informational perspective on if

01:01:05.573 --> 01:01:09.873
you want optimal decision making but now you also brought in this notion of

01:01:09.873 --> 01:01:13.153
empowerment which sort of is a complement to this informational perspective

01:01:13.153 --> 01:01:17.073
so what is what is unique about this empowerment notion and where did it come

01:01:17.073 --> 01:01:21.953
from the original idea came it's it's very funny to say is it came originally

01:01:21.953 --> 01:01:24.553
from this robot football project.

01:01:24.993 --> 01:01:32.473
We had the wish to model agents that go to the ball and kick it without having to tell them so.

01:01:32.593 --> 01:01:36.333
So we wanted not to avoid an external reward function.

01:01:37.533 --> 01:01:44.153
So the idea became more formal when we introduced the perception action loop

01:01:44.153 --> 01:01:45.293
and the information of U.

01:01:45.413 --> 01:01:49.953
It was clear that this model would be very naturally presented or represented

01:01:49.953 --> 01:01:53.353
this idea by having a potential set of potential.

01:01:55.013 --> 01:02:00.573
Actions in the future how much change they could possibly invoke in the environment,

01:02:01.393 --> 01:02:06.273
in other words how much can you influence the environment and it was very clear

01:02:06.273 --> 01:02:10.053
also that you need to see the influence if you can't see it doesn't count and

01:02:10.053 --> 01:02:11.533
that was very natural to say,

01:02:12.673 --> 01:02:17.093
basically from social sciences the concept of empowerment Empowerment means

01:02:17.093 --> 01:02:23.253
that people, disenfranchised people, for example, realize they can change the situation they are in,

01:02:23.913 --> 01:02:26.233
and they can also perceive that change.

01:02:27.113 --> 01:02:29.713
The perception is also important. It's a subjective measure.

01:02:30.513 --> 01:02:36.553
And it turned out that this measure turned out to be surprisingly effective.

01:02:37.233 --> 01:02:42.153
We tried it in various scenarios, and something like 11 or 12 different scenarios,

01:02:42.153 --> 01:02:47.693
is what it seems to really do produce behavior, motivated, self-motivated behavior,

01:02:47.813 --> 01:02:51.133
which does not require an external reward function. It basically produces goals.

01:02:51.493 --> 01:02:55.413
You give it a dynamics, and it gives you goals, more or less.

01:02:57.533 --> 01:03:02.133
The idea behind it is, if you have an organism, how does this organism choose its goals?

01:03:02.193 --> 01:03:06.233
Of course, there are some fundamental goals, like finding enough food and meat

01:03:06.233 --> 01:03:09.313
and so on. These are fundamental goals. And it comes from that what you do.

01:03:10.133 --> 01:03:15.393
And it seems to be that empowerment. So maximize your options because that maximizes it.

01:03:17.060 --> 01:03:21.240
The states you can reach in the next step if some states go away or niche go

01:03:21.240 --> 01:03:26.620
get smaller it increases your chance of getting out of this situation and that

01:03:26.620 --> 01:03:32.460
seems to work surprisingly well so the motivation was can we understand from

01:03:32.460 --> 01:03:35.280
an evolutionary part of you with very,

01:03:36.100 --> 01:03:43.760
limited assumptions how organisms can generically create their own goals when

01:03:43.760 --> 01:03:49.620
there is not a very clearly define goal-like eating or fleeing a predator or something like that.

01:03:49.860 --> 01:03:53.640
So that means if we take as an example tool use, right?

01:03:53.660 --> 01:03:57.080
So I encounter an object and empowerment would then tell me,

01:03:57.160 --> 01:04:01.280
well, you can learn that this object will have a certain affordance.

01:04:01.300 --> 01:04:05.820
That means relative to your morphology and your goals, you can achieve a certain

01:04:05.820 --> 01:04:07.880
objective with that, right?

01:04:07.940 --> 01:04:11.600
So that's what you do. You don't necessarily learn just the local properties of that object.

01:04:11.600 --> 01:04:21.420
You learn how to embed it within your own affordance repertoire yes but now so empowerment then.

01:04:22.820 --> 01:04:29.200
Allows you to incorporate objects but how well does that indeed again scale

01:04:29.200 --> 01:04:34.580
to same issues also within informational sense because depending on how you process this,

01:04:34.620 --> 01:04:39.700
how you represent this how you segment across different objects you might have capacity

01:04:39.880 --> 01:04:44.100
limitations yes so how does it scale well

01:04:44.100 --> 01:04:47.120
empowerment itself as it's defined it doesn't

01:04:47.120 --> 01:04:51.620
care about cost so it's really as you said more formal to the other view you

01:04:51.620 --> 01:04:57.220
can combine it you can put a cost or a kind of cost limitation on how many action

01:04:57.220 --> 01:05:00.380
sequences or action potential actions you want to consider and when you do that

01:05:00.380 --> 01:05:04.760
we get interesting results namely dominant strategies strategies,

01:05:04.760 --> 01:05:08.180
which are particularly effective in changing the world.

01:05:08.340 --> 01:05:16.580
So you won't probably remember some kind of weird wobbly movement that happens to move somewhere.

01:05:17.060 --> 01:05:23.460
You will remember really a clear, well-directed, well-established movement that

01:05:23.460 --> 01:05:26.680
clearly changes the world to one state rather than to another.

01:05:27.140 --> 01:05:31.620
So that's something that actually this limitation gives you.

01:05:31.620 --> 01:05:36.360
We have also various tricks and algorithms and approximations how to calculate

01:05:36.360 --> 01:05:37.600
empowerment, also the continuum.

01:05:37.880 --> 01:05:42.160
And this is being developed because now that it's established that empowerment

01:05:42.160 --> 01:05:48.840
really does a lot of cool behaviors, it's worth investing and actually scaling it up.

01:05:49.360 --> 01:05:54.320
And some tricks allow us, for example, to push empowerment forward many hundred

01:05:54.320 --> 01:05:57.460
steps or something. There are quite a few tricks, it's not, it's really,

01:05:57.560 --> 01:06:01.780
really drastic approximations, but they give you qualitatively.

01:06:02.657 --> 01:06:05.577
Uh again sensible results and this

01:06:05.577 --> 01:06:08.737
is what an organism needs an organism will not optimize this function

01:06:08.737 --> 01:06:11.537
to the very best it wants some kind of thing that

01:06:11.537 --> 01:06:17.457
works that's good enough it's good enough yes but now so for empowerment also

01:06:17.457 --> 01:06:22.157
the way you conceptualize this it's like injecting information into the world

01:06:22.157 --> 01:06:27.437
and recovering the result right and then in something you can frame it again

01:06:27.437 --> 01:06:30.897
in a compatible framework of information processing.

01:06:32.237 --> 01:06:38.757
But now the exploration that I have to engage in to understand what this object

01:06:38.757 --> 01:06:43.617
might contribute to my goals will take a certain amount of time.

01:06:44.137 --> 01:06:49.797
So how rapidly does such a process converge and how does it also depend on the

01:06:49.797 --> 01:06:51.777
degrees of freedom that the object would afford?

01:06:52.377 --> 01:06:56.917
Learning is not yet part of the model except for this one example that I showed

01:06:56.917 --> 01:07:02.037
you with a pendulum, where it actually learns how to model the forward model.

01:07:02.317 --> 01:07:07.497
When you say a concrete goal, we have not yet linked concrete goals to a goal,

01:07:07.497 --> 01:07:09.737
which is something that needs to be done, of course.

01:07:10.137 --> 01:07:13.717
But in real life it's also similar. Imagine you play some new game that you

01:07:13.717 --> 01:07:15.377
just learned the rules of.

01:07:15.637 --> 01:07:18.897
I won't mention goal, you probably know how to play, I'll mention focus,

01:07:19.057 --> 01:07:20.337
which is a really nice game.

01:07:21.317 --> 01:07:25.877
We once did it many years ago as an exercise for our students.

01:07:26.197 --> 01:07:29.857
And the point about the game is that none of the students knew the game.

01:07:30.697 --> 01:07:33.737
We didn't know the game. There are no libraries, opening libraries.

01:07:34.097 --> 01:07:37.397
So we really had to learn this game from scratch.

01:07:38.677 --> 01:07:42.337
And it was very interesting. In the beginning, it just looked like random walks.

01:07:42.557 --> 01:07:44.217
You do something, something happens.

01:07:44.957 --> 01:07:49.537
And after four or five games, you as a human player start to see structures.

01:07:49.777 --> 01:07:52.577
You start to see, oh, this does this, this does that.

01:07:52.777 --> 01:07:55.777
This does that and you start to pick up the

01:07:55.777 --> 01:07:59.077
salient points and this is where empowerment would come in it would basically

01:07:59.077 --> 01:08:02.837
say these are the salient points of the world of course it doesn't solve the

01:08:02.837 --> 01:08:07.677
problem of actually beating your opponent but it structures it it tells you

01:08:07.677 --> 01:08:11.777
okay these are the points from where i can try to see whether i can beat my

01:08:11.777 --> 01:08:15.077
opponent so the argument would be it creates broad.

01:08:16.952 --> 01:08:23.772
Road map or milestones which tell me this is where I want to be and if I want

01:08:23.772 --> 01:08:26.712
to have control over these and these states of the game.

01:08:27.012 --> 01:08:31.432
And then you can ask, can I get to the goal there? So the argument is always, you are local.

01:08:31.712 --> 01:08:35.332
Your understanding is always local. Do these.

01:08:38.452 --> 01:08:44.672
Landmarks in my mental map give me a hint how I'm getting closer to whatever

01:08:44.672 --> 01:08:46.772
goal I may want to achieve?

01:08:46.952 --> 01:08:52.892
And this is, in my opinion, how we are able to learn very abstract games or

01:08:52.892 --> 01:08:59.032
math or things like that, that we get these landmarks where we go to,

01:08:59.152 --> 01:09:04.412
and then we start mapping out where from these landmarks are sub-landmarks,

01:09:04.612 --> 01:09:09.332
and which sub-landmarks are conceptually closer to where we want to go to.

01:09:10.452 --> 01:09:16.352
It's purely hypothetical, but I think that's the way to probably look at empowerment.

01:09:16.352 --> 01:09:20.172
Empowerment itself finds the landmarks, the main ones, doesn't find the goals,

01:09:20.292 --> 01:09:22.052
or it creates the landmarks as a goal.

01:09:22.172 --> 01:09:26.972
But of course, when you have a specific goal, it just may give you a way of getting there.

01:09:27.812 --> 01:09:30.472
So but now so if we look at these two frameworks right for this

01:09:30.472 --> 01:09:34.272
information theoretical framework and talk about optimizing information processing linking

01:09:34.272 --> 01:09:37.352
sensory states to actions um and then

01:09:37.352 --> 01:09:40.932
we have this more embodied action-oriented empowerment

01:09:40.932 --> 01:09:46.492
view right and they're orthogonal but empowerment is something that also challenges

01:09:46.492 --> 01:09:50.392
the informational view because empowerment is also telling well there's a lot

01:09:50.392 --> 01:09:55.072
of information really in the embodiment in the action in the world that is offloading

01:09:55.072 --> 01:09:58.272
the informational processing that is going on.

01:09:58.452 --> 01:10:03.732
So maybe this whole emphasis on this very centralized view on behavior,

01:10:03.812 --> 01:10:07.952
where it all has to happen in this cognitive engine that is optimizing informational,

01:10:08.312 --> 01:10:13.672
maybe this is so far at an extreme of our search space, right?

01:10:13.732 --> 01:10:20.212
If we talk about embodied action, that maybe your empowerment notion will sort of invalidate it.

01:10:21.032 --> 01:10:25.612
How do you see that? Well, it's an interesting question. Why do we have agents at all?

01:10:25.832 --> 01:10:29.032
Why is there a concept of an agent in physics?

01:10:29.652 --> 01:10:38.072
Why did they emerge? In my opinion, it's because in a way, some type of information

01:10:38.072 --> 01:10:39.472
likes to be accumulated.

01:10:40.132 --> 01:10:43.572
So essentially, like wants to like, if you like.

01:10:45.072 --> 01:10:48.752
An organism, why does it want to procreate?

01:10:50.702 --> 01:10:54.222
Frankly, I think the reason is because there are some processes that basically

01:10:54.222 --> 01:10:58.382
parasite, or parasites in a way, on the physical world.

01:10:58.742 --> 01:11:02.182
And these parasites, they like to continue doing so.

01:11:04.082 --> 01:11:09.302
Because that's what a parasite does. It wants to propagate its unique way of life.

01:11:09.462 --> 01:11:13.642
Even if you, the physics doesn't care. So it's not antagonistic either.

01:11:14.782 --> 01:11:18.702
If a parasite parasites another organism, that other organism may not like it.

01:11:18.702 --> 01:11:21.442
And so that will try to get rid of it, of course.

01:11:21.502 --> 01:11:24.322
So then it comes out of realistic and it's meta level.

01:11:25.982 --> 01:11:30.462
But I don't think that you can make a unique statement on, oh,

01:11:30.462 --> 01:11:37.162
empowerment is one that's giving you the one perspective, the other one is getting the other.

01:11:37.302 --> 01:11:42.102
They operate and they may have different parameters, different time scales.

01:11:43.962 --> 01:11:47.742
For example, the increase of your bandwidth of processing is slow.

01:11:47.742 --> 01:11:50.582
You can't just make your brain twice as big.

01:11:52.322 --> 01:11:55.662
Unlikely, perhaps with CRISPR we can at some point try that.

01:11:55.902 --> 01:11:59.502
I think it would be ethically unquestionable, but in principle not to try.

01:12:01.802 --> 01:12:05.382
What's fast? Empowerment is relatively fast. You need a forward model.

01:12:06.482 --> 01:12:11.602
If you look at information preservation and information saving, that's something we do.

01:12:12.382 --> 01:12:16.222
Subconsciously, when we learn something, we use up a lot of bandwidth. with.

01:12:16.262 --> 01:12:20.182
Once we know how to do it, we use very little bandwidth because it's probably

01:12:20.182 --> 01:12:25.342
rewired, reorganized in such a way that it will eat up less information.

01:12:25.602 --> 01:12:30.202
So I would say learning to grab a complicated object or handle a complicated

01:12:30.202 --> 01:12:34.562
object will be very bandwidth intensive, takes a long time.

01:12:34.862 --> 01:12:37.842
And it's translated somehow, it's rewired in a

01:12:37.842 --> 01:12:40.662
way that internally will use less information

01:12:40.662 --> 01:12:44.022
so in other words this this process happens

01:12:44.022 --> 01:12:46.842
all the time and what the time constants are

01:12:46.842 --> 01:12:49.682
and what the weights are it's not something i

01:12:49.682 --> 01:12:56.282
would be able to make a speculation right but now if if we try to map these

01:12:56.282 --> 01:13:03.862
these this concept um to physical systems like the brain would it imply that

01:13:03.862 --> 01:13:09.482
the brain tries to optimize its mean activity level is it really that isomorphic,

01:13:10.522 --> 01:13:13.042
or not. No, no, no, no.

01:13:13.282 --> 01:13:16.702
I think the way to think of it is slightly different.

01:13:18.178 --> 01:13:23.878
If I have two brains, one uses a lot of information processing to do something,

01:13:23.918 --> 01:13:24.858
the other one loses very.

01:13:25.898 --> 01:13:32.818
The other one has an optimized way of doing it. It's clear that the other one has an advantage. Why?

01:13:33.678 --> 01:13:37.638
Because it can handle other tasks too. It can learn additional tasks.

01:13:37.958 --> 01:13:41.538
It can concentrate on other tasks. It can react to danger faster.

01:13:42.838 --> 01:13:48.938
So it has lots of advantages indirectly. directly. So, both brains may be, for example, two twins.

01:13:49.038 --> 01:13:53.598
One twin has learned how to play tennis many years ago. The second twin just

01:13:53.598 --> 01:13:55.538
learning it. They play against each other.

01:13:55.678 --> 01:14:00.398
Well, who's going to win? Of course, the one which spends less time thinking about his moves.

01:14:01.338 --> 01:14:08.978
Very simply so. That's because he has had the opportunity to impress and squeeze,

01:14:09.798 --> 01:14:11.778
and optimize its information flow.

01:14:11.998 --> 01:14:17.558
And he can also perhaps even talking on the phone and upsetting his twin brother

01:14:17.558 --> 01:14:24.278
this way, when the twin brother is sweating and trying to catch Justin to ball.

01:14:25.098 --> 01:14:31.198
So in other words, the advantage is not necessarily energetic.

01:14:31.478 --> 01:14:34.038
It's advantage in many dimensions.

01:14:34.898 --> 01:14:39.298
Information theory is basically saying just, with this resources,

01:14:39.638 --> 01:14:45.978
if you have them, that's how much you can process, and that's how good you can process. Right.

01:14:46.558 --> 01:14:52.038
So Daniel, you also, in parallel to your theoretical work, you're now also the

01:14:52.038 --> 01:14:54.638
president-elect of the RoboCup organization,

01:14:55.138 --> 01:15:00.978
which of course is an understandable concern because this is also very much

01:15:00.978 --> 01:15:03.118
about testing a lot of these ideas in the real world.

01:15:03.238 --> 01:15:11.518
But why do you invest effort and time in sort of advancing this notion of RoboCup?

01:15:11.538 --> 01:15:12.738
Why is that so important to you?

01:15:13.238 --> 01:15:16.898
I do believe that we have several advantages by having that.

01:15:16.958 --> 01:15:18.878
First of all, we have a direct comparison.

01:15:19.498 --> 01:15:24.238
You can essentially come up with all kinds of algorithms which work in a lab

01:15:24.238 --> 01:15:25.738
under bearing control conditions.

01:15:26.038 --> 01:15:30.798
But at the end of the day, they will have to work in the field.

01:15:31.498 --> 01:15:34.718
And you can compare, does it actually work? I can't predict,

01:15:34.838 --> 01:15:38.078
say that it works if I have these and these and these and these constraints.

01:15:38.618 --> 01:15:42.198
But on the field, there's always an excuse. Either it works or it doesn't.

01:15:42.198 --> 01:15:43.238
And you see it immediately.

01:15:43.458 --> 01:15:49.178
You see, oh, this guy has, this group has an excellent vision system or this

01:15:49.178 --> 01:15:53.018
group has a very good walking system and then you can learn.

01:15:53.158 --> 01:15:57.538
And even if you don't copy their system exactly, you can pick up.

01:15:58.751 --> 01:16:02.371
Ideas at various levels of abstraction, either concrete code,

01:16:02.511 --> 01:16:06.751
if that's being published, for example, which happens in some groups and leagues,

01:16:06.931 --> 01:16:12.551
or else by seeing, oh, this is a concept that essentially everybody else is now using.

01:16:13.551 --> 01:16:17.571
Let's take a very simple example. Omnidirectional drive used to be a concept like that.

01:16:17.611 --> 01:16:22.911
At the beginning, it was not obvious for the midsize, so you can later everybody introduced it.

01:16:23.471 --> 01:16:26.251
It's a very simple example, but there are more intricate ones.

01:16:26.251 --> 01:16:33.091
The second thing is, I do believe that a lot of interesting questions emerge.

01:16:34.051 --> 01:16:39.631
You have a relatively clearly defined task, a relatively clearly defined world,

01:16:39.651 --> 01:16:40.771
and yet it's very complicated.

01:16:41.071 --> 01:16:43.451
You have to get several things to work at once.

01:16:43.771 --> 01:16:49.171
And it's very motivating to think about, okay, what do I need in principle if

01:16:49.171 --> 01:16:53.351
I want to have such a machine to learn something like that from scratch?

01:16:53.351 --> 01:16:58.851
Of course, many teams write code to win the competition, so they have to be

01:16:58.851 --> 01:17:01.011
very specific about how to do that.

01:17:01.391 --> 01:17:05.291
On the other hand, I think it's a great motivator to think, if I have a robot

01:17:05.291 --> 01:17:09.611
that is not just running forward, I mean, if we look at walking robots, that's what they do.

01:17:10.311 --> 01:17:14.811
A football robot cannot just walk forward. It has to understand what a sidekick

01:17:14.811 --> 01:17:16.551
is and when to do it, when to use it.

01:17:16.671 --> 01:17:19.911
Okay, this is all done by hand, but in principle, the challenge is, what at all?

01:17:19.911 --> 01:17:22.911
When you're a footballer you do

01:17:22.911 --> 01:17:26.131
that by instinct you train also a

01:17:26.131 --> 01:17:29.051
lot but you have many things that you just do on the moment

01:17:29.051 --> 01:17:33.831
at the opportunity that you do and I do think that this context switching that

01:17:33.831 --> 01:17:38.391
happens all the time is one of the major challenges of AI so I think if we can

01:17:38.391 --> 01:17:43.491
address that in a proper way so we can move away from hand crafted behavioral

01:17:43.491 --> 01:17:46.331
rules to a more automatic.

01:17:47.231 --> 01:17:50.591
Autonomous autonomous decision of how to switch contexts from,

01:17:50.651 --> 01:17:53.551
say, walking to snopping to kicking to whatever,

01:17:54.131 --> 01:17:56.471
we will have made a big step ahead in AI.

01:17:56.711 --> 01:18:01.231
And finally, my personal view, that's a very, very personal view,

01:18:01.331 --> 01:18:07.711
it's not official or anything, I believe that we need new materials, new algorithms.

01:18:09.103 --> 01:18:15.863
And the new types of embodiment for robots to be actually able to achieve such

01:18:15.863 --> 01:18:21.683
a level of competence where the big goal is 2050 to actually play and possibly

01:18:21.683 --> 01:18:22.803
win against world champion,

01:18:23.763 --> 01:18:29.343
that it will push this uh envelope much more strongly than if we say yes we

01:18:29.343 --> 01:18:34.723
need soft materials but yeah at some point when when it's ready rather having

01:18:34.723 --> 01:18:40.243
this perspective gives you an incentive to actually try these materials a bit earlier, of course.

01:18:40.563 --> 01:18:44.043
But when will the first robot team win the Champions League?

01:18:44.923 --> 01:18:50.103
Well, the Champions League is beyond 2050.

01:18:50.243 --> 01:18:54.123
In 2050, the goal has been declared, playing against the world champion,

01:18:54.383 --> 01:18:56.143
the human world champion, and win.

01:18:56.363 --> 01:19:01.623
It's a very ambitious goal, but let's put it this way, when it was declared

01:19:01.623 --> 01:19:05.223
in 1997, people really didn't believe that's even possible.

01:19:05.903 --> 01:19:09.783
There were hardly any humanoid robots in labs. There were probably a handful

01:19:09.783 --> 01:19:12.803
of labs in the world that could afford a humanoid robot. And today,

01:19:13.663 --> 01:19:14.503
humanoid robots are everywhere.

01:19:15.543 --> 01:19:24.723
So, even that already was a huge push ahead in terms of making robotic science more democratic,

01:19:25.503 --> 01:19:31.263
more popular, and actually sell people that, yes, it's possible to make a humanoid

01:19:31.263 --> 01:19:33.663
robot that walks and doesn't fall down all the time.

01:19:33.903 --> 01:19:40.003
But do you think the main challenge is in the biomechanics or in the cognition and the motor control?

01:19:41.423 --> 01:19:45.823
Everywhere. I think biomechanics is a big issue. Energy is a big issue.

01:19:46.003 --> 01:19:52.143
I think having an energy so that the role can run for 45 minutes in undrafted is massive.

01:19:53.263 --> 01:19:58.123
So biomechanics, energy is a major issue. But I think the cognition is also

01:19:58.123 --> 01:20:05.323
a major issue because you will have to contend with players like say Messi or

01:20:05.323 --> 01:20:07.743
that really are flexible in their thinking.

01:20:08.743 --> 01:20:14.883
You can optimize in a particular situation something that you can shoot a penalty

01:20:14.883 --> 01:20:19.043
shot without fail, essentially, assuming the hardware doesn't break.

01:20:19.963 --> 01:20:23.323
So you could be better than humans on a penalty, for example.

01:20:23.583 --> 01:20:27.723
It might be possible that we would beat humans on very specific swap tasks.

01:20:27.983 --> 01:20:31.963
But in a generic game situation, to make the right decision,

01:20:31.963 --> 01:20:38.883
hopping up, risking life and then to kick a ball above your head into the goal.

01:20:38.963 --> 01:20:40.423
That's something that humans do.

01:20:41.773 --> 01:20:44.773
And good let's call it chilean here

01:20:44.773 --> 01:20:48.133
right yeah yeah yeah exactly and then essentially doing

01:20:48.133 --> 01:20:51.293
that as a human player uh requires a

01:20:51.293 --> 01:20:54.493
lot of guts and and nerve and instinct and

01:20:54.493 --> 01:20:58.713
not every player can do that so it's very clear that it's a very special skill

01:20:58.713 --> 01:21:05.573
but now can a robot full player in the current competitions get a red card there

01:21:05.573 --> 01:21:09.993
are ways of getting fouls but right now the fouls are relatively mild.

01:21:10.153 --> 01:21:12.013
It's basically blocking the goal and things like that.

01:21:12.993 --> 01:21:16.833
In the future, and for example in the simulation league, they introduce already

01:21:16.833 --> 01:21:22.093
fouls. So if a player I'm not exactly sure how they implement it, but it's automatic.

01:21:22.433 --> 01:21:30.013
If a player kicks another player without going for the ball, I think then it's a foul.

01:21:30.313 --> 01:21:32.553
There are certain rules that Roger implemented.

01:21:33.233 --> 01:21:36.353
And the RoboCop is a bit like the Robot Olympics if you want, right?

01:21:36.393 --> 01:21:38.353
Even though there's a separate competition also with that name.

01:21:38.353 --> 01:21:44.153
And the Olympics also continuously change the disciplines that are participating.

01:21:44.473 --> 01:21:47.873
So do you see also a RoboCup that might be changing that will further expand

01:21:47.873 --> 01:21:51.393
into other domains? Maybe soon we have robot basketball or robot tennis?

01:21:52.313 --> 01:21:57.453
There are changes. First of all, in the main leagues they are actually,

01:21:57.573 --> 01:21:58.913
they become harder and harder.

01:21:59.233 --> 01:22:03.593
That's why if you watch the games, sometimes the games look less interesting

01:22:03.593 --> 01:22:07.093
as time goes by because they become much more hard to follow.

01:22:07.233 --> 01:22:10.313
So football ball for example they took

01:22:10.313 --> 01:22:13.153
away the colors of the goals they took away um lots of

01:22:13.153 --> 01:22:16.073
structure from the field the ball has no color

01:22:16.073 --> 01:22:20.293
anymore so it's really very challenging problem um

01:22:20.293 --> 01:22:22.973
other leagues come and go so there are leagues

01:22:22.973 --> 01:22:27.053
who emerge it's not the robot legged league

01:22:27.053 --> 01:22:34.933
that was the robot sony ibo robots um that league was um introduced um i think

01:22:34.933 --> 01:22:40.253
the first demo games were with 98 and it disappeared later when it was superseded

01:22:40.253 --> 01:22:43.133
by a standard platform leak,

01:22:43.293 --> 01:22:46.393
which is basically wrong. That's when you see the malware works.

01:22:46.733 --> 01:22:51.773
So in this case, you have a development of the leaks or a disappearance of leaks

01:22:51.773 --> 01:22:55.073
or emerge of new leaks. Like we have a logistics leak.

01:22:55.213 --> 01:23:01.873
We have an at home leak, which is concerned with making robots more flexible,

01:23:02.053 --> 01:23:07.933
so flexible they can deal with problems of home robotics, which is,

01:23:07.953 --> 01:23:09.233
of course, a huge challenge.

01:23:09.273 --> 01:23:15.093
It's much easier to develop a robot for industrial sets than for homes,

01:23:15.173 --> 01:23:17.933
but those homes are notoriously unpredictable.

01:23:19.113 --> 01:23:22.633
Would you feel that something like a robot war leak would be fitting?

01:23:23.133 --> 01:23:29.793
Well, first of all, I must say that war is not a very nice term.

01:23:30.511 --> 01:23:35.591
Set of tests. And the second thing is, first of all, it's destructive.

01:23:36.291 --> 01:23:39.851
I find that a little bit unsettling.

01:23:40.451 --> 01:23:45.211
And also it's an issue of ethics in general. I don't think that we want autonomous

01:23:45.211 --> 01:23:48.291
robots to know how to destroy other entities.

01:23:48.671 --> 01:23:51.291
I think that's where it gets problematic.

01:23:51.731 --> 01:23:56.831
But the other thing is that Robot Wars exists, and it's a remote-controlled

01:23:56.831 --> 01:24:02.711
league, so it's not autonomous. So in other words, RoboCop is fully about autonomy. You want autonomy.

01:24:03.111 --> 01:24:09.331
Why I bring it up is that maybe the real challenge here is about building moral robots.

01:24:09.671 --> 01:24:14.931
That's our real challenge. Because I think on the midterm already,

01:24:15.151 --> 01:24:21.131
we really have to master how to build robots that are autonomously and truly,

01:24:21.251 --> 01:24:23.391
let's say, assistive and moral in their behavior.

01:24:23.651 --> 01:24:25.671
Because to build a destructive robot is easy.

01:24:26.351 --> 01:24:29.751
But our challenge is how do we control this and how

01:24:29.751 --> 01:24:32.451
do we make it transparent and what i'm worried about is that

01:24:32.451 --> 01:24:35.991
right now the robot league the robot war competitions are

01:24:35.991 --> 01:24:41.911
a bit sort of in the public media of military organizations of course looking

01:24:41.911 --> 01:24:47.491
into these things it's completely outside the realms of transparent analysis

01:24:47.491 --> 01:24:53.111
and debate and i think that's even a bigger risk so i was wondering

01:24:53.231 --> 01:24:57.971
whether it would not be making sense to maybe have a league where robots can do damage,

01:24:58.171 --> 01:25:00.451
but they manage to not do it autonomously.

01:25:01.111 --> 01:25:06.331
Because this is what we have to master, and we have to drive this debate as researchers.

01:25:06.471 --> 01:25:12.451
We cannot leave it to non-academic institutions to do this behind closed doors.

01:25:13.051 --> 01:25:15.971
No one knows where it's going. And once it hits the streets,

01:25:16.191 --> 01:25:18.171
we have no frameworks to deal with it.

01:25:18.311 --> 01:25:20.951
So that's why I was wondering whether it would not be an idea to at

01:25:20.951 --> 01:25:24.111
least start to think about how to also incorporate this even though it is a

01:25:24.111 --> 01:25:29.071
painful issue it is an insulting issue sometimes but we cannot close our eyes

01:25:29.071 --> 01:25:35.091
for it i would say and uh it's really nice that you bring it up in one of the

01:25:35.091 --> 01:25:40.891
statements i made before i was um basically selected as the coming.

01:25:42.271 --> 01:25:48.191
President is that i think that road ethics is a major point that should be discussed

01:25:48.191 --> 01:25:53.151
in the I think it's an excellent opportunity because even in the football game,

01:25:53.271 --> 01:25:58.591
how much damage I'm ready to do to my fellow player if I want to win.

01:25:59.131 --> 01:26:04.351
So it's already appearing there. It's very clear that the concept of fair game is already there.

01:26:05.131 --> 01:26:09.611
I think that robot ethics has the same problems as human ethics.

01:26:09.851 --> 01:26:14.191
How do you prevent them? This is an example I wrote many years ago and repeated,

01:26:14.351 --> 01:26:19.131
and unfortunately, reality has caught up with me.

01:26:19.571 --> 01:26:23.891
With my example, I was saying, how do you ensure that a passenger airline pilot

01:26:23.891 --> 01:26:26.471
does not take the plane and crash it somewhere?

01:26:27.649 --> 01:26:30.769
This was an example I actually brought up. And you don't.

01:26:31.049 --> 01:26:34.869
You don't know. You can't see into the head. You believe that socialization

01:26:34.869 --> 01:26:40.229
helps, that you know the person, that you trust the person, and you believe

01:26:40.229 --> 01:26:44.109
that they have a self-preservation need and so on.

01:26:44.229 --> 01:26:48.469
So there's a whole set of safeguards which we assume.

01:26:48.829 --> 01:26:51.169
But when you take them away, when

01:26:51.169 --> 01:26:54.849
they disappear, when people don't pay attention, bad things can happen.

01:26:55.369 --> 01:27:00.609
So I don't think that robots will be exempt from that however I think I see

01:27:00.609 --> 01:27:05.589
a way forward for making robots more ethical and that's very simple it's actually

01:27:05.589 --> 01:27:09.189
the same thing that humans need to do to be more ethical namely,

01:27:09.789 --> 01:27:15.029
basically a generalization of the concept of empathy and there's just for an

01:27:15.029 --> 01:27:19.129
example for how that could look like and again there's just pure motivation.

01:27:21.049 --> 01:27:25.989
There's nothing well developed it's just the first glimpse Christian Guglisberger

01:27:25.989 --> 01:27:29.669
from Goldsmith and Christoph Sager from Hertz,

01:27:29.809 --> 01:27:33.469
they have developed a model of basically NPCs,

01:27:33.469 --> 01:27:39.649
so players you have a video game and they are accompanied by a pet or a companion

01:27:39.649 --> 01:27:45.389
and the problem is these companions are usually quite stupid they act really

01:27:45.389 --> 01:27:52.169
in a stupid way and one thing that they did was using empowerment cross-empowerment,

01:27:52.869 --> 01:27:58.449
as a value function for the pet, for the companion.

01:27:58.669 --> 01:28:03.649
So the companion tries to maximize the empowerment of its master and its own

01:28:03.649 --> 01:28:06.269
too. So it tries not to die, of course.

01:28:06.489 --> 01:28:10.389
And when you do that, it's very interesting. For example, it will shoot an enemy

01:28:10.389 --> 01:28:12.169
that endangers the master.

01:28:13.069 --> 01:28:17.029
It will behave in various ways in a sensible way.

01:28:17.369 --> 01:28:21.669
One thing, for example, that it shows how almost human-like it reacts,

01:28:22.489 --> 01:28:27.549
empowerment looks at and takes into account all possible actions and so the pet,

01:28:28.309 --> 01:28:34.009
has by default assumes that the human could be a psycho and kill the pet so

01:28:34.009 --> 01:28:38.429
the pet has also a sense of self-preservation so it will accompany him and help

01:28:38.429 --> 01:28:44.349
the human but it will stay out of his shooting direction unless you turn on

01:28:44.349 --> 01:28:47.289
a trust flag in which case.

01:28:48.729 --> 01:28:55.149
Basically the pet will trust human right and will enter his shooting shooting direction,

01:28:56.457 --> 01:29:00.897
So here we have empathy, of course, because the pet has a model of the human

01:29:00.897 --> 01:29:02.397
and what the human could be doing.

01:29:02.717 --> 01:29:06.677
And if that model is wrong, then yes, I have a problem.

01:29:06.717 --> 01:29:15.477
But at least it's very clear where empathy and trustworthiness and probably ethics come from.

01:29:15.637 --> 01:29:21.637
It's basically knowing what the other one needs and not acting too much against. Right.

01:29:23.357 --> 01:29:28.077
My prediction there is that what we're going to see is to build machines that

01:29:28.077 --> 01:29:33.537
know how to handle trust will be orders of making it more difficult and to get the biomechanics right,

01:29:35.057 --> 01:29:41.997
i'm not sure about that um i think trust is a state of mind so it's a belief

01:29:41.997 --> 01:29:47.057
state it's basically saying what do i believe will the other one do uh it's

01:29:47.057 --> 01:29:49.557
related to game theory um common knowledge advantage.

01:29:50.077 --> 01:29:56.737
For example, if I am a player and I have an advantage having a pet,

01:29:56.917 --> 01:30:03.457
my pet knows that I have an advantage having it and so on, then the trust can emerge on its own.

01:30:03.577 --> 01:30:08.737
If I'm a new player, basically a newbie, perhaps the pet will say,

01:30:08.797 --> 01:30:13.457
no, this guy doesn't know that yet, so I'll wait and see how he behaves before I trust him.

01:30:13.517 --> 01:30:17.017
And that's in real life it happens to you have a team of people and

01:30:17.017 --> 01:30:19.737
one person is new to the to the business a new

01:30:19.737 --> 01:30:24.617
boss you don't trust him immediately you see how he behaves and turns out okay

01:30:24.617 --> 01:30:28.077
he knows what he's doing he has a good model of the future and then you start

01:30:28.077 --> 01:30:34.397
trusting so in other words i don't think that this is i mean the practice will

01:30:34.397 --> 01:30:38.537
be very difficult of course as all learning is but i think conceptually,

01:30:39.650 --> 01:30:44.390
I don't think we are that far away from that, conceptually. The practice is a dimmer store.

01:30:45.510 --> 01:30:52.050
Biomechanics is, we are really far away. So, I don't think I agree with it. That's fine.

01:30:53.250 --> 01:30:57.450
But I think the other issue that's really important here, why I think this should

01:30:57.450 --> 01:31:01.530
be included in RoboCop, is that if we look at the history of our science,

01:31:01.650 --> 01:31:03.650
we know there's no value-free science.

01:31:03.990 --> 01:31:09.390
And the mistake science has made over the ages is to develop technologies and

01:31:09.390 --> 01:31:14.130
knowledge and then leave it to others to figure out what the ethics of it is.

01:31:14.770 --> 01:31:20.170
This is not working. So if we, as the researchers behind these kinds of machines,

01:31:20.390 --> 01:31:26.050
are not actively engaged in that debate, we will not have normative frameworks when we need them.

01:31:26.450 --> 01:31:29.670
Because, for instance, we go to bioethics or general ethics of human behavior,

01:31:29.870 --> 01:31:31.730
we have no normative frameworks. We're stuck.

01:31:32.010 --> 01:31:36.870
And I feel that just for that reason and also given this historical consideration

01:31:36.870 --> 01:31:39.950
Consideration of science is not value free.

01:31:40.490 --> 01:31:43.450
So it's also us, the scientists, who now have to engage with that.

01:31:43.550 --> 01:31:45.950
So for that, I think it's important that gets included.

01:31:46.350 --> 01:31:49.210
But now, so you have a broad set of interests.

01:31:49.370 --> 01:31:54.090
You're driving this whole RoboCup community now forward into the future.

01:31:54.250 --> 01:31:57.950
2050, you're going to beat the world champion. It's great to have predictions

01:31:57.950 --> 01:32:01.910
that only need to be tested by the time we are in a retirement home somewhere.

01:32:02.070 --> 01:32:06.010
So no one can blame us for making predictions that fail. But now,

01:32:06.090 --> 01:32:10.070
if we would like to follow in a tradition that you represent,

01:32:10.470 --> 01:32:14.390
what would be the Polanyi law that we have to adhere to?

01:32:15.488 --> 01:32:18.848
Well, I wouldn't formulate that as Polanyi law. I think it's a very simple law,

01:32:18.968 --> 01:32:22.108
that it's much older. That's the golden rule.

01:32:23.208 --> 01:32:26.248
Don't do to others what you want be done to you.

01:32:26.968 --> 01:32:32.688
But actually, the rule needs to be generalized. Because a machine that can offload

01:32:32.688 --> 01:32:37.728
its memory onto a big computer has a completely different view on survival than a human.

01:32:37.988 --> 01:32:41.068
A human turned off will not be turned on again.

01:32:41.368 --> 01:32:44.488
A machine turned off can very well be turned on again.

01:32:45.608 --> 01:32:49.008
And so I would say the goal rule needs to be generalized.

01:32:50.168 --> 01:32:57.948
Don't do to somebody else what they themselves, according to the best model

01:32:57.948 --> 01:33:02.408
that you can have from them, would not like to be done to them. Okay.

01:33:03.388 --> 01:33:06.128
So it's a law of empathy, essentially. Yeah.

01:33:07.128 --> 01:33:11.128
And then three years from now, we're going to go visit you with Anna,

01:33:11.208 --> 01:33:12.868
who's waving at us there behind the glass.

01:33:12.868 --> 01:33:15.708
And we're gonna we're gonna check

01:33:15.708 --> 01:33:18.768
whether the prediction you're gonna make today was confirmed or

01:33:18.768 --> 01:33:23.588
not so what's the the one non-trivial prediction you you would like to see tested

01:33:23.588 --> 01:33:28.248
three years from now that you're gonna see confirmed three years is a short

01:33:28.248 --> 01:33:36.648
time can we increase the period four that's too small ten no come on let's do

01:33:36.648 --> 01:33:38.668
it like this compromise three and 10.

01:33:39.548 --> 01:33:47.488
Oh, that's not a compromise. I've made it hard. Well, I think I'm happy to make predictions.

01:33:47.588 --> 01:33:50.788
I'm not so happy making putting times on it. I'll tell you. Okay.

01:33:51.028 --> 01:33:56.528
Because I think that discoveries are power law distributed.

01:33:56.868 --> 01:34:02.008
So it's like avalanches. You know, it will happen or earthquakes that will happen.

01:34:02.108 --> 01:34:04.028
You don't know when and how big big thing.

01:34:04.348 --> 01:34:11.828
So the prediction that I think is that we will have to completely.

01:34:14.498 --> 01:34:21.878
Not completely, but significantly expand our understanding of how to create

01:34:21.878 --> 01:34:29.758
contexts or switch contexts if we want proper AI to merge, rather than highly specialized,

01:34:30.138 --> 01:34:35.318
highly optimized, one domain optimized.

01:34:35.678 --> 01:34:42.598
That is something that I'm sure. I would not put a number on it, probably say,

01:34:45.018 --> 01:34:47.898
no i'd rather i'd rather not give a number to

01:34:47.898 --> 01:34:51.398
that because this this is not that may happen so it's more

01:34:51.398 --> 01:34:54.498
an aspiration of let's say uh if you

01:34:54.498 --> 01:34:58.538
want the general intelligence of this context independent robocop they can play

01:34:58.538 --> 01:35:02.978
football and basketball uh for example where even in robocop that can't play

01:35:02.978 --> 01:35:08.058
football but where i don't have to encode um how to switch context between the

01:35:08.058 --> 01:35:12.078
from a stance to a kick to a defense, et cetera.

01:35:12.238 --> 01:35:17.198
If I don't have to do that myself anymore, I would say we made a huge step ahead.

01:35:17.778 --> 01:35:22.538
Not sure whether it's enough for what we would call AI to be complete,

01:35:22.638 --> 01:35:25.518
but I would say without this, it will not happen.

01:35:25.758 --> 01:35:30.078
Okay, great. Daniel Polanyi, thank you very much for this conversation. Thank you.

01:35:35.498 --> 01:35:41.458
The CSN podcast was produced by the Convergent Science Network of Biometrics

01:35:41.458 --> 01:35:47.838
and Biohybrid Systems, a project funded by the European Sevens Research Framework Program.

01:35:49.398 --> 01:35:54.718
For more interviews, recorded lectures, or upcoming conferences in the field

01:35:54.718 --> 01:36:00.958
of biometrics and biohybrid systems, go to csnnetwork.eu.

01:36:00.880 --> 01:36:09.520
Music.

01:36:01.278 --> 01:36:03.158
And thank you for listening.