Today I interview artificial intelligence researcher Eliezer Yudkowsky. Among other things, we discuss:
- Eliezer’s journey from ‘traditional rationality’ to ‘technical rationality’
- Cognitive biases
- Artificial Intelligence (AI)
Download CPBD episode 081 with Eliezer Yudkowsky. Total time is 1:06:58.
Eliezer Yudkowsky links:
- Eliezer’s home page
- Machine Intelligence Research Institute
- Less Wrong
- Harry Potter and the Methods of Rationality
Links for things we discussed:
- E.T. Jaynes
- Eliezer’s “coming of age” as a rationalist
- Martin Gardner, James Randi
- Richard Feynman
- Heinlein, Asimov
- Hayakawa, Language in Thought and Action
- Hollywood Rationality
- Cognitive biases
- Empiricism, falsifiability
- Eliezer’s tutorial on Bayes’ Theorem, and my rewriting of it
- Analysis of the story about two babies in a carriage
- Eliezer, “My Best and Worst Mistake“
- Jaynes, Probability Theory: The Logic of Science
- Frequentist inference
- Daniel Kahneman and Amos Tversky, “Judgment Under Uncertainty” (the paper)
- Emil Gilliam
- Kahneman & Tversky, Judgment Under Uncertainty (the book)
- Eliezer, “Making Beliefs Pay Rent“
- Eliezer, “Mysterious Answers to Mysterious Questions“
- elan vital
- Eliezer, “Reductionism“
- Artificial intelligence
- I.J. Good
- Universal grammar
- Dyson sphere
- Friendly artificial intelligence
- Exploitation-Exploration Trade-Off
- Marginal return on investment
LUKE: Eliezer Yudkowsky is a research fellow at the Machine Intelligence Research Institute, and a popular writer at the website Less Wrong, a community devoted to the refining the art of human rationality. Eliezer, welcome to the show!
ELIEZER: Pleased to be here.
LUKE: Eliezer, you’re perhaps best known for your extensive writing on how to improve human rationality, how not to fool ourselves, how to overcome cognitive biases, how to update our beliefs to match the evidence, and so on.
When people first discover your writing on LessWrong.com or where ever, it can be intimidating because they think, “Whoa, this guy is 10 levels ahead of me. I don’t have time to figure all this stuff out.”
But of course you didn’t come out of your mother’s womb quoting E.T. Jaynes on probability theory. You got where you are today as a result of a journey, just like the rest of us. So, I’d love to ask you about that journey, and I guess we can start at the beginning. You were raised Jewish, right?
ELIEZER: Well that’s what I used to think, and then at one point I was watching a space shuttle launch on TV and getting tears in my eyes and realizing that I didn’t really get tears in my eyes for anything Judaism-related. That was when I realized that my childhood religion that I’d sort of grown away from over time, but still had the power to bring tears to my eyes, wasn’t Judaism so much as space travel.
LUKE: Interesting. Well you certainly had a religious home that you were raised in, right?
ELIEZER: I did. I mean, I’m sure that there are other people who have had vastly more unpleasant religious upbringings, but nonetheless I did not enjoy it.
LUKE: How did you get from the way that your parents were raising you and the values that they gave you, whatever those were, to what you call on your website, “Traditional Rationality”?
ELIEZER: So, first of all: My father was a physicist and my mother was a psychiatrist. They were what I believe is called Modern Orthodox Jews, which… you still follow all the silly rules but you’re also allowed to believe in science.
So, my father was a skeptic of the Martin Gardner sort. Never applied it to his religion of course, but brought me up reading Martin Gardner. Big fan of James Randi and so on, I believe he was once on a radio show with James Randi once in fact.
So, I was more or less brought up within the nontechnical rationalist’s tradition, and of course growing up in a household full of science fiction books including Heinlein and Asimov had some of the same effect. Reading Richard Feynman as a kid had some of the same effect.
“Language in Thought and Action” by S.I. Hayakawa, standard Richard Feynman and Martin Gardner books… So, that’s sort of what I was brought up in actually, was sort of the non-technical rationality tradition.
LUKE: So, what is it that characterizes what you call, “Traditional Rationality”?
ELIEZER: I generally divide sort of three lines. There’s Hollywood Rationality, which is Spock, which is all wrong. Sort of like the public image of rationality, which is unfeeling or clever verbal thinking, or putting lots of significant digits on things that don’t need them.
Then there’s the traditional art, handed down from the ancient rationalists like Richard Feynman and so on.
Then there’s the rational you get into once you start studying probability theory and decision theory, and all the known biases from cognitive psychology and so on.
So, traditional rationality is distinguished by not having been translated into probability theory and being passed down from supervisor to grad student to father and so on. For example, like empiricism: you should do your experiments yourself. Get your hands dirty. Know how to do your own experiments.
Not just empiricism in the sense of, “Go out and look at things,” but also the sort of home-spun, “do your own experiments” sense of virtue that comes along with it. Falsifiability: stick your neck out. Make bold predictions that no one else is making, that are surprising and that can easily be proven wrong. That way if you’re right it’s impressive, and in order to make an impressive correct prediction you have got to risk some stakes.
You have an obligation to provide justification when you say something interesting, and all these other things that actually turn out to have much more precise incarnations in probability theory, like the notion of falsification, takes on a whole different meaning once you know about Bayes’s Theorem, about which I believe you’ve recently written on your blog. [laughs]
LUKE: Well, or just copy and pasted what you already wrote about it, really. [laughs]
ELIEZER: Well, you re-wrote most of that, and that’s actually one of the web pages that I’ve had on my table to redo for quite awhile, but haven’t actually gotten around to doing.
LUKE: Well, so how did you get from traditional rationality: falsifiability, empiricism, that kind of thing to this more precise type of rationality that takes some lessons from cognitive science and also probability theory and decision theory? How did that journey happen?
ELIEZER: The coolest way to put it would be that it started with a badly phrased math story problem that someone gave me. And that story problem runs like this: You meet a mathematician on the street, and the mathematician is pushing along two babies in a carriage. They’re so swaddled up that you can’t tell what gender the babies are, and the mathematician says, “You know, at least one of my children is a boy. What is the probability that they are both boys?”
Now, this as it happened is the incorrectly phrased form of the story problem, but the way it’s supposed to work is that you ask the mathematician, “Is at least one of your children a boy?” If the mathematician says yes to that, then the probability that they’re both boys is one-third. On the other hand, the person who told me the story actually botched it, they misphrased it. They had the mathematician spontaneously saying, “At least one of my children is a boy.”
And in that case, you might tend to presume that if one child was a boy, or one child was a girl, there would be a 50 percent probability of them saying at least one child’s a boy, or at least one child is a girl. In which case, by Bayes’s Theorem, though I didn’t know it was called that, the probability that they’re both boys is one-half.
I answered that, and someone said, “Well, sure. That’s the Bayesian answer, but I’m not a Bayesian.” OK. So, who are these Bayesian people who get the correct answer as opposed to the wrong answer you just gave me?
ELIEZER: And I started looking it up. [laughs] I was aghast at the thought that there were such things as non-Bayesians…
ELIEZER: …and yet so it seemed to be, somehow.
LUKE: [laughs] Well, on the website Less Wrong, you tell another story as well that perhaps was a key moment in your journey to this more Bayesian way of being rational, and that was a story about your “best and worst mistake.” Would you mind telling us that story?
ELIEZER: To put it in a nutshell, my best and worst mistake was thinking that intelligence was a big, complicated kluge with no simple principles behind it. The reason that it’s my best mistake, as mistakes go, is that this belief that there were no simple answers caused me to go out and study neuroscience, cognitive psychology, and various A.I. machine learning stuff, and this whole big grab bag of information that was actually very useful to know.
As mistakes go, this mistake motivated me to go out and learn a whole lot of different things. Which is certainly a very good sort of mistake to make if you view it from that perspective. But for other and even more complicated reasons that we may or may not end up getting into later in this particular interview, I later realized that A.I. work was going to have to meet a higher standard of precision than I’d been visualizing.
And around the same time I came to that realization, was when I was reading E.T. Jaynes’s “Probability Theory: The Logic of Science” for the first time. And E.T. Jaynes really emphasized the point that if you did things the frequentest way or the various other non-Bayesian ways, if you treated statistics as this big grab bag of tools that you’d throw at the problem, then you could compute the same problem three different ways and come to three different answers.
But in probability theory, said Jaynes, probability theory is math, the results are theorems. You must never compute the same problem two different ways that arrive at two different answers, for that will be a mathematical inconsistency. And this was sort of the moment at which I realized for the first time that there were rules. There were laws of thought.
There were not simply various cool ways you could try to compute a problem. There’s actually a correct way to do it. And the correct way might be computationally impossible, but even if you couldn’t compute the correct answer, it didn’t change the fact that there was a correct answer.
There were rules. Intelligence was lawful and what I had seen as disorganization and chaos was actually just my own confusion and ignorance being projected out into the world.
LUKE: Now, you talked about how artificial intelligence, in trying to program that, you’d have to be very precise and that you’d have to compute the right answer by being very precise and following the correct rules. But you also talked about there being correct rules of thought. What do you mean by that?
ELIEZER: Well, if I get a test result, let’s say I’m being tested for… I don’t know, low blood sugar or something. And I get back a little stick of paper showing red or something like that. And that red is twice as likely to appear in the case of low blood sugar as in no blood sugar.
Then the odds that I had low blood sugar have just doubled. That is, if there was previously one to two, it is now one to four. If it was previously three to one against, it is now three to two against, etc. So, that’s the exactly correct answer.
Whenever you see a piece of evidence, there’s a certain exact amount that that evidence should shift your beliefs. No more, no less. If you look at things in terms of traditional rationality, it’s all phrased in terms of, are you allowed to believe this? Can you get away with believing that? Are you forced to believe this?
It’s all phrased qualitatively. And because of that qualitative phasing, there’s an awful lot of wiggle room and room for argument. And if you look at the underlying math that should theoretically be underlying all of this, it’s precise.
So, in response to every iota of evidence or every iota of argument, things shifting at the certain exact amount. And if you shift any more than that or any less than that, you’re getting it wrong.
LUKE: And Eliezer, getting back to your story. How did you get from studying about Bayes’ Theorem and understanding what “Bayesian” is, to what you sometimes call “Bayescraft”, which is kind of the highest level of rationality and very precise thinking, that there are other things involved in there besides Bayes’ Theorem?
ELIEZER: Well, I sort of think of that as being sort of a human version. There would be a higher version of that which is: write an A.I.! [laughs]
ELIEZER: So, I don’t know if I’d call it the highest level.
ELIEZER: But it’s the most highest level I ever got to, that’s for sure. So, I guess it was just sort of obvious on an intuitive level in some sense. As soon as I saw Bayes’ Theorem, I was like, “Ooh! Here’s the fundamental equation of rationality.”
ELIEZER: It just immediately clicked for some reason. And I guess that’s because my mathematical talent has always been sort of very heavily weighted toward the intuitive, the visualize, to understand what it means.
I know someone and I work with someone who is much more adept at manipulating proofs and theorems than I am. He can prove some things faster than I can understand them. And I used to feel very intimidated and unworthy, until I realized that sometimes he would go through this lightning-fast proof and I would look at this final result and say, “That’s wrong.” And it wouldn’t always be wrong but it often would be wrong.
And that was when I realized that different people have different math talents. So, in my case I just looked at probability theory, and because it happens to suit my native math talent, I understood very quickly what it meant, what it was saying to me.
In terms of like how did I get here from there, another sort of major episode was discovering the field of heuristics and biases. I came across an online paper that was just sort of very briefly in passing …. Actually it wasn’t an online paper, it was an online PowerPoint presentation, that just very briefly in passing, mentioned some of Kahneman and Tversky’s results. Kahneman and Tversky are the two founders of the field of heuristics and biases.
And I was so shocked that what I was reading that I emailed the author to ask, “Is this a real result?”, because it didn’t come with any citations, it was just this presentation. So, you know, the author…to say, is it a real result? And they said yes, and they emailed me back the original Judgment Under Uncertainty paper from 1974.
And then I said, “Well, that looks interesting. I should learn about that eventually.” Put it on hold.
And a friend of mine named Emile Gilliam essentially got reading that on hold and bought me the book to make sure I’d read it. The Judgment Under Uncertainty edit, I own the book. And he probably scored quite a number of points that way. [laughs]
So, I read through the edited volume and it was really fascinating. It was the manual of known bugs in human reasoning, essentially what it was.
ELIEZER: And I hadn’t quite realized that that was a whole field of science that was just all the known bugs in human reasoning.
LUKE: And could you give us one of your favorite examples of these bugs of human reasoning?
ELIEZER: Well, for example, Kahneman and Tversky went to the Second International Conference on Forecasting in 1982. These were professional forecasters. Foretelling the future was their job.
And they asked one group of forecasters about the probability of a complete breakdown of diplomatic relations between the United States and the Soviet Union sometime in 1983.
Now they asked a separate group about the probability of a Soviet invasion of Poland followed by a complete breakdown of diplomatic relations between the USA and the Soviet Union some time in 1983. And group two responded with higher probabilities. And the reason why this doesn’t make sense is that in any case where Russia invades Poland and diplomatic relations breaks down is necessarily a case where diplomatic relations breaks down.
You cannot assign higher probability to the compound event ‘A and B’ than the single event ‘A, whether or not B happens.’
ELIEZER: But that was what the forecasters did. There’s a number of ways looking at the reasons why they did this. The most important thing to realize about that is that adding more details onto a prediction automatically makes that prediction less probable by the laws of probability theory. But it can also make it sound more plausible to human beings.
So, this is telling you about is something of the probability theory underlying Occam’s Razor and the human psychology that causes us not to implement Occam’s Razor. So, you see someone believing these enormous, complicated stories with no evidence behind them. It can really help to understand this, to come to terms with this horrifying reality of human madness.
To have studied some of the cognitive psychology by which you understand how and why. Oh well, sure! As they make the story more and more complicated, it becomes less and less probable and sounds more and more plausible.
LUKE: Yeah, that whole field of cognitive heuristics is absolutely fascinating and we’re still getting new results every year.
One of the things, Eliezier, that you write about on Less Wrong is this series on “Making Beliefs Pay Rent.” Could you explain what does that mean?
ELIEZER: To give a negative example, this is how it should not work. A negative example would be, you go into your English class and the English professor tells you that Mildred Mirram is a post-Utopian. And She said, “Well her works exhibit colonial alienation.” Well what is colonial alienation? “Well, it’s what post Utopians exhibit.”
And it seems like you have these beliefs that are connected to other beliefs. And that this notion, of her being post-Utopian, is actually yielding this successful prediction that her works will exhibit colonial alienation. And yet, your belief network actually has this little collection of nodes that are connected only to each other, and never interact with sensory experience at all.
On the other hand, if I believe that gravity pulls downward at 9.8 meters per second, per second. And I believe that a certain building that I’m on is, say, 125 meters high. By having a sort of abstract belief about ‘this is how much gravity is,’ and ‘this building is 120 meters tall,’ you can get from there to the anticipation of sensory experience.
And if I drop this bowling ball off the tower, the bowling ball is going to crash into the ground five seconds later. Or to be even more precise, if I saw the clock second hand on the One numeral when I dropped the ball, the hand will be on the Two numeral, five seconds later when I hear the crash.
And then you’ve taken your sort of abstract beliefs, with words like ’9.8 meter per second, per second’ and concepts like acceleration and even integral calculus. And the building is around 120 meters tall, the belief that a building is 120 meters tall is directly a sensory experience. But by connecting your beliefs together, you can get directly to what I anticipate happening next.
Or another example: let’s say someone tells you that they have a dragon in their garage. And you say, “OK, let’s go look at the dragon.” They say, “It’s an invisible dragon.”
You say, “OK, let’s go and listen to the dragon.” And they say, “It’s an inaudible dragon.” And you say, “Well I’d like to toss a bag of flour in the air and see if the dragon’s invisible form is outlined within the flour.” And they say, “Well the dragon is permeable to flour.”
Now, when Carl Sagan originally told this story, he was telling it to say, if your beliefs have no effect on the real world than you’re allowed to have them but please keep them out of my politics. Or you can tell the story to emphasize the idea that false hypotheses need to do sort of fast footwork and complicate themselves to avoid falsification.
But when I tell that story, I tell it with the moral that, this person who says they have a dragon in their garage, clearly has a good model of the world hidden somewhere in their brain. Because they can anticipate, in advance, exactly which experiences they’ll need to come up with excuses for. He’ll know in advance that when you look into his garage you’re not going to see a dragon there. And the moral I take from that is: don’t ask what facts do I believe? Ask: what experiences do I anticipate?
LUKE: And so, making beliefs pay rent is to say, look at your beliefs and make sure that they actually give you some anticipated experiences. Because if they don’t, then maybe they’re just kind of free hanging and only connected to other certain beliefs, and not actually connected to anything you could ever experience.
ELIEZER: Or, for that matter, maybe they’re carefully set up to give exactly the same answers… Maybe they’re surrounded by a sort of protective layer of excuses which prevents them from making any predictions, which is even a worse sign in a way.
LUKE: Yeah. I think making beliefs pay rent is a good image for this idea that, your beliefs should render predictions. Because otherwise, what do your beliefs really mean? Do you really even believe it if it wouldn’t change your anticipated experiences at all?
ELIEZER: Well, the problem is, people can and do believe it. It is perfectly possible to have a collection of nodes in your head that doesn’t link to sensory experiences, that doesn’t link to reality. Where you can’t set up a truth condition of saying, how would the quarks in the universe have to be configured for Mildred Miram to be a post-Utopian or not?
It’s clear what configuration of quarks… makes a statement, this building is 120 meters tall, true or, alternatively, false. Some collections of quarks will make that statement true, others will make it false. If the statement, “Mildred Miram is a post-Utopian, ” doesn’t constrain the sensory experience in any way, it’s not going to be conjugate to any collection of quarks or any collection of causes and effects.
It’s not going to have a truth condition, it’s not going to be meaningful. Nonetheless, if you don’t answer that way on the test the professor will mark you down. And if you published a few papers about that and become famous for them, you might be very passionate about this belief.
Even though it’s sort of an abuse of your brain’s belief representation to contain content that that doesn’t mean anything, or correspond to anything. It’s a part of the map for which there is no territory. Not just that the territory doesn’t match the map, but that this was a section of the map which does not match any territory, and could not match any territory, no matter how the territory looked.
LUKE: And the map and the territory is this analogy you use, where you’re trying to come up with as true a model of the world as you can, so that the map in your head matches up to the territory in reality. And when you allow yourself to have these points on the map, that don’t actually control your anticipated experiences, then those points on the map might just not ever connect to anything in the territory.
ELIEZER: Right, and not just that they’re false, but that they might not mean anything. There might not be any way the world could be that would make them true or false.
LUKE: So, this is one of the tools that you write about to improve our rationality, and improve our ability to make our mental maps match the territory in the world outside. Another tool that you provide is this warning against giving mysterious answers to mysterious questions. What does that look like?
ELIEZER: If you rewind really far back to the day of the ancient Greeks, then, wouldn’t this have been interesting to look at yourself in some half polished mirror and just have no idea what you were looking at? You’re made out of stuff that moves and that corresponds to your will. Well, why does your hand move when you want it to move, whereas, a piece of clay that you make it into the shape of a hand, doesn’t move at all?
So, right up even until the 19th century, even, it was thought that there was this, sort of, animating spirit, elan vital. The elan vital was not a term invented by the ancient Greeks, but I’m not sure that they even had, sort of, a notion that refined yet, but by the time you got to the 19th century, this confusion, “Why does my flesh obey my orders and not clay?” They thought that there was animating spirit running through the flesh, and that distinguished the flesh from the clay. It was the stuff of life that was responsible for the obvious difference in kind between animate matter and inanimate matter.
Many people still believe in this, of course. they don’t quite call it by that name, but one way or another, they believe it. But the thing about saying that what is responsible for life is elan vital – Even after you say it, you can’t make any new predictions.
So, that’s the first thing to notice. After you say it, it doesn’t act as an anticipation controller. It acts as a curiosity stopper. You say, “Why?” and the answer is, “Elan vital.” And then you’re supposed to stop thinking.
The notion of elan vital didn’t have any moving parts inside. There wasn’t a complex mechanism that was supposed to explain life, it was just supposed to be a sort of simple fluid, a simple substance, that was responsible for it. It was a black box, and you weren’t supposed to open up the black box and look inside. That’s the second attribute of mysterious answers to mysterious questions.
LUKE: Right, and so it’s a mysterious answer to a mysterious question, we haven’t made it anywhere.
ELIEZER: This was something that got out of ET Jaynes: if I’m ignorant about a phenomenon that is a fact about my state of mind, not a fact about the phenomenon itself. And elan vital sort of took all the ignorance that people had of how life worked and made it into a substance in the outside world.
LUKE: Elan Vital is an example of a mysterious answer to a mysterious question. What are some other ones that are more recent and more common today?
ELIEZER: Well the third sign of a mysterious answer is that the people who offer the mysterious answer are sort of proud of their ignorance. They speak very proudly of how the phenomenon defeats ordinary science or is unlike merely mundane phenomena. They put their ignorance into a separate magisterium and make it holy. That’s why even after the mysterious answer is given, the phenomenon is still a mystery and still possesses the same quality of wonderful inexplicability that it had at the start.
So, a modern example of something like that would be, I would hold, emergence. Suppose I were to tell you that intelligence is an emergent phenomenon within the brain. And if I told you intelligence is a magical phenomenon within the brain. Or if I simply told you intelligence is a phenomenon within the brain. So if I just substituted the word magic for emergent or I just deleted the word entirely, you’d make exactly the same predictions either way.
LUKE: Well and a very common one of course is God as a mysterious answer to a mysterious question, depending on how it’s put forward.
ELIEZER: That one’s almost too easy even. I actually do think that the God hypothesis is meaningful and false. I mean if you tell it to a kid, they have a pretty good idea of what you mean by God.
They’re able to make experimental predictions about what God would be expected to do and those predictions don’t come true. Now adults have elaborate excuses to guard the hypothesis from falsification, but children know perfectly well what it means and the hypothesis is meaningful and wrong.
LUKE: So Eliezer, related to all of this is the concept of reductionism. Could you explain what that is?
ELIEZER: There is of course Hollywood reductionism which is believing that since atoms are uninteresting and the universe is made of atoms the universe is uninteresting, Q.E.D. So that’s Hollywood reductionism. What does reductionism mean if you actually have some idea of what you’re talking about?
So, I once met a fellow who’d been a Navy gunner and this fellow was under the impression that things which move slowly were governed by the Newtonian mechanics and things that move at high rates of speed, they’re governed by special relativity and general relativity.
And I attempted to explain to this person, “No, everything in the universe is governed by special relativity and general relativity at all times, in all places.” And he was like, “No, for low velocity things that will give you the wrong answer.” I was like, “No, it will give you the exactly correct answer, but it might take too long to calculate so you might want to quickly get a Newtonian answer.”
But, no, he thought that there were actually sort of different laws that governed at different speeds. So, he didn’t really understand the notion of a universe that was governed by unified physical laws, that the universe is a single, low level, unified, mathematical process and that something like Newtonian mechanics was an approximation that would quickly give you an answer that was almost right but in its details wrong.
This is sort of a run up to the concept of reductionism. So, let’s you look at an engineer’s model of a 747. That engineer’s model of a 747 is not going to talk about quarks. It will talk about wings and airflow and it will model the air as a fluid but there will not be elements in the computer program modeling the 747 that correspond to individual quarks.
And the idea behind reductionism is that our multi-level map of the universe corresponds to a single-level territory. So, we have different beliefs about objects of different scales. We have different beliefs about quarks, about molecules, about cells, about tissues, about people, about societies. We learn different rules pertaining to each of these things, and we learn rules about how to translate our knowledge at one level into our knowledge at another level. But, this multilevel map, there isn’t, in the world out there, separate levels.
In the world out there, there’s a 747 that is incarnate in the quarks, and the fact that the 747 has wings is a high level map. It’s a high level fact in a multilevel map, a multilevel model that we have of the 747. But in the the laws of physics themselves, they’re just quarks and the interactions between them.
Now, the notion that the 747 has wings is meaningful. It has a truth condition. There is something about the 747 that makes the belief that it has wings true, or alternatively, false. There are some configurations of quarks that makes that true. There’s some configurations of quarks that makes that false.
If I say “I’m angry at you.” Or, “I’m fond of you, my friend.” Then there are ways that I, as a collection of quarks, can be, that will make these facts true or alternatively, false.
So, everything that we talk about at the higher levels is still meaningful, it still has truth conditions. But this does not change the fact the universe itself is a single mathematically unified, low level process, governed by universal, exceptionless, physical laws, as best as anyone can determine.
LUKE: The person who doesn’t believe that, maybe a dualist or something like that, would say, “Well, Eliezer, how do you know that it’s quarks all the way down? How do you know that?”
ELIEZER: It’s a simple hypothesis, extensively supported by the evidence. And, first, there’s a question as to whether “non-reductionist” hypotheses are even properly meaningful. Like, what would it mean to live in a universe where the fact that the 747 had wings was a separate fact apart from its core. What would that universe look like that would make non-reductionism true? Is non-reductionism a coherent theory or is it just a sort of logical confusion?
And the other aspect is, well, we went out and looked at the universe, and we found that it was a single, low level physical process, as best as anyone could ever see by anything that actually showed up in replicated experiments. And there were a lot of people trying to say things that didn’t fit with this picture, because, to a human, the idea of a rule with literally no exceptions, it doesn’t sound right.
Like, you might have a rule about dividing up the meat fairly that you took in the hunt. But, if you put a gun to everyone’s head and said, “This one time don’t divide up the meat fairly, or we’ll shoot everyone in the tribe.” They would make an exception to the rules just that once. So, the idea of a universe with universal exceptionless laws, is something that belongs to the language of math, more than the language that humans would tend to naturally speak in.
People are always inventing exceptions to the laws. And science is always shooting down those proposed exceptions. And the fact that this happens over and over again, at some point you sort of, pick up the hint. You realize what it is that the universe is trying to tell you here, anthropomorphically speaking.
You realize that after the last 300 exceptions got shot down, that the 301st exception is probably, once again, humans just not getting with the concept of universal law.
LUKE: Yeah. So, that’s what you would say about proposed exceptions today like consciousness, where people say the intrinsic subjectivity of consciousness is just, I just can’t imagine how that could be reduced to quarks or turn out to be just a particular configuration of quarks, and you’re going to say, “Well, we said that about 5000 other things before, and all of those have been shot down and turned out to be just quarks.” Is that right?
ELIEZER: The problem is that people aren’t immortal. They didn’t actually live through it. They learn about astronomy and chemistry and biology in school and it seems to them that these have always been the proper meat of science. They never were part of the separate majesterium, they never were revered, they never were sacred, they never were mysterious.
So people who are going off and doing astrology, or people who believed in Vitalism, they must have just been stupid. Because they took something that is self evidently non-mysterious, in the domain of science, like biology, and they tried to make a big deal out of it. They say, consciousness, no, consciousness, that really is mysterious.
Then they’ll say something like, it’s been mysterious for centuries and that of course is exactly what they would have said about life in the nineteenth century. Everything is a mystery, right from the dawn of human experience, right up until someone solves it.
ELIEZER: I think that people like that are just not quite taking the lessons from history that they would’ve taken of they’d lived through it themselves. I say this as someone who used to believe that consciousness was mysterious. Andonce I actually started to realize that it wasn’t going to be like that, that was when I woke up and said: I can’t believe I did that again!
Why didn’t I learn anything from my history book! That was when I realized not just the general form of the lesson but also: wow, those people weren’t stupid. Or at least, I wasn’t any smarter than they were.
LUKE: Yes, we all have these human brains.
ELIEZER: The great shock that happens over and over again is that something seems really amazingly impenetrably mysterious, like no one could ever manage to explain that, and then they explain it anyway.
LUKE: [laughs] It keeps happening. The universe is trying to tell us something.
ELIEZER: No it’s not. [laughs] The universe does not actually have any will or spirit with which to want to tell us things. But we might want to learn from it. [laughs]
LUKE: Maybe. So Eliezer, you work in artificial intelligence and a particular area of artificial intelligence, called Friendly AI, could you explain real briefly what the need for that is, and what type of work you’re doing?
ELIEZER: OK, making a very long story very short! [laughs] About 50-100 thousand years ago maybe, human beings evolved a new bit of software that actually made the human beings in the first place. No one quite knows what the last layer of icing on the chimpanzee cake was exactly. But we woke up and a few tens of thousands of years later we even invented writing. And then we invented science and a few centuries after that we invented computers.
The moral of the story is, intelligence is powerful. People say things like, intelligence is no match for a gun, like my guns had grown on trees. Or you try to explain to them that an AI having smarter than human intelligence might be a bit of a big thing. They’ll say something like, “What if the AI doesn’t have any money.” Like humans had gotten money from the trees and had just found supermarkets already constructed there in the savannah.
People just have this stereotype of intelligence as this useless thing that professors have without quite realizing that this is why they are not trying to gather nuts and roots in order to make it through the day.
I.J. Good pointed that if you have a sufficiently smart AI, it can do that thing humans do where they try to build AIs and it can potentially make itself smarter. Then, having made itself smarter, it might be able to make itself even smarter.
If the AI got to the point where each self improvement on average triggered more than one self improvement of similar magnitude on average, if its metaphorical effective neutron multiplication factor – which was what was used to describe nuclear criticality – went over one then you get what I. J. Good termed an intelligent explosion. Leading to what I. J. Good termed ultra intelligent machine.
At this point, people sort of start asking, “Well, what, who would a super-intelligent AI want?” And this is the wrong question, because humans come from the factory with a number of built-in drives, and also a built-in capacity to pick up a morality from their environment and upbringing. Which nonetheless, has to match up with the built in drives or it won’t be acquired.
The same way that we have a language acquisition capacity that doesn’t work on arbitrary grammars, but only a certain sort of human built in syntax. So, that sort of corresponds to our experience. So, we expect all minds – because they’re all minds that we’ve had the experience with – we expect them all to, you know, sort of, on the one hand, have a certain amount of innate selfishness as a drive, self concern as a drive, and yet, to respond positively to positive gestures.
So, they’re thinking, “Well, you know, we’ll build AIs, and the AIs will, of course, want some resources for themselves, but if we’re nice to them, they’ll probably be nice to us. And on the other hand, if we’re cruel to them, and we try to enslave them, then they’ll resent that, and they’ll feel rebellious, and they’ll try to break free.” And if you think I’m making all this up, I suggest watching the prequel to the Matrix movies, or just, like, reading any bad science fiction ever about AIs.
LUKE: [laughter] Those are remarkably human – homo sapien – related AIs.
ELIEZER: They are human. These are script writers who do not comprehend the concept of anthropomorphism, so they just write the AIs as if they’re any other character, meaning any other human character.
But, all human minds are a single dot within the space of possible, all possible mind designs. You imagine this gigantic sphere, larger than the Dyson sphere, maybe of course you can’t visualize that. So, just imagine this huge sphere, stretching far off into the night sky, and that would be space with possible mind designs. Then imagine a tiny little dot sized period, where you can’t see it, but it’s really small, and that’s where all the humans are in mind design space.
ELIEZER: But since we don’t usually spend a whole lot of time talking with squirrels, we think that that’s the whole, that that little tiny dot is the whole space.
LUKE: Yeah, and I would imagine, even if we take animal minds on earth, that’s a larger dot in mind design space than humans, but it’s probably still very small in the space of all possible mind designs.
ELIEZER: Right, because it’s mostly the mammalian lineage that has really complicated mind designs at all, and natural selection is very tightly constrained in what it designs and manufactures. For example, the wheel, the freely rotating wheel, has been invented a grand total of three times by natural selection, that is known to us over the history of Earth. Because, a freely rotating wheel is a very hard thing to evolve incrementally.
Cynthia McKenyon once said, “A programmer can do things in an hour that natural selection cannot do in a million years.” If you can just sort of use your abstract intelligence to make multiple changes that work together simultaneously and just jump right through the cirque space in a way that natural selection can’t manage.
I think there’s this notion that creationists are bad, which is true. People who believe in natural selection are the good guys, which is also true. Therefore, if you say good things about natural selection and praise how amazingly effective it is, you are also a good guy and on the side of science, which is false. If you actually read scientific literature on evolutionary biology, it tends to be heavily guarded with warnings not to anthropomorphize natural selection, evolution and not to think that it would do the same sort of thing you would do in its shoes.
Remember, the amazing thing about natural selection is not how well it works, but that it works at all without a brain. This is a very counter-intuitive idea, that you can actually get complicated designs without there being any brain to design them. It doesn’t mean that doing it without a brain is actually better.
That is false praise, and remember, it’s not that you get bonus points for saying that because you’re on the side of evolutionary biology, you are merely contradicting modern science’s understanding of evolutionary biology and making mistakes. You know that thing a computer programmer does where they sit down for an hour and generate a new piece of code you know containing hundreds of inter-dependent parts. Natural selection does not work that quickly because it does not have a brain. The amazing thing is not how well it works but that it works at all.
And the analogy I sometimes use to explain the human brain is that it’s like the first replicator. The first replicator ever to exist. The one that just sort of popped into existence by accident in some tidal pool.
Now, it’s counter intuitive that you can have an accidental replicator at all. That first replicator, the one that happened by accident, that replicator that wasn’t produced by natural selection, it had to happen to get natural selection started at all but at the same time it probably wasn’t a very good replicator. You know it would be eaten in an instant by a modern bacteria.
If you talked about what a wonderful replicator that first replicator might have been in order to praise science and score more points against the creationists who deny that you can get an accidental replicator. You’d be totally missing that what made this first replicator so wonderful if not how well it replicated that it could happen at all by accident. It’s the same way with the human brain.
Now by relation to the whole future stretching out ahead of us it’s probably going to be one of the strangest intelligent brains ever to exist. Because it was produced entirely by natural selection and not at all by intelligent design. Now you do need some brains like that back in the dawn of time in order to get the recursive self improvement minds-manufacturing-minds process started.
But if you were to talk about that brain as if it were some kind of super amazing incredible thinker, you’d be entirely missing the point. The human brain is the lowest level of intelligence that suffices to build computer chips. If any lower level of intelligence we could have and still build computer chips, we’d be having this conversation at that level of intelligence instead. We’re literally as dumb as you can get and still you know build AI’s given that we actually can build AI’s which I do think we can but which hasn’t quite been determined yet.
So, that’s sort of the background concept of the intelligence explosion which is one of the things, one of the many things that people actually mean when they say singularity. But I do prefer to use the word intelligence explosion because it is more precise and actually means something unlike what the sad thing that has happened to the word singularity which is now being used to mean all sorts of different things and is no longer a very precise term.
LUKE: Yeah, yeah. So, you explained how there are all these different possible ways that a mind could be designed. How does that fit into your work on friendly AI?
ELIEZER: Because the question isn’t what will super intelligent AI’s want. the question is, you know, there are different kinds of possible super intelligent AI’s and they want different things depending on how their goal systems, their preferences, are written. And the sufficiently intelligent agents will preserve their utility function in writing themselves or their distant parts or successors or whatever you want to call it. You know as they are absorbing more material and transforming it into more rational agent, it will transform it into more rational agent with the same preferences.
For example: Gandhi doesn’t want to kill people. You offer Gandhi a pill that makes him want to kill. If Gandhi actually knows that this is what the pill does, Gandhi will refuse the pill because if he takes the pill he will kill people and Gandhi doesn’t want people to die.
So, that’s the sort of brief gloss of the heuristic, non-technical intuitive argument for most sufficiently intelligent rational agents that tend to precisely comprehend the effective changes to their own source code will deliberately change their source code in a way that alters their utility function, or whatever preferences that they have.
I would like to be able to make that argument precise. I would like to be able to design an AI, and know that that AI was going to modify in itself in such fashion as to preserve the utility function as originally written.
I don’t know how to do that. No one knows how to do that, as far as I know. That is, even if you give me infinite computing power, I cannot give you any formal specification.
I cannot give you any mathematical insight into how to have an AI self modify, including the part of the self modifying, the part of itself that goes to self modification at all. Because, all the decision theory tools we have for that will go into an infinite loop and explode, at the point where it talks about the AI modifying part of itself that does self modifying.
So, in order to have humanity’s future life cone be what we would regard as worthwhile, it is probably necessary to solve this problem.
And that is what the Machine Intelligence Research Institute is set up to do, and in particular, my mid-range long term job description is to come up with a reflective decision theory, that does not go into an infinite loop and explode when it talks about modifying itself. And that will be able to give formal mathematical insight into the notion of self modifying in a way that preserves the preferences/utility function, etc.
LUKE: Yeah, and so the worry is that, if there are different types of possible minds that could arise, as different people are trying to design artificial intelligences, there are a huge number of them that would end up being really destructive, or even malicious toward the things that we care about.
And so, it’s sort of a race to develop a self-improving artificial intelligence that will make the world a better place, instead of destroying everything that’s valuable. Is that kind of the picture of how you would paint the future in some of these important efforts?
ELIEZER: Pretty much. People sort of automatically map the AIs they imagining on to humans, they take humans as their point of departure, they imagine AIs as though they’re modified humans. And, because of this, they tend to think that it is much easier to get “human-ish” values, “moral-ish” AIs. AIs that we can coexist with.
They imagine that that’s much easier to get to a future that we regard as worthwhile, whereas, after you spend a few years investigating the issue – or depending on how fast you are on the uptake, possibly less than that – you realize that human values are very complicated, don’t happen by chance, they don’t happen in generic minds. They’re not built into the basics of rational agents, and if you get even a small distance away from this very complex information making up human value, the worthwhileness of the universe drops off very rapidly past that point.
For example: boredom. If you look at the rational agent formalisms, there is a concept analogous to exploration curiosity, and it’s called the exploitation exploration trade off. Now, the way humans do it is, we get curious, we explore. Now, there’s innate drives for all these things. But that’s the way natural selection stumbled across it.
You look at the pure math, it says, figure out how much resources you want to spend on exploring, do a bunch of exploring, use all your remaining resources on exploiting the most valuable thing you’ve discovered, over and over and over again.
So, if you lost the human notion of boredom and curiosity, but you preserve all the rest of human values, then it would be like… Imagine the AI that has everything but boredom. It goes out to the stars, takes apart the stars for raw materials, and it builds whole civilizations full of minds experiencing the most exciting thing ever, over and over and over and over and over again.
The whole universe is just tiled with that, and that single moment is something that we would find this very worthwhile and exciting to happen once. But it lost the single aspect of value that we would name boredom and went instead to the more pure math of exploration-exploitation where you spend some initial resources finding the best possible moment to live in and you devote the rest of your resources to exploiting that one moment over and over again.
And so you lose a single dimension, and the reach with the universe is an interesting worthwhile place and the whole human journey was worth it – from our perspective – drops off very rapidly.
This is what I call the fragility of value thesis. It is counter intuitive.
Your brain comes up with all sorts of clever ways to argue to the AI that allow to do what you want the AI to do. Because that is what the human brain is programmed to do. It’s programmed with political nationalization for doing things your way as argued to people who started from other goals.
But, the AI is not motivated to argue the same way that you are motivated to argue to the AI and so if you come up with a clever argument for why fully general Bayesian decision processes should do boredom the human way instead of the exploration-exploitation trade off way that you can find in artificial intelligence textbooks, it is probably not going to work on an arbitrary AI. Now you can build the AI, to value things the human way, but that would be like a special AI within the space of all possible mind design.
Most of them by default unless they are specially configured, are going to be doing exploitation, exploration the way it’s in the AI textbooks. So, that’s the problem, that you have to set a very narrow target, in mind design space in order to end up with a worthwhile future. And if you miss that target by even a little, the worthwhileness of the universe from our perspective drops off very rapidly.
By our perspective I don’t mean like, “Oh well, the universe is full of, strange and wondrous interactions between all sorts of mysterious different AI’s that we couldn’t grasp in inevitable civilization. But you mere human you didn’t comprehended it…” No I don’t mean that. I mean you off boredom in a way and they are all exploiting the same experience over and over and over again. That’s what I mean: it gets uninteresting, is the problem.
You have hit a very narrow point in the design space in order to not end up with all the galaxies being turned into paper clips or two little tiny counters showing very high values of pleasure. Or something exploiting the same high utility configuration over and over and over again. It is hard to get a worthwhile galaxy, and it requires solving some rather difficult AI problems.
LUKE: So, we have a lot of things that we think are important and that we worry about like global warming, and how to feed the poor and how to get peace in the middle east and how cure certain diseases.
How important do you think this problem of friendly artificial intelligence is in the big picture?
ELIEZER: The problem with friendly artificial intelligence is the big picture. That is the question of what happens to hundreds of millions of galaxys over the next billions of years. Global warming is not the big picture.
But, if I can be a bit more abstract there is a concept of marginal return on investment. And there is going to some optimal balance of investments for the human species at this point in its history.
Now if we were talking about 2500 years ago, what if you rational for humanity at that time to try to directly invest all the three resources on doing AI theory and to do no farming? No, because we would have starved. Right now however, we are getting a bit close to the expected time when the creation of AI is already plausible.
Someone could be working on it in some basement that I don’t know about. It is already plausible, it is going to get more and more probable over time, because it’s hard to feel make it any less probable over time. I would be quite surprised to hear that a hundred years later AI had still not been invented, and indeed I would be a bit surprised in the sense of having modifications violated, to hear that AI had still not been invented 50 years from now.
So, there is going to be some marginal balance of investments for the human species which if we were smart, if we were sane, we were doing what species ought to do, we would not be putting all our effort into direct research on friendly AI and zero effort into farming. But, none the less, the current situation where humanity is spending more on marketing lipstick in New York, than on insuring the future of the next several billion years and 100 million galaxies – not to mention for those with more selfish and short sighted vision, their own survival, in the next few decades – the tiny fraction of resources that we are currently spending on this problem is not that defensible, it isn’t sane. And philanthropy has always been insane. There is no efficient market, in philanthropy there is no such market, and expected utilons.
Even within a certain class of charitable interventions like I am trying to save lives in Africa you will find charitys that are 1000 times as efficient as other charity’s. Can you imagine having a stock that predictably delivered 1000 times the return of other stocks? This will not happen in an efficient market. And that is because people care about money in a way that they do not quite care about maximizing the return on marginal investment in expected utilons when they do philanthropy.
So, you can look at the resource balance and how much we current invested in global warming. How much we are currently investing in marketing lipstick in New York, and how much we currently investing and trying to solve the friendly AI problem. It is very clear that the next marginal philanthropic investments should be going into friendly AI.
We are rationalists over there. We are not doing this because we wandered into it at random. We are doing this because there has to be one cause in the world that has the single highest marginal return on investment in expected utilons. And friendly AI is it. And if with that we’re not it, we would be off doing something else, right now this is where the maximum marginal return on investment is.
LUKE: So, donate to the Machine Intelligence Research Institute.
ELIEZER: Right, or whatever fraction of your money you want to spend on maximizing marginal return on expected utilons turn that to the Machine Intelligence Research Institute.
LUKE: Well Eliezer one last question. I know you have been talking about writing a book for quite a while and a lot of people will be curious to know how that’s coming along.
ELIEZER: So, I am about to finished with the first draft. The book seems to have split into two books. One is called How to Actually Change Your Mind and it is about all the biases that stop us from changing our minds. And all these little mental skills that we invent in ourselves to prevent ourselves from changing our minds and the counter skills that you need in order to defeat this self-defeating tendency and manage to actually change your mind.
It may not sound like an important problem, but if you consider that people who win Nobel prizes typically do so for managing to change their minds only once, and many of them go on to be negatively famous for being unable to change their minds again, you can see that the vision of people being able to change their minds on a routine basis like once a week or something, is actually the terrifying Utopian vision that I am sure this book will not actually bring to pass. But, it may none the less manage to decrease some of the sand in the gears of thought.
LUKE: Well it sounds excellent to me and what’s the second book that this has become?
ELIEZER: That’s all the basics of rationality that ought to be taught in grade school and are actually just taught piece meal in various post-graduate courses.
What is truth? What is evidence? Probability is in the mind. What does it mean to say that a hypothesis is simple? How do you do induction?
Reductionism. What does it mean to be any universe where complex things are made of simple parts. Just covering all the basics really.
LUKE: Yes that sounds great too. I hope I get to see them sometimes soon. Eliezer it’s been a pleasure speaking with you. Thanks for coming on the show!
ELIEZER: Thanks for having me.
Previous post: News Bits