Reading Yudkowsky, part 47

by Luke Muehlhauser on June 21, 2011 in Eliezer Yudkowsky,Resources,Reviews

AI researcher Eliezer Yudkowsky is something of an expert at human rationality, and at teaching it to others. His hundreds of posts at Less Wrong are a treasure trove for those who want to improve their own rationality. As such, I’m reading all of them, chronologically.

I suspect some of my readers want to “level up” their rationality, too. So I’m keeping a diary of my Yudkowsky reading. Feel free to follow along.

His 391st post is Heading Toward Morality:

As people were complaining before about not seeing where the quantum physics sequence was going, I shall go ahead and tell you where I’m heading now.

Having dissolved the confusion surrounding the word “could“, the trajectory is now heading toward should.

In fact, I’ve been heading there for a while.  Remember the whole sequence on fake utility functions?  Back in… well… November 2007?

I sometimes think of there being a train that goes to the Friendly AI station; but it makes several stops before it gets there; and at each stop, a large fraction of the remaining passengers get off.

One of those stops is the one I spent a month leading up to in November 2007, the sequence chronicled in Fake Fake Utility Functions and concluded in Fake Utility Functions.

That’s the stop where someone thinks of the One Great Moral Principle That Is All We Need To Give AIs.

To deliver that one warning, I had to go through all sorts of topics – which topics one might find useful even if not working on Friendly AI.  I warned against Affective Death Spirals, which required recursing on the affect heuristicand halo effect, so that your good feeling about one particular moral principle wouldn’t spiral out of control.  I didthat whole sequence on evolution; and discursed on the human ability to make almost any goal appear to support almost any policy; I went into evolutionary psychology to argue for why we shouldn’t expect human terminal valuesto reduce to any simple principle, even happiness, explaining the concept of “expected utility” along the way…

…and talked about genies and more; but you can read the Fake Utility sequence for that.

So that’s just the warning against trying to oversimplify human morality into One Great Moral Principle.

If you want to actually dissolve the confusion that surrounds the word “should” – which is the next stop on the train – then that takes a much longer introduction.  Not just one November.

I went through the sequence on words and definitions so that I would be able to later say things like “The next project is to Taboo the word ‘should’ and replace it with its substance“, or “Sorry, saying that morality is self-interest ‘by definition‘ isn’t going to cut it here”.

And also the words-and-definitions sequence was the simplest example I knew to introduce the notion of How An Algorithm Feels From Inside, which is one of the great master keys to dissolving wrong questions.  Though it seems to us that our cognitive representations are the very substance of the world, they have a character that comes from cognition and often cuts crosswise to a universe made of quarks.  E.g. probability; if we are uncertain of a phenomenon, that is a fact about our state of mind, not an intrinsic character of the phenomenon.

Then the reductionism sequence: that a universe made only of quarks, does not mean that things of value are lostor even degraded to mundanity.  And the notion of how the sum can seem unlike the parts, and yet be as much the parts as our hands are fingers.

Followed by a new example, one step up in difficulty from words and their seemingly intrinsic meanings:  “Free will” and seemingly intrinsic could-ness.

But before that point, it was useful to introduce quantum physics.  Not just to get to timeless physics and dissolve the “determinism” part of the “free will” confusion.  But also, more fundamentally, to break belief in an intuitive universe that looks just like our brain’s cognitive representations.  And present examples of the dissolution of even such fundamental intuitions as those concerning personal identity.  And to illustrate the idea that you are within physicswithin causality, and that strange things will go wrong in your mind if ever you forget it.

And yet…

We aren’t yet at the point where I can explain morality.

Jesus. Eliezer’s mind is a complicated place.

The Outside View’s Domain modifies a Platonic dialogue. Surface Analogies and Deep Causes ends:

As for Inside View vs. Outside View, I think that the lesson of history is just that reasoning from surface resemblances starts to come apart at the seams when you try to stretch it over gaps larger than Christmas shopping – over gaps larger than different draws from the same causal-structural generator.  And reasoning by surface resemblance fails with especial reliability, in cases where there is the slightest motivation in the underconstrained choice of a reference class.

Optimization and the Singularity is a quick preview of some of Eliezer’s thinking on the Singularity.

The Psychological Unity of Humankind is next:

Donald E. Brown’s list of human universals is a list of psychological properties which are found so commonly that anthropologists don’t report them.  If a newly discovered tribe turns out to have a sense of humor, tell stories, perform marriage rituals, make promises, keep secrets, and become sexually jealous… well, it doesn’t really seem worth reporting any more.  You might record the specific tales they tell.  But that they tell stories doesn’t seem any more surprising than their breathing oxygen.

In every known culture, humans seem to experience joy, sadness, fear, disgust, anger, and surprise. In every known culture, these emotions are indicated by the same facial expressions.

This may seem too natural to be worth mentioning, but try to take a step back and see it as a startling confirmation of evolutionary biology.  You’ve got complex neural wiring that controls the facial muscles, and even more complex neural wiring that implements the emotions themselves.  The facial expressions, at least, would seem to be somewhat arbitrary – not forced to be what they are by any obvious selection pressure.  But no known human tribe has been reproductively isolated long enough to stop smiling.

When something is universal enough in our everyday lives, we take it for granted; we assume it without thought, without deliberation.  We don’t ask whether it will be there – we just act as if it will be. When you enter a new room, do you check it for oxygen?  When you meet another intelligent mind, do you ask whether it might not have an emotion of joy?

The Design Space of Minds points out that human minds represent a tiny dot in the space of possible mind designs, that posthuman minds occupy a much larger design space, and artificially intelligent minds occupy and even larger space of possible mind designs. So predicting what the future will be like after the Singularity is pretty hard.

No Universally Compelling Arguments reminds us:

…compulsion is not a property of arguments, it is a property of minds that process arguments.

So the reason I’m arguing against the ghost, isn’t just to make the point that (1) Friendly AI has to be explicitly programmed and (2) the laws of physics do not forbid Friendly AI. (Though of course I take a certain interest in establishing this.)

I also wish to establish the notion of a mind as a causal, lawful, physical system in which there is no irreducible central ghost that looks over the neurons / code and decides whether they are good suggestions.

But here is the take-home point, I think:

Many philosophers are convinced that because you can in-principle construct a prior that updates to any given conclusion on a stream of evidence, therefore, Bayesian reasoning must be “arbitrary”, and the whole schema of Bayesianism flawed, because it relies on “unjustifiable” assumptions, and indeed “unscientific”, because you cannot force any possible journal editor in mindspace to agree with you.

And this (I then replied) relies on the notion that by unwinding all arguments and their justifications, you can obtain an ideal philosophy student of perfect emptiness, to be convinced by a line of reasoning that begins from absolutely no assumptions.

But who is this ideal philosopher of perfect emptiness?  Why, it is just the irreducible core of the ghost!

And that is why (I went on to say) the result of trying to remove all assumptions from a mind, and unwind to the perfect absence of any prior, is not an ideal philosopher of perfect emptiness, but a rock.  What is left of a mind after you remove the source code?  Not the ghost who looks over the source code, but simply… no ghost.

2-Place and 1-Place Words explains the mind projection fallacy in terms of treating a function of two arguments as though it were a function of one argument.

The Opposite Sex, thankfully, re-assures us that Eliezer apparently is stupid with regard to some things:

Understanding the opposite sex is hard. Not as hard as understanding an AI, but it’s still attempting empathy across a brainware gap: trying to use your brain to understand something that is not like your brain.

Despite everything I’ve read on evolutionary psychology, and despite having set out to build an AI, and despite every fictional novel I’d read that tried to put me into the life-experience of a woman, when I tried to use that “knowledge” to guide my interactions with my girlfriend, it still didn’t work right.

Another reflection:

A common pattern in failed attempts to cross the gap of sex and gender, is men who see women as defective men, or women who see men as defective women.  For example, if you think that women don’t take the initiative enough in sex, or that men are afraid of intimacy, then you think that your own brainware is the law of the universeand that anything which departs from it is a disturbance in that essential ghost.  The human species has two sexes, a male sex and a female sex.  Not a right sex and a wrong sex.

Previous post:

Next post:

{ 1 comment… read it below or add one }

Ex Hypothesi June 21, 2011 at 11:08 am

“A common pattern in failed attempts to cross the gap of sex and gender, is men who see women as defective men, or women who see men as defective women. For example, if you think that women don’t take the initiative enough in sex, or that men are afraid of intimacy, then you think that your own brainware is the law of the universeand that anything which departs from it is a disturbance in that essential ghost. The human species has two sexes, a male sex and a female sex. Not a right sex and a wrong sex.”

What’s the principled difference between this case and the difference between the “brainware” of Ted Bundy and the rest of us besides the fact that the former doesn’t happen very much? That is, why can’t we say: “The human species is comprised of individuals who have Ted Bundy brainware and non-Ted Bundy brainware. Not a right brainware and a wrong brainware.”

Furthermore (as my question makes evident): Y’s claim that “The human species has two sexes, a male sex and a female sex”, in order for it do the work he’s intending it to do in the paragraph, is a thoroughgoing normative claim.


Leave a Comment