Reading Yudkowsky, part 24

by Luke Muehlhauser on March 30, 2011 in Eliezer Yudkowsky,Resources,Reviews

AI researcher Eliezer Yudkowsky is something of an expert at human rationality, and at teaching it to others. His hundreds of posts at Less Wrong are a treasure trove for those who want to improve their own rationality. As such, I’m reading all of them, chronologically.

I suspect some of my readers want to “level up” their rationality, too. So I’m keeping a diary of my Yudkowsky reading. Feel free to follow along.

His 180th post is Affective Death Spirals, which argues that a huge number of errors in reasoning are caused by our attraction to thoughts that feel really good:

With intense positive affect attached to the Great Thingy, the resonance touches everywhere.  A believing Communist sees the wisdom of Marx in every hamburger bought at McDonalds; in every promotion they’re denied that would have gone to them in a true worker’s paradise; in every election that doesn’t go to their taste, in every newspaper article “slanted in the wrong direction”.  Every time they use the Great Idea to interpret another event, the Great Idea is confirmed all the more.  It feels better – positive reinforcement – and of course, when something feels good, that, alas, makes us want to believe it all the more.

When the Great Thingy feels good enough to make you seek out new opportunities to feel even better about the Great Thingy, applying it to interpret new events every day, the resonance of positive affect is like a chamber full of mousetraps loaded with ping-pong balls.

You could call it a “happy attractor”, “overly positive feedback”, a “praise locked loop”, or “funpaper”.  Personally I prefer the term “affective death spiral”.

So how do you escape an affective death spiral? For example, for naturalists the Great Thingy might be Science. We might be easy targets to fall into an affective death spiral toward Science. Resist the Happy Death Spiral offers this advice:

But then how can we resist the happy death spiral with respect to Science itself?  The happy death spiral starts when you believe something is so wonderful that the halo effect leads you to find more and more nice things to say about it, making you see it as even more wonderful, and so on, spiraling up into the abyss.  What if Science isin fact so beneficial that we cannot acknowledge its true glory and retain our sanity?  Sounds like a nice thing to say, doesn’t it?  Oh no it’s starting ruuunnnnn…

The happy death spiral is only a big emotional problem because of the overly positive feedback, the ability for the process to go critical.  You may not be able to eliminate the halo effect entirely, but you can apply enough critical reasoning to keep the halos subcritical – make sure that the resonance dies out rather than exploding.

You might even say that the whole problem starts with people not bothering to critically examine every additional burdensome detail – demanding sufficient evidence to compensate for complexitysearching for flaws as well as support, invoking curiosity – once they’ve accepted some core premise.  Without the conjunction fallacy, there might still be a halo effect, but there wouldn’t be a happy death spiral.

Even on the nicest Nice Thingies in the known universe, a perfect rationalist who demanded exactly the necessary evidence for every additional (positive) claim, would experience no affective resonance. You can’t do this, but you can stay close enough to rational to keep your happiness from spiraling out of control.

…To summarize, you do avoid a Happy Death Spiral by (1) splitting the Great Idea into parts (2) treating every additional detail as burdensome (3) thinking about the specifics of the causal chain instead of the good or bad feelings (4) not rehearsing evidence (5) not adding happiness from claims that “you can’t prove are wrong”; but not by (6) refusing to admire anything too much (7) conducting a biased search for negative points until you feel unhappy again (8) forcibly shoving an idea into a safe box.

Uncritical Supercriticality continues:

Yesterday, I suggested that one key to resisting an affective death spiral is the principle of “burdensome details” – just remembering to question the specific details of each additional nice claim about the Great Idea.  (It’s not trivial advice.  People often don’t remember to do this when they’re listening to a futurist sketching amazingly detailed projections about the wonders of tomorrow, let alone when they’re thinking about their favorite idea ever.)  This wouldn’t get rid of the halo effect, but  it would hopefully reduce the resonance to below criticality, so that one nice-sounding claim triggers less than 1.0 additional nice-sounding claims, on average.

The diametric opposite of this advice, which sends the halo effect supercritical, is when it feels wrong to argue against any positive claim about the Great Idea.  Politics is the mind-killer.  Arguments are soldiers.  Once you know which side you’re on, you must support all favorable claims, and argue against all unfavorable claims.  Otherwise it’s like giving aid and comfort to the enemy, or stabbing your friends in the back.

…the affective death spiral turns much deadlier after criticism becomes a sin, or a gaffe, or a crime.  There are things in this world that are worth praising greatly, and you can’t flatly say that praise beyond a certain point is forbidden.  But there is never an Idea so true that it’s wrong to criticize any argument that supports it.  Never.  Never ever never for ever.  That is flat.  The vast majority of possible beliefs in a nontrivial answer space are false, and likewise, the vast majority of possible supporting arguments for a true belief are also false, and not even the happiest idea can change that.

Fake Fake Utility Functions opens with a highly entertaining account of Yudkowsky’s writing process. First, he wanted to write a post about X. But then he realized he had to explain Y first. But the post on Y go to to long, so he had to split it into two posts. And then he realized that to explain X, he also had to explain Z, which required a whole series of posts. And so on. He had to write about 30 posts before he was able to explain Fake Utility Functions with just one inferential step for his audience.

And then finally he could write his post Fake Utility Functions, which opens:

Every now and then, you run across someone who has discovered the One Great Moral Principle, of which all other values are a mere derivative consequence.

I run across more of these people than you do.  Only in my case, it’s people who know the amazingly simple utility function that is all you need to program into an artificial superintelligence and then everything will turn out fine.

…But a utility function doesn’t have to be simple.  It can contain an arbitrary number of terms.  We have every reason to believe that insofar as humans can said to be have values, there are lots of them – high Kolmogorov complexity.

…Leave out just one of these values from a superintelligence, and even if you successfully include every othervalue, you could end up with a hyperexistential catastrophe, a fate worse than death.  If there’s a superintelligence that wants everything for us that we want for ourselves, except the human values relating to controlling your own life and achieving your own goals, that’s one of the oldest dystopias in the book.

In the end, the only process that reliably regenerates all the local decisions you would make given your morality, is your morality.  Anything else – any attempt to substitute instrumental means for terminal ends – ends up losing purpose and requiring an infinite number of patches because the system doesn’t contain the source of the instructions you’re giving it.  You shouldn’t expect to be able to compress a human morality down to a simple utility function, any more than you should expect to compress a large computer file down to 10 bits.

I heartily agree!

Previous post:

Next post:

{ 3 comments… read them below or add one }

Garren March 30, 2011 at 9:01 am

Is the goal of moral philosophy to preserve every individual’s current moral intuitions, or to seek more basic and ‘compressible’ principles? I can read Yudkowsky as warning of the danger of premature conclusions about the latter, or as a declaration that we may as well stop trying for the latter.


cl March 30, 2011 at 5:23 pm

Every time they use the Great Idea to interpret another event, the Great Idea is confirmed all the more. It feels better – positive reinforcement – and of course, when something feels good, that, alas, makes us want to believe it all the more.

I agree with his basic point here. The thing that’s hard for me to understand is, given this logic, why am I not an atheist? No reason to fear the unknown, the absence of judgment, an equal fate for all… those things feel way better to me. I would love to be able to just give up my chips and call it a day, to stop worrying about whether or not I’m acting correctly at any given moment. I would love to be able to just let go and “ride the wave” of life, so to speak. It’s just that, when I think honestly about the causal chain, I can’t do it, no matter how good I suspect it might feel.

What if Science isin fact so beneficial that we cannot acknowledge its true glory and retain our sanity? Sounds like a nice thing to say, doesn’t it? Oh no it’s starting ruuunnnnn…

LOL! No comment.


MarkD March 30, 2011 at 11:43 pm

I caught up with the EY post on Kolmogorov and was blissfully happy to see the Levin normalization mentioned in the comments as well as the relationships to VC dimension and PAC learning…

I’m starting to like this guy (or at least his commentators).


Leave a Comment