AI researcher Eliezer Yudkowsky is something of an expert at human rationality, and at teaching it to others. His hundreds of posts at Less Wrong are a treasure trove for those who want to improve their own rationality. As such, I’m reading all of them, chronologically.
I suspect some of my readers want to “level up” their rationality, too. So I’m keeping a diary of my Yudkowsky reading. Feel free to follow along.
His 442nd post is Morality as Fixed Computation, an attempt to “boil down” his theory of morality and be clear about it, but I remain as baffled as ever.
Update: I wrote this post months ago. I’ve since gotten clearer on Eliezer’s meta-ethics thanks to this discussion.
Inseparably Right; or, Joy in the Merely Good does a bit better:
There is no pure ghostly essence of goodness apart from things like truth, happiness and sentient life.
What do you value? At a guess, you value the life of your friends and your family and your Significant Other and yourself, all in different ways. You would probably say that you value human life in general, and I would take your word for it, though Robin Hanson might ask how you’ve acted on this supposed preference. If you’re reading this blog you probably attach some value to truth for the sake of truth. If you’ve ever learned to play a musical instrument, or paint a picture, or if you’ve ever solved a math problem for the fun of it, then you probably attach real value to good art. You value your freedom, the control that you possess over your own life; and if you’ve ever really helped someone you probably enjoyed it. You might not think of playing a video game as a great sacrifice of dutiful morality, but I for one would not wish to see the joy of complex challenge perish from the universe. You may not think of telling jokes as a matter of interpersonal morality, but I would consider the human sense of humor as part of the gift we give to tomorrow.
And you value many more things than these.
Your brain assesses these things I have said, or others, or more, depending on the specific event, and finally affixes a little internal representational label that we recognize and call “good”.
Okay, but what if you ask “What should I recognize as good?”
Every time you say should, it includes an implicit criterion of choice; there is no should-ness that can be abstracted away from any criterion.
…Don’t look to some surprising unusual twist of logic for your justification. Look to the living child, successfully dragged off the train tracks. There you will find your justification. What ever should be more important than that?
Sorting Pebbles into Correct Heaps is a parable about optimism, and then Eliezer responds to the usual objection that moral subjectivism cannot account for genuine moral disagreement, in Moral Error and Moral Disagreement. What does Eliezer mean by saying morality is a fixed computation? Well, he means it’s an Abstracted Idealized Dynamic:
To sum up:
- Morality, like computation, involves latent development of answers;
- Morality, like computation, permits expected agreement of unknown latent answers;
- Morality, like computation, reasons about abstract results apart from any particular physical implementation;
- Morality, like computation, unfolds from bounded initial state into something potentially much larger;
- Morality, like computation, can be viewed as an idealized dynamic that would operate on the true state of the physical world – permitting us to speak about idealized answers of which we are physically uncertain;
- Morality, like computation, lets us to speak of such un-physical stuff as “error”, by comparing a physical outcome to an abstract outcome – presumably in a case where there was previously reason to believe or desire that the physical process was isomorphic to the abstract process, yet this was not actually the case.
Well okay then. Arbitrary examines the nature of arbitrariness, and then Eliezer asks: Is Fairness Arbitrary? Which leads to a central question we probably all want to know the answer to… The Bedrock of Morality: Arbitrary?
You Provably Can’t Trust Yourself shows Yudkowsky trying to figure out why his earlier explanations of morality didn’t make sense to people. No License to Be Human is a renewed attempt to explain his moral theory, this time in terms of Peano Arithmetic. It sure feels like Eliezer is vastly over-complicating his theory.
What’s really interesting is that from the community, it looks like a huge number of people were following Yudkowsky – and very often agreeing with him – all the way through his explanations of rationality, Bayesianism, philosophy of language, and even quantum mechanics, but then when he turned to meta-ethics, he lost them. How interesting. Is morality more confusing than quantum mechanics? Is it more difficult to argue about… in language… than philosophy of language? I can think of some reasons why it might be so. My own experience is that morality is one of the hardest topics to discuss profitably.
Unnatural Categories is more accessible:
“Tell me why you want to know,” says the rationalist, “and I’ll tell you the answer.” If you want to know whether your seismograph, located nearby, will register an acoustic wave, then the experimental prediction is “Yes”; so, for seismographic purposes, the tree should be considered to make a sound. If instead you’re asking some question about firing patterns in a human auditory cortex – for whatever reason – then the answer is that no such patterns will be changed when the tree falls.
What is a poison? Hemlock is a “poison”; so is cyanide; so is viper venom. Carrots, water, and oxygen are “not poison”. But what determines this classification? You would be hard pressed, just by looking at hemlock and cyanide and carrots and water, to tell what sort of difference is at work. You would have to administer the substances to a human – preferably one signed up for cryonics – and see which ones proved fatal. (And at that, the definition is still subtler than it appears: a ton of carrots, dropped on someone’s head, will also prove fatal. You’re really asking about fatality from metabolic disruption, after administering doses small enough to avoid mechanical damage and blockage, at room temperature, at low velocity.)
…Much of the way that we classify things – never mind events – is non-local, entwined with the consequential structure of the world. All the things we would call a chair are all the things that were made for us to sit on.
…I’ve chosen the phrase “unnatural category” to describe a category whose boundary you draw in a way that sensitively depends on the exact values built into your utility function. The most unnatural categories are typically these values themselves! What is “true happiness”? This is entirely a moral question, because what it really means is “What is valuable happiness?” or “What is the most valuable kind of happiness?” Is having your pleasure center permanently stimulated by electrodes, “true happiness”? Your answer to that will tend to center on whether you think this kind of pleasure is a good thing. “Happiness”, then, is a highly unnatural category – there are things that locally bear a strong resemblance to “happiness”, but which are excluded because we judge them as being of low utility, and “happiness” is supposed to be of high utility.
Most terminal values turn out to be unnatural categories, sooner or later. This is why it’s such a tremendous difficulty to decide whether turning off Terry Schiavo’s life support is “murder”.
This is extended in Magical Categories, which is also the first post (that I recall) to explain why one proposed solution to the problem of Friendly AI fails.
Next is Three Fallacies of Teleology:
These then are three fallacies of teleology: Backward causality, anthropomorphism, and teleological capture.
Dreams of AI Design outlines why AI is so hard.
Previous post: News Bits