Reading Yudkowsky, part 22

by Luke Muehlhauser on March 23, 2011 in Eliezer Yudkowsky,Resources,Reviews

AI researcher Eliezer Yudkowsky is something of an expert at human rationality, and at teaching it to others. His hundreds of posts at Less Wrong are a treasure trove for those who want to improve their own rationality. As such, I’m reading all of them, chronologically.

I suspect some of my readers want to “level up” their rationality, too. So I’m keeping a diary of my Yudkowsky reading. Feel free to follow along.

His 159th post is Evolutionary Psychology:

A man and a woman meet in a bar.  The man is attracted to her clear complexion and firm breasts, which would have been fertility cues in the ancestral environment, but which in this case result from makeup and a bra.  This does not bother the man; he just likes the way she looks.  His clear-complexion-detecting neural circuitry does not know that its purpose is to detect fertility, any more than the atoms in his hand contain tiny little XML tags reading “<purpose>pick things up</purpose>”.  The woman is attracted to his confident smile and firm manner, cues to high status, which in the ancestral environment would have signified the ability to provide resources for children.  She plans to use birth control, but her confident-smile-detectors don’t know this any more than a toaster knows its designer intended it to make toast.  She’s not concerned philosophically with the meaning of this rebellion, because her brain is a creationist and denies vehemently that evolution exists.  He’s not concerned philosophically with the meaning of this rebellion, because he just wants to get laid.  They go to a hotel, and undress.  He puts on a condom, because he doesn’t want kids, just the dopamine-noradrenaline rush of sex, which reliably produced offspring 50,000 years ago when it was an invariant feature of the ancestral environment that condoms did not exist.  They have sex, and shower, and go their separate ways.  The main objective consequence is to keep the bar and the hotel and condom-manufacturer in business; which was not the cognitive purpose in their minds, and has virtually nothing to do with the key statistical regularities of reproduction 50,000 years ago which explain how they got the genes that built their brains that executed all this behavior.

The followups are Protein Reinforcement and DNA Consequentialism and Thou Art Godshatter, which explain why evolution didn’t program us to understand genetic fitness, even though that might have been beneficial.

Terminal Values and Instrumental Values discusses the means-ends distinction:

I rarely notice people losing track of plans they devised themselves.  People usually don’t drive to the supermarket if they know the chocolate is gone.  But I’ve also noticed that when people begin explicitly talking about goal systems instead of just wanting things, mentioning “goals” instead of using them, they oft become confused.  Humans are experts at planning, not experts on planning, or there’d be a lot more AI developers in the world.

In particularly, I’ve noticed people get confused when – in abstract philosophical discussions rather than everyday life – they consider the distinction between means and ends; more formally, between “instrumental values” and “terminal values”.

[English causes confusion.] So forget English.  We can set up a mathematical description of a decision system in which terminal values and instrumental values are separate and incompatible types – like integers and floating-point numbers, in a programming language with no automatic conversion between them.

He then goes on to give a programmer’s rendition of value-as-a-means and value-as-an-end.

In Evolving to Extinction, Eliezer reminds us:

It is a very common misconception that an evolution works for the good of its species.  Can you remember hearing someone talk about two rabbits breeding eight rabbits and thereby “contributing to the survival of their species”?  A modern evolutionary biologist would never say such a thing; they’d sooner breed with a rabbit.

The explanation continues with discussions of gender ratios, bystander effect, and much more.

No Evolutions for Corporations or Nanodevices introduces us to perhaps the most important equation in evolutionary biology. It ends:

To sum up, if you have all of the following properties:

  • Entities that replicate
  • Substantial variation in their characteristics
  • Substantial variation in their reproduction
  • Persistent correlation between the characteristics and reproduction
  • High-fidelity long-range heritability in characteristics
  • Frequent birth of a significant fraction of the breeding population
  • And all this remains true through many iterations

Then you will have significant cumulative selection pressures, enough to produce complex adaptations by the force of evolution.

Which leads to The Simple Math of Everything:

I am not a professional evolutionary biologist.  I only know a few equations, very simple ones by comparison to what can be found in any textbook on evolutionary theory with math, and on one memorable occasion I used one incorrectly.  For me to publish an article in a highly technical ev-bio journal would be as impossible as corporations evolving.  And yet when I’m dealing with almost anyone who’s not a professional evolutionary biologist…

It seems to me that there’s a substantial advantage in knowing the drop-dead basic fundamental embarrassingly simple mathematics in as many different subjects as you can manage.  Not, necessarily, the high-falutin’ complicated damn math that appears in the latest journal articles.  Not unless you plan to become a professional in the field.  But for people who can read calculus, and sometimes just plain algebra, the drop-dead basic mathematics of a field may not take that long to learn.  And it’s likely to change your outlook on life more than the math-free popularizations or the highly technical math.

I would love to a read a book called The Simple Math of Everything, but I don’t think it exists. It would have to be an anthology written by experts in each field, I suppose.

Conjuring an Evolution to Save You discusses an analogue between animal breeding and the fall of Enron.

Artificial Addition is a follow up to one of Yudkowky’s central essays, The Simple Truth:

Suppose that human beings had absolutely no idea how they performed arithmetic.  Imagine that human beings had evolved, rather than having learned, the ability to count sheep and add sheep.  People using this built-in ability have no idea how it worked, the way Aristotle had no idea how his visual cortex supported his ability to see things.

The story ends with two morals:

  • First, the danger of believing assertions you can’t regenerate from your own knowledge.
  • Second, the danger of trying to dance around basic confusions.

And then Eliezer gives an example from the history of AI research.

Truly Part of You offers some cogent suggestions on how to make knowledge truly part of you, and not just a kind of parrot-knowledge that allows you to repeat the correct answer without deeply understanding its meaning.

Not for the Sake of Happiness (Alone) argues that happiness isn’t our only end; it isn’t the only thing we desire. I agree.

Leaky Generalizations reminds us:

Do humans have ten fingers?  Most of us do, but plenty of people have lost a finger and nonetheless qualify as “human”.

Unless you descend to a level of description far below any macroscopic object – below societies, below people, below fingers, below tendon and bone, below cells, all the way down to particles and fields where the laws are truly universal – then practically every generalization you use in the real world will be leaky.

Yudkowsky then tells a story about a lock to illustrate his point that:

Instrumental values often have no specification which is both compact and local.

The Hidden Complexity of Wishes suspects that the Open Source Wish Project is futile.

Previous post:

Next post:

{ 12 comments… read them below or add one }

Alexander Kruel March 23, 2011 at 6:21 am

Not for the Sake of Happiness (Alone) argues that happiness isn’t our only end; it isn’t the only thing we desire.

At the end of the day, who cares if only one is happy? Have you ever heard anyone proclaiming that they are really happy but unsatisfied? I think that our resentment against wireheading might be biased. Sure, we don’t want to turn into pleasure maximizing devices. Sure??? I don’t think so, after all we want to win and how do we meassure winning if not by the amount of happiness we experience? When we try to maximally satisfy our evolutionary template what we really try is to get the most happiness out of the human machine. Why not disregard the human machine completely and turn the universe into happiness? I have no answer. One objection would be that one could choose decision-utility instead of experience-utility. But how isn’t decision-utility completely arbitrary? As Hume said, “`Tis not contrary to reason to prefer the destruction of the whole world to the scratching of my finger.”

  (Quote)

Polymeron March 23, 2011 at 9:16 am

Pleasure is a physical, immediate, positive feedback mechanism. It is not context dependent. As a biologist, I can generate pleasure without modifying any other factors. Pain is the same, only negative. Neither impresses me.

Happiness is a gradual, positive feedback mechanism That is context-dependent. It seems evident to me – and correct me if you think this intuition is wrong (I’m not basing it on any particular research, only my own observation) – that you experience happiness when you believe your desires (most and greatest) are being fulfilled, or are getting closer to being fulfilled. I cannot generate happiness artificially without modifying one’s desires. Because of this context dependency, I have come to have much more respect for happiness than I used to have.
(Note that “artificial happiness” – the modification of one’s desires after the fact to avoid misery – is a natural process)

In any case, if happiness is dependent on belief about desire fulfillment, it seems obvious that you should want to cut out the middle man and just fulfill desires, regardless of what one believes about them. Happiness only reflects desire fulfillment, but if we could generate it artificially it would miss the point completely.

Now the question is, how do we feel about molding desires? If I had an AI I taught to fulfill human desires, it would learn to change people’s desires to something it can easily fulfill (like continuous pleasure). If I told it not to change desires, it would probably not do anything, because most actions have the potential to change desires. We try to mold people’s desires all the time, after all.

I still can’t answer that one. People having grand desires is something I value, but I have no justification for this. Should I?

  (Quote)

Alexander Kruel March 23, 2011 at 10:30 am

I cannot generate happiness artificially without modifying one’s desires. Because of this context dependency, I have come to have much more respect for happiness than I used to have.

I agree that happiness is satisfaction is goal fulfilment. But we also value happiness in and of itself. What do we really want, what is more important? Would you rather choose a world where every being was maximally happy or one in which all preferences are maximized? Note that many terminal goals are either mutually exclusive or their equilibrium is a Pareto-suboptimal solution. This means that to choose the achievement of goals over maximizing happiness means to measure success in terms of decision-utility instead of experience-utility. But can this be correct? Here is a simple thought experiment. Eve wants to marry Adam, it would make her and Adam happy. Eve’s mother however does not like Adam and it would reduce her overall happiness if her daughter married Adam. Eve’s girlfriend is also in love with Adam and it would reduce her overall happiness if Eve married Adam. Yet all 4 agents have something in common, they are only happy if everyone else is happy too, they want to maximize happiness. One could go on and calculate the best possible solution, the outcome that would maximize happiness given what each person desires. But this outcome would be suboptimal as opposing desires have to be weighed. The question is, what do people actually value in this scenario? It seems that what everyone really wants is to be happy, to maximize the overall happiness of the system. This would be possible if one was able to alter the very desires, one could create an optimal solution. Eve’s girlfriend could stop loving Adam because he doesn’t love her back and Eve’s mother could stop hating Adam because she loves her daughter. What arguments against such an outcome are reasonable? Nobody would be worse off, everyone would be maximally happy. All that would have happened would be a tradeoff, a desire in return for maximal happiness. We simply have to ask ourselves, what is it that we want? Do we value a desire in and of itself or the payoff of its fulfilment, that is happiness? Do we really want to live in a world where we achieved some goal that is objectively irrelevant and be somewhat happy or do we rather want to be maximally happy regardless of a goal that won’t matter to us in retrospect anyway?

  (Quote)

Polymeron March 23, 2011 at 11:23 am

I agree that happiness is satisfaction is goal fulfilment. But we also value happiness in and of itself.

Do we? I’m not so sure. I can think of many scenarios where a person is happy, and yet I find it sad and disheartening. Even if that person is me.

Would you rather choose a world where every being was maximally happy or one in which all preferences are maximized?

By definition, a world in which all preferences are maximized is a world where my own preferences are maximized. I can’t not prefer it, by definition!

Note that many terminal goals are either mutually exclusive or their equilibrium is a Pareto-suboptimal solution. This means that to choose the achievement of goals over maximizing happiness means to measure success in terms of decision-utility instead of experience-utility. But can this be correct? Here is a simple thought experiment. Eve wants to marry Adam, it would make her and Adam happy. Eve’s mother however does not like Adam and it would reduce her overall happiness if her daughter married Adam. Eve’s girlfriend is also in love with Adam and it would reduce her overall happiness if Eve married Adam. Yet all 4 agents have something in common, they are only happy if everyone else is happy too, they want to maximize happiness. One could go on and calculate the best possible solution, the outcome that would maximize happiness given what each person desires. But this outcome would be suboptimal as opposing desires have to be weighed. The question is, what do people actually value in this scenario? It seems that what everyone really wants is to be happy, to maximize the overall happiness of the system. This would be possible if one was able to alter the very desires, one could create an optimal solution. Eve’s girlfriend could stop loving Adam because he doesn’t love her back and Eve’s mother could stop hating Adam because she loves her daughter. What arguments against such an outcome are reasonable? Nobody would be worse off, everyone would be maximally happy. All that would have happened would be a tradeoff, a desire in return for maximal happiness. We simply have to ask ourselves, what is it that we want? Do we value a desire in and of itself or the payoff of its fulfilment, that is happiness? Do we really want to live in a world where we achieved some goal that is objectively irrelevant and be somewhat happy or do we rather want to be maximally happy regardless of a goal that won’t matter to us in retrospect anyway?  

I think you’re throwing in too many vague terms into this pot and stirring them together.
Happiness and desire fulfillment are tightly linked, which is precisely the source of the confusion. I cannot prefer to be happy and that all my goals not be fulfilled (say, because I am deluded into thinking otherwise), over being miserable with all my goals fulfilled. Unless I value my happiness as an end unto itself, more than all the other goals. But in that case, that is an arbitrary desire in and of itself, no different than all other desires.

In any case, I’d like to point out that a system where everyone’s beliefs are completely true, the two would be identical. If your goals were being fulfilled, you would know it, and be happy. If they were not, you would be miserable. Happiness and desire fulfillment would completely correlate. Maximizing happiness over desire fulfillment necessarily requires reducing the accuracy of beliefs. So does the other way around – you could fulfill people’s desires but lie to them about it. I don’t see either as justified.

But this is indeed a complicated issue. Our desires – and therefore, what would make us happy – constantly shift. Furthermore, we have desires about desires. I wouldn’t want to desire heroin, for instance. I wouldn’t want to greatly desire an unattainable goal. But these desires about desires are also arbitrary, and subject to change.
Would it be “good” to mold desires so as to make the Pareto-optimal? Many desires require entropy. Are they “bad” by definition?

I am coming more and more to the conclusion that morality is the aggregated result of fairly arbitrary drivers. I don’t like that conclusion and don’t know what to do with it, but I can’t escape it either. I guess when all is said and done, I’m left with my own set of arbitrary desires. There’s no alternative.

  (Quote)

Alexander Kruel March 23, 2011 at 12:46 pm

Unless I value my happiness as an end unto itself, more than all the other goals.

That is exactly what I tried to argue. All our desires are really instrumental goals. The one true terminal goal is to maximize happiness. In other words, we desire that which makes us happy, we desire happiness. In other words, our values are not complex. Would you choose the fulfilment of your desires if you knew that you could be more happy by simulating the fulfilment of your desires to an extent that cannot be reached otherwise? Maybe you would, but would you be choosing the outcome with the largest expected utility? If yes, then you are not measuring experience-utility but simply assign arbitrary amounts of utility to a certain decision. But there is a problem with that. In the case of happiness maximization as a terminal goal you got evidence in favor of its value, happiness is desirable. Any other desire might be misinformed or simply biased.

But I am just throwing in some thoughts here. I have never read anything about this topic, not even the LW sequences. There is so much more to consider. For example, does it matter if you or something else does experience happiness? This invokes many questions about self, identifying with future versions of yourself and discount rates. Does it matter if there is one universe spanning consciousness or as many as possible that experience happiness? Can the utility you assign to the achievement of a goal after which your life ends outweigh the same amount of utility in the form of happiness that you experience over time? I’ll just leave the discussion at this point.

  (Quote)

Polymeron March 24, 2011 at 1:11 am

The one true terminal goal is to maximize happiness.

I believe this is demonstrably false. Or perhaps I don’t understand what you mean by “one true” terminal goal. What makes happiness truer a goal than, say, world peace?

In other words, we desire that which makes us happy, we desire happiness.

As well as a lot of other things.

In other words, our values are not complex. Would you choose the fulfilment of your desires if you knew that you could be more happy by simulating the fulfilment of your desires to an extent that cannot be reached otherwise?

Emphatically, yes. I would choose world peace with my death (zero happiness experienced) over walking around believing that world peace was a reality and being super happy about that.
Our values are complex.

Maybe you would, but would you be choosing the outcome with the largest expected utility? If yes, then you are not measuring experience-utility but simply assign arbitrary amounts of utility to a certain decision. But there is a problem with that. In the case of happiness maximization as a terminal goal you got evidence in favor of its value, happiness is desirable. Any other desire might be misinformed or simply biased.

I don’t see what makes experience-utility any less arbitrary a meter than others. You have evidence of the value of other desires as well. If I tell you that I’m willing to pay 10,000$ for world peace even if I don’t know about it at all, then I have demonstrated it is of value to me. That may be “biased”, but so is desiring my own happiness!

But I am just throwing in some thoughts here. I have never read anything about this topic, not even the LW sequences. There is so much more to consider. For example, does it matter if you or something else does experience happiness? This invokes many questions about self, identifying with future versions of yourself and discount rates. Does it matter if there is one universe spanning consciousness or as many as possible that experience happiness? Can the utility you assign to the achievement of a goal after which your life ends outweigh the same amount of utility in the form of happiness that you experience over time? I’ll just leave the discussion at this point.  

I think these are all very interesting questions! I don’t have answers for them either.
:|

I recommend Thou Art Godshatter on the source for our desires, if you haven’t yet read it (it is after all one of the posts Luke mentions here). I read this a few weeks ago, and it underscored something I already realized: Our desires are ultimately arbitrary functions of evolutionary processes. That doesn’t make them less real, of course…
If you disagree with this view, I’d be interested to know why.

  (Quote)

Polymeron March 24, 2011 at 4:16 am

As an addendum, I will concede that world peace has only instrumental, not terminal, value to me; but this is because I value human life. And I value it for several things, like creativity, and the accumulation of knowledge. These are two things I value regardless of happiness. So, I do have some terminal values other than happiness (mine and others’), and I suspect most of us do.

  (Quote)

Alexander Kruel March 24, 2011 at 5:03 am

If you disagree with this view, I’d be interested to know why.

I don’t disagree. I am just saying that we are able to change our minds, overcome bias. We do value satisfaction, happiness etc., we just choose different means to achieve those experiences. How do we measure the value of those means if not by how we feel about them? Is it reasonable to disregard the payoff in the form of positive experiences if it was given to us by other means, e.g. directly? Why not just become maximally happy and satisfied for no particular reason? There is no difference except that artificial induced happiness will be easier to achieve and ultimately much larger if you turn the whole universe into happiness experiencing machinery rather than some suboptimal equilibrium of various means to achieve happiness. There exist a certain kind of structure that can be maximally happy given the laws of physics and it is unlikely that it is you or me maximizing our preferences in an universe full of other agents that try to maximize their preferences as well.

Don’t get me wrong, my desires are complex too. Blissful ignorance seems undesirable to me too. But I ask myself, might those feelings be biased? If I knew a lot more, would I still choose to maximize the full complexity of my desires or just choose to be happy instead? In other words, are my desires really complex or am I just confused?

There are basically two kinds of suicides, one assigns infinite utility to the act of suicide and the other calculated the expected utility of living to be negative. Both of them might have complex values, yet both of them seem to be biased. Wouldn’t they be better off being happy regardless of their current arbitrary utility calculations? After all we can be mistaken about our beliefs, our values, but we cannot be mistaken about happiness if we want to be happy. If rational agents are utility maximizer’s, why don’t we choose the objective value of happiness if we want to be rational? The payoff is the same, it is happiness. Yet you can only maximize happiness maximally if you choose the experience of happiness as a terminal goal. If you choose to maximize a certain preference, or achieve a certain goal, then you’ll experience happiness and satisfaction in doing so, but how does it compare to the amount of satisfaction and happiness you could experience by changing your preferences and turn yourself into something whose sole purpose is to maximize the experience of happiness over time?

  (Quote)

Polymeron March 24, 2011 at 5:34 am

This is precisely why I say that we are not maximizing happiness.

I have a desire for complexity. That is why the happiness-experiencing universe does not appeal to me; in that universe, everyone has very simple desires. Nothing interesting is going on.

Is this desire for complexity and interest arbitrary? Yes. Would we be better off without this desire? That depends on your definition of “better off”. If you define “good” as “maximum happiness”, then yes. But I think that definition would be wrong.

At this point I see no reason to believe that any desires or states are of intrinsic value. I only see emergent value, from the dispositions we just happen to have. These constantly shift and influence one another, in an intricate dance that, when viewed as a system, often displays both direction and noise.
No wonder morality has never come close to being universally agreed upon.

  (Quote)

Alexander Kruel March 24, 2011 at 6:01 am

I have a desire for complexity.

Hyperbolic discounting is called a bias. Yet biases are largely a result of our evolution, just as our desire for complexity. You read LessWrong and are told to change your mind, overcome bias and disregard discounting as a biased desire that leads to a suboptimal outcome. The question we have to answer is why we don’t go a step further and disregard human nature altogether in favor of something that can value the possibilities of reality maximally? Where do we draw the line?

  (Quote)

Polymeron March 24, 2011 at 6:15 am

That is an excellent question.

Rationalists draw the line at desires – rationality is a means for knowing what to believe, and how to achieve one’s desires. I haven’t seen it applied to finding out which desires you should have, if any. One could claim that this boundary is arbitrary and, in itself, biased.

I don’t think I have an answer to that. It would be a lot easier of intrinsic value could be shown to exist… :(

  (Quote)

Alexander Kruel March 24, 2011 at 9:11 am

As an addendum, the naturalistic fallacy:

[...] a naturalistic fallacy is committed whenever a philosopher attempts to prove a claim about ethics by appealing to a definition of the term “good” in terms of one or more natural properties (such as “pleasant”, “more evolved”, “desired”, etc.)

Is the naturalistic fallacy a bias that should be overcome? When Yudkowsky in one paragraph writes ” [...] if intelligence were foolish enough to allow the idiot god continued reign.”, doesn’t he reveal his general averseness to evolutionary implementations? He further writes the following:

The Mote in God’s Eye by Niven and Pournelle depicts an intelligent species that stayed biological a little too long, slowly becoming truly enslaved by evolution, gradually turning into true fitness maximizers obsessed with outreproducing each other. But thankfully that’s not what happened. Not here on Earth. At least not yet.

To me this sounds like a general degradation of values that are the result of evolution. Yet he goes on to cherry-pick the current set of desires without ever asking the question if our current set of desires is actually desirable. Does asking that question even make sense? That is up to the answer to another question, what constitutes winning? If rationality is about winning and winning means to achieve goals given to us by evolution, to satisfy our desires that have been implemented by evolution, then it needs to be able to discern goals and desires from biases and fallacies.

Let’s assume that winning means to satisfy our evolutionary template, all of our complex values, desires and goals. What is it that rationality is doing by helping us to win? How does it measure success? If I give in to Akrasia, how did I fail to satisfy an evolutionary desire? If I procrastinate, how does rationality measure that I am acting irrationally? What is the unit in which success is measured?

Let’s further assume that the unit by which rational conduct, i.e. winning is measured is satisfaction. How then is it irrational to maximize satisfaction by choosing a set of desires that is most suitable as a substrate to experience satisfaction and the easiest to satisfy? One might object that one of our desires is to desire complexity. But if rationality is prescriptive and can say No to procrastination and Yes to donating money to the Machine Intelligence Research Institute, then it is already telling us to disregard most of our desires to realize its own idea of what we ought to do. Why then is it irrational to say No to complex values and Yes to whatever maximizes what rationality is measuring?

  (Quote)

Leave a Comment