Lately, I’ve taken an interest in the problem of Friendly AI.
In brief: One plausible scenario for the next few centuries is that we build an AI as smart as we are at designing AI, which means it could improve its own intelligence very quickly, which means it would soon become a superintelligent machine, with almost unlimited power to accomplish whatever it desires.
So we need to consider carefully how to program its desires, so that it doesn’t kill us all.
Presumably, this is a matter of ethics. Which ethical theory should we use to program the goal system of a future superintelligence? (Nick Bostrom call such a singular world-dominating power a “singleton.”)
Andy Walters argues that using desirism to program the goal system of a singleton will likely lead to human extinction.
The argument: Desirism does not admit the existence of categorical imperatives – rules that are universally binding. It only says that agents have reasons for action to engage in reward and punishment, praise and condemnation, to change the desires of others. (Because the desires of others greatly affect the fulfillment or thwarting of our own desires.) But a superintelligent machine’s desires will not be susceptible to these social tools as ours are, and so it will unstoppably fulfill whatever it’s desires are (within the limits of physics and its intelligence).
Andy Walters proposes Korsgaardian deontological moral theory as a potential solution to the problem.
For now, I’ll share only a few brief thoughts:
One. Walters is correct that if humans were (briefly) cohabiting the planet with a superintelligence, desirism says there is no ground for a categorical condemnation of the human-extinction-causing desires of the singleton. But as with the case of Scrooge, I ask: Why does this matter? If I could go to Scrooge and prove to him that he was categorically wrong, he wouldn’t then say: “Huh. Wow. I guess you’re right! By the laws of the universe, my disregard for the poor is just absolutely wrong! Well, you’ve convinced me. I’ll start being more generous to the poor.”
No. That’s not how human psychology works. Even if there were categorical reasons in this universe (and I don’t think there are), we would still have to use the same social tools we do now in order to influence other people’s behavior. In the thick of things, the existence of categorical imperatives would not save us from Scrooge, and they would not save us from a singleton, either.
Two. But the future is relatively “fixed” once the superintelligence takes over, anyway. So what we’re really talking about is this matter of how to design the singleton’s goal system in the first place.
Now, you might be tempted to phrase this question as “What is the morally right way to design the singleton’s goal system?” or “What is the morally best way to design the singleton’s goal system?”
Now, I think moral language has a useful function, which is why I’m elaborating a theory of revised moral discourse in a podcast with Alonzo Fyfe. (Unfortunately, we haven’t gotten to the parts about morality or language, yet – but they are forthcoming). But I think that when dealing with tough problems, or working through moral debate, it is best to play a game of Taboo with your discourse, so that you’re not allowed to use moral terms. Replace the symbol with the substance, so that you’re sure you know what you’re talking about. I recommend this because moral discourse is so confused. Unlike talk of electrons, people use moral terms in a great variety of ways, which is why it can be important to say what you’re trying to say with using moral terms at all – and if you can’t do so, then you probably don’t know what you’re talking about.
Something that is “morally good” in the language of desirism does not share all the properties of something that is “morally good” in the language of moral functionalism (a variety of realist moral natuarlism). So, imagine a universe (UniverseOne) where desirism is true and all other moral theories are false. Next, imagine a universe (UniverseTwo) where moral functionalism is true and all other ethical theories are false. And, imagine that the most intelligent species of each universe have reached that point of needing to design the goal system of a singleton that will determine the future of each respective universe. Finally, imagine that the beings in UniverseOne have come up with AlgorithmX that is morally best according to the moral theory that is true in their universe (desirism). And, by sheer coincidence, the beings in UniverseTwo have discovered that the same AlgorithmX is morally best according to the moral theory that is true in their universe (moral functionalism).
What I mean when I say that “morally good” does not share the same properties in both universes is that in the above scenario, it could be the case that the beings of UniverseOne do not have most reason to implement AlgorithmX even though the beings in UniverseTwo do have most reason to implement AlgorithmX and even though AlgorithmX is “morally good” in both universes. Or, perhaps the difference lies elsewhere. The point is that we need to be careful when talking about such things and, as needed, replace the symbol “morally good” with its substance, so we don’t confuse ourselves.
Thus, let’s consider the question of how to design the goal system of a superintelligent machine without using moral terms.
What does desirism say about this problem? It says that humans have extremely strong reasons for action to design the goal system of a superintelligence such that it does not lead to the sudden extinction of the human species. It also says that people have strong reasons to condemn those who are trying to develop a self-improving artificial intelligence without first seriously considering all the ways in which such action could lead to the extinction of the human species. It also says that people have strong reasons to praise those who are trying to think clearly about such things, and those who donate to the Machine Intelligence Research Institute and to the Future of Humanity Institute at Oxford, the two organizations who are working the most on this problem.
But desirism says more than this – something that some people would count as an “advantage” over what is currently the most developed plan for developing the goal system of a superintelligent machine: Coherent Extrapolated Volition.
Coherent Extrapolated Volition is a plan for building a ‘Seed AI’ that is capable of extrapolating human values to estimate what we would value “if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.”
This is perhaps more promising that designing the singleton’s goal system according to the most progressive moral values of our time. After all, imagine what would have happened had we needed to design a singleton’s goal system with the most progressive values of 500 years ago! Not a pretty thought.
But here’s a worry: Let’s say we succeeded with Coherent Extrapolated Volition. The superintelligent machine arises, and transforms Earth into a utopia. Meanwhile, the singleton builds a Dyson sphere around our sun to gather a significant fraction of its energy so that it can send self-replicating Von Neumann probes to other solar systems and other galaxies.
Now, let’s say there are 10 other solar systems in our galaxy with advanced forms of life. Unluckily for them, the singleton’s goal system has been designed from an extrapolation of the desires and values of a particular species of primate on a water-based planet thousands or millions of light-years away. The self-replicating probes harvest the minerals of their solar system to build Dyson spheres around their stars, and begins implementing what is a “utopia” according to the extrapolated desires of that primate species back on Earth.
Which could very well be a living hell for creatures that evolved far away from that primate species, and perhaps developed to have very different values – perhaps even very different extrapolated values. To these far-flung forms of intelligent life, the arrival of our singleton’s probes would not be the arrival of utopia, but hell. Or extinction.
Desirism, on the other hand, recognizes the possible existence and significance of these far-flung reasons for action in distant solar systems. Perhaps designing a singleton’s goal system with this in mind would lead us to design a singleton that would create a utopia for human and post-humans in this solar system, but would do the science of figuring out the desires (or extrapolated desires) of distant alien species and creating utopias for them as well.
(Or, perhaps this conclusion falls out of Coherent Extrapolated Volition as well; perhaps our extrapolated human values entail such a concern for distantly evolved value systems. As of yet, this matter is unclear.)
Three. I will note that many people agree that a rule-checked motivational system for a superintelligent machine is best. Last time I checked, Robin Hanson holds this position, and Ben Goertzel and Mark Waser have written articles on this, too (see their many articles here).
Others, for example Wendell Wallach and Colin Allen, are more pessimistic about the chances that any “top-down” plan (including desire-based theories of the good or Kantian theories of the right) for a singleton’s motivational system will turn out well. Their section on this subject in Moral Machines (2009) is perhaps the clearest writing on the problem yet published, and anyone interested in this topic should read it.
Next post: The Anonymous Threat Hoax