Every year, machines surpass human ability in new ways. One day, we may design a machine that surpasses human ability at designing intelligent machines. This machine could improve its own intelligence, which would make it even better at improving its own intelligence, which would make it massively better at improving its own intelligence, and… you get the point. Within years, or months, or hours, we would have a machine so intelligent that it could quickly dominate the galaxy for its own purposes.
The trick is to program the first such machine with good purposes. That is what will determine the fate of our galaxy for the next several billion years. That is the problem of building Friendly AI, and it makes the problem of global warming look small by comparison.
The bibliography below lists sources on the subject of Friendly AI. I could not build a bibliography for “artificial morality” or “robot ethics” in general, because the field is too vast. Instead, I focus on artificial moral agency in the context of a technological singularity: that is, the problem of Friendly AI.
Last updated on 02/18/2011.
For the public
- Kaste (NPR), “The Singularity: Humanity’s Last Invention?” (audio)
- Yudkowsky, “The Challenge of Friendly AI” (video)
- Farber (zdnet), “Can Friendly AI save humans from irrelevance or extinction?” (audio)
Easy reading
- Turney, “Controlling super-intelligent machines” (1991)
- Wikipedia, “Friendly artificial intelligence“
- Yudkowsky, “What is Friendly Intelligence?“
- Sotala, “14 objections against [Friendly AI] answered“
- Legg, “Friendly AI is Bunk” (2006)
- Omohundro, “AI and the Future of Human Morality” (2008) [text variation]
- Wallach & Allen, Moral Machines: Teaching Robots Right from Wrong (2009)
For academics
- Yudkowsky, Creating Friendly AI (2001)
- Goertzel, “Thoughts on AI Morality” (2002)
- Hibbard, Super-Intelligent Machines (2002)
- Hibbard, “Critiques of the MIRI Guidelines on Friendly AI” (2003)
- Bostrom, “Ethical Issues in Advanced Artificial Intelligence” (2003)
- Goertzel, “The All-Seeing (A)I” (2004)
- Yudkowky, “Coherent Extrapolated Volition” (2004)
- Goertzel, “Encouraging a Positive Transcension” (2004)
- Dawrst, “Thoughts on Friendly AI“
- Yudkowsky, “Artificial Intelligence as a Positive and Negative Factor in Global Risk” (2006)
- Armstrong, “Chaining God: A qualitative approach to AI, trust and moral systems” (2007)
- Hall, Beyond AI (2007)
- Omohundro, “The Nature of Self-Improving Artificial Intelligence” (2007)
- Bugaj & Goertzel, “Five Ethical Imperatives and Their Implications for Human-AGI Interaction” (2007)
- Omohundro, “Basic AI drives” (2008)
- Hall, “Engineering Utopia” (2008)
- Waser, “Discovering the foundations of a universal system of ethics as a road to safe artificial intelligence” (2008)
- Freeman, “Using Compassion and Respect to Motivate an Artificial Intelligence” (2009)
- Waser, “A safe ethical system for intelligent machines” (2009)
- Tarleton, “Coherent Extrapolated Volition: A Meta-Level Approach to Machine Ethics” (2010)
- Goertzel, “The Machine Intelligence Research Institute’s Scary Idea (and Why I Don’t Buy It)” (2010)
- Schulman, Tarleton, & Jonsson, “Which Consequentialism? Machine Ethics and Moral Divergence” (2010)
- Sotala, “From mostly harmless to civilization-threatening: pathways to dangerous artificial intelligences” (2010)
- Waser, “Designing a safe motivational system for intelligent machines” (2010)
- Waser, “Deriving a safe ethical architecture for intelligent machines” (2010)
- Waser, “A game-theoretically optimal basis for safe and ethical artificial intelligence” (2010)
- Chalmers, “The Singularity: A Philosophical Analysis” (2010)
- Goertzel, “Coherent Aggregated Volition” (2010)
- Goertzel, “GOLEM: Toward an AGI Meta-Architecture Enabling Both Goal Preservation and Radical Self-Improvement” (2010)
- Muehlhauser, Ethics and Superintelligence (forthcoming)
I’ve placed in bold the works that may be most useful, both because they make major contributions and are fairly readable to people not trained in AI. (For example, Creating Friendly AI is an important contribution, but I find it much harder to read than works written in the usual style of Anglophone science and philosophy journals.)
Previous post: The Case for the Historical Wizard of Oz
Next post: AI and the Future of Human Morality


{ 16 comments… read them below or add one }
Self-replicating machines will be the end of us. Anyone ever read Fred Saberhagen’s Berserker series?
Skywatch is coming! :-0
Walter(Quote)
Luke,
Where is Isaac Asimov’s 3 laws of robotics?
Bill Maher(Quote)
“Computing speed doubles every two subjective years of work. Two years after Artificial Intelligences reach human equivalence, their speed doubles. One year later, their speed doubles again. Six months – three months – 1.5 months … Singularity.”
That was true in 1996. It no longer is.
Charles(Quote)
Bill: they may be found in the fiction section. :) They aren’t useful for creating real artificial benevolent optimization for a variety of reasons; for instance, they sound simple, but by using words like “human” and “harm”, they essentially contain pointers to the entire complicated human axiology, and representing human values computationally is one of the central Friendly AI problems (which is a whole lot harder than you’d expect if you’re imagining telling fully-formed anthropomorphic robots what to do in English and having them automatically know and care what you mean).
Charles: I don’t recall which of the above documents quotes that passage, but aside from no longer being true, it’s no longer particularly relevant either. First, in the current terminology, I think the “singularity” is the point when an AI reaches human equivalence, and there’s no expectation that AI-directed AI research would follow the same trends as human-directed research. Second, reliable trends in computing power are only relevant to FAI in that it makes it more urgent — the more computing power is available, the easier it may be to make a self-improving AI by imprecise and (relatively) brute-force approaches.
Adam Atlas(Quote)
Relevant:
http://www.youtube.com/watch?v=G6CVj5IQkzk
Zak(Quote)
In Chalmer’s paper, the most interesting point for me was the following:
Again, the argument can be resisted…perhaps by arguing that evolution produced intelligence by means of processes that we cannot mechanically replicate. The latter line might be taken by holding that evolution…needed an enormously complex history that we could never artificially duplicate, or needed an enormous amount of luck.
Arguments against simulation (Dreyfus) or that deny simulation is possible (Penrose) all fail for me because they are essentially hardware issues and there are simulated solutions to those issues (including quantum problems), but there remains a problem of fundamental complexity that also impacts the moral/ethical boundary. If the only path to the kinds of flexibility, learning capacity, and self-awareness is through a rich evolutionary history (because we can’t manage the complexity ourselves), then we must focus on “leak-proof” simulated environments in understanding AI+ (Chalmers, p. 31). That, in itself, is problematic because it then assumes that we can create a sufficiently rich leak-proof world for AI+ to emerge in.
MarkD(Quote)
So. . .what additional steps are needed to make a sexbot?
Scott(Quote)
Adam,
It was intended to be a joke and was a reference to the I, Robot picture. :)
Bill Maher(Quote)
I’m not persuaded that intelligence is so greatly expandable as e.g. Chalmers argues. The argument that AI+ leads to AI++ is just hokey.
cd(Quote)
I think there are two problems with AI that usually go unmentioned:
1. How are we going to imbue a sense of purpose onto machines?
2. How will these machines attain autonomy?
The first problem is, to me, insurmountable: our survival instinct has been modelled by evolution over billions of years. I doubt we can artificially reproduce that. I doubt even superintelligent machines could reproduce that: why would they want to?
The second problem is easier, but only if we are stupid enough to let the machines reproduce ad libitum. Even a few thousand machines would not be too hard to get rid of if they started getting funny ideas.
piero(Quote)
You want friendly-with-benefits AI?
Zeb(Quote)
Love and Sex with Robots.
Luke Muehlhauser(Quote)
You might want to revisit or eliminate your distinction between peer-reviewed and not peer-reviewed. All of my references as well as some by Omohundro and Sotala were peer-reviewed submissions to conferences. I’d also be curious about the “peer review” that submissions to one’s own web journal (where 11 of the 25 articles since 2007 are by two of the editors) get.
Mark Waser(Quote)
Mark Waser,
Done.
Luke Muehlhauser(Quote)
From the Turney paper:
“A SIM manipulated by the experience of pleasure, however, may feel resentment, like a drug addict manipulated by a dealer.”
Seriously? Game theory and evolutionary biology are necessary in an introduction to AI, not only for students but for Turney.
Brian(Quote)
Hi Luke, I’ve been reading FAI stuff casually for a while now and am familiar with the arguments (at least on a blog post/pop sci level). However, unless I’ve missed it, I haven’t seen either you or Yudkowsky directly address in writing the points Mark Waser makes about his Rational Universal Benevolence as an answer to the FAI problem. Yet judging by this bibliography you are aware of his work. Could you point me to anything I’ve missed, or if there isn’t anything, write a blog post about it please? It’s just that it’s bugging me that I can’t see anything obviously wrong with his argument, yet he seems to have been kicked out of Less Wrong without a proper hearing! You may have seen that he has a new blog post on it – http://becominggaia.wordpress.com/2012/01/22/value-is-simple-and-robust/
Greg Colbourn(Quote)
{ 1 trackback }