Friendly AI: a bibliography

by Luke Muehlhauser on February 4, 2011 in Friendly AI,Resources

Every year, machines surpass human ability in new ways. One day, we may design a machine that surpasses human ability at designing intelligent machines. This machine could improve its own intelligence, which would make it even better at improving its own intelligence, which would make it massively better at improving its own intelligence, and… you get the point. Within years, or months, or hours, we would have a machine so intelligent that it could quickly dominate the galaxy for its own purposes.

The trick is to program the first such machine with good purposes. That is what will determine the fate of our galaxy for the next several billion years. That is the problem of building Friendly AI, and it makes the problem of global warming look small by comparison.

The bibliography below lists sources on the subject of Friendly AI. I could not build a bibliography for “artificial morality” or “robot ethics” in general, because the field is too vast. Instead, I focus on artificial moral agency in the context of a technological singularity: that is, the problem of Friendly AI.

Last updated on 02/18/2011.

For the public

Easy reading

For academics

I’ve placed in bold the works that may be most useful, both because they make major contributions and are fairly readable to people not trained in AI. (For example, Creating Friendly AI is an important contribution, but I find it much harder to read than works written in the usual style of Anglophone science and philosophy journals.)

Previous post:

Next post:

{ 16 comments… read them below or add one }

Walter February 4, 2011 at 5:34 am

Self-replicating machines will be the end of us. Anyone ever read Fred Saberhagen’s Berserker series?

Skywatch is coming! :-0


Bill Maher February 4, 2011 at 7:50 am


Where is Isaac Asimov’s 3 laws of robotics?


Charles February 4, 2011 at 8:33 am

“Computing speed doubles every two subjective years of work. Two years after Artificial Intelligences reach human equivalence, their speed doubles. One year later, their speed doubles again. Six months – three months – 1.5 months … Singularity.”

That was true in 1996. It no longer is.


Adam Atlas February 4, 2011 at 10:17 am

Bill: they may be found in the fiction section. :) They aren’t useful for creating real artificial benevolent optimization for a variety of reasons; for instance, they sound simple, but by using words like “human” and “harm”, they essentially contain pointers to the entire complicated human axiology, and representing human values computationally is one of the central Friendly AI problems (which is a whole lot harder than you’d expect if you’re imagining telling fully-formed anthropomorphic robots what to do in English and having them automatically know and care what you mean).

Charles: I don’t recall which of the above documents quotes that passage, but aside from no longer being true, it’s no longer particularly relevant either. First, in the current terminology, I think the “singularity” is the point when an AI reaches human equivalence, and there’s no expectation that AI-directed AI research would follow the same trends as human-directed research. Second, reliable trends in computing power are only relevant to FAI in that it makes it more urgent — the more computing power is available, the easier it may be to make a self-improving AI by imprecise and (relatively) brute-force approaches.


Zak February 4, 2011 at 11:45 am
MarkD February 4, 2011 at 12:02 pm

In Chalmer’s paper, the most interesting point for me was the following:

Again, the argument can be resisted…perhaps by arguing that evolution produced intelligence by means of processes that we cannot mechanically replicate. The latter line might be taken by holding that evolution…needed an enormously complex history that we could never artificially duplicate, or needed an enormous amount of luck.

Arguments against simulation (Dreyfus) or that deny simulation is possible (Penrose) all fail for me because they are essentially hardware issues and there are simulated solutions to those issues (including quantum problems), but there remains a problem of fundamental complexity that also impacts the moral/ethical boundary. If the only path to the kinds of flexibility, learning capacity, and self-awareness is through a rich evolutionary history (because we can’t manage the complexity ourselves), then we must focus on “leak-proof” simulated environments in understanding AI+ (Chalmers, p. 31). That, in itself, is problematic because it then assumes that we can create a sufficiently rich leak-proof world for AI+ to emerge in.


Scott February 4, 2011 at 3:37 pm

So. . .what additional steps are needed to make a sexbot?


Bill Maher February 4, 2011 at 4:51 pm


It was intended to be a joke and was a reference to the I, Robot picture. :)


cd February 4, 2011 at 5:05 pm

I’m not persuaded that intelligence is so greatly expandable as e.g. Chalmers argues. The argument that AI+ leads to AI++ is just hokey.


piero February 4, 2011 at 7:39 pm

I think there are two problems with AI that usually go unmentioned:
1. How are we going to imbue a sense of purpose onto machines?
2. How will these machines attain autonomy?

The first problem is, to me, insurmountable: our survival instinct has been modelled by evolution over billions of years. I doubt we can artificially reproduce that. I doubt even superintelligent machines could reproduce that: why would they want to?

The second problem is easier, but only if we are stupid enough to let the machines reproduce ad libitum. Even a few thousand machines would not be too hard to get rid of if they started getting funny ideas.


Zeb February 5, 2011 at 2:15 pm

So. . .what additional steps are needed to make a sexbot?  

You want friendly-with-benefits AI?


Luke Muehlhauser February 5, 2011 at 2:45 pm


Mark Waser February 15, 2011 at 1:36 pm

You might want to revisit or eliminate your distinction between peer-reviewed and not peer-reviewed. All of my references as well as some by Omohundro and Sotala were peer-reviewed submissions to conferences. I’d also be curious about the “peer review” that submissions to one’s own web journal (where 11 of the 25 articles since 2007 are by two of the editors) get.


Luke Muehlhauser February 15, 2011 at 8:19 pm

Mark Waser,



Brian September 23, 2011 at 12:13 pm

From the Turney paper:

“A SIM manipulated by the experience of pleasure, however, may feel resentment, like a drug addict manipulated by a dealer.”

Seriously? Game theory and evolutionary biology are necessary in an introduction to AI, not only for students but for Turney.


Greg Colbourn January 24, 2012 at 4:58 am

Hi Luke, I’ve been reading FAI stuff casually for a while now and am familiar with the arguments (at least on a blog post/pop sci level). However, unless I’ve missed it, I haven’t seen either you or Yudkowsky directly address in writing the points Mark Waser makes about his Rational Universal Benevolence as an answer to the FAI problem. Yet judging by this bibliography you are aware of his work. Could you point me to anything I’ve missed, or if there isn’t anything, write a blog post about it please? It’s just that it’s bugging me that I can’t see anything obviously wrong with his argument, yet he seems to have been kicked out of Less Wrong without a proper hearing! You may have seen that he has a new blog post on it –


Leave a Comment

{ 1 trackback }