Ethics and Superintelligence

by Luke Muehlhauser on February 13, 2011 in Ethics,Friendly AI,Machine Ethics,Resources

To educate a man in mind and not in morals is to educate a menace to society.

- Theodore Roosevelt

Here you can download the latest draft of my upcoming book Ethics and Superintelligence (03-11-2011 draft).

I’m also posting early versions of tiny sections of the book at Less Wrong: 1, 2, … (more to come)

Table of contents

  1. The dawn of superintelligence
  2. Utility-based plans for SAMA design
  3. Rule-based plans for SAMA design
  4. Other plans for SAMA design
  5. Coherent extrapolated volition
  6. How to solve philosophical problems quickly

Bibliography (works cited in the book)

Allen, C. (2002). Calculated morality: Ethical computing in the limit. In I. Smit & G. Lasker (Eds.), Cognitive, emotive and ethical aspects of decision making and human action, vol I. Baden/IIAS.

Allen, C., Varner, G., and Zinser, J. (2000). Prolegomena to any future artificial moral agentJournal of Experimental & Theoretical Artificial Intelligence, 12: 251-261.

Allen, C., Wallach, W., and Smit, I. (2006). Why machine ethics? IEEE Intelligent Systems, 21(4): 12-17.

Anderson, M. and Anderson, S., eds. (2006). IEEE Intelligent Systems, 21(4).

Anderson, M. and Anderson, S. (2007). Machine ethics: Creating an ethical intelligent agent. AI Magazine, 28(4): 15-26.

Anderson, M. and Anderson, S. (2008). EthEl: Toward a Principled Ethical Eldercare Robot. Proceedings of the AAAI Fall 2008 Symposium on AI in Eldercare: New Solutions to Old Problems. Arlington, Virginia.

Anderson, M., Anderson, S. and Armen, C. (2006). MedEthEx: A Prototype Medical Ethics Advisor. Proceedings of the Eighteenth Conference on Innovative Applications of Artificial Intelligence. Boston, Massachusetts.

Arkin, R. (2009). Governing Lethal Behavior in Autonomous Robots. Chapman and Hall.

Arnall, A. (2003). Future Technologies Today’s Choices: A Report for the Greenpeace Environmental Trust. Greenpeace.

Bainbridge, W. (2005). Managing nano-bio-info-cogno innovations: Converging technologies in society. Springer.

Baum, S., Goertzel, B., and Goertzel, T. (forthcoming). How long until human-level AI? Results from an expert assessment. Technological Forecasting and Social Change.

Bostrom, N. (1998). How long before superintelligence? Intenational Journal of Future Studies, 2.

Bostrom, N. (2002). Existential Risks: Analyzing Human Extinction Scenarios and Related Hazards. Journal of Evolution and Technology 9.

Bostrom, N. (2003). Ethical issues in advanced artificial intelligence. In Smit I, Lasker G and Wallach W (eds), Cognitive, emotive and ethical aspects of decision making in humans and in artificial intelligence, vol II. IIAS, Windsor.

Bostrom, N. (2006). What is a Singleton? Linguistic and Philosophical Investigations, 5(2): 48-54.

Bostrom, N. and Cirkovic, M. (2008). Global Catastrophic Risks. Oxford University Press.

Brown, J.S. and Duguid, P. (2000). A response to Bill Joy and the doom-and-gloom technofuturists. Industry Standard, April 24: 196.

Brown, J.S., and Duguid, P. (2001). Don’t count society out: a reply to Bill Joy. In: P.J. Denning (ed.) The invisible future, 117–144.

Bringsjord, S., Taylor, J., Gilbert, E., van Hueveln, B., Arkoudas, K., Clark, M., and Housten, T. (2010). Piagetian Roboethics via Category Theory: Moving Beyond Mere Formal Operations to Engineer Robots Whose Decisions are Guaranteed to be Ethically Correct.

Butler, S. (1863). Darwin among the machines. The Press (Cristchurch, New Zealand), June 13.

Bynum, T. (2000). The Foundation of Computer Ethics. Computers and Society, 30(2): 6-13.

Bynum, T. (2004). Ethical Challenges to Citizens of the ‘Automatic Age’: Norbert Wiener on the Information Society. Journal of Information, Communication and Ethics in Society, 2(2): 65-74.

Bynum, T. (2005). Norbert Wiener’s Vision: the Impact of the ‘Automatic Age’ on our Moral Lives. In R. Cavalier (ed.), The Impact of the Internet on our Moral Lives: 11-25. SUNY Press.

Bynum, T. (2008a). Norbert Wiener and the Rise of Information Ethics. In J. van den Hoven and J. Weckert (eds.), Moral Philosophy and Information Technology: 8-25. Cambridge University Press.

Bynum, T. (2008b). Computer and Information Ethics. The Stanford Encyclopedia of Philosophy (Spring 2011 Edition), Edward N. Zalta (ed.).

Campbell, M., Hoane, A., and Hsu, F. (2002). Deep Blue. Artificial Intelligence, 134: 57-83.

Capurro, R., Hausmanninger, T., Weber, K., Weil, F., Cerqui, D., Weber, J., Weber, K. (2006). International Review of Information Ethics, Vol. 6: Ethics in Robots.

Chalmers, D. (2010). The Singularity: A Philosophical AnalysisJournal of Consciousness Studies, 17: 7-65.

Clarke, S. (2005). Future technologies, dystopic futures and the precautionary principle. Ethics and Information Technology, 7(3): 121-126.

Danielson, P. (1992). Artificial morality: Virtuous robots for virtual games. Routledge.

Decker, M. (2004). The role of ethics in interdisciplinary technology assessment. Poiesis & Praxis: International Journal of Technology Assessment and Ethics of Science, 2(2-3): 139-156.

Dietrich, E. (2007). After the Humans are Gone. Philosophy Now, v. 61, May/June: 16-19.

Dietrich, E. (2011). Homo sapiens 2.0: Building the better robots of our nature. In M. Anderson and S. Anderson, (eds.), Machine Ethics. Cambridge University Press.

Floridi, L. (1999). Philosophy and Computing: An Introduction. Routledge.

Ganascia, J. (2007a). Ethical System Formalization using Non-Monotonic Logics. Proceedings of the Cognitive Science conference (CogSci2007). Nashville.

Ganascia J. (2007b). Modeling Ethical Rules of Lying with Answer Set Programming. Ethics and Information Technology, 9: 39-47.

de Garis, H. (2005). The Artilect War: Cosmists Vs. Terrans: A Bitter Controversy Concerning Whether Humanity Should Build Godlike Massively Intelligent Machines. ETC Publications.

Goertzel, B. and Pennachin, C., eds. (2010). Artificial General Intelligence. Springer.

Good, I. J. (1965). Speculations concerning the first ultraintelligent machine. Advanced in Computers, 6: 31-88.

Hall, J. (2005). Nanofuture: What’s next for nanotechnology. Prometheus.

Hanson, R. (1994). If uploads come first: The crack of a future dawn. Extropy, 6(2): 10–15.

Hanson, R. (forthcoming). Economic growth given machine intelligence. Journal of Artificial Intelligence Research.

Hansson, S. (1997). The Limits of Precaution. Foundations of Science, 2: 293–306.

Hansson, S. (1999). Adjusting Scientific Practices to the Precautionary Principle. Human and Ecological Risk Assessment, 5: 909–921.

Hibbard, B. (2001). Super-intelligent machines. Computer Graphics 35(1), 11-13.

Honarvar, A. and Ghasem-Aghaee, N. (2009). An artificial neural network approach for creating an ethical artificial agent. Proceedings of the 8th IEEE international conference on Computational intelligence in robotics and automation: 290-295.

Jensen, K. (2002). The moral foundation of the precautionary principle. Journal of Agricultural and Environmental Ethics, 15(1): 39-55.

Joy, B. (2000). Why the future doesn’t need us. Wired 8.04.

Keiper A. (2007). Nanoethics as a discipline? New Atlantis (Spring): 55–67.

Keynes, J. M. (1933). Essays in persuasion. Macmillan.

King, R. et al. (2009). The automation of science. Science, 324: 85-89.

King, R. (2011). Rise of the robo scientists. Scientific American, January 2011.

Kurzweil. R. (2005). The Singularity is Near. Viking.

Legg, S. (2008). Machine super-intelligence. PhD Thesis. IDSIA.

Leontief, W. W. (1982). The distribution of work and income. Scientific American, 192: 188–204.

Lokhorst, G. (2011). Computational meta-ethics: Towards the meta-ethical robot. Minds and Machines.

Loosemore, R., and Goertzel, B. (2011). Why an intelligence explosion is probable. H+ Magazine, March 7, 2011.

MacKenzie, D. (1995). The Automation of Proof: A Historical and Sociological Exploration. IEEE Annals, 17(3): 7-29.

Manson N. (2002). Formulating the precautionary principle. Environmental Ethics 24: 263–274.

Markoff, J. (2011). “Computer Wins on ‘Jeopardy!’; Trivial, it’s Not.” New York Times, February 17th 2011, A1.

McLaren, B. (2003). Extensionally Defining Principles and Cases in Ethics: an AI Model. Artificial Intelligence Journal, 150: 145-181.

McLaren, B. (2005). Lessons in Machine Ethics from the Perspective of Two ComputationalModels of Ethical Reasoning. AAAI Technical Report FS-05-06: 70-77.

Messerly, J. (2003). I’m glad the future doesn’t need us: a critique of Joy’s pessimistic futurism. IGCAS Computers and Society, 33(2).

Metzinger, T. (2004). Being No One. MIT Press.

Metzinger, T. (2009). The Ego Tunnel. Basic Books.

Mnyusiwalla, A., Daar, A., and Singer, P. (2003). ‘Mind the gap’: science and ethics in nanotechnology. Nanotechnology, 14(3): R9-R13.

Moor, J. (2005). Why we need better ethics for emerging technologies. Ethics and Information Technology, 7: 111-119.

Moravec, H. (1999). Robot: Mere Machine to Transcendent Mind. Oxford University Press.

Nielsen, M. (2011). What should a reasonable person believe about the Singularity?

Nilsson, N.  (1985). Artificial intelligence, employment, and income. Human Systems Management, 5: 123–125.

Nilsson, N. (2009). The Quest for Artificial Intelligence. Cambridge University Press.

Nordmann, A. (2007). If and then: a critique of speculative nanoethics. NanoEthics 1: 31–46.

Nordmann, A. and Rip, A. (2009). Mind the gap revisited. Nature Neuroscience, 4: 273-274.

Norris, R. and Kristensen, H. (2009). U.S. Nuclear Warheads, 1945-2009. Bulletin of the Atomic Scientists, 65(4): 72-81.

Oosterlaken, I. (2009). Design for development: A capability approach. Design Issues, 25(4): 91–102.

Phoenix, C. and Treder, M. (2003). Applying the precautionary principle to nanotechnology.

Powers, T. (2005). Deontological Machine Ethics. AAAI Technical Report FS-05-06: 79-86.

Roach, R. (2008). Ethics, speculation, and values. Nanoethics 2: 317-327.

Rose, J., Huhns, M., Roy, S., and Turkett, W. (2002). An agent architecture for long-term robustness. Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 3. ACM.

Russell, S., and Norvig, P. (2009). Artificial Intelligence: A Modern Approach, 3rd ed. Prentice Hall.

Sawyer, R. (2007). Robot ethics. Science, 318(5853): 1037.

Simon, H. A. (1977). The new science of management decision. Prentice Hall.

Smart, J. (2009) Evo Devo Universe? A Framework for Speculations on Cosmic Culture. In Dick, S., and Lupisella, M. (eds.) Cosmos & Culture. NASA Press.

Turing, A. (1950). Computing machinery and intelligence. Mind, 59: 433-460.

Turing, A. (1951/2004). Intelligent machinery, a heretical theory. In Copeland (ed.), The Essential Turing, 2004, Oxford: Oxford University Press. Originally presented in 1951 as a lecture for the ’51 society in Manchester.

Vinge, V. (1993). The coming technological singularity: How to survive in the post-human era. Whole Earth Review, winter 1993. New Whole Earth.

Wallach, W., Allen, C., and Smit, I. (2008). Machine morality: Bottom-up and top-down approaches for modelling human moral faculties. AI and Society, 22(4), 565–582.

Wallach, W., Franklin, S., and Allen, C. (2010). A Conceptual and Computational Model of Moral Decision Making in Human and Artificial Agents. Topics in Cognitive Science, 2(3): 454-485.

Wallach, W. (2010). Robot minds and human ethics: the need for a comprehensivemodel of moral decision making. Ethics and Information Technology, 12: 243-250.

Weckert, J. and Moor, J. (2006). The precautionary principle in nanotechnology. International Journal of Applied Philosophy, 20(2): 191-204.

Weismann, A. (2007). The World Without Us. Thomas Dunne Books.

Wiegel, V. (2006). Building Blocks for Artificial Moral Agents. Proceedings of EthicalALife06 Workshop.

Wiegel, V. and van den Berg, J. (2009). Combining Moral Theory, Modal Logic and Mas to Create Well-Behaving Artificial Agents. International Journal of Social Robotics, 1: 233:242.

Yudkowsky, E. (2001a). Creating Friendly AI 1.0. Machine Intelligence Research Institute.

Yudkowsky, E. (2001b). What is friendly AI? Machine Intelligence Research Institute.

Yudkowsky, E. (2004). Coherent extrapolated volition. Machine Intelligence Research Institute.

Yudkowsky, E. (2007). Three major singularity schools.

Yudkowsky, E. (2008). Artificial intelligence as a positive and negative factor in global risk. In Bostrom, N. and Cirkovic, M. (eds.), Global Catastrophic Risks. Oxford University Press.

Yudkowsky, E. (2010). Timeless decision theory. Machine Intelligence Research Institute.

Previous post:

Next post:

{ 21 comments… read them below or add one }

The Nardini Incident February 13, 2011 at 7:01 am


Are you still writing a book on the New Atheists?


Silas February 13, 2011 at 9:12 am

My advice is: don’t start anything new. Try to finnish what you have started. It is very ironic that you choose to write about how to beat procrastination… when you’re the ultimate procrastinator. There are dozens of blog series you have started and never finished.


Reginald Selkirk February 13, 2011 at 9:47 am

1: The technological singularity is coming soon.

Good luck with that one.


MauricXe February 13, 2011 at 10:12 am

I’m gonna have to agree with Silas. Personally, I’m still waiting for your explanation of how special relativity defeats, or posses problems for, the Kalam.

I also echo Rehinald Selkirk’s skepticism.


DaVead February 13, 2011 at 10:21 am

There’s something I still don’t understand The singularity won’t happen on its own. Why not just stop the development of A.I.?

It’s like engineers continually adding floors to the tallest building in the world, and then telling the public that because their rate of construction is accelerating, they need more resources and money to make sure the building doesn’t topple over and kill people. Or it’s like physicists building particle accelerators in which to simulate black holes that could potentially annihilate the planet, and then pleading with people for recognition and funding to help lower the chances of disaster. Are A.I. researchers not essentially saying, “We’re developing cool technologies that could potentially threaten the fate of the galaxy, so support us so we can make even cooler technologies to (maybe) prevent this hazard.” ? Maybe there are just some places we shouldn’t go, and maybe advanced A.I. is one of them.

Why not make A.I. smart enough for a set amount of tasks that do not require machines with self-perpetuating levels of sophisticated, and then say enough is enough? In my opinion, if the A.I. researchers don’t do this, then they are to blame, regardless of whether or not they can make “friendly” A.I.


Luke Muehlhauser February 13, 2011 at 11:37 am

The Nardini Incident,

No I gave it up. I lost interest.


Luke Muehlhauser February 13, 2011 at 11:38 am


Some projects are more worthy than others. Doing what matters is more important than finishing things that matter less.


The Nardini Incident February 13, 2011 at 12:04 pm


Agreed. One ought to do what is more important than finishing up less important matters. If one discovers, in the midst of Project A, that Project B seems more worthwhile then one ought to abandon Project A. What counts is accomplishing important projects, not having finished several projects one is indifferent to. Good luck with the new book.


Silas February 13, 2011 at 1:16 pm

Yeah, so let’s see if this matters. I’ve read this blog since the start and odds are it doesn’t.


Steven R. February 13, 2011 at 9:00 pm

Yeah, so let’s see if this matters. I’ve read this blog since the start and odds are it doesn’t.  

I find the topic to be rather interesting. I mean, even if the problem isn’t as urgent as Luke thinks (and, being someone who keeps on observing how fast technology has progressed, tend to agree with Luke that this may be coming faster than we suspect), this may well be a problem in the future and the sooner we begin preparing ourselves for it the better. Plus, the subject is fascinating, and learning which, if any objective morals exist, and their possible relation to other beings (be it extraterrestrials or singularities) is an important matter.

Why continue progressing? Because, if done properly, we have just come to the greatest revolution which would open the doors of human understanding to considerable new heights. We’d no longer be simply limited to our own selves and now be capable of doing so much more.


MarkD February 13, 2011 at 11:24 pm

I get easily distracted, but recommend the following strategy: keep around three writing projects in play at once and allow yourself to switch between them but, critically, get between 500 and 1,000 words per day on any one of them. Put it in your various scheduling devices. A book or technical paper pops out every 6 months or so at that rate.

And now I see my new novel, Turning Ball, just popped on my schedule and at 11:30 I must write…grumble.


Fleisch February 14, 2011 at 1:34 am

@DaVead: The problem is that “we” don’t have any license or monopoly on building A.I. If, say, the American military, or the Chinese, or some geniuses in a basement of the Googleplex are building an A.I., it’s practically impossible for us (as in “the people at MIRI”) to stop them. Especially as computers get better and better, just whacking an A.I. together will become ever easier.


Fleisch February 14, 2011 at 9:12 am

(I am not actually affiliated with the MIRI. I talked about the MIRI to point out how few people actually recognize a problem with building AI. You always have to be careful when using the word “we”, because it might delude you into thinking that whoever you thought of form an actual group, especially one that agrees with you.)


Mark Waser February 16, 2011 at 5:28 am

I’m not quite sure why you are so focussed on a singleton. There are many reasons why a singleton is “only” an ultimate end state in the same sense that death is the ultimate end state of life and many more reasons why a singleton is *much* more dangerous than well-designed alternatives. It certainly makes sense to discuss a singleton as an option and its pros and cons but it seems that you are ignoring what I perceive to be the majority of the solution space (and eliminating many acceptable solutions in the process).


Luke Muehlhauser February 16, 2011 at 6:30 am

Mark Waser,

Yes, this is of course a decision I will explain and argue for in the book.


Mark Waser February 16, 2011 at 6:44 am

I hope you’ll blog about it first. Please? ;-)


Luke Muehlhauser February 16, 2011 at 8:19 am


At the least, you’ll see the draft of that section go up in the Less Wrong discussion area long before a book is published.


Jason Sewell February 28, 2011 at 7:43 pm


Thrilled to see your change of focus. I thoroughly enjoyed your Atheist writing, references, and podcasts, but I’d been hoping that you’d move into an area like this. Have you read Bostrom’s Simulation Argument? I didn’t see it listed as a reference.


Luke Muehlhauser February 28, 2011 at 8:57 pm

Jason Sewell,

I’m familiar with the argument but haven’t read the papers. Is it relevant to the problem of Friendly AI?


Song Moo November 8, 2011 at 4:18 pm


Is this still being worked on?


Luke Muehlhauser November 8, 2011 at 11:07 pm

Song Moo:

Yes, by Nick Bostrom. :)


Leave a Comment