Goertzel Tells Us Not To Fear The Machine

Aaron Saenz

Nov 14, 2010

Below is a guest post written by Dr. Ben Goertzel CEO of AI software company Novamente LLC and bioinformatics company Biomind LLC; leader of the open-source OpenCog Artificial General Intelligence software project; Chairman of Humanity+; Chief Technology Officer of biopharma firm Genescient Corp.; Director of Engineering of digital media firm Vzillion Inc.; Advisor to the Singularity University and Singularity Institute; Research Professor in the Fujian Key Lab for Brain-Like Intelligent Systems at Xiamen University, China; and general Chair of the Artificial General Intelligence conference series. This post was originally posted on Dr. Goertzel's own blog.

I recently wrote a blog post about my own AI project, but it attracted a bunch of adversarial comments from folks influenced by the Singularity Institute for AI's (rather different) perspective on the best approach to AI R&D. I responded to some of these comments there.

(Quick note for those who don't know: the Singularity Institute for AI is not affiliated with Singularity University, though there are some overlaps ... Ray Kurzweil is an Advisor to the former and the founder of the latter; and I am an Advisor to both.)

Following that discussion, a bunch of people have emailed me in the last couple weeks asking me to write something clearly and specifically addressing my views on SIAI's perspective on the future of AI. I don't want to spend a lot of time on this but I decided to bow to popular demand and write a blog post...

Of course, there are a lot of perspectives in the world that I don't agree with, and I don't intend to write blog posts explaining the reasons for my disagreement with all of them! But since I've had some involvement with SIAI in the past, I guess it's sort of a special case.

First of all I want to clarify I'm not in disagreement with the existence of SIAI as an institution, nor with the majority of their activities -- only with certain positions habitually held by some SIAI researchers, and by the community of individuals heavily involved with SIAI. And specifically with a particular line of thinking that I'll refer to here as "SIAI's Scary Idea."

Roughly, the Scary Idea posits that: If I or anybody else actively trying to build advanced AGI succeeds, we're highly likely to cause an involuntary end to the human race.

Brief Digression: My History with SIAI

Before getting started with the meat of the post, I'll give a few more personal comments, to fill in some history for those readers who don't know it, or who know only parts. Readers who are easily bored may wish to skip to the next section,

SIAI has been quite good to me, overall. I've enjoyed all the Singularity Summits, which they've hosted, very much; I think they've played a major role in the advancement of society's thinking about the future, and I've felt privileged to speak at them. And I applaud SIAI for consistently being open to Summit speakers whose views are strongly divergent from those commonly held in the SIAI community.

Also, in 2008, SIAI and my company Novamente LLC seed-funded the OpenCog open-source AGI project (based on software code spun out from Novamente). The SIAI/OpenCog relationship diminished substantially when Tyler Emerson passed the leadership of SIAI along to Michael Vassar, but it was instrumental in getting OpenCog off the ground. I've also enjoyed working with Michael Vassar on the Board of Humanity+, of which I'm Chair and he's a Board member.

When SIAI was helping fund OpenCog, I took the title of "Director of Research" of SIAI, but I never actually directed any research there apart from OpenCog. The other SIAI research was always directed by others, which was fine with me. There were occasional discussions about operating in a more unified manner, but it didn't happen. All this is perfectly ordinary in a small start-up type organization.

Once SIAI decided OpenCog was no longer within its focus, after a bit of delay I decided it didn't make sense for me to hold the Director of Research title anymore, since as things were evolving, I wasn't directing any SIAI research. I remain as an Advisor to SIAI, which is going great.

Now, on to the meat of the post….

SIAI's Scary Idea (Which I Don't Agree With)

SIAI's leaders and community members have a lot of beliefs and opinions, many of which I share and many not, but the key difference between our perspectives lies in what I'll call SIAI's "Scary Idea", which is the idea that: progressing toward advanced AGI without a design for "provably non-dangerous AGI" (or something closely analogous, often called "Friendly AI" in SIAI lingo) is highly likely to lead to an involuntary end for the human race.

(SIAI's Scary Idea has been worded in many different ways by many different people, and I tried in the above paragraph to word it in a way that captures the idea fairly if approximatively, and won't piss off too many people.)

Of course it's rarely clarified what "provably" really means. A mathematical proof can only be applied to the real world in the context of some assumptions, so maybe "provably non-dangerous AGI" means "an AGI whose safety is implied by mathematical arguments together with assumptions that are believed reasonable by some responsible party"? (where the responsible party is perhaps "the overwhelming majority of scientists" … or SIAI itself?)….. I'll say a little more about this a bit below.

Please note that, although I don't agree with the Scary Idea, I do agree that the development of advanced AGI has significant risks associated with it. There are also dramatic potential benefits associated with it, including the potential of protection against risks from other technologies (like nanotech, biotech, narrow AI, etc.). So the development of AGI has difficult cost-benefit balances associated with it -- just like the development of many other technologies.

I also agree with Nick Bostrom and a host of SF writers and many others that AGI is a potential "existential risk" -- i.e. that in the worst case, AGI could wipe out humanity entirely. I think nanotech and biotech and narrow AI could also do so, along with a bunch of other things.

I certainly don't want to see the human race wiped out! I personally would like to transcend the legacy human condition and become a transhuman superbeing … and I would like everyone else to have the chance to do so, if they want to. But even though I think this kind of transcendence will be possible, and will be desirable to many, I wouldn't like to see anyone forced to transcend in this way. I would like to see the good old fashioned human race continue, if there are humans who want to maintain their good old fashioned humanity, even if other options are available

But SIAI's Scary Idea goes way beyond the mere statement that there are risks as well as benefits associated with advanced AGI, and that AGI is a potential existential risk.

Finally, I note that most of the other knowledgeable futurist scientists and philosophers, who have come into close contact with SIAI's perspective, also don't accept the Scary Idea. Examples include Robin Hanson, Nick Bostrom and Ray Kurzweil.

There's nothing wrong with having radical ideas that one's respected peers mostly don't accept. I totally get that: My own approach to AGI is somewhat radical, and most of my friends in the AGI research community, while they respect my work and see its potential, aren't quite as enthused about it as I am. Radical positive changes are often brought about by people who clearly understand certain radical ideas well before anyone else "sees the light." However, my own radical ideas are not telling whole research fields that if they succeed they're bound to kill everybody ... so it's a somewhat different situation.

What is the Argument for the Scary Idea?

Although an intense interest in rationalism is one of the hallmarks of the SIAI community, still I have not yet seen a clear logical argument for the Scary Idea laid out anywhere. (If I'm wrong, please send me the link, and I'll revise this post accordingly. Be aware that I've already at least skimmed everything Eliezer Yudkowsky has written on related topics.)

So if one wants a clear argument for the Scary Idea, one basically has to construct it oneself.

As far as I can tell from discussions and the available online material, some main ingredients of peoples' reasons for believing the Scary Idea are ideas like:

If one pulled a random mind from the space of all possible minds, the odds of it being friendly to humans (as opposed to, e.g., utterly ignoring us, and being willing to repurpose our molecules for its own ends) are very low
Human value is fragile as well as complex, so if you create an AGI with a roughly-human-like value system, then this may not be good enough, and it is likely to rapidly diverge into something with little or no respect for human values
"Hard takeoffs" (in which AGIs recursively self-improve and massively increase their intelligence) are fairly likely once AGI reaches a certain level of intelligence; and humans will have little hope of stopping these events
A hard takeoff, unless it starts from an AGI designed in a "provably Friendly" way, is highly likely to lead to an AGI system that doesn't respect the rights of humans to exist

I emphasize that I am not quoting any particular thinker associated with SIAI here. I'm merely summarizing, in my own words, ideas that I've heard and read very often from various individuals associated with SIAI.

If you put the above points all together, you come up with a heuristic argument for the Scary Idea. Roughly, the argument goes something like: If someone builds an advanced AGI without a provably Friendly architecture, probably it will have a hard takeoff, and then probably this will lead to a superhuman AGI system with an architecture drawn from the vast majority of mind-architectures that are not sufficiently harmonious with the complex, fragile human value system to make humans happy and keep humans around.

The line of argument makes sense, if you accept the premises.

But, I don't.

I think the first of the above points is reasonably plausible, though I'm not by any means convinced. I think the relation between breadth of intelligence and depth of empathy is a subtle issue which none of us fully understands (yet). It's possible that with sufficient real-world intelligence tends to come a sense of connectedness with the universe that militates against squashing other sentiences. But I'm not terribly certain of this, any more than I'm terribly certain of its opposite.

I agree much less with the final three points listed above. And I haven't seen any careful logical arguments for these points.

I doubt human value is particularly fragile. Human value has evolved and morphed over time and will continue to do so. It already takes multiple different forms. It will likely evolve in future in coordination with AGI and other technology. I think it's fairly robust.

I think a hard takeoff is possible, though I don't know how to estimate the odds of one occurring with any high confidence. I think it's very unlikely to occur until we have an AGI system that has very obviously demonstrated general intelligence at the level of a highly intelligent human. And I think the path to this "hard takeoff enabling" level of general intelligence is going to be somewhat gradual, not extremely sudden.

I don't have any strong sense of the probability of a hard takeoff, from an apparently but not provably human-friendly AGI, leading to an outcome likable to humans. I suspect this probability depends on many features of the AGI, which we will identify over the next years & decades via theorizing based on the results of experimentation with early-stage AGIs.

Yes, you may argue: the Scary Idea hasn't been rigorously shown to be true… but what if it IS true?

OK but ... pointing out that something scary is possible, is a very different thing from having an argument that it's likely.

The Scary Idea is certainly something to keep in mind, but there are also many other risks to keep in mind, some much more definite and palpable. Personally, I'm a lot more worried about nasty humans taking early-stage AGIs and using them for massive destruction, than about speculative risks associated with little-understood events like hard takeoffs.

Is Provably Safe or "Friendly" AGI A Feasible Idea?

The Scary Idea posits that if someone creates advanced AGI that isn't somehow provably safe, it's almost sure to kill us all.

But not only am I unconvinced of this, I'm also quite unconvinced that "provably safe" AGI is even feasible.

The idea of provably safe AGI is typically presented as something that would exist within mathematical computation theory or some variant thereof. So that's one obvious limitation of the idea: mathematical computers don't exist in the real world, and real-world physical computers must be interpreted in terms of the laws of physics, and humans' best understanding of the "laws" of physics seems to radically change from time to time. So even if there were a design for provably safe real-world AGI, based on current physics, the relevance of the proof might go out the window when physics next gets revised.

Also, there are always possibilities like: the alien race that is watching us and waiting for us to achieve an IQ of 333, at which point it will swoop down upon us and eat us, or merge with us. We can't rule this out via any formal proof, and we can't meaningfully estimate the odds of it either. Yes, this sounds science-fictional and outlandish; but is it really more outlandish and speculative than the Scary Idea?

Be Part of the Future

100% Free. No Spam. Unsubscribe any time.

A possibility that strikes me as highly likely is that, once we have created advanced AGI and have linked our brains with it collectively, most of our old legacy human ideas (including physical law, aliens, and Friendly AI) will seem extremely limited and ridiculous.

Another issue is that the goal of "Friendliness to humans" or "safety" or whatever you want to call it, is rather nebulous and difficult to pin down. Science fiction has explored this theme extensively. So even if we could prove something about "smart AGI systems with a certain architecture that are guaranteed to achieve goal G," it might be infeasible to apply this to make AGI systems that are safe in the real-world -- simply because we don't know how to boil down the everyday intuitive notions of "safety" or "Friendliness" into a mathematically precise goal G like the proof refers to.

This is related to the point Eliezer Yudkowsky makes that "value is complex" -- actually, human value is not only complex, it's nebulous and fuzzy and ever-shifting, and humans largely grok it by implicit procedural, empathic and episodic knowledge rather than explicit declarative or linguistic knowledge. Transmitting human values to an AGI is likely to be best done via interacting with the AGI in real life, but this is not the sort of process that readily lends itself to guarantees or formalization.

Eliezer has suggested a speculative way of getting human values into AGI systems called Coherent Extrapolated Volition, but I think this is a very science-fictional and incredibly infeasible idea (though a great SF notion). I've discussed it and proposed some possibly more realistic alternatives in a previous blog post (e.g. a notion called Coherent Aggregated Volition). But my proposed alternatives aren't guaranteed-to-succeed nor neatly formalized.

But setting those worries aside, is the computation-theoretic version of provably safe AI even possible? Could one design an AGI system and prove in advance that, given certain reasonable assumptions about physics and its environment, it would never veer too far from its initial goal (e.g. a formalized version of the goal of treating humans safely, or whatever)?

I very much doubt one can do so, except via designing a fictitious AGI that can't really be implemented because it uses infeasibly much computational resources. My GOLEM design, sketched in this article, seems to me a possible path to a provably safe AGI -- but it's too computationally wasteful to be practically feasible.

I strongly suspect that to achieve high levels of general intelligence using realistically limited computational resources, one is going to need to build systems with a nontrivial degree of fundamental unpredictability to them. This is what neuroscience suggests, it's what my concrete AGI design work suggests, and it's what my theoretical work on GOLEM and related ideas suggests. And none of the public output of SIAI researchers or enthusiasts has given me any reason to believe otherwise, yet.

Practical Implications

The above discussion of SIAI's Scary Idea may just sound like fun science-fictional speculation -- but the reason I'm writing this blog post is that when I posted a recent blog post about my current AGI project, the comments field got swamped with SIAI-influenced people saying stuff in the vein of: Creating an AGI without a proof of Friendliness is essentially equivalent to killing all people! So I really hope your OpenCog work fails, so you don't kill everybody!!!

(One amusing/alarming quote from a commentator (probably not someone directly affiliated with SIAI) was "if you go ahead with an AGI when you're not 100% sure that it's safe, you're committing the Holocaust." But it wasn't just one extreme commentator, it was a bunch … and then a bunch of others commenting to me privately via email.)

If one fully accepts SIAI's Scary Idea, then one should not work on practical AGI projects, nor should one publish papers on the theory of how to build AGI systems. Instead, one should spend one's time trying to figure out an AGI design that is somehow provable-in-advance to be a Good Guy. For this reason, SIAI's research group is not currently trying to do any practical AGI work.

Actually, so far as I know, my "GOLEM" AGI design (mentioned above) is closer to a "provably Friendly AI" than anything the SIAI research team has come up with. At least, it's closer than anything they have made public.

However GOLEM is not something that could be practically implemented in the near future. It's horribly computationally inefficient, compared to a real-world AGI design like the OpenCog system I'm now working on (with many others -- actually I'm doing very little programming these days, so happily the project is moving forward with the help of others on the software design and coding side, while I contribute at the algorithm, math, design, theory, management and fundraising levels).

I agree that AGI ethics is a Very Important Problem. But I doubt the problem is most effectively addressed by theory alone. I think the way to come to a useful real-world understanding of AGI ethics is going to be to

build some early-stage AGI systems, e.g. artificial toddlers, scientists' helpers, video game characters, robot maids and butlers, etc.
study these early-stage AGI systems empirically, with a focus on their ethics as well as their cognition
in the usual manner of science, attempt to arrive at a solid theory of AGI intelligence and ethics based on a combination of conceptual and experimental-data considerations
humanity collectively plots the next steps from there, based on the theory we find: maybe we go ahead and create a superhuman AI capable of hard takeoff, maybe we pause AGI development because of the risks, maybe we build an "AGI Nanny" to watch over the human race and prevent AGI or other technologies from going awry. Whatever choice we make then, it will be made based on far better knowledge than we have right now.

So what's wrong with this approach?

Nothing, really -- if you hold the views of most AI researchers or futurists. There are plenty of disagreements about the right path to AGI, but wide and implicit agreement that something like the above path is sensible.

But, if you adhere to SIAI's Scary Idea, there's a big problem with this approach -- because, according to the Scary Idea, there's too huge of a risk that these early-stage AGI systems are going to experience a hard takeoff and self-modify into something that will destroy us all.

But I just don't buy the Scary Idea.

I do see a real risk that, if we proceed in the manner I'm advocating, some nasty people will take the early-stage AGIs and either use them for bad ends, or proceed to hastily create a superhuman AGI that then does bad things of its own volition. These are real risks that must be thought about hard, and protected against as necessary. But they are different from the Scary Idea. And they are not so different from the risks implicit in a host of other advanced technologies.

Conclusion

So, there we go.

I think SIAI is performing a useful service by helping bring these sorts of ideas to the attention of the futurist community (alongside the other services they're performing, like the wonderful Singularity Summits). But, that said, I think the Scary Idea is potentially a harmful one. At least, it WOULD be a harmful one, if more people believed it; so I'm glad it's currently restricted to a rather small subset of the futurist community.

Many people die each day, and many others are miserable for various reasons -- and all sorts of other advanced and potentially dangerous technologies are currently under active development. My own view is that unaided human minds may well be unable to deal with the complexity and risk of the world that human technology is unleashing. I actually suspect that our best hope for survival and growth through the 21st century is to create advanced AGIs to help us on our way -- to cure disease, to develop nanotech and better AGI and invent new technologies; and to help us keep nasty people from doing destructive things with advanced technology.

I think that to avoid actively developing AGI, out of speculative concerns like the Scary Idea, would be an extremely bad idea.

That is, rather than "if you go ahead with an AGI when you're not 100% sure that it's safe, you're committing the Holocaust," I suppose my view is closer to "if you avoid creating beneficial AGI because of speculative concerns, then you're killing my grandma" !! (Because advanced AGI will surely be able to help us cure human diseases and vastly extend and improve human life.)

So perhaps I could adopt the slogan: "You don't have to kill my grandma to avoid the Holocaust!" … but really, folks… Well, you get the point….

Humanity is on a risky course altogether, but no matter what I decide to do with my life and career (and no matter what Bill Joy or Jaron Lanier or Bill McKibben, etc., write), the race is not going to voluntarily halt technological progress. It's just not happening.

We just need to accept the risk, embrace the thrill of the amazing time we were born into, and try our best to develop near-inevitable technologies like AGI in a responsible and ethical way.

And to me, responsible AGI development doesn't mean fixating on speculative possible dangers and halting development until ill-defined, likely-unsolvable theoretical/philosophical issues are worked out to everybody's (or some elite group's) satisfaction.

Rather, it means proceeding with the work carefully and openly, learning what we can as we move along -- and letting experiment and theory grow together ... as they have been doing quite successfully for the last few centuries, at a fantastically accelerating pace.

And so it goes.

Aaron Saenz

A sea-based data center prototype made by startup Panthalassa

In the Scramble to Power AI, Investors Bet $140 Million on Data Centers at Sea

Edd Gent

May 11, 2026

A smartphone with the message "What can I help you with" on its display

You Probably Wouldn’t Notice if a Chatbot Slipped Ads Into Its Responses

Brian Jay Tang

and

Kang G. Shin

May 08, 2026

A doctor with a stethoscope and white coat consults a phone

An AI Just Beat Doctors at Diagnosing ER Patients

Shelly Fan

May 04, 2026

Energy