1) It’s not even clear people are going to try to react in the first place. As I say, most AI development is positive. If you implement regulations to fight bad ARA, you are also hindering the whole ecosystem. It’s not clear to me that we are going to do something about open source. You need a big warning shot beforehand and this is not really clear to me that this happens before a catastrophic level. It's clear they're going to react to some kind of ARAs (like chaosgpt), but there might be some ARAs they won't react to at all.
2) it’s not clear this defense (say for example Know Your Customer for providers) is going to be sufficiently effective to completely clean the whole mess. if the AI is able to hide successfully on laptops + cooperate with some humans, this is going to be really hard to shut it down. We have to live with this endemic virus. The only way around this is cleaning the virus with some sort of pivotal act, but I really don’t like that.

While doing all that, in order to stay relevant, they'll need to recursively self-improve at the same rate at which leading AI labs are making progress, but with far fewer computational resources.

"at the same rate" not necessarily. If we don't solve alignment and we implement a pause on AI development in labs, the ARA AI may still continue to develop. The real crux is how much time the ARA AI needs to evolve into something scary.

Superintelligences could do all of this, and ARA of superintelligences would be pretty terrible. But for models in the broad human or slightly-superhuman ballpark, ARA seems overrated, compared with threat models that involve subverting key human institutions.

We don't learn much here. From my side, I think that superintelligence is not going to be neglected, and big labs are taking this seriously already. I’m still not clear on ARA.

Remember, while the ARA models are trying to survive, there will be millions of other (potentially misaligned) models being deployed deliberately by humans, including on very sensitive tasks (like recursive self-improvement). These seem much more concerning.

This is not the central point. The central point is:

At some point, ARA is unshutdownable unless you try hard with a pivotal cleaning act. We may be stuck with a ChaosGPT forever, which is not existential, but pretty annoying. People are going to die.
the ARA evolves over time. Maybe this evolution is very slow, maybe fast. Maybe it plateaus, maybe it does not plateau. I don't know
This may take an indefinite number of years, but this can be a problem

the "natural selection favors AIs over humans" argument is a fairly weak one; you can find some comments I've made about this by searching my twitter.

I’m pretty surprised by this. I’ve tried to google and not found anything.

Overall, I think this still deserves more research

Reply

1

We might be dropping the ball on Autonomous Replication and Adaptation.

Charbel-Raphaël1dΩ120

Why not! There are many many questions that were not discussed here because I just wanted to focus on the core part of the argument. But I agree details and scenarios are important, even if I think this shouldn't change too much the basic picture depicted in the OP.

Here are some important questions that were voluntarily omitted from the QA for the sake of not including stuff that fluctuates too much in my head;

would we react before the point of no return?
Where should we place the red line? Should this red line apply to labs?
Is this going to be exponential? Do we care?
What would it look like if we used a counter-agent that was human-aligned?
What can we do about it now concretely? Is KYC something we should advocate for?
Don’t you think an AI capable of ARA would be superintelligent and take-over anyway?
What are the short term bad consequences of early ARA? What does the transition scenario look like.
Is it even possible to coordinate worldwide if we agree that we should?
How much human involvement will be needed in bootstrapping the first ARAs?

We plan to write more about these with @Épiphanie Gédéon in the future, but first it's necessary to discuss the basic picture a bit more.

Reply

Awakening

Charbel-Raphaël2d40

Thanks for writing this.

I like your writing style, this inspired me to read a few more things

Reply

Brainstorming positive visions of AI

Charbel-Raphaël1mo20

Seems like we are here today

Reply

AI Safety Camp final presentations

Charbel-Raphaël1mo30

are the talks recorded?

Reply

Constructability: Plainly-coded AGIs may be feasible in the near future

Charbel-Raphaël1mo20

Corrected

Reply

Constructability: Plainly-coded AGIs may be feasible in the near future

Charbel-Raphaël1mo50

[We don't think this long term vision is a core part of constructability, this is why we didn't put it in the main post]

We asked ourselves what should we do if constructability works in the long run.

We are unsure, but here are several possibilities.

Constructability could lead to different possibilities depending on how well it works, from most to less ambitious:

Using GPT-6 to implement GPT-7-white-box (foom?)
Using GPT-6 to implement GPT-6-white-box
Using GPT-6 to implement GPT-4-white-box
Using GPT-6 to implement Alexa++, a humanoid housekeeper robot that cannot learn
Using GPT-6 to implement AlexNet-white-box
Using GPT-6 to implement a transparent expert system that filters CVs without using protected features

Comprehensive AI services path

We aim to reach the level of Alexa++, which would already be very useful: No more breaking your back to pick up potatoes. Compared to the robot Figure01, which could kill you if your neighbor jailbreaks it, our robot seems safer and would not have the capacity to kill, but only put the plates in the dishwasher, in the same way that today’s Alexa cannot insult you.

Fully autonomous AGI, even if transparent, is too dangerous. We think that aiming for something like Comprehensive AI Services would be safer. Our plan would be part of this, allowing for the creation of many small capable AIs that may compose together (for instance, in the case of a humanoid housekeeper, having one function to do the dishes, one function to walk the dog, …).

Alexa++ is not an AGI but is already fine. It even knows how to do a backflip Boston dynamics style. Not enough for a pivotal act, but so stylish. We can probably have a nice world without AGI in the wild.

The Liberation path

Another possible moonshot theory of impact would be to replace GPT-7 with GPT-7-plain-code. Maybe there's a "liberation speed n" at which we can use GPT-n to directly code GPT-p with p>n. That would be super cool because this would free us from deep learning.

Different long term paths that we see with constructability.

Guided meditation path

You are not really enlightened if you are not able to code yourself.

Maybe we don't need to use something as powerful as GPT-7 to begin this journey.

We think that with significant human guidance, and by iterating many many times, we could meander iteratively towards a progressive deconstruction of GPT-5.

We could use current models as a reference to create slightly more transparent and understandable models, and use them as reference again and again until we arrive at a fully plain-coded model.

Going from GPT-5 to GPT-2-hybrid seems possible to us.
Improving GPT-2-hybrid to GPT-3-hybrid may be possible with the help of GPT-5?
...

If successful, this path could unlock the development of future AIs using constructability instead of deep learning. If constructability done right is more data efficient than deep learning, it could simply replace deep learning and become the dominant paradigm. This would be a much better endgame position for humans to control and develop future advanced AIs.

Path	Feasibility	Safety
Comprehensive AI Services	Very feasible	Very safe but unstable in the very long run
Liberation	Feasible	Unsafe but could enable a pivotal act that makes things stable in the long run
Guided Meditation	Very Hard	Fairly safe and could unlock a safer tech than deep learning which results in a better end-game position for humanity.

Reply

A Dilemma in AI Suffering/Happiness

Charbel-Raphaël1mo2-1

You might be interested in reading this. I think you are reasoning in an incorrect framing.

Reply

Effectively Handling Disagreements - Introducing a New Workshop

Charbel-Raphaël2mo96

I have tried Camille's in-person workshop in the past and was very happy with it. I highly recommend it. It helped me discover many unknown unknowns.

Reply

1

What convincing warning shot could help prevent extinction from AI?

Answer by Charbel-RaphaëlApr 15, 202480

Deleted paragraph from the post, that might answer the question:

Surprisingly, the same study found that even if there were an escalation of warning shots that ended up killing 100k people or >$10 billion in damage (definition), skeptics would only update their estimate from 0.10% to 0.25% ^[1]: There is a lot of inertia, we are not even sure this kind of “strong” warning shot would happen, and I suspect this kind of big warning shot could happen beyond the point of no return because this type of warning shot requires autonomous replication and adaptation abilities in the wild.

^{^}
It may be because they expect a strong public reaction. But even if there was a 10-year global pause, what would happen after the pause? This explanation does not convince me. Did the government prepare for the next covid?

Reply