habryka

Running Lightcone Infrastructure, which runs LessWrong. You can reach me at habryka@lesswrong.com

Sequences

A Moderate Update to your Artificial Priors
A Moderate Update to your Organic Priors
Concepts in formal epistemology

Wiki Contributions

Load More

Comments

habryka15h40

I did that! (I am the primary admin of the site). I copied your comment here just before I took down the duplicate post of yours to make sure it doesn't get lost.

habryka16h50

@henry (who seems to know Nicky) said on a duplicate link post of this: 

This is an accessible introduction to AI Safety, written by Nicky Case and the teens at Hack Club. So far, part 1/3 is completed, which covers a rough timeline of AI advancement up to this point, and what might come next.

If you've got feedback as to how this can be made more understandable, that'd be appreciated! Reach out to Nicky, or to me and I'll get the message to her.

habryka17h70

@jefftk comments on the HN thread on this

How many people would, if they suddenly died, be reported as a "Boeing whistleblower"? The lower this number is, the more surprising the death.

Another HN commenter says (in a different thread): 

It’s a nice little math problem.

Let’s say both of the whistleblowers were age 50. The probability of a 50 year old man dying in a year is 0.6%. So the probability of 2 or more of them dying in a year is 1 - (the probability of exactly zero dying in a year + the probability of exactly one dying in a year). 1 - (A+B).

A is (1-0.006)^N. B is 0.006N(1-0.006)^(N-1). At 60 A is about 70% and B is about 25% making it statistically insignificant.

But they died in the same 2 month period, so that 0.006 should be 0.001. If you rerun the same calculation, it’s 356.

habryka17h3415

Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens. 

Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.

habryka21hΩ17324

Can you confirm or deny whether you signed any NDA related to you leaving OpenAI? 

(I would guess a "no comment" or lack of response or something to that degree implies a "yes" with reasonably high probability. Also, you might be interested in this link about the U.S. labor board deciding that NDA's offered during severance agreements that cover the existence of the NDA itself have been ruled unlawful by the National Labor Relations Board when deciding how to respond here)

habryka21hΩ10143

Thank you for your work there. Curious what specifically prompted you to post this now, presumably you leaving OpenAI and wanting to communicate that somehow?

habryka2d3017

Sorry, what? I thought the fear was that we don't know how to make helpful AI at all. (And that people who think they're being helped by seductively helpful-sounding LLM assistants are being misled by surface appearances; the shoggoth underneath has its own desires that we won't like when it's powerful enough to persue them autonomously.) In contrast, this almost makes it sound like you think it is plausible to align AI to its user's intent, but that this would be bad if the users aren't one of "us"—you know, the good alignment researchers who want to use AI to take over the universe, totally unlike those evil capabilities researchers who want to use AI to produce economically valuable goods and services.

My steelman of this (though to be clear I think your comment makes good points): 

There is a large difference between a system being helpful and a system being aligned. Ultimately AI existential risk is a coordination problem where I expect catastrophic consequences because a bunch of people want to build AGI without making it safe. Therefore making technologies that in a naive and short-term sense just help AGI developers build whatever they want to build will have bad consequences. If I trusted everyone to use their intelligence just for good things, we wouldn't have anthropogenic existential risk on our hands.

Some of those technologies might end up useful for also getting the AI to be more properly aligned, or maybe to help with work that reduces the risk of AI catastrophe some other way, though my current sense is that kind of work is pretty different and doesn't benefit remotely as much from generically locally-helpful AI.

In-general I feel pretty sad about conflating "alignment" with "short-term intent alignment". I think the two problems are related but have really important crucial differences, I don't think the latter generalizes that well to the former (for all the usual sycophancy/treacherous-turn reasons), and indeed progress on the latter IMO mostly makes the world marginally worse because the thing it is most likely to be used for is developing existentially dangerous AI systems faster.

Edit: Another really important dimension to model here is also not just the effect of that kind of research on what individual researchers will do, but what effect this kind of research will have on what the market wants to invest in. My standard story of doom is centrally rooted in there being very strong short-term individual economic incentives to build more capable AGI, enabling people to make billions to trillions of dollars, while the downside risk is a distributed negative externality that is not at all priced into the costs of AI development. Developing applications of AI that make a lot of money without accounting for the negative extinction externalities therefore can be really quite bad for the world. 

Load More