Quick Takes

robo189

Our current big stupid: not preparing for 40% agreement

Epistemic status: lukewarm take from the gut (not brain) that feels rightish

The "Big Stupid" of the AI doomers 2013-2023 was AI nerds' solution to the problem "How do we stop people from building dangerous AIs?" was "research how to build AIs".  Methods normal people would consider to stop people from building dangerous AIs, like asking governments to make it illegal to build dangerous AIs, were considered gauche.  When the public turned out to be somewhat receptive to the idea of regulating ... (read more)

Showing 3 of 4 replies (Click to show all)
2TsviBT
Well I asked this https://www.lesswrong.com/posts/X9Z9vdG7kEFTBkA6h/what-could-a-policy-banning-agi-look-like but roughly no one was interested--I had to learn about "born secret" https://en.wikipedia.org/wiki/Born_secret from Eric Weinstein in a youtube video. FYI, while restricting compute manufacture is I would guess net helpful, it's far from a solution. People can make plenty of conceptual progress given current levels of compute https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce . It's not a way out, either. There are ways possibly-out. But approximately no one is interested in them.
robo10

I think this is the sort of conversation we should be having!  [Side note: I think restricting compute is more effective than restricting research because you don't need 100% buy in.

  1. it's easier to prevent people from manufacturing semiconductors than to keep people from learning ideas that fit on a napkin
  2. It's easier to prevent scientists in Eaccistan from having GPUs than to prevent scientists in Eaccistan from thinking.

The analogy to nuclear weapons is, I think, a good one.  The science behind nuclear weapons is well known -- what keeps them fro... (read more)

4gilch
Not quite. It was to research how to build friendly AIs. We haven't succeeded yet. What research progress we have made points to the problem being harder than initially thought, and capabilities turned out to be easier than most of us expected as well. Considered by whom? Rationalists? The public? The public would not have been so supportive before ChatGPT, because most everybody didn't expect general AI so soon, if they thought about the topic at all. It wasn't an option at the time. Talking about this at all was weird, or at least niche, certainly not something one could reasonably expect politicians to care about. That has changed, but only recently. I don't particularly disagree with your prescription in the short term, just your history. That said, politics isn't exactly our strong suit. But even if we get a pause, this only buys us some time. In the long(er) term, I think either the Singularity or some kind of existential catastrophe is inevitable. Those are the attractor states. Our current economic growth isn't sustainable without technological progress to go with it. Without that, we're looking at civilizational collapse. But with that, we're looking at ever widening blast radii for accidents or misuse of more and more powerful technology. Either we get smarter about managing our collective problems, or they will eventually kill us. Friendly AI looked like the way to do that. If we solve that one problem, even without world cooperation, it solves all the others for us. It's probably not the only way, but it's not clear the alternatives are any easier. What would you suggest? I can think of three alternatives. First, the most mundane (but perhaps most difficult), would be an adequate world government. This would be an institution that could easily solve climate change, ban nuclear weapons (and wars in general), etc. Even modern stable democracies are mostly not competent enough. Autocracies are an obstacle, and some of them have nukes. We are not on tra

Very Spicy Take

Epistemic Note: 
Many highly respected community members with substantially greater decision making experience (and Lesswrong karma) presumably disagree strongly with my conclusion.

Premise 1: 
It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research.

Premise 2:
This was the default outcome. 

Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule. 

Premise 3:... (read more)

Showing 3 of 39 replies (Click to show all)

I don't know the answer, but it would be fun to have a twitter comment with a zillion likes asking Sam Altman this question.  Maybe someone should make one?

1Rebecca
OpenAI wasn’t a private company (ie for-profit) at the time of the OP grant though.
1Ebenezer Dukakis
So basically, I think it is a bad idea and you think we can't do it anyway. In that case let's stop calling for it, and call for something more compassionate and realistic like a public apology. I'll bet an apology would be a more effective way to pressure OpenAI to clean up its act anyways. Which is a better headline -- "OpenAI cofounder apologizes for their role in creating OpenAI", or some sort of internal EA movement drama? If we can generate a steady stream of negative headlines about OpenAI, there's a chance that Sam is declared too much of a PR and regulatory liability. I don't think it's a particularly good plan, but I haven't heard a better one.

I promise I won't just continue to re-post a bunch of papers, but this one seems relevant to many around these parts. In particular @Elizabeth (also, sorry if you dislike being at-ed like that).

Associations of dietary patterns with brain health from behavioral, neuroimaging, biochemical and genetic analyses

Food preferences significantly influence dietary choices, yet understanding natural dietary patterns in populations remains limited. Here we identifiy four dietary subtypes by applying data-driven approaches to food-liking data from 181,990 UK Biobank pa

... (read more)

Regarding the situation at OpenAI, I think it's important to keep a few historical facts in mind:

  1. The AI alignment community has long stated that an ideal FAI project would have a lead over competing projects. See e.g. this post:

Requisite resource levels: The project must have adequate resources to compete at the frontier of AGI development, including whatever mix of computational resources, intellectual labor, and closed insights are required to produce a 1+ year lead over less cautious competing projects.

  1. The scaling hypothesis wasn't obviously t
... (read more)

A Theory of Usable Information Under Computational Constraints

We propose a new framework for reasoning about information in complex systems. Our foundation is based on a variational extension of Shannon's information theory that takes into account the modeling power and computational constraints of the observer. The resulting \emph{predictive V-information} encompasses mutual information and other notions of informativeness such as the coefficient of determination. Unlike Shannon's mutual information and in violation of the data processing inequality, V-

... (read more)
4Alexander Gietelink Oldenziel
Can somebody explain to me what's happening in this paper ?

My reading is their definition of conditional predictive entropy is the naive generalization of Shannon's conditional entropy given that the way that you condition on some data is restricted to only being able to implement functions of a particular class. And the corresponding generalization of mutual information becomes a measure of how much more predictable does some piece of information become (Y) given evidence (X) compared to no evidence.

For example, the goal of public key cryptography cannot be to make the mutual information between a plaintext, and ... (read more)

1[comment deleted]

My timelines are lengthening. 

I've long been a skeptic of scaling LLMs to AGI *. To me I fundamentally don't understand how this is even possible. It must be said that very smart people give this view credence. davidad, dmurfet. on the other side are vanessa kosoy and steven byrnes. When pushed proponents don't actually defend the position that a large enough transformer will create nanotech or even obsolete their job. They usually mumble something about scaffolding.

I won't get into this debate here but I do want to note that my timelines have lengthe... (read more)

Showing 3 of 24 replies (Click to show all)
2Alexander Gietelink Oldenziel
In my mainline model there are only a few innovations needed, perhaps only a single big one to product an AGI which just like the Turing Machine sits at the top of the Chomsky Hierarchy will be basically the optimal architecture given resource constraints. There are probably some minor improvements todo with bridging the gap between theoretically optimal architecture and the actual architecture, or parts of the algorithm that can be indefinitely improved but with diminishing returns (these probably exist due to Levin and possibly.matrix.multiplication is one of these). On the whole I expect AI research to be very chunky. Indeed, we've seen that there was really just one big idea to all current AI progress: scaling, specifically scaling GPUs on maximally large undifferentiated datasets. There were some minor technical innovations needed to pull this off but on the whole that was the clinger. Of course, I don't know. Nobody knows. But I find this the most plausible guess based on what we know about intelligence, learning, theoretical computer science and science in general.

There are two kinds of relevant hypothetical innovations: those that enable chatbot-led autonomous research, and those that enable superintelligence. It's plausible that there is no need for (more of) the former, so that mere scaling through human efforts will lead to such chatbots in a few years regardless. (I think it's essentially inevitable that there is currently enough compute that with appropriate innovations we can get such autonomous human-scale-genius chatbots, but it's unclear if these innovations are necessary or easy to discover.) If autonomou... (read more)

2Alexander Gietelink Oldenziel
My timelines were not 2026. In fact, I made bets against doomers 2-3 years ago, one will resolve by next year. I agree iterative improvements are significant. This falls under "naive extrapolation of scaling laws". By nanotech I mean something akin to drexlerian nanotech or something similarly transformative in the vicinity. I think it is plausible that a true ASI will be able to make rapid progress (perhaps on the order of a few years or a decade) on nanotech. I suspect that people that don't take this as a serious possibility haven't really thought through what AGI/ASI means + what the limits and drivers of science and tech really are; I suspect they are simply falling prey to status-quo bias.

I'm surprised at people who seem to be updating only now about OpenAI being very irresponsible, rather than updating when they created a giant public competitive market for chatbots (which contains plenty of labs that don't care about alignment at all), thereby reducing how long everyone has to solve alignment. I still parse that move as devastating the commons in order to make a quick buck.

Showing 3 of 7 replies (Click to show all)

I disagree. This whole saga has introduced the Effective Altruism movement to people at labs that hadn't thought about alignment.

From my understanding openai isn't anywhere close to breaking even from chatgpt and I can't think of any way a chatbot could actually be monetized.

1Ebenezer Dukakis
In the spirit of trying to understand what actually went wrong here -- IIRC, OpenAI didn't expect ChatGPT to blow up the way it did. Seems like they were playing a strategy of "release cool demos" as opposed to "create a giant competitive market".
2Garrett Baker
Who is updating? I haven't seen anyone change their mind yet.

Sometimes I forget to take a dose of methylphenidate. As my previous dose fades away, I start to feel much worse than baseline. I then think "Oh no, I'm feeling so bad, I will not be able to work at all."

But then I remember that I forgot to take a dose of methylphenidate and instantly I feel a lot better.

Usually, one of the worst things when I'm feeling down is that I don't know why. But now, I'm in this very peculiar situation where putting or not putting some particular object into my mouth is the actual cause. It's hard to imagine something more tangibl... (read more)

Lorxus-20

Wait, some of y'all were still holding your breaths for OpenAI to be net-positive in solving alignment?

After the whole "initially having to be reminded alignment is A Thing"? And going back on its word to go for-profit? And spinning up a weird and opaque corporate structure? And people being worried about Altman being power-seeking? And everything to do with the OAI board debacle? And OAI Very Seriously proposing what (still) looks to me to be like a souped-up version of Baby Alignment Researcher's Master Plan B (where A involves solving physics and C invo... (read more)

Akash189

My current perspective is that criticism of AGI labs is an under-incentivized public good. I suspect there's a disproportionate amount of value that people could have by evaluating lab plans, publicly criticizing labs when they break commitments or make poor arguments, talking to journalists/policymakers about their concerns, etc.

Some quick thoughts:

  • Soft power– I think people underestimate the how strong the "soft power" of labs is, particularly in the Bay Area. 
  • JobsA large fraction of people getting involved in AI safety are interested in the
... (read more)
21Zach Stein-Perlman
Sorry for brevity, I'm busy right now. 1. Noticing good stuff labs do, not just criticizing them, is often helpful. I wish you thought of this work more as "evaluation" than "criticism." 2. It's often important for evaluation to be quite truth-tracking. Criticism isn't obviously good by default. Edit: 3. I'm pretty sure OP likes good criticism of the labs; no comment on how OP is perceived. And I think I don't understand your "good judgment" point. Feedback I've gotten on AI Lab Watch from senior AI safety people has been overwhelmingly positive, and of course there's a selection effect in what I hear, but I'm quite sure most of them support such efforts. 4. Conjecture (not exclusively) has done things that frustrated me, including in dimensions like being "'unilateralist,' 'not serious,' and 'untrustworthy.'" I think most criticism of Conjecture-related advocacy is legitimate and not just because people are opposed to criticizing labs. 5. I do agree on "soft power" and some of "jobs." People often don't criticize the labs publicly because they're worried about negative effects on them, their org, or people associated with them.
Akash84

RE 1& 2:

Agreed— my main point here is that the marketplace of ideas undervalues criticism.

I think one perspective could be “we should all just aim to do objective truth-seeking”, and as stated I agree with it.

The main issue with that frame, imo, is that it’s very easy to forget that the epistemic environment can be tilted in favor of certain perspectives.

EG I think it can be useful for “objective truth-seeking efforts” to be aware of some of the culture/status games that underincentivize criticism of labs & amplify lab-friendly perspectives.

RE 3:

Go... (read more)

simeon_c13075

Idea: Daniel Kokotajlo probably lost quite a bit of money by not signing an OpenAI NDA before leaving, which I consider a public service at this point. Could some of the funders of the AI safety landscape give some money or social reward for this?

I guess reimbursing everything Daniel lost might be a bit too much for funders but providing some money, both to reward the act and incentivize future safety people to not sign NDAs would have a very high value. 

Showing 3 of 24 replies (Click to show all)
11habryka
Yeah, at the time I didn't know how shady some of the contracts here were. I do think funding a legal defense is a marginally better use of funds (though my guess is funding both is worth it).
3Yonatan Cale
@habryka , Would you reply to this comment if there's an opportunity to donate to either? Me and another person are interested, and others could follow this comment too if they wanted to (only if it's easy for you, I don't want to add an annoying task to your plate)
habryka30

Sure, I'll try to post here if I know of a clear opportunity to donate to either. 

William_SΩ731669

I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to t... (read more)

Reply151032
Showing 3 of 35 replies (Click to show all)

They would not know if others have signed the SAME NDAs without trading information about their own NDAs, which is forbidden.

25tlevin
Kelsey Piper now reports: "I have seen the extremely restrictive off-boarding agreement that contains nondisclosure and non-disparagement provisions former OpenAI employees are subject to. It forbids them, for the rest of their lives, from criticizing their former employer. Even acknowledging that the NDA exists is a violation of it."
2wassname
Interesting! For most of us, this is outside our area of competence, so appreciate your input.

From my perspective, the only thing that keeps the OpenAI situation from being all kinds of terrible is that I continue to think they're not close to human-level AGI, so it probably doesn't matter all that much.

This is also my take on AI doom in general; my P(doom|AGI soon) is quite high (>50% for sure), but my P(AGI soon) is low. In fact it decreased in the last 12 months.

TurnTroutΩ7150

Apparently[1] there was recently some discussion of Survival Instinct in Offline Reinforcement Learning (NeurIPS 2023). The results are very interesting: 

On many benchmark datasets, offline RL can produce well-performing and safe policies even when trained with "wrong" reward labels, such as those that are zero everywhere or are negatives of the true rewards. This phenomenon cannot be easily explained by offline RL's return maximization objective. Moreover, it gives offline RL a degree of robustness that is uncharacteristic of its online RL count

... (read more)
Showing 3 of 4 replies (Click to show all)
Algon20

Because future rewards are discounted

Don't you mean future values? Also, AFAICT, the only thing going on here that seperates online from offline RL is that offline RL algorithms shape the initial value function to give conservative behaviour. And so you get conservative behaviour.

1Tao Lin
One lesson you could take away from this is "pay attention to the data, not the process" - this happened because the data had longer successes than failures. If successes were more numerous than failures, many algorithms would have imitated those as well with null reward.
6tailcalled
The paper sounds fine quality-wise to me, I just find it implausible that it's relevant for important alignment work, since the proposed mechanism is mainly an aversion to building new capabilities.

Several dozen people now presumably have Lumina in their mouths. Can we not simply crowdsource some assays of their saliva? I would chip money in to this. Key questions around ethanol levels, aldehyde levels, antibacterial levels, and whether the organism itself stays colonized at useful levels.

Showing 3 of 6 replies (Click to show all)
3Lorxus
Surely so! Hit me up if you ever end doing this - I'm likely getting the Lumina treatment in a couple months.
1wassname
A before and after would be even better!
Lorxus10

Any recommendations on how I should do that? You may assume that I know what a gas chromatograph is and what a Petri dish is and why you might want to use either or both of those for data collection, but not that I have any idea of how to most cost-effectively access either one as some rando who doesn't even have a MA in Chemistry.

  1. We inhabit this real material world, the one which we perceive all around us (and which somehow gives rise to perceptive and self-conscious beings like us).
  2. Though not all of our perceptions conform to a real material world. We may be fooled by things like illusions or hallucinations or dreams that mimic perceptions of this world but are actually all in our minds.
  3. Indeed if you examine your perceptions closely, you'll see that none of them actually give you representations of the material world, but merely reactions to it.
  4. In fact, since the only evidence we
... (read more)
Showing 3 of 11 replies (Click to show all)
2Richard_Kennaway
Dragging files around in a GUI is a familiar action that does known things with known consequences. Somewhere on the hard disc (or SSD, or somewhere in the cloud, etc.) there is indeed a "file" which has indeed been "moved" into a "folder", and taking off those quotation marks only requires some background knowledge (which in fact I have) of the lower-level things that are going on and which the GUI presents to me through this visual metaphor. Some explanations work better than others. The idea that there is stuff out there that gives rise to my perceptions, and which I can act on with predictable results, seems to me the obvious explanation that any other contender will have to do a great deal of work to topple from the plinth. The various philosophical arguments over doctrines such as "idealism", "realism", and so on are more like a musical recreation (see my other comment) than anything to take seriously as a search for truth. They are hardly the sort of thing that can be right or wrong, and to the extent that they are, they are all wrong. Ok, that's my personal view of a lot of philosophy, but I'm not the only one.
2David Gross
It sounds like you want to say things like "coherence and persistent similarity of structure in perceptions demonstrates that perceptions are representations of things external to the perceptions themselves" or "the idea that there is stuff out there seems the obvious explanation" or "explanations that work better than others are the best alternatives in the search for truth" and yet you also want to say "pish, philosophy is rubbish; I don't need to defend an opinion about realism or idealism or any of that nonsense". In fact what you're doing isn't some alternative to philosophy, but a variety of it.

Some philosophy is rubbish. Quite a lot, I believe. And with a statement such as "perceptions are caused by things external to the perceptions themselves", which I find unremarkable in itself as a prima facie obvious hypothesis to run with, there is a tendency for philosophers to go off the rails immediately by inventing precise definitions of words such as "perceptions", "are", and "caused", and elaborating all manner of quibbles and paradoxes. Hence the whole tedious catalogue of realisms.

Science did not get anywhere by speculating on whether there are four or five elements and arguing about their natures.

On an apparent missing mood - FOMO on all the vast amounts of automated AI safety R&D that could (almost already) be produced safely 

Automated AI safety R&D could results in vast amounts of work produced quickly. E.g. from Some thoughts on automating alignment research (under certain assumptions detailed in the post): 

each month of lead that the leader started out with would correspond to 15,000 human researchers working for 15 months.

Despite this promise, we seem not to have much knowledge when such automated AI safety R&D might happ... (read more)

Showing 3 of 5 replies (Click to show all)
1Bogdan Ionut Cirstea
Seems like probably the modal scenario to me too, but even limited exceptions like the one you mention seem to me like they could be very important to deploy at scale ASAP, especially if they could be deployed using non-x-risky systems (e.g. like current ones, very bad at DC evals). This seems good w.r.t. automated AI safety potentially 'piggybacking', but bad for differential progress. Sure, though wouldn't this suggest at least focusing hard on (measuring / eliciting) what might not come at the same time? 
2ryan_greenblatt
Why think this is important to measure or that this already isn't happening? E.g., on the current model organism related project I'm working on, I automate inspecting reasoning traces in various ways. But I don't feel like there is any particularly interesting thing going on here which is important to track (e.g. this tip isn't more important than other tips for doing LLM research better).

Intuitively, I'm thinking of all this as something like a race between [capabilities enabling] safety and [capabilities enabling dangerous] capabilities (related: https://aligned.substack.com/i/139945470/targeting-ooms-superhuman-models); so from this perspective, maintaining as large a safety buffer as possible (especially if not x-risky) seems great. There could also be something like a natural endpoint to this 'race', corresponding to being able to automate all human-level AI safety R&D safely (and then using this to produce a scalable solution to a... (read more)

keltan134

Note to self, write a post about the novel akrasia solutions I thought up before becoming a rationalist.

  • Figuring out how to want to want to do things
  • Personalised advertising of Things I Wanted to Want to Do
  • What I do when all else fails
5trevor
Have you tried whiteboarding-related techniques? I think that suddenly starting to using written media (even journals), in an environment without much or any guidance, is like pressing too hard on the gas; you're gaining incredible power and going from zero to one on things faster than you ever have before.  Depending on their environment and what they're interested in starting out, some people might learn (or be shown) how to steer quickly, whereas others might accumulate/scaffold really lopsided optimization power and crash and burn (e.g. getting involved in tons of stuff at once that upon reflection was way too much for someone just starting out).
keltan10

This seems incredibly interesting to me. Googling “White-boarding techniques” only gives me results about digitally shared idea spaces. Is this what you’re referring to? I’d love to hear more on this topic.

4keltan
Maybe I could even write a sequence on this?

Unfortunately, it looks like non-disparagement clauses aren't unheard of in general releases:

http://www.shpclaw.com/Schwartz-Resources/severance-and-release-agreements-six-6-common-traps-and-a-rhetorical-question

Release Agreements commonly include a “non-disparagement” clause – in which the employee agrees not to disparage “the Company.”

https://joshmcguirelaw.com/civil-litigation/adventures-in-lazy-lawyering-the-broad-general-release

The release had a very broad definition of the company (including officers, directors, shareholders, etc.), but a fairly reas

... (read more)
Wei Dai502

AI labs are starting to build AIs with capabilities that are hard for humans to oversee, such as answering questions based on large contexts (1M+ tokens), but they are still not deploying "scalable oversight" techniques such as IDA and Debate. (Gemini 1.5 report says RLHF was used.) Is this more good news or bad news?

Good: Perhaps RLHF is still working well enough, meaning that the resulting AI is following human preferences even out of training distribution. In other words, they probably did RLHF on large contexts in narrow distributions, with human rater... (read more)

Showing 3 of 8 replies (Click to show all)

Bad: AI developers haven't taken alignment seriously enough to have invested enough in scalable oversight, and/or those techniques are unworkable or too costly, causing them to be unavailable.

Turns out at least one scalable alignment team has been struggling for resources. From Jan Leike (formerly co-head of Superalignment at OpenAI):

Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.

Even worse, apparently the whole Supera... (read more)

4ryan_greenblatt
I'm skeptical that increased scale makes hacking the reward model worse. Of course, it could (and likely will/does) make hacking human labelers more of a problem, but this isn't what the comment appears to be saying. Note that the reward model is of the same scale as the base model, so the relative scale should be the same. This also contradicts results from an earlier paper by Leo Gao. I think this paper is considerably more reliable than the comment overall, so I'm inclined to believe the paper or think that I'm misunderstanding the comment. Additionally, from first principles I think that RLHF sample efficiency should just increase with scale (at least with well tuned hyperparameters) and I think I've heard various things that confirm this.
2ryan_greenblatt
Oops, fixed.
Load More