One winter a grasshopper, starving and frail, approaches a colony of ants drying out their grain in the sun to ask for food, having spent the summer singing and dancing.

Then, various things happen.

Customize


Things are getting scary with the Trump regime. Rule of law is breaking down with regard to immigration enforcement and basic human rights are not being honored.

I'm kind of dumbfounded because this is worse than I expected things to get. Do any of you LessWrongers have a sense of whether these stories are exaggerated or if they can be taken at face value?

Deporting immigrants is nothing new, but I don't think previous administrations have committed these sorts of human rights violations and due process violations. 

 

Krome detention center in Miami -- overcrowded and possibly without access to sufficient drinking water

https://www.miamiherald.com/news/local/immigration/article303485356.html
https://www.yahoo.com/news/unpacking-claims-ice-holding-4k-220400460.html

https://www.instagram.com/jaxxchismetalk/reel/DHjxaXzAddP/

https://www.instagram.com/catpowerofficial/reel/DHhsiMvJ8BT/people-are-dying-under-ice-detainment-in-miamiand-this-video-is-from-last-weekpl/

 

Canadian in the US legally to apply for a work visa detained 2 weeks by ICE without due process

https://www.theguardian.com/us-news/2025/mar/19/canadian-detained-us-immigration-jasmine-mooney

Things are getting scary with the Trump regime. Rule of law is breaking down with regard to immigration enforcement and basic human rights are not being honored. I'm kind of dumbfounded because this is worse than I expected things to get. Do any of you LessWrongers have a sense of whether these stories are exaggerated or if they can be taken at face value? Deporting immigrants is nothing new, but I don't think previous administrations have committed these sorts of human rights violations and due process violations.    Krome detention center in Miami -- overcrowded and possibly without access to sufficient drinking water https://www.miamiherald.com/news/local/immigration/article303485356.html https://www.yahoo.com/news/unpacking-claims-ice-holding-4k-220400460.html https://www.instagram.com/jaxxchismetalk/reel/DHjxaXzAddP/ https://www.instagram.com/catpowerofficial/reel/DHhsiMvJ8BT/people-are-dying-under-ice-detainment-in-miamiand-this-video-is-from-last-weekpl/   Canadian in the US legally to apply for a work visa detained 2 weeks by ICE without due process https://www.theguardian.com/us-news/2025/mar/19/canadian-detained-us-immigration-jasmine-mooney

here is how to cast a spell. (quick writeup of a talk i gave at DunCon)

Short Version
1. have an intention for change.
2. take a symbolic action representing the change.

that's it. that's all you need. but also, if you want to do it more:

Long Version

Prep

1. get your guts/intuition/S1/unconscious in touch with your intention somehow, and work through any tangles you find.

2. choose a symbolic action representing (and ideally participating in) the change you want, and design your spell around that. to find the right action, try this thought experiment: one day you wake up, and find you're living in the future where you've completely succeeded at getting the thing you want. you stretch and open your eyes. what's the very first thing that tips you off that something's different?

3. when it's time, lay out your materials and do any other physical/logistical prep work.

The Spell Itself

4. dedicate a sacred space (aka "cast a circle"). for example, carry incense around the space three times, or pour a circle of oats, or just move stuff out of the way. maybe ask something that vibes with protection and/or sacredness for you to help (such as a nearby river, or the memory of your grandfather).

5. raise energy. this means, do something that requires effort and focus and that invites your unconscious to the party while getting it engaged with the materials of your spell. maybe chant something relevant, or hold your hands over the magic herbs while pretending you can pour your intention into them. this is not the symbolic action; it just prepares you for the symbolic action.

6. release energy (that is, cast the spell). take the symbolic action you've planned. cut the string, drink the potion, burn the paper, whatever.

Conclude

7. seal the spell. do something that means "i'm now on the other side, in the post-spell world". the wiccans say "so mote it be". the christians say "amen". if your spell involved a jar, you could literally seal it with wax. i once did a spell involving a pair of earrings, which i sealed by putting the earrings on.

8. ground. "return excess energy to the earth." do something that gradually brings you back to a more ordinary state of mind. maybe imagine roots reaching down from your body deep into the earth and pretend you can equalize with the ground by sending your breath down and up those roots. or shake yourself like a wet dog. or eat a piece of cake.

9. release the space. thank whatever helped you cast the circle, then take an action that's the reverse of whatever you did to dedicate the space. walk the circle backward, or put all the furniture back.

Reinforce

10. choose at least one concrete action to take in line with your intention, and take that action. bonus points if it's a repeated action, like "every sunday i'll call my mom"

now, obviously 10 is the physical mechanism by which a change might happen in the physical world. but i recommend trying to answer for yourself, "how could including 1-9 possibly be superior in some way to jumping straight to 10?"


Hypnotic Framing

one more model of spellcasting: a spell is a self-hypnosis script.

standard hypnosis has 6 parts: 
1. pre-talk
2. an induction
3. a deepener (sometimes)
4. a suggestion(s) (pre- or post-hypnotic)
5. an awakener
6. following a post-hypnotic suggestion if one was given

In the Long Version above, 

1-3 is pre-talk,
4 is an induction
5 is a deepener
6 and 7 are a post-hypnotic suggestion
8 and 9 are awakeners
10 is following the post-hypnotic suggestion.
 

3LoganStrohl
btw, although i've read a lot of witchy books at this point, this specific framework is most heavily influenced by the book "Spellcrafting" by Arin Murphy-Hiscock.
2trevor
An aspect where I expect further work to pay off is stuff related to self-visualization, which is fairly powerful (e.g. visualizing yourself doing something for 10 hours will generally go a really long way to getting you there, and for the 10 hour thing it's more a question of what to do when something goes wrong enough to make the actul events sufficiently different from what you imagined, and how to do it in less than 10 hours).
2Garrett Baker
It seems reasonable to mention that I know of many who have started doing "spells" like this, with a rationalized "oh I'm just hypnotizing myself, I don't actually believe in magic" framing who then start to go off the deep-end and start actually believing in magic. That's not to say this happens in every case or even in most cases. Its also not to say that hypnotizing yourself can't be useful sometimes. But it is to say that if you find this tempting to do because you really like the idea of magic existing in real life, I suggest you re-read some parts of the sequences.
2Garrett Baker
(you also may want to look into other ways of improving your conscientiousness if you're struggling with that. Things like todo systems, or daily planners, or simply regularly trying hard things)
here is how to cast a spell. (quick writeup of a talk i gave at DunCon) Short Version 1. have an intention for change. 2. take a symbolic action representing the change. that's it. that's all you need. but also, if you want to do it more: Long Version Prep 1. get your guts/intuition/S1/unconscious in touch with your intention somehow, and work through any tangles you find. 2. choose a symbolic action representing (and ideally participating in) the change you want, and design your spell around that. to find the right action, try this thought experiment: one day you wake up, and find you're living in the future where you've completely succeeded at getting the thing you want. you stretch and open your eyes. what's the very first thing that tips you off that something's different? 3. when it's time, lay out your materials and do any other physical/logistical prep work. The Spell Itself 4. dedicate a sacred space (aka "cast a circle"). for example, carry incense around the space three times, or pour a circle of oats, or just move stuff out of the way. maybe ask something that vibes with protection and/or sacredness for you to help (such as a nearby river, or the memory of your grandfather). 5. raise energy. this means, do something that requires effort and focus and that invites your unconscious to the party while getting it engaged with the materials of your spell. maybe chant something relevant, or hold your hands over the magic herbs while pretending you can pour your intention into them. this is not the symbolic action; it just prepares you for the symbolic action. 6. release energy (that is, cast the spell). take the symbolic action you've planned. cut the string, drink the potion, burn the paper, whatever. Conclude 7. seal the spell. do something that means "i'm now on the other side, in the post-spell world". the wiccans say "so mote it be". the christians say "amen". if your spell involved a jar, you could literally seal it with wax. i once did a spe

I occasionally get texts from journalists asking to interview me about things around the aspiring rationalist scene. A few notes on my thinking and protocols for this:

  • I generally think it is pro-social to share information with serious journalists on topics of clear public interest.
  • By-default I speak with them only if their work seems relatively high-integrity. I like journalists whose writing is (a) factually accurate, (b) boring, and (c) do not feel to me to have an undercurrent of hatred for their subjects.
  • By default I speak with them off-the-record, and then offer to send them write-ups of the things I said that they want to quote. This has gone quite well. I've felt comfortable speaking in my usual fashion without worrying about nailing each and every phrasing. Then I ask what they're interested in quoting, and I send them (typically a 1-2 page) google doc on those topics (largely re-stating what I already said to them, and making some improvements / additions). Then they tell me which quotes they want to use (typically cutting many sentences or paragraphs half-way). Then I make one or two slight edits and give them explicit permission to quote. I think this has gone quite well and they've felt my quotes were substantive and improvements.
  • For the New York Times, I am currently trying out the policy of "I am happy to chat off-the-record. I will also offer quotes by my usual protocol, but I will only give them conditional on you including a mention that I disapprove of the NYT's de-anonymization policies (which I bring up due to your reckless and negligent behavior that upturned the life of a beloved member of my community)." I am about to try this for the first time, and I expect they will thus not want to use my quotes, and that's fine by me.
I occasionally get texts from journalists asking to interview me about things around the aspiring rationalist scene. A few notes on my thinking and protocols for this: * I generally think it is pro-social to share information with serious journalists on topics of clear public interest. * By-default I speak with them only if their work seems relatively high-integrity. I like journalists whose writing is (a) factually accurate, (b) boring, and (c) do not feel to me to have an undercurrent of hatred for their subjects. * By default I speak with them off-the-record, and then offer to send them write-ups of the things I said that they want to quote. This has gone quite well. I've felt comfortable speaking in my usual fashion without worrying about nailing each and every phrasing. Then I ask what they're interested in quoting, and I send them (typically a 1-2 page) google doc on those topics (largely re-stating what I already said to them, and making some improvements / additions). Then they tell me which quotes they want to use (typically cutting many sentences or paragraphs half-way). Then I make one or two slight edits and give them explicit permission to quote. I think this has gone quite well and they've felt my quotes were substantive and improvements. * For the New York Times, I am currently trying out the policy of "I am happy to chat off-the-record. I will also offer quotes by my usual protocol, but I will only give them conditional on you including a mention that I disapprove of the NYT's de-anonymization policies (which I bring up due to your reckless and negligent behavior that upturned the life of a beloved member of my community)." I am about to try this for the first time, and I expect they will thus not want to use my quotes, and that's fine by me.

every 4 years, the US has the opportunity to completely pivot its entire policy stance on a dime. this is more politically costly to do if you're a long-lasting autocratic leader, because it is embarrassing to contradict your previous policies. I wonder how much of a competitive advantage this is.

Or disadvantage, because it makes it harder to make long-term plans and commitments?

Autarchies, including China, seem more likely to reconfigure their entire economic and social systems overnight than democracies like the US, so this seems false.

4leogao
It's often very costly to do so - for example, ending the zero covid policy was very politically costly even though it was the right thing to do. Also, most major reconfigurations even for autocratic countries probably mostly happen right after there is a transition of power (for China, Mao is kind of an exception, but thats because he had so much power that it was impossible to challenge his authority even when he messed up).
2Garrett Baker
The closing off of China after/during Tinamen square I don't think happened after a transition of power, though I could be mis-remembering. See also the one-child policy, which I also don't think happened during a power transition (allowed for 2 children in 2015, then removed all limits in 2021, while Xi came to power in 2012). I agree the zero-covid policy change ended up being slow. I don't know why it was slow though, I know a popular narrative is that the regime didn't want to lose face, but one fact about China is the reason why many decisions are made is highly obscured. It seems entirely possible to me there were groups (possibly consisting of Xi himself) who believed zero-covid was smart. I don't know much about this though. I will also say this is one example of china being abnormally slow of many examples of them being abnormally fast, and I think the abnormally fast examples win out overall. Ish? The reason he pursued the cultural revolution was because people were starting to question his power, after the great leap forward, but yeah he could be an outlier. I do think that many autocracies are governed by charismatic & powerful leaders though, so not that much an outlier.
8leogao
I mean, the proximate cause of the 1989 protests was the death of the quite reformist general secretary Hu Yaobang. The new general secretary, Zhao Ziyang, was very sympathetic towards the protesters and wanted to negotiate with them, but then he lost a power struggle against Li Peng and Deng Xiaoping (who was in semi retirement but still held onto control of the military). Immediately afterwards, he was removed as general secretary and martial law was declared, leading to the massacre.
2Ben
Having unstable policy making comes with a lot of disadvantages as well as advantages. For example, imagine a small poor country somewhere with much of the population living in poverty. Oil is discovered, and a giant multinational approaches the government to seek permission to get the oil. The government offers some kind of deal - tax rates, etc. - but the company still isn't sure. What if the country's other political party gets in at the next election? If that happened the oil company might have just sunk a lot of money into refinery's and roads and drills only to see them all taken away by the new government as part of its mission to "make the multinationals pay their share for our people." Who knows how much they might take? What can the multinational company do to protect itself? One answer is to try and find a different country where the opposition parties don't seem likely to do that.  However, its even better to find a dictatorship to work with. If people think a government might turn on a dime, then they won't enter into certain types of deal with it. Not just companies, but also other countries. So, whenever a government does turn on a dime, it is gaining some amount of reputation for unpredictability/instability, which isn't a good reputation to have when trying to make agreements in the future.
every 4 years, the US has the opportunity to completely pivot its entire policy stance on a dime. this is more politically costly to do if you're a long-lasting autocratic leader, because it is embarrassing to contradict your previous policies. I wonder how much of a competitive advantage this is.

What I've been using AI (mainly Gemini 2.5 Pro, free through AI Studio with much higher limits than the free consumer product) for:

  1. Writing articles in Chinese for my family members, explaining things like cognitive bias, evolutionary psychology, and why dialectical materialism is wrong. (My own Chinese writing ability is <4th grade.) My workflow is to have a chat about some topic with the AI in English, then have it write an article in Chinese based on the chat, then edit or have it edit as needed.
  2. Simple coding/scripting projects. (I don't code seriously anymore.)
  3. Discussing history, motivations of actors, impact of ideology and culture, what if, etc.
  4. Searching/collating information.
  5. Reviewing my LW posts/comments (any clear flaws, any objections I should pre-empt, how others might respond)
  6. Explaining parts of other people's comments when the meaning or logic isn't clear to me.
  7. Expanding parts of my argument (and putting this in a collapsible section) when I suspect my own writing might be too terse or hard to understand.
  8. Sometimes just having a sympathetic voice to hear my lamentations of humanity's probable fate.

I started using AI more after Grok 3 came out (I have an annual X subscription for Tweeting purposes), as previous free chatbots didn't seem capable enough for many of these purposes, and then switched to Gemini 2.0 Pro which was force upgraded to 2.5 Pro. Curious what other people are using AI for these days.

3winstonBosan
I mostly use Claude desktop client with MCPs (like additional plugins and tooling for Claude to use) for: * 2-iter Delphi method involving calling Gemini2.5pro+whatever is top at the llm arena of the day through open router. * Metaculus, Kalshi and Manifold search for quick intuition on subjects * Smart fetch (for ocr’ing pdf, images, etc) * Local memory 
What I've been using AI (mainly Gemini 2.5 Pro, free through AI Studio with much higher limits than the free consumer product) for: 1. Writing articles in Chinese for my family members, explaining things like cognitive bias, evolutionary psychology, and why dialectical materialism is wrong. (My own Chinese writing ability is <4th grade.) My workflow is to have a chat about some topic with the AI in English, then have it write an article in Chinese based on the chat, then edit or have it edit as needed. 2. Simple coding/scripting projects. (I don't code seriously anymore.) 3. Discussing history, motivations of actors, impact of ideology and culture, what if, etc. 4. Searching/collating information. 5. Reviewing my LW posts/comments (any clear flaws, any objections I should pre-empt, how others might respond) 6. Explaining parts of other people's comments when the meaning or logic isn't clear to me. 7. Expanding parts of my argument (and putting this in a collapsible section) when I suspect my own writing might be too terse or hard to understand. 8. Sometimes just having a sympathetic voice to hear my lamentations of humanity's probable fate. I started using AI more after Grok 3 came out (I have an annual X subscription for Tweeting purposes), as previous free chatbots didn't seem capable enough for many of these purposes, and then switched to Gemini 2.0 Pro which was force upgraded to 2.5 Pro. Curious what other people are using AI for these days.

Popular Comments

As a newly minted +100 strong upvote, I think the current karma economy accurately reflects how my opinion should be weighted
Non-Google models of late 2027 use Nvidia Rubin, but not yet Rubin Ultra. Rubin NVL144 racks have the same number of compute dies and chips as Blackwell NVL72 racks (change in the name is purely a marketing thing, they now count dies instead of chips). The compute dies are already almost reticle sized, can't get bigger, but Rubin uses 3nm (~180M Tr/mm2) while Blackwell is 4nm (~130M Tr/mm2). So the number of transistors per rack goes up according to transistor density between 4nm and 3nm, by 1.4x, plus better energy efficiency enables higher clock speed, maybe another 1.4x, for the total of 2x in performance. The GTC 2025 announcement claimed 3.3x improvement for dense FP8, but based on the above argument it should still be only about 2x for the more transistor-hungry BF16 (comparing Blackwell and Rubin racks). Abilene site of Stargate[1] will probably have 400K-500K Blackwell chips in 2026, about 1 GW. Nvidia roadmap puts Rubin (VR200 NVL144) 1.5-2 years after Blackwell (GB200 NVL72), which is not yet in widespread use, but will get there soon. So the first models will start being trained on Rubin no earlier than late 2026, much more likely only in 2027, possibly even second half of 2027. Before that, it's all Blackwell, and if it's only 1 GW Blackwell training systems[2] in 2026 for one AI company, shortly before 2x better Rubin comes out, then that's the scale where Blackwell stops, awaiting Rubin and 2027. Which will only be built at scale a bit later still, similarly to how it's only 100K chips in GB200 NVL72 racks in 2025 for what might be intended to be a single training system, and not yet 500K chips. This predicts at most 1e28 BF16 FLOPs (2e28 FP8 FLOPs) models in late 2026 (trained on 2 GW of GB200/GB300 NVL72), and very unlikely more than 1e28-4e28 BF16 FLOPs models in late 2027 (1-4 GW Rubin datacenters in late 2026 to early 2027), though that's alternatively 3e28-1e29 FP8 FLOPs given the FP8/BF16 performance ratio change with Rubin I'm expecting. Rubin Ultra is another big step ~1 year after Rubin, with 2x more compute dies per chip and 2x more chips per rack, so it's a reason to plan pacing the scaling a bit rather than rushing it in 2026-2027. Such plans will make rushing it more difficult if there is suddenly a reason to do so, and 4 GW with non-Ultra Rubin seems a bit sudden. So pretty similar to Agent 2 and Agent 4 at some points, keeping to the highest estimates, but with less compute than the plot suggests for months while the next generation of datacenters is being constructed (during the late 2026 to early 2027 Blackwell-Rubin gap). ---------------------------------------- 1. It wasn't confirmed all of it goes to Stargate, only that Crusoe is building it on the same site as it did the first buildings that do go to Stargate. ↩︎ 2. 500K chips, 1M compute dies, 1.25M H100-equivalents, ~4e27 FLOPs for a model in BF16. ↩︎
Strong disagree. This is an ineffective way to create boredom. Showers are overly stimulating, with horrible changes in temperature, the sensation of water assaulting you nonstop, and requiring laborious motions to do the bare minimum of scrubbing required to make society not mad at you. A much better way to be bored is to go on a walk outside or lift weights at the gym or listen to me talk about my data cleaning issues
Load More

Recent Discussion

This is a linkpost for https://ai-2027.com/

In 2021 I wrote what became my most popular blog post: What 2026 Looks Like. I intended to keep writing predictions all the way to AGI and beyond, but chickened out and just published up till 2026.

Well, it's finally time. I'm back, and this time I have a team with me: the AI Futures Project. We've written a concrete scenario of what we think the future of AI will look like. We are highly uncertain, of course, but we hope this story will rhyme with reality enough to help us all prepare for what's ahead.

You really should go read it on the website instead of here, it's much better. There's a sliding dashboard that updates the stats as you scroll through the scenario!

But I've nevertheless copied the...

1Daniel Kokotajlo
Perhaps this is lack of imagination on the part of our players, but none of this happened in our wargames. But I do agree these are plausible strategies. I'm not sure they are low-risk though, e.g. 2 and 1 both seem like plausibly higher-risk than 3, and 3 is the one I already mentioned as maybe basically just an argument for why the slowdown ending is less likely. Overall I'm thinking your objection is the best we've received so far.

I'd love to play the wargame in Munich, our local LW community.
You have a link to the rules?

PS: huge fan, love the AI 2027 website, keep being a force for good

3Sebastian Schmidt
Big +1 on adding this and/or finding another high-quality way of depicting what the ideal scenario would look like. I think many people think and feel that the world is in a very dire state to an extent that leads to hopelessness and fatalism. Articulating clear theories of victory that enable people to see the better future they can contribute towards will be an important part of avoiding this scenario.
1scarcegreengrass
I think it's indeed humor & indeed singling out a company.

Recently, Nathan Young and I wrote about arguments for AI risk and put them on the AI Impacts wiki. In the process, we ran a casual little survey of the American public regarding how they feel about the arguments, initially (if I recall) just because we were curious whether the arguments we found least compelling would also fail to compel a wide variety of people. 

The results were very confusing, so we ended up thinking more about this than initially intended and running four iterations total. This is still a small and scrappy poll to satisfy our own understanding, and doesn’t involve careful analysis or error checking. But I’d like to share a few interesting things we found. Perhaps someone else wants to look at our data more...

In the big round (without counterarguments), arguments pushed people upward slightly more:

(more than downward -- not more than previous surveys)

Metroid Prime would work well as a difficult video-game-based test for AI generality.

  • It has a mixture of puzzles, exploration, and action.
  • It takes place in a 3D environment.
  • It frequently involves backtracking across large portions of the map, so it requires planning ahead.
  • There are various pieces of text you come across during the game. Some of them are descriptions of enemies' weaknesses or clues on how to solve puzzles, but most of them are flavor text with no mechanical significance.
  • The player occasionally unlocks new abilities they have to learn how to
... (read more)

Things are getting scary with the Trump regime. Rule of law is breaking down with regard to immigration enforcement and basic human rights are not being honored.

I'm kind of dumbfounded because this is worse than I expected things to get. Do any of you LessWrongers have a sense of whether these stories are exaggerated or if they can be taken at face value?

Deporting immigrants is nothing new, but I don't think previous administrations have committed these sorts of human rights violations and due process violations. 

 

Krome detention center in Miami ... (read more)

Summary:

When stateless LLMs are given memories they will accumulate new beliefs and behaviors, and that may allow their effective alignment to evolve. (Here "memory" is learning during deployment that is persistent beyond a single session.)[1]

LLM agents will have memory: Humans who can't learn new things ("dense anterograde amnesia") are not highly employable for knowledge work. LLM agents that can learn during deployment seem poised to have a large economic advantage. Limited memory systems for agents already exist, so we should expect nontrivial memory abilities improving alongside other capabilities of LLM agents.

Memory changes alignment: It is highly useful to have an agent that can solve novel problems and remember the solutions. Such memory includes useful skills and beliefs like "TPS reports should be filed in the folder ./Reports/TPS"....

7plex
Accurate, and one of the main reasons why most current alignment efforts will fall apart with future systems. A generalized version of this combined with convergent power-seeking of learned patterns looks like the core mechanism of doom.

I think the more generous way to think about it is that current prosaic alignment efforts are useful for aligning future systems, but there's a gap they probably don't cover.

Learning agents like I'm describing still have an LLM at their heart, so aligning that LLM is still important. Things like RLHF, RLAIF, deliberative alignment, steering vectors, fine tuning, etc. are all relevant. And the other not-strictly-alignment parts of prosaic alignment like mechanistic interpretability, behavioral tests for alignment, capabilities testing, control, etc. remain ... (read more)

An aspect where I expect further work to pay off is stuff related to self-visualization, which is fairly powerful (e.g. visualizing yourself doing something for 10 hours will generally go a really long way to getting you there, and for the 10 hour thing it's more a question of what to do when something goes wrong enough to make the actul events sufficiently different from what you imagined, and how to do it in less than 10 hours).

2Garrett Baker
It seems reasonable to mention that I know of many who have started doing "spells" like this, with a rationalized "oh I'm just hypnotizing myself, I don't actually believe in magic" framing who then start to go off the deep-end and start actually believing in magic. That's not to say this happens in every case or even in most cases. Its also not to say that hypnotizing yourself can't be useful sometimes. But it is to say that if you find this tempting to do because you really like the idea of magic existing in real life, I suggest you re-read some parts of the sequences.
2Garrett Baker
(you also may want to look into other ways of improving your conscientiousness if you're struggling with that. Things like todo systems, or daily planners, or simply regularly trying hard things)
1Lorxus
Did you ever get back to reading this? I think I got some very different things out of it when I read through! (And @whatstruekittycat will talk your ear off about it, among other topics.)
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

The quarter inch jack (" phone connector") is probably the oldest connector still in use today, and it's picked up a very wide range of applications. Which also means it's a huge mess in a live sound context, where a 1/4" jack could be any of:

  • Unbalanced or balanced line level (~1V). Ex: a mixer to a powered speaker.

  • Unbalanced instrument level (~200mV), high impedance. Ex: electric guitar.

  • Unbalanced piezo level (~50mV), high impedance. Ex: contact pickup on a fiddle.

  • Unbalanced speaker level (~30V). Ex: powered amplifier to passive speaker.

  • Stereo line level (2x ~1V). Ex: output of keyboard.

  • Stereo headphone level (2x ~3v). Ex: headphone jack.

  • Send and return line level (~2x 1V). Ex: input to and output from an external compressor.

  • Switch (non-audio). Ex: damper pedal on a keyboard, which would be normally open or normally closed.

  • 1V per octave

...

We study alignment audits—systematic investigations into whether an AI is pursuing hidden objectives—by training a model with a hidden misaligned objective and asking teams of blinded researchers to investigate it.

This paper was a collaboration between the Anthropic Alignment Science and Interpretability teams.

Abstract

We study the feasibility of conducting alignment audits: investigations into whether models have undesired objectives. As a testbed, we train a language model with a hidden objective. Our training pipeline first teaches the model about exploitable errors in RLHF reward models (RMs), then trains the model to exploit some of these errors. We verify via out-of-distribution evaluations that the model generalizes to exhibit whatever behaviors it believes RMs rate highly, including ones not reinforced during training. We leverage this model to study alignment audits in two...

Thank you, this is great work. I filled out the external researcher interest form but was not selected for Team 4.

I'm not sure that Team 4 were on par with what professional jailbreakers could achieve in this setting. I look forward to follow up experiments. This is bottlenecked by the absence of an open source implementation of auditing games. I went over the paper with a colleague. Unfortunately we don't have bandwidth to replicate this work ourselves. Is there a way to sign up to be notified once a playable auditing game is available? 

I'd also be e... (read more)

2Fabien Roger
This is a mesa-optimizer in a weak sense of the word: it does some search/optimization. I think the model in the paper here is weakly mesa-optimizing, maybe more than base models generating random pieces of sports news, and maybe roughly as much as a model trying to follow weird and detailed instructions - except that here it follows memorized "instructions" as opposed to in-context ones.
3ErickBall
Fair enough, I guess the distinction is more specific than just being a (weak) mesa-optimizer. This model seems to contradict https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target because it has, in fact, developed reward as the optimization target without ever being instructed to maximize reward. It just had reward-maximizing behaviors reinforced by the training process, and instead of (or in addition to) becoming an adaptation executor it became an explicit reward optimizer. This type of generalization is surprising and a bit concerning, because it suggests that other RL models in real-world scenarios will sometimes learn to game the reward system and then "figure out" that they want to reward hack in a coherent way. This tendency could also be beneficial, though, if it reliably causes recursively self-improving systems to wirehead once they have enough control of their environment.
4gwern
It doesn't contradict Turntrout's post because his claims are about an irrelevant class of RL algorithms (model-free policy gradients) . A model-based RL setting (like a human, or a LLM like Claude pretrained to imitate model-based RL agents in a huge number of settings ie. human text data) optimizes the reward, if it's smart and knowledgeable enough to do so. (This comment is another example of how Turntrout's post was a misfire because everyone takes away the opposite of what they should have.)

“In the loveliest town of all, where the houses were white and high and the elms trees were green and higher than the houses, where the front yards were wide and pleasant and the back yards were bushy and worth finding out about, where the streets sloped down to the stream and the stream flowed quietly under the bridge, where the lawns ended in orchards and the orchards ended in fields and the fields ended in pastures and the pastures climbed the hill and disappeared over the top toward the wonderful wide sky, in this loveliest of all towns Stuart stopped to get a drink of sarsaparilla.”
— 107-word sentence from Stuart Little (1945)

Sentence lengths have declined. The average sentence length was 49 for Chaucer (died 1400), 50...

I quite like the article The Rise and Fall of the English Sentence, which partially attributes reduced structural complexity to increase in noun compounds (like "state hate crime victim numbers" rather than "the numbers of victims who have experienced crimes that were motivated by hatred directed at their ethnic or racial identity, and who have reported these crimes to the state"))

1Guive
I agree this would be a good argument for short sentences in 2019, but does it still apply with modern LLMs?
2eggsyntax
I suspect that the average reader is now getting smarter, because there are increasingly ways to get the same information that require less literacy: videos, text-to-speech, Alexa and Siri, ten thousand news channels on youtube. You still need some literacy to find those resources, but it's fine if you find reading difficult and unpleasant, because you only need to exercise it briefly. And less is needed every year. I also expect that the average reader of books is getting much smarter, because these days adults reading books are nearly always doing so because they like it. It'll be fascinating to see whether sentence length, especially in books, starts to grow again over the coming years.
1Alex K. Chen (parrot)
Related https://www.econlib.org/archives/2008/10/where_is_the_po.html?utm_source=chatgpt.com