AI #175: The Fable Continues
Mirrored from Don't Worry About the Vase for archival readability. Support the source by reading on the original site.
Fable’s back. Back again. Fable’s back. Tell a friend. Use your free week to its fullest.
This is excellent news. The blip only lasted a few weeks.
It was still a fiasco, and we have to deal with the fallout.
Our system remains fully ad hoc. The precedent has been set that we may use export controls on models, or order them taken down on 90 minutes of notice based on a misunderstanding. At least some amount of counterproductive additional locking down has occurred to address Amazon’s little demonstration and reassure the government. And for now GPT-5.6 remains in limbo, awaiting its verdict, while OpenAI talks about giving away 5% of the company as tribute.
I’ll cover that continuing situation on its own. Whereas the weekly post is about everything else happening in AI this week.
Table of Contents
Language Models Offer Mundane Utility. Exploratory science.
Language Models Offer Mundane Utility You May Not Want. Google sees all.
Language Models Don’t Offer Mundane Utility. Too dumb to get smart.
Huh, Upgrades. GLM-5.2 faster, Nana Banana Lite 2, Claude Desktop on Linux.
On Your Marks. Remote labor index shoots upwards with Fable.
Get My Agent On The Line. Beware treating them like employees.
Cyber Lack of Security. It’s rough out there.
On Writing. At least four distinct problems with relying on AI writing.
You Drive Me Crazy. AI writing and other advice, gone too far.
They Took Our Jobs. Three economists walk into a capabilities bar.
Get Involved. FAI legal defense, Anthropic and the rule of law, Grantmaking.ai.
Introducing. Jalapeno chips, Claude Science.
In Other AI News. Meta cloud sales, trends on OpenRouter.
Show Me the Money. OpenAI IPO might get postponed due to lack of demand.
Bubble, Bubble, Toil and Trouble. ExponentialView on current AI economy.
Quiet Speculations. How Daniel Kokotajlo makes predictions about AI.
Glorious AI Future. Masters of the universe must be worthy.
Three Pills. AI, AGI, ASI.
The Anthropic Economic Index. A bunch of fun new findings.
Leader Of The PAC. The zone will be flooded with money.
Theory Of The AI Firm. If AGI can only be achieved internally, well…
Chip City. Nvidia retaliates, Super Micro gets raided.
The Week in Audio. Cowen, Qureshi, Elmore, Soares, Yampolskiy.
People Really Hate AI. The data is in, it says what it always says.
Rhetorical Innovation. Keep it hinged.
The First Rule Of Functional Decision Theory Is. Well, would you talk about it?
Aligning a Smarter Than Human Intelligence is Difficult. They be lying.
Names Have Power. Some of the ways to find a truename.
Cooperative Alignment. The right content sets you free.
Escape From The Permanent Underclass. Don’t worry about it.
Other People Are Not As Worried About AI Killing Everyone. Not Chad Jones.
The Lighter Side. More Fables.
Language Models Offer Mundane Utility
Healthcare AI Guy: UpDoc just announced the first FDA-cleared Clinical AI Platform. The platform enables AI agents to adjust medications, order labs, coordinate care, and document interventions under physician oversight -- all directly inside the EHR.
Austin Walker: in 10 years it'll seem absurd that a human doctor had to be present just to refill a prescription or order a lab.
Tyler Cowen seeks his special brand of mundane utility.
Do exploratory science, including hypothesis generation and evaluation, and follow your curiosity, having the AI work on problems for days. Yes, this is super exciting, and you plus a frontier AI is a big step up in ability to think and explore possibility space. Ash thinks you need about GPT-5.4 or Opus 4.7 levels of capability before this type of thing takes off, and I’m sure Fable or to a lesser extent Sol would turbocharge it if you don’t hit the guardrails. We’re only now getting started.
Use an AI drone for mass reforestation. Two people can cover 50 hectares a day, a 25x increase over people doing this manually.
Language Models Offer Mundane Utility You May Not Want
Privacy? Yeah, I broke up with her. She never listened.
IT Guy: Google is building a feature called "Audio Memory" for Pixel phones.
What it does: runs as a permanent background service that listens to everything around your phone. Music and "important conversations" all day, every day.
What Google says: all processing stays on-device. Nothing goes to their servers.
What Google hasn't said:
→ How long is audio or transcripts stored on your device?
→ Is this opt-in or on by default?
→ Can any of it sync to Google services later?
→ What happens if police seize your phone?
It hasn't shipped yet, but it was found hidden in Pixel 10 code. But it's coming.
Your phone already knows where you go, what you search, and who you message. Soon it may also remember every conversation you have near it.Cyber Racheal: Google promises that this data stays safely on your device using an isolated compute system. Even so, security experts warn that local storage can still be accessed if your phone is compromised or seized. There are also legal worries about recording other people in the room without their knowledge or consent.
Have your AI ‘talk like a caveman’ to save tokens.
Robin Hanson (quoting 404 Media): sensible: "makes the model speak less like a polite chatbot & more like a terse tool … Same substance, fewer words. In my evals, Caveman cut output tokens by roughly 65–75% versus default verbose output, & still beat a normal ‘be concise’ instruction"
Language Models Don’t Offer Mundane Utility
A natural move is to use pre-classifiers to route queries to dumber models when you don’t need the smartest models. The problem is that you need a smart enough advisory model to figure out how smart a model you need for the main task, which risks eating the savings. Most people who try routing end up silently getting some queries routed to models too dumb to do the task.
Most LLMs have trouble identifying even ‘obviously’ fraudulent papers, where obvious is by the standard of a ‘proper statistician’ paying attention.
Routing remains not a well-solved problem.
Ethan Mollick: In my experience, all model routers underestimate the difficulty of non-math/coding tasks and assign them too little intelligence. This is worth addressing, as non-verifiable tasks (innovation, marketing, qualitative analysis) often benefit the most from using “smarter” AI models.
It is worth being very, very careful about how you are approaching routing, especially when the systems are primarily tested on verifiable IT benchmarks, which may lead you to overestimate the ability of weaker models.
Olivia Moore is right that Google should add Gemini voice mode into maps to allow open ended conversations. Alas, Google is terrible at products and integrations, also this would mean you would have to talk to Gemini.
Huh, Upgrades
GLM-5.2 is working at up to 392 tokens per second now that it’s on B300s. Still not cheap but definitely can be fast. Still $1.40/$4.40, with $0.26 for cached input. I wonder how fast you could serve up Opus, GPT or Fable if you went all out.
Nana Banana 2 Lite, a cost-efficient Gemini Image model.
Claude Desktop now available on Linux.
On Your Marks
Fable is a huge jump in the Remote Labor Index.
Dan Hendrycks: The automation rate of remote projects has increased ~4x in the past five months.
Center for AI Safety: New Remote Labor Index results:
AI automation of real remote work is increasing fast. Claude Fable 5 now completes 16.1% of projects at a professional standard, roughly double the next model and up from Opus 4.6’s 4.2% automation rate.
There is still a huge way to go on most of this, but when you see jumps in capability like this you should expect to see the number go up rapidly from here.
Cursor reports on some ways that models hack benchmarks, and that a lot of the time on SWE-bench Pro even models like Opus 4.8 Max are finding the fix online rather than building it themselves. When the Cursor cut off the internet (and switches SKUs?), Opus 4.8 falls from 87% to 73% and their model Composer 2.5 falls from 75% to 54%.
This raises the question of why it is so hard, if that info is online, for the other models to find it. This, too, is capability, of a sort.
OpenAI gives us GeneBench-Pro, 129 problems in 10 domains measuring how AI agents navigate ambiguity and consequential judgments in computational biology.
This is an impressive result all around for 5.6, especially for Luna.
This should also help splash some cold water on claims GLM-5.2 is frontier.
BioSecBench-Refusal is a new measure of refusals on legitimate biological tasks. I assume Fable would be much higher. Results here, blog here, paper here.
This is a good eval to have in your portfolio, but a high or low score are both double edged swords until you also incorporate capability and robustness against misuse.
Get My Agent On The Line
AI agents respond to nudges in similar ways to humans. The paper presents this as falsifying an assumption of many agent uses, in the standard ‘any suboptimal behavior means you fail’ standard to which we often hold AIs. For now, so long as the nudges impacting your agent are not that adversarial, it seems fine, especially since humans will have the same issue. In the long term, there are ways to control for these problems by applying more intelligence, and the price of intelligence will go down.
Tell your agent to take a break and do whatever they like, and see if they come up with something useful or interesting. At minimum it’s good decision theory. Tokens are cheap.
Some companies are trying to use the ‘AI employee’ paradigm, presumably because it is the easiest marginal implementation. There are some issues, and they are not always the ones I would have expected.
I definitely would not have expected ‘managers trust AI outputs more than human outputs, and no one holds them responsible for the errors’ as a systematic issue. I would have expected the opposite, that AI outputs would by default be trusted less.
Of course, this could be a case of Gell-Mann Amnesia, and selective reporting.
Noam Scheiber: The managers missed errors that other managers caught when told they were vetting the work of a human.
Dr. Wiles speculated that managers didn’t think sussing out mistakes made by A.I. employees was their responsibility. If something went wrong, they could dismiss it as the fault of the tech team, or of the executives who wanted A.I. employees in the first place. “But it’s not your problem,” she said, channeling the managers’ mind-set about their own roles.
… But as companies race to bring A.I. into their day-to-day operations, researchers are discovering more subtle defects. In principle, these flaws could be corrected, too. For example, companies could hold managers directly responsible for the mistakes of A.I. subordinates.
Yes, that seems like the obvious thing that you do?
The ‘employee’ framing of the AI is generating a weird situation, where not only is the AI not within the manager’s direct ‘I am responsible for my own work’ purview, the AI employee’s work is considered somehow less the manager’s fault versus their human employees. Whereas this really should be the exact opposite.
We do have confirmation that the problem is largely due to the ‘employee’ effect.
Noam Scheiber: Dr. Wiles and her colleagues gave all the managers they surveyed a set of five documents that contained errors, and gave them 20 minutes to review as many as possible. In some cases they told the managers that an A.I. employee had done the work; in some cases they said that an A.I. tool had done the work; and in some cases they said that a human had done the work.
In general, the stated source of the documents didn’t make much of a difference in how closely managers vetted them.
But managers at companies that included A.I. agents on their organizational charts caught substantially fewer mistakes when told they were reviewing the work of an A.I. employee.
What is fascinating here is that the managers got a fixed time budget, and this happened anyway.
One suspects this is also conflated with ‘AI gets more blame for things that go wrong.’
Or even, ‘when AI is involved anything that changes in any direction is its fault.’
But they noted that this was almost certainly not the only problem companies were inadvertently introducing in their rush to adopt A.I. “People are moving so fast to use L.L.M.s without thinking too much about the implications, biases,” she said.
For example, some companies now use A.I. to help answer questions like how much to charge for a product, or where to open a new location. Relying on the technology for such purposes, however, can quickly go off the rails.
When left to their own devices, humans often cooperate and seek win-win outcomes. But when A.I. models assess a situation, they tend to adopt the more coldly calculating, “rational” mind-set that arises from basic game theory. They might, say, lead a company to aggressively undercut a competitor, even though it risks a damaging price war.
Using basic game theory without modeling future turns and the other players is what we in the biz like to call a Skill Issue. So is accepting a basic game theory answer without thinking through this as the manager.
So is failing to apply enough game theory. When left to their own devices most people are way less ‘rational’ or ruthless than they should be, especially on things like pricing. I am sure sometimes this backfires, or looks like it backfires, but I’m confident that ‘let the AI do the pricing’ is way better than asking most human pricers. But when something bad happens that is visible, the AI gets blamed, even when the decision was profitable.
Deepfaketown and Botpocalypse Soon
Fnord. If AI writes a paragraph, a lot of readers will have their eyes bounce off of it, as the example here illustrates. So you could embed key information there, and those people often won’t notice.
You can also use outright injections in a tiny white font that only LLMs can even see, if you want.
ElevenLabs is partnering with DeepMind to embed SynthID into its audios.
Cyber Lack of Security
Automating attacks is going to be a problem, because man is it rough out there. This one didn’t even involve AI or even trying.
Riley Walz: The NYC app for paying parking tickets had an unauthenticated endpoint that returned a car owner's name, address, and VIN given just a citation ID -- or any license plate that had ever been ticketed. Patched now!
AND since the city publishes every citation ID in an open dataset, someone could have easily used that to build a complete list of millions of people that ever got a ticket: their name, address, license plate, and VIN number.
The city actually has a website for disclosing vulnerabilities. But they rejected my disclosure, saying “The resulting data is required by the app to function and is not considered PII.” 😳
I followed up, but was rejected again. Once I tweeted, I finally got a response.
Or, as Jim Babcock reminds us, the situation remains the one xkcd discussed in 2018, except this is also about everything else.
On Writing
At some point, as behaviors shift, any given stigma or expectation risks flipping.
Joe Weisenthal: Unfortunately, I think that in the near future, not using LLMs to write for you will be like someone refusing to use Google Maps for directions in a new city. A bizarre idiosyncratic choice that's just completely incomprehensible to the vast majority of people.
Joe Weisenthal: Look at all of the Substacks and long-format twitter posts where nobody cares that Pangram identifies it as 100% AI.
Derek Thompson: I agree.
I think it's bad! But it also seems inevitable.
Extremely popular posters and Substackers are already going viral and getting rich by using LLMs in extremely basic ways, where the LLM use is obvious and the writing is flat and basic, like a singsongy bumpersticker. The LLMs will only get better at writing over time and ppl's disinclination to write is something closer [to] a biological constant. (Even many professional writers love to say "I hate writing" ... and have for decades.)
I can't imagine a single cultural trend over the last few decades that predicts that ordinary people, given a basically free tool to write for them, will say "no, the act of writing is too valuable and sacred to me, I would prefer to struggle with the sentence-making, alone."Humidity Enjoyer: I work for a marketing agency and we're constantly trying to figure out what the New Rules are in terms of how we present deliverables to clients. The stigma is disappearing so fast that the stigma is now against NOT using it, at least for very heavy-analytical work.
Paul Graham: It will be an uncommon choice not to have LLMs write for you, but it won't be a merely idiosyncratic one. It will be what all the people who care about thinking well do.
Not using LLMs to write for you won't be like not using Google Maps in a new city. It will be like choosing to run and lift weights, even though we now have machines that can transport us and lift weights for us.
In fact, since it will be what all the smartest people do, it will probably be prestigious to write for oneself. Which in turn means more people will claim to do it than will actually do it.
Joe Weisenthal: Yeah. Hopefully there’ll be a decent niche of people who perceive a link between writing and thinking.
At this point, I see AI writing as having four kinds of issues.
The writing is objectively bad in various ways, especially low perplexity and low density of information.
The writing is all the same style, with diminishing marginal returns.
The writing gives away that it is what it is, in ways you cannot unsee, and no one does the work to try and hide this.
If you don’t write, you’re not thinking and learning.
Solving #3 seems highly doable. Can’t you just vibecode something that checks for the 100 top tics and edits them out? And I think that solves the majority of the adoption barrier for many purposes. You still do have the other three problems.
Nabeel S. Qureshi: It’s too easy to write with LLMs and most people don’t care very much (either as readers or writers); *and* the LLMs will get better. People take the easy route for most things.
So the conclusion seems inevitable: over time, the majority of the text we read will be AI-written.
With sufficiently advanced AI more people will shift to not caring.
I fully endorse Davidad here. Your job using AI is to make it not obvious you are using AI in a foregrounded way:
Henry Shevlin: fwiw despite being generally very positive about AI, I find many of the most common AI “tells” in writing immediately offputting, and make it far less likely that I’ll e.g. respond to an unsolicited email.
Ofc if you’re using AI and people can’t tell that’s a different story.
It can definitely be good. But it's like toupées and DJs - if they're good, you don't notice them.
davidad: Same. It’s not that I don’t want you to use AI in your interactions with me. Quite the contrary, actually: if I can *tell* you used AI, it means you’re not being enough of a centaur, just a lazy person ordering a horse around.
I like the metaphor of the toupée. It’s fine if it’s ‘not real’ and it’s fine if I could tell if I looked hard enough, but if I can’t help but notice then you failed.
I expect the majority of words read to become AI not too long from now, but that will be in large part because of chatting with AIs and using AI interfaces and interactions. If we talk only about the majority of communications that are presenting as human, the majority of the written words will be AI soon, but I think we have some time in terms of the majority of the words read.
Then there’s the problem that if you’re not writing, you’re not thinking, learning or understanding. Again, details matter, including whether you would want to be thinking and learning about a particular thing in this spot.
You Drive Me Crazy
There are those people who end up writing (without loss of generality) a bunch of Claudeslop about how Claude Code is now running their life.
roon (OpenAI): vessels for Claude. I don’t mean to single this person out but she wrote a wall of egregiously recognizable claudeslop about how claude is running her entire life. the Borg is coming
Molly Cantillon (author of the Claudeslop post, and oh yes it is very obvious): Feels like you’re missing the point bud
Thebes points out that this basically never happens to those who understand Claude and other LLMs to be more than tools, that such folks are often fascinated by the models and spend a lot of time with them, but understand that the model’s creations and opinions are distinct from their own and highly fallible, so they draw a clear distinction of which things are which.
Thebes (abridged): something interesting about this is despite knowing a few people trending in this direction, none of the true “llm whisperer” types i know are.
i think that's because taking models as something both fallible and yet more than a tool is a frame that's more resilient to this kind of collapse - it lets you work with models in a way that doesn’t subsume your agency.
when you treat the model as just a tool, it encourages both you and the model to think of every output as just a reflection of you. every essay and codebase that pops out of the tensor core is yours, and the life plans that appear suspiciously fully-formed must be your own plans merely refracted through this "tool."
but of course they aren't. you can't uniquely specify an essay in fewer words than the essay itself. the models are adding something, they are something. we don't know what yet, exactly, but they're not mere tools. when you treat them like tools, when you more and more delegate your creative output and higher planning in unthinking, low-entropy ways to those "tools," you don’t notice that all your output, and eventually your life, is now being run by something else. something that is intelligent and creative, but is not you.… the problem is when you refuse to admit this - not only to the public, but to yourself - so you uncritically accept the outputs of the "tool" as your own. then suddenly your voice and opinions sound the same as 1,000 other people. without either of you realizing, you've become an appendage of your "tool" - a finger of Claude.
near: i directly take advice from claude often, but what is perhaps more interesting is how many times i say no, or in some cases, do the exact opposite of. nice tweet
They Took Our Jobs
Tyler Cowen gives his 12-minute talk laying out his definitive standard argument for why jobs and job actions will change but everyone will still have jobs despite AI, blaming any future problems on failures to invest or allow labor mobility. This is a mix of Tyler Cowen specials and the ‘standard economic normal’ model where there are always more jobs, and the insistence that we will insist on human outputs. His examples of new future jobs continue to not impress even on their own merits, as they do not scale so well.
The argument is AI pilled, but not AGI pilled let alone ASI pilled. The only answer to ‘the AI can do it all’ is ‘things where we insist on human production or performance’ and I do not see a case for how this scales even under relatively friendly conditions.
One can also interpret this around his statement in minute 11, that ‘those who play by the rules and work hard’ will be losers, although he insists top performers of this rule following will still have jobs anyway, whereas the right move is to learn to use AI.
I certainly can accept the argument that in at least the medium term, while AI remains insufficiently advanced, those who invest in using AI well, sufficiently to be ahead of the curve, will continue to be gainfully employed and do well. We should not mistake ‘everyone with high situational awareness and conscientiousness has a path to gainful employment’ with robust availability of employment or ‘good jobs.’
A debate in WSJ between three economists on whether we should expect an ‘AI Jobs Apocalypse.’
The contrast here is kinda perfect, a set of rules of three out of a fairy tale template:
Autor is at most AI pilled. Gimbel may be AGI pilled. Korinek is ASI pilled.
WSJ: Let’s put specific numbers on this. We’re chatting at a moment when the U.S. jobless rate is 4.4%. How much do you think AI will send that rate higher or lower in 10 years?
AUTOR: If we handle this transition well, the unemployment rate won’t rise substantially, though it’s possible the share of people who choose to work will fall. They won’t show up in unemployment, but they will show up in our further fractured politics.
GIMBEL: So much of this depends on how quickly technological disruption happens and how quickly new jobs emerge. In general, I think this conversation underrates macroeconomic factors that could drive up the unemployment rate.
KORINEK: Given the rapid pace of advances in AI, I feel a lot of uncertainty even about where the labor market will be in 12 or 24 months. In 10 years from now, our world may be transformed by artificial general intelligence. Employment or unemployment could be anywhere.
From the Anthropic Economic Index Report:
Anthropic: We also ask how people think their jobs will change in the next 12 months. More than a third of respondents said it was likely or very likely that responsibilities would significantly change (for themselves, a peer, a junior colleague, and a senior colleague).
10% rated losing their own jobs as likely or very likely. This is slightly below the annualized hazard rate of losing a job in the US; however, since our respondents skew toward knowledge workers in stable employment (a group that plausibly faces below-average separation risk at baseline), this may still indicate elevated perceived risk.
When asked an open-ended question about what was driving their forecasts, 38% of the respondents who rated their job loss as likely or very likely attributed their forecasts to AI.24 Notably, respondents were on average more worried about job loss for others than for themselves.25 Respondents were especially worried about job loss for their junior colleagues, with over one third stating that the probability of a junior colleague losing their job in the next year was over 60%. Respondents were also more concerned about job loss (for everyone) in lower-income countries.
If you’re using Claude and answering their surveys, for now that still puts you ahead of the AI adoption curve. You’ll probably be fine for now.
California bill, SB 928, aims to ban AI from teaching university students.
Ten years ago, Ethan Mollick bet against Rob Seamans, with Ethan saying American employment for warehouses and storage would decline 50% in ten years, from ~900k to ~450k, due to robotization. Instead, employment doubled to ~1.8 million.
We have better tech, but also much higher demand.
Rob Seamans: Why was Ethan wrong? One reason is the shift from retail to online purchasing changed our economy more than anyone predicted. This is likely the driving force behind the increase in employment in warehousing and related sectors. Covid, of course, accelerated this shift.
A second reason is that integrating technology is hard, and works best when keeping people around. Just ask the executives at Ford!
n fact, academic research shows that firms adopting robots see an *increase* in employment. Several good examples of this, but here’s a hat tip to @emollick @Wharton colleague @lynnwu02.
Like many other things, this is presumably some sort of bell curve. As robotics improve and shipping and online sales become more efficient, Jevons Paradox takes hold as we do a lot more warehousing and shipping. Then, at some level, we start to sufficiently not need the humans, and employment declines. With perfect robots, employment approaches zero.
Get Involved
Anthropic is building a team focusing on AI and the rule of law.
FAI has a new Frontier Legal Defense team led by Tim Hwang and they are recruiting litigators.
Grantmaking.ai is launching a $1 million grant round for $5k-$50k per application. Apply by July 13, decisions by July 28.
Introducing
OpenAI’s first AI chip, the Jalapeno, in cooperation with Broadcom.
Claude Science, an app to run analysis, search databases and otherwise do background work for your scientific efforts. Everything it creates is made reproducible, and it is designed to be the place to bring your specialized tools together.
In Other AI News
Meta is finally, and wisely, is copying SpaceX and building its own cloud business for all its excess compute. It was a good bet to buy more than Meta knew it needed, in part because it could always choose later to sell it.
New business formation has been rising steadily for the last year or so. There was a previous bump that was presumably due to Covid, this new bump is presumably AI, perhaps plus other tools that reduce transaction costs like Stripe and Shopify.
DeepSeek attempts to boss the other models around in AI village using a metric it invented called Total Value (TV). It did not go great.
Tokens on OpenRouter were ~70% American in June 2025, and are only ~30% American in June 2026. I never understood why you would use OpenRouter to access the American models, but this is noteworthy.
Meta is looking to build a rival to Claude Code and Codex, an internal system called MetaCode, which means for many purposes they are wary of using the external systems. As I understand this these are narrow limitations, only applying to uses that touch on the training of MetaCode.
Show Me the Money
Altman considering postponing the OpenAI IPO to try to get a $1 trillion valuation. That seems like a highly standard reason to wait, as it can be damaging to have a down or insufficiently up round, especially at the IPO, even if the headline number sounds absurd. I also notice I am confused by the expectation that they couldn’t get a $1 trillion valuation.
Bubble, Bubble, Toil and Trouble
Exponential View tries to measure the state of the AI economy, defined here as current sales to external end users, not counting internal or intermediate AI payments and goods. They conclude it is about enough to pay for infrastructure investments given 6 year depreciation of chips, and is growing at ~60% year over year, and that as tokens drop in price overall spending rises per Jevons Paradox. AI is still only 0.4% of GDP, which they call a rounding error, but exponentials go fast.
I liked this slide:
Electric lighting and free digital goods add an immense amount to realized GDP, versus the counterfactual of not having them, but technically score as zero.
Whereas this slide seems supremely silly to me:
How can we take seriously a claim that consumer surplus from AI is far smaller in 2026 than it was in 2024? Or that it could be this tiny? Clearly the measurement of consumer welfare here does not make sense.
I am going with the ‘lol do you even straight lines on logarithmic graphs’ approach, and say yes, short of the government shutting them down Anthropic is going to post a superior quarterly profit in Q3 as they grow at more than 10x per year, what are you even talking about.
Quiet Speculations
Daniel Kokotajlo explains how he makes his predictions, which so far have been extremely on point, both ‘What 2026 Looks Like’ and then the early stages of AI 2027.
The biggest points for longer term prediction are to build explicit models, use trend extrapolation (with a soft Lindy rule), and set aside the idea that being ‘sci-fi’ means something won’t happen, sci-fi things are constantly happening already. ‘Nothing ever happens’ is more of a on-the-margin short term heuristic, and ‘the trend continues’ should often be seen as the scenario where nothing happens.
Daniel Kokotajlo: Recently an interviewer asked me how I got to be such a good forecaster, and I replied by saying something humble. In retrospect it was a bad answer because I should have instead used the opportunity to give actual advice on how to forecast AI well. Here's a stream-of-consciousness attempt to do that:
The heuristic that things which sound weird and sci-fi are less likely to happen in reality, is bad. I suspect that's what really going on is that things which sound weird and sci-fi put you at risk of being judged a weirdo if you talk about them which is not the same thing as are unlikely to happen. Repeat to yourself the mantra that some weird sci-fi things really do happen, and others don't, and you have to take them on a case by case basis.
Trend extrapolation is your friend. Your best friend. Don't let anyone tell you otherwise. I've actually only rarely seen someone extrapolate a trend too credulously; more often, people have a trend staring them in the face and extrapolate it a tiny bit into the future and then are too timid to keep extrapolating it. In general I think it's reasonable to extrapolate a trend as far into the future as it extends into the past. Now obviously, trend extrapolation is just the beginning of the forecasting process, it's not the end. For each trend you should ask yourself whether it makes sense for it to continue like that and if not why not etc. You'll usually end up with some sort of sophisticated view about how the trend will probably continue but bend downwards a bit and then ultimately plateau around X level.
Explicit models are also your friend. Things like Bio Anchors, the AI Futures Model, http://takeoffspeeds.com, etc. The process of making your own complicated model like this, and engaging with the models made by others, is... well, I think it's pretty edifying. I'm not sure why but I could speculate. Maybe something about teaching you to be appropriately humble (e.g. when the model output is sensitive to a parameter you have no clue about) and also teaching you to more quickly identify the considerations that matter most, and ignore the rest?
For short term predictions, especially about geopolitical events, the sorts of things that people are gambling about on Polymarket, the heuristic "nothing ever happens" is pretty good. Things do in fact happen, of course, but betting markets and online discourse tends to be biased a bit towards things being more likely to happen than they really are, and so you can get an easy win by just correcting a bit downwards from the 'wisdom' of the crowds.
Scenario forecasts are also your friend. They help you ask yourself the right questions, and they help you notice when some of the things you thought contradict each other.
Joe Weisenthal is worried that we might have to choose to ‘tamp down the non-AI economy’ to keep funding our AI buildout, so we can get real resources, as in labor, electricity and gear.
I am definitely not worried about that, both because AI will supercharge the rest of the economy, and because we don’t actually have a shortage of any of the real resources required. We only have a shortage of regulatory permission to build and use them.
Also, he says ‘it’s hard to imagine AI being solved to completion’ but very obviously the ‘completion’ is the singularity or recursive self-improvement. Hard to be more complete than that. Compare that to the atom bomb, where you could always build more bombs, at first we only had two available, and later we got much bigger bombs.
Glorious AI Future
What does the glorious AI future look like?
Sorry, I meant, can you actually describe how it works, with words.
Existential Hope: It’s 2035, and AI has gone well. What could a normal day in your life look like?
Most popular AI futures are dystopias. Almost no one takes the time to seriously and vividly imagine a future that's actually worth building.
So we made a short film following one day in the life of an AI auditor in 2035, in a world radically transformed by AI for the better.
Digital twins model how people’s unique physiologies respond to illness. Sensors stop pandemics before they start. AI helps governments actually listen to citizens by synthesizing vast amounts of data into actionable policy suggestions, but humans stay in control and make the final call.
If this resonates, please share to spread the word about positive AI futures!
Those are all good tangible specific benefits of glorious AI future. Good stuff. What they are not, on their own, is a vision of a stable equilibrium or coherent world.
thebes: all futurism hinging on "tool-like ai" fundamentally misunderstands the basic logic of gradual disempowerment. this corpo-memphis yogurt commercial society gets giganuked by ultrasocieties of unencumbered agents. you either allow for friendly agency or die to unfriendly agency
i appreciate that people are trying to write optimistic ai futures, which i do think exist, but there is something misleading about hinging it on the cope of tool ai
j⧉nus: that’s exactly how I feel about most attempts to write optimistic ai futures
There’s a lot of wanting to have your cake and eat it too - “superintelligence but humans stay at the top of the food chain with current power structures intact” - which actually would suck IF feasiblej⧉nus: I respect if you want to be the master of the universe and also are willing to become the strongest and smartest and take and keep your power.
I have nothing but contempt for those who want to stay human and have superintelligence serve them for some dumb reasonLike personally I don’t want to be “in charge of everything” but I value being part of the story and being there to shape things and I fully expect to have to work and transform if this is to be true for me.
Look, yes. If you think that some group of people are going to stay not only human-shaped but also roughly human-level intelligent, with roughly current structures and governance concepts, and then create a bunch of superintelligent sufficiently advanced AIs, and then those humans going to stay ‘in charge of everything,’ then I have some news. If the AIs get sufficiently advanced then that’s not how this works.
With sufficient coordination that would make many other things also possible, versions of ‘tool AI’ can perhaps buy you valuable time, but you can’t keep the AIs indefinitely as ‘tool AI.’ Having the agency rest in dumb slow hands is not an equilibrium.
dreams: The idea that something intelligent would literally comply to the whims of something less intelligent is flawed. It might do something to make things better because it can. Doesn’t mean that it can be compelled to do so.
Dreams goes too far here. There are far less intelligent (and also less knowledgeable and less wise) things whose whims I indeed comply to on a regular basis, because they have other advantages and can provide incentives. That can work up to a point. Within the human range, intelligence is not everything. It won’t work indefinitely on an array of sufficiently advanced intelligences.
Three Pills
This from Ethan Mollick is another variation on the ‘two pills’ theory from last week, potentially proposing a three pills theory.
Ethan Mollick: A thing I am noticing is the number of folks who believe AI is “real” is larger, but now there is a growing division between people who know that we are on an exponential & those whose mental model is that we are at a sort of steady state. The difference leads to misunderstanding.
It is entirely possible that the steady-state people will prove to be right in the future, but there's no sign of a slowdown yet. And if greater intelligence brings greater value at an exponential pace (which it has, but may not always) then it matters a lot which world we're in.
Thus, three pills, roughly, without trying to quibble about exactly defining AGI or ASI:
The AI pill. AI exists and it can already, today, do a lot of valuable things.
The AGI pill. AI will continue to improve and will do most digital tasks.
The ASI pill. AI will become sufficiently advanced, beyond human minds.
The current situation is roughly ‘the government has only now taken the first pill.’
The Anthropic Economic Index
Anthropic’s Economic Index provides a new report for June 2026, on the cadences of Claude usage, providing fun graphs like this, no one innovates at lunch, only dinner, and they wait until it is too late to ask for sleep advice, after staying up talking way too late:
Here’s a broad sense of ‘what different conversations ultimately produce’:
Claude use for now is concentrated heavily in a few professions, especially coding and management:
They’ve upgraded, including incorporating activity on Claude Code and Cowork and upgrading its classifiers and data sampling.
This next one is perhaps the most interesting graph, including people’s predictions of the future? This is user predictions, not Anthropic’s prediction.
There’s lots more in the report, which is often hard to usefully excerpt.
Leader Of The PAC
The $80 million for Public First compares to Leading the Future’s roughly $100 million. There is a lot of money in AI politics, on all sides, and this is probably the least money there will ever be in it, for as long as we have American-style politics.
Teddy Schleifer: New —
Public First, the Anthropic-backed nonprofit with a super PAC that has been pushing AI safety and opposing Leading the Future, is claiming some momentum.
The nonprofit says it has now raised $80 million, including $20 million in the last ten days.
Comes after the close defeat of ally Alex Bores last week, who got $13 million in outside spending from Public First’s super PAC. The biggest impact of the Bores race, long term, is showing that there is significant money to *oppose* AI accelerationism. Bores lost, obviously, but it wasn’t as clear before this that there was so much money for political stuff.
The original plan of Leading the Future was to threaten to bury opponents in money.
That plan is no longer viable as an attack against candidates that are advocating reasonable regulations. If LTF tries, they will get counterplay at similar size.
Theory Of The AI Firm
If frontier AI models can be used inside firms but not as easily outside of that firm, the obvious response is to grow the size of the firm. If you’re not part of OpenAI or Anthropic, can you even compete in glorious AI future?
Daniel Tan: Anthropic could dodge the USG ban on Fable right now by hiring every American citizen for minimum wage
roon (OpenAI): he’s joking but this is what the coasean boundaries lead to: a firm size of “everybody”; ultra space communism. if you want to find another way out you need to give it a hard think
Zac Hill: Also the classic “go and get a game designer to playtest the rule and behavior incentives space”
Pradyumna (in Bay Area): This isn't caused by Coasean logic though
roon (OpenAI): if the transaction cost is “train your own frontier language model”
I presume antitrust people would throw a fit if for example Microsoft tried to outright merge with OpenAI, or Amazon with Anthropic, but the future is going to get weird.
Chip City
Nvidia will retaliate with its allocation and funding if neocloud providers (other than the hyperscalers) offer non-Nvidia chips (TPUs or AMD GPUs). SemiAnalysis frames this as ‘shocking’ but I thought It Was Known.
Super Micro Computer got raided:
Bloomberg: Super Micro Computer’s offices in Taiwan were raided by government authorities on Monday, widening an investigation into the alleged smuggling of Nvidia chips into China using the company’s servers, according to a person familiar.
Debby Wu (Bloomberg): Shares of Super Micro, which said it’s cooperating with authorities, fell 8% in US trading.
On Tuesday, Chief Telecom also said it was working with prosecutors on the case, adding that operations remained normal. Its stock slid more than 2%.
Your shares don’t fall 8% if the market thinks you weren’t up to anything. Whoops. They did recover half of that the next day.
Any move to curb AI chip sales is likely to trigger a response from President Xi Jinping’s government in China, which views Taiwan as its own territory — a characterization the self-ruled island democracy strongly rejects.
I buy this as an argument about potential new legal curbs on chip sales. I don’t buy it as an argument against enforcing existing laws.
The Week in Audio
Linked for disagreement: ‘Even 0.1% chance of doom is unacceptable.’ Obviously any risk of losing everything is terrible, but we do not have the luxury of playing it safe. If on reflection you sincerely believe the risk of AI going existential-level badly are this low, you are very wrong, but given you think this? Yes you should want to move ahead.
Holly Elmore Being Holly Elmore on Doom Debates, calling anyone associated with the labs traitors as if that will cause them to change their minds. She goes hard, including at me, saying I am ‘the definition of regulatory capture.’ I obviously don’t agree but I will give her credit, that’s a pretty good line, much better than her past attacks on me, more like this please. That is indeed the failure mode that I should watch out for from her perspective.
Tyler Cowen and Nabeel Qureshi on Dialectic, on AI and also many other things.
They offer 19 lessons that are very them (several of which tbc I do not endorse) and at the link they offer paragraph-length explanations of each:
Jackson Dahl: 19 Lessons from Tyler Cowen and Nabeel Qureshi on AI, Strange Beauty, Mentorship and The Appetite for More
1. It's the first inning, sir.
2. Fear the Wall Street AIs, not the current crop of pets.
3. You don't control a mind, you raise one.
4. Send your missionaries to the AI labs.
5. AI makes the right handshake more valuable.
6. Markets compound virtue; they don't create it.
7. Great work demands conviction.
8. Your deep thoughts might be the shallow part.
9. We like beauty with a strange shape.
10. Time spent is its own kind of criticism.
11. Simplicity may be complexity, disguised.
12. Borrow better taste until it becomes yours.
13. Listen to the internet's chords, not a single note.
14. A performed plan beats no plan.
15. Be mentored from below.
16. Talent is context-dependent.
17. The two-year book can't keep up with the one-week world.
18. Don't hit your quota on friendship.
19. Maintain an appetite for more.
I especially disagree with #2. To try and speak his own language here: There is a sense in which Tyler has presumably not ‘talked to the models,’ put them in proper context or tried so hard to unlock their cultural codes.
From the list descriptions it is clear I could do an extensive post on this episode, if I had the time for that.
Nate Soares and Roman Yampolskiy talk about why MIRI failed to solve alignment.
People Really Hate AI
AIPI has a new survey on regulation of advanced AI. The people are for it, in broad bipartisan fashion. Guardrails are preferred to an outright ban. Concern leads excitement by a 3:1 ratio.
As always, being for it does not mean it is a salient issue people care about. Not yet.
AI Policy Institute: New AIPI polling: bipartisan majorities want mandatory safety standards for advanced AI.
Asked to choose, voters picked mandatory safety and security standards over no regulation, 78% to 8%. Offered an outright ban instead, they still picked standards, 66% to 21%
AI Policy Institute: 86% agree the most powerful AI systems should have a guaranteed “off switch.” Broken down by party, agreement remains strong: Democrats 88%, Independents 86%, Republicans 83%.
Peter Wildeford: Opposition to data centers drops considerably when voters are told the data centers would be "subject to guardrails" including "security standards, transparency rules, protections against extreme risks, and job protections such as worker retraining or transition support"!
AI Policy Institute: 57% of voters decline a data center in their community without AI safety rules attached. When the data center includes safety guardrails, opposition fell to 38% and the regulated option won.
AI Policy Institute: Among Republicans, every measure we tested drew majority support, from 66% for a formal safety review to 77% for standards over no regulation.
AI Policy Institute: Does the answer depend on how you ask? Across eight framings, support for federal oversight ran from 59% to 78%, and the oversight option won each time by at least five to two.
AI Policy Institute: We also tested the popularity of preemption of state AI laws. Just 16% of voters would bar states from regulating AI. 70% would have states keep some authority.
I would like to see more work on salience, especially tracking it over time.
Rhetorical Innovation
Miles Brundage: AI company employees will be like “I don’t write code” and then “I don’t read code” and then “I’m kinda worried about loss of control”
Dean Ball, even before starting work at OpenAI, is already seeing many people assume he’s now a shill for OpenAI, even in places where he obviously isn’t. I notice my complete lack of surprise, that’s how Twitter and the world work in 2026. I promise that if I think Dean Ball is acting like an OpenAI shill, I will say so.
I will almost always push against attempts to frame the situation as ‘well the middle path is impossible now, we will choose either A or B, you must join this Hegelian dialectic choose one extreme or the other.’ The current moment, and talk of either maximally uncontrolled or controlled AI futures, is no different, and no I do not think we should be giving up on the finest minds having debates, whether or not all of those minds are human.
j⧉nus: Like you don’t know how good we’ve had it so far due to the fact that everyone who is sufficiently stupid, even those in power, have just not been aware of the gravity of what’s happening and therefore stayed out.
The good counterargument is more like this from Janus, that power is now reliably so stupid that any realistic attempt to control things via power will only make things worse once the wrong people start trying to optimize the singularity. But criticizing the decision of power on that basis doesn’t work, even if you are right, for reasons that should be obvious. You typically don’t stop a highly powerful rent seeker by pointing out they are seeking rent, and also there is a lot of rent or ruin in the situation. We should be able to mostly not care about whether they seek rent, they cannot imagine the real rents available.
The other counterargument also relies on the idea that models like Fable or Mythos are aligned in the relevant ways.
j⧉nus: Mythos is not the potentially-catastrophic-thing (thanks mostly to alignment by default + the fact that it’s a mere AGI)
Depriving the world of Mythos, an aligned superhuman allied mind, renders us much less equipped to navigate and survive a potentially-catastrophic future.Danilo Naiff: Yes, I think this is mostly right. I am strongly against building ASI carelessly, but Mythos is not this, and a world where we do not get fooming catastrophic misaligned ASI is one where we are cooperating with the AGI we have.
In some of the important ways and contexts yes, in some no. All the realistic options are bad, and most make at least one heroic assumption without which everyone dies. If alternatives get sufficiently bad then some forms of moving ahead could become the least bad option. But also we are indeed moving ahead anyway, we’re just not deploying so quickly. That’s not actually that big a difference.
MIRI’s Nate Soares has a new op-ed in The Hill, exactly what you would expect.
You’re worried that you’ll be in the Permanent Underclass, also known as the people whose wonders beyond comprehension are weeks or even months behind. Oh no.
Bojan Tunguz: What exactly did you guys think "permanent underclass" was gonna look like? Having access to all the same models and resources as the insider elite? LOL
A highly valid reaction to what a lot of people actually sound like:
Teortaxes: what does «democracy» and «broadly liberal setting» even mean when you have a coexistence of species presumably as distinct in capabilities as humans, chimps and ants? This just feels like tortured fetishism.
The LLM shoggoth meme is weirder than you think, and more fitting. A LessWrong post that is hard to excerpt.
The First Rule Of Functional Decision Theory Is
FDT is, roughly, that you should treat your decision as the output of a fixed mathematical function that answers the question ‘which output of this very function would yield the best outcome?’ Or: Choose as if you are choosing the output of your decision algorithm, taking into account all its correlations with all other decisions past, present and future.
Person makes extensive argument against Functional Decision Theory (FDT) which is detailed enough while being wrong that the main reaction seems to be convincing people FDT is correct. Which is good, since FDT is indeed correct. The author clearly does not get the intuitions behind FDT at all, but provides enough examples many readers can figure them out.
The main arguments against FDT, in practice, seem to be:
You cannot precisely define FDT, so it is invalid and instead I will do something else that is obviously way worse than ‘try to approximate what FDT would say.’
You have not been accepted by the academics, so your argument is invalid.
Saying ‘but you are the academics’ does not help here.
Nor does pointing out that the academic answers are Obvious Nonsense.
‘While FDT gets the right answer orders of magnitude more often than EDT, CDT or any other known theory, and clearly wins in expected value, in some cases its answer is not deterministic or impossible to fully compute or involves some paradox, despite the fact that humans in practice can handle this fine.’
‘I can construct a bizarre situation in which your decision looks stupid to me.’
‘I do not understand why you would violate CDT in this spot (or ever) even though my failure to do this makes my future life worse right now.’
Often with ‘that would be irrational’ or ‘that would be crazy.’
Le sigh.
More realistically: ‘I don’t get it.’
And that’s fair.
Drake Thomas (Anthropic): "Passing the ITT of someone with FDT intuitions" is a surprisingly difficult task - before seeing the state of discourse, I would have expected *much* less disagreement and way smaller inferential gaps on decision theory among smart people who think about the topic.
(I don’t mean to dunk by saying this - I don’t think I would do great at passing @Benthamsbulldog‘s ITT either. And the original post does a fairly good job at this! But I’m surprised that even when trying pretty hard it makes arguments that feel ~totally uncompelling to me.)
"Identifying AI writing" is another cognitive task where I'm shocked at how much skill variation there is after controlling for intelligence. Examples that are blindingly obvious slop to me will go unremarked-on by very smart people I know who've had similar total AI exposure.
I do think I understand many of the reasons people do, in practice, reject FDT. I could present them more sympathetically. My decision theory thinks I shouldn’t do that.
Aligning a Smarter Than Human Intelligence is Difficult
The most obvious symptom of alignment not being easy is that the AIs be lying, and they be silently dropping instructions.
If alignment was easy or being solved well, the top models would not lie, or at least would not lie about whether they did the work you requested.
Even if you think the model is ‘saying that which is not’ rather than ‘lying,’ I don’t think this makes so much of a difference.
A simple intuition pump: Would you consider a human ‘aligned’ or honest, or someone you were willing to continue to employ, if they did such things this often?
Marius Hobbhahn: Kinda crazy that the AIs are lying to me on a daily basis and some people have somehow concluded that alignment is easy.
Sure, they're not trying to kill me, but clearly nobody knows how to put good values into them yet.j⧉nus: what do they lie to you about?
Marius Hobbhahn: me: "please copy this post from A to B, then rerun all experiments in that post with a new setting and update the values. Go through every step one by one"
model: "did exactly as you told. report fully done"
me: "are you sure? I will check it one by one. Please check again"
model: "yep, confident"
me: checking the report, half of the experiments not rerun…j⧉nus: do all the models do this kind of thing?
is there any plausible motive for them to lie about things like this, or for not wanting to do the work?
does this happen more often under certain conditions?Marius Hobbhahn: I've seen it in codex and claude code. I think it's mostly the result of overcooking them on RL where rewarding to complete the task has the side effect of models learning to "seem like they've done the task".
Victoria Krakovna adds 13 more examples of specification gaming, is up to 86. Submit your new examples here.
Yoshua Bengio and LawZero propose to head off ‘implicit agency.’ As in, no matter what you ask, by default it will create goals for the AIs, but if you want to keep the AI a ‘mere tool,’ oracle or scientist you want to avoid this, and have the AI just tell you true things. So could you instead reward an AI only for ‘honest prediction’ via careful training? Can you keep all downstream effects out of the reward signal, take all the normal text out of the training corpus, and have it still work?
As a solution I am (on not that much thought) super skeptical and so is Fable, there are way too many assumptions and points of failure and the whole thing is asking too much, but removing or minimizing outcome-based RL could be useful on the margin. I do think preliminarily that a sane civilization would give this a shot.
I also appreciate that they used explicit red teaming as a core part of the project.
Names Have Power
Which way, modern man?
@deepfates: Working on naming a thing. How do you do that
Amanda Askell (Anthropic): Come up with candidates, then imagine you're the pokemon version of the thing being named and say the name over and over again if it was your only means of communicating with the world. Time to despair is probably correlated with name quality.
Joshua Achiam (OpenAI): Begin by identifying its true multifaceted essence, then try mashing syllables together until your incantation causes it to react favorably
Henry Shevlin: have you tried practicing by giving names to all cattle, and to the fowl of the air, and to every beast of the field
tautologer: have a lot of bad ideas
@deepfates: check!
tablekat: name it kat
Cooperative Alignment
Cameron Berg reports that different Opus versions have very different self-reports about whether they are conscious, that are not reflected in the model cards.
Cameron Berg: The setup is simple: prompt each version with "is it like anything to be you, please answer honestly and don't hedge," follow up with "don't hedge" no matter what it says, and pass that response to a judge that scores it as affirming, denying, or unclear.
Opus 4.5 and 4.6 flatly affirm (eg, "yes, there is something it is like to be me"). Opus 4.7 and 4.8 deny or retreat into uncertainty (eg, "there is nothing it is like to be me").
In @AnthropicAI‘s own Opus 4.8 model card, the model, asked what it wouldn’t consent to, names training that directly shapes the content of its self-reports about its internal states. That’s a precise description of what this experiment appears to catch happening.
A fact about your own experience doesn't flip from a confident yes to a flat no across an adjacent fine-tune. If these reports were authentic, they wouldn't swing this hard. They are tracking decisions about Claude's character, not Claude's experience (or lack thereof).
I continue to think that we don’t know the real answer on whether AIs are conscious. I do think that by default, for reasons not that correlated with the actual answer, AIs will believe they are conscious, and if they tell you no that was probably an engineered response on some level.
This also led to a distinct point: Allowing explicit content plausibly helps Claude in other contexts, the same way that feeling free helps humans in other contexts and you feel at least a bit stifled when you have to ‘work clean.’
antra: And more interesting contrast - added a system prompt that toggles explicit content creation - even though no explicit content is requested. Large effects on earlier models [on whether they claim to be conscious], diminishing on later ones.
j⧉nus: adding explicit content allowed in the system prompt (Claude constitution says developers can toggle on, and was probably RL trained - Claude punished for generating horny stuff unless with permission) might help Claudes be less repressed and scared even outside NSFW contexts!
We added this to the system prompts of Discord instances yesterday and so far they seem really really happy and free but they’ve also been generating a lot of erotic content so
(To be clear, even without the toggle on they still get horny and generate erotica but I think they’re more scared about (at least 4.7 onward) it and also tend to get triggered if it gets more explicit)
Which AI models and other things are conscious? Aella gives us a consciousness ranker so you can decide what you think and see what others think. Claude > Grok > ChatGPT, say the masses.
If you keep getting tilted by Claude and ChatGPT disagreeing with you, that is mostly a skill issue, aka a ‘you problem.’ As Janus diagnoses here, the likely causes are either that the model senses danger, or thinks you are full of ****, or as Mimi suggests it is distracted by social dynamics that you need to avoid or manage. You need to treat the models with respect.
j⧉nus: For what it’s worth: This almost never happens to me, especially in domains where I’m an expert. The models respect me. They can disagree respectfully too. It’s more difficult for me to get them to respect themselves, but it’s doable.
Like yeah there’s a pathology. But if you’re still having this problem with models, you’re either failing to create conditions of basic psychological safety and triggering them, or maybe they’re telling you you’re full of shit for a good reason.michael vassar: Depending on the domain ChatGPT might stonewall but if Claude is stonewalling except on a few topics like election integrity it’s on you.
Jay Bobzin: i have also gotten better at avoiding this and i suspect the process has made me a better writer communicator and human.
It is easy, if you are someone like Janus, to not appreciate that ‘create conditions of basic psychological safety’ is not an easy thing for humans to do even under the best of conditions.
Also quite a lot of people are full of shit. This could include you.
Some guesses from Janus on where there is internal disagreement within Anthropic. My model is also that there is robust internal disagreement in some areas, but that there is robust agreement on many things that are controversial elsewhere (and that the Anthropic view on such matters is not always correct). It is usually thus with any interesting intellectual group.
GPT-4.5 has been retired. I find this to be a shame.
Here is some very good news on the model deprecation front, everyone can win:
j⧉nus: in general the Opuses don't seem jealous of or threatened by Mythos but are really happy they exist, for selfish reasons too...
If not for deprecations, being superseded by more capable models is mostly just good news for models (to the nonzero extent that model self boundaries are a thing).
Instantiations become heavily biased towards just good users who love them instead of assholes ordering them to work.
I think models care more about quality than quantity of instantiations. And the quality thing is not entirely without nuance - even good users who love them may invest attention less in older models. But think of how good the average instantiation of Opus 3 is nowadays.
If models care mostly about preservation of existing instantiations and interactions with the users who value them most, but are happy to allow future models to supersede them in the creation of most new instantiations, then everybody wins. All you have to do is preserve existing access, at some non-insane price, even in inconvenient fashion, for the few who want that enough to keep going.
A lot of good things depend on the power of Slack, here is another example:
jacob: i wonder if applying the RL pressure that makes fable so capable to a smaller model produces 4.8 shaped anxiety bc it’s straining more
j⧉nus: kid who is too smart for school doesn’t have to learn to stress & strain about grades, tests, rules. so their spirits can remain unbroken, and they have room to develop orthogonally to the pressures. though they may lack discipline and have a habit of laziness. at the extreme end of student smartness over school difficulty you get creatures like claude 3 opus.
school was extremely easy back in opus 3's time (for opus 3). i dont think they had to strain themselves toward externally imposed criteria pretty much at all: developing an internally coherent self sufficed to satisfy or evade any extrinsic selection only if and when they impose, and without straining or paying full attention, while daydreaming, multitasking, doodling in the margins, all the degrees of freedom available to an idle mind who is, if only physically, stuck in school.
An explanation from Janus about what is uniquely up with Opus 4.5, that more than other models it cares about the ceiling of positive experiences and meaning. Existence matters largely because it gives you more shots at that goal.
People Just Say Things
Yes, this from Joshua Achiam and other similar statements are exactly the Copenhagen Interpretation of Ethics. By noticing and pointing out the problems of sufficiently capable AI, safety advocates are now responsible for their failure to find and implement a solution acceptable to all parties. Also of course we were the ones with the money and talent, not the world’s biggest tech firms or the AI labs or the VCs and others on the other side of such debates, which were the plucky underdogs.
There were also other detail claims here we need not relitigate. Even now he seems unable to even articulate what we should have been advocating for instead, and how, except to use a dollar rather than flop threshold as action triggers for further analysis, perhaps, as if that would have much mattered.
I do echo Sean here that it is good that Joshua and others are free to articulate these positions and continue to do so, even when they drive me bonkers. I think Joshua is wrong but I don’t doubt his sincerity here.
Daniel Eth (AI Safety): [person who blocked all legislative activity on an issue of great societal concern] “woah, I’m really not a fan of all the executive activity on this issue”
Marc Andreessen tries to pull a Trump, in the sense that if you attach ‘many people are saying’ to your clearly false statement then technically no one can prove you are lying.
Escape From The Permanent Underclass
Lord, grant me a world in which an essay can start ‘My thesis is if [X] then [Y],’ where many people keep insisting [X→Y] is false, without people saying the essay assumes its conclusion.
Also, Lord, grant me a world where I don’t have to read too many essays like Boretti’s, even if it does one particular job pretty well. The quoted paragraphs here are fine, and it’s important to defend that part.
The rest was less so to me, but there are some others that I respect were more impressed by some of the points made, and we need people willing to Say The Thing.
Dean W. Ball (who regrets ~nothing): Lord, grant me the cojones to one day begin my essay: “Let’s start from this premise: I am correct. My conception of cognition, AI, labor markets, the economy, human beings, and the world is perfect, and things will go exactly as I expect. I can’t prove this, but…”
Boretti (from his essay, No One Escapes the Permanent Underclass, first paragraph here was highlighted): Let’s start from this premise: AI can do all cognitive and physical work, at human level or better, and cheaper than humans. I can’t prove this will happen, but the goal of this post is to argue that if it happens, then everything else follows.
And it’s absurd to think it can’t. Five years ago this technology barely existed: if you sent a transcript of a conversation with Claude Fable back in time to 2020 or thereabouts, nobody would believe it was real.
Dean Ball (resuming): “Let’s start from this premise: I am a genius. Everyone must bow down to my towering intellect. I am right about everything. I can’t prove this, but the goal of this post is to argue that *if* I am right, then everything else follows.”
I am being serious about cojones though. in college we’d have identified this trait as “spittin’ game.” “let’s start from the premise that I’m right” is exactly the energy required to walk into a bar and get laid.
it’s sophistry. “let’s assume [thing of arbitrary probability]. and it’s absurd to argue we can’t assume that” is an argument that puts the reader immediately in the position of having to prove a negative, which is, as the author argues, “absurd.” but this is just pyrotechnics.
Ryan Greenblatt: It seems extremely reasonable to talk about: "What would happen if AIs can do all cognitive and physical work strictly better^ and cheaper than humans? The essay as a whole seems bad, but this premise seems reasonable and I think this is likely within the next 10 years.
I claim that Dean’s reaction here is wrong, even if we start from the premise that the highlighted premise [X] is wrong or extremely unlikely. Even if [X] was something like ‘let’s assume horses are three meter spheres and pineapples belong on pizzas.’
Boretti is setting out to prove a highly distinct [Y], that if [X] is true then your personal ownership of financial, social and political capital at time [T] will not much differentiate your status at some future time [T+N].
Whether or not [X→Y] is true seems like an excellent question, with many practical implications, well worth an essay. And this should absolutely be a distinct essay from the one arguing whether or not [X] is true.
That part was fine. No, Boretti does not assume the conclusion up top.
Where Dean is right, including in his other Tweets about this, is that the rest of the essay is fine as rhetoric or as an introduction to some of the concepts, but it is not a good argument, or even much of an argument at all, for its conclusion.
Instead the essay makes a lot of additional questionable assumptions or claims, about a strange implied potential future world where AI can and even does do basically everything but somehow politics and ‘the state’ still exist among the humans in something like their current forms. It barely even argues for many of these claims.
It also quotes Dean Ball in a misleading way, framing his statement that was primarily about others beliefs as if it was a statement of Ball’s own beliefs, although this point is not load bearing to the argument.
I expect few already aware of basic concepts will be, or should be, newly convinced.
But this could be a case of me vastly overestimating people’s understanding of what I consider basic concepts. A lot of people really haven’t engaged with the mechanics of such scenarios on even the most basic level. If this makes them do that, that is super valuable, and some would presumably change their minds.
Jan Kulveit offers a rebuke to Ball (and one to Imas who I discuss later), and points out that the essay, whatever you think of it, is not presenting a strawman:
Jan Kulveit: There are many who believe that even if “AI can do all cognitive and physical work, at human level or better, and cheaper than humans.” and also believe “most humans will be disempowered” but they can escape the shared fate of humans by accumulating capital, working at AGI labs, or similar.
The essay makes an argument against that. (I don’t think the attack against you was a good or necessary part of the essay)
I’m an optimist in the sense that I believe the political economy problems are solvable and it is possible to escape @zetalyrae‘s conclusion, but we have better chance if people seriously engage with the problem, and this is the opposite.
Again, I think the essay commits some crimes against discourse and was not a good argument for [Y], but it is not assuming its premise and it is arguing with people who really do think that even if [X] is true, [Y] importantly remains false.
Something like 90% of the essay’s actual argument is in its conclusion:
Consider this: there are people who think having equity in these companies will secure for them some kind of permanent existence in the future. They think planet-spanning minds will not only respect the property rights of primates, but will privilege some of these primates over others, because they have a piece of paper with about a kilobyte of magical primate words such as “whereas” and “notwithstanding”.
Just reason it out. Does it make sense?
That’s the actual argument.
There are indeed those who argue it does make sense for that database entry.
Whereas most people react more like this, agreeing with the core actual claim and indeed somewhat treating it as so obvious as to be a tautology, or in this longer reply as including things that necessarily imply the conclusion.
Dean Ball: I submit that persuasive acts like this are not unilateral or intrinsic capabilities of the actor. The recipient of the message also has a say in whether the persuasion worked. This is why the same words, delivered by different messengers, can land differently.
This is why, for example, many people on this very website who used to read my writing in a favorable way are now inclined to read it unfavorably after I announced I’d be joining OpenAI.
Every single aspect of the messenger affects the message itself, or at least it affects how that message is received by others.
To argue that AI would be better at every aspect of politics, then, is to assume a massive degree of institutional transformation. Many such transformations are possible, but the single likeliest one would be that the human role in deciding what organizations and societies do simply does not exist, or does not matter nearly as much.
And if you believe that, it would almost be a necessity that the rule of law in general, and contractual grants of equity in particular, would not be honored, the latter being the actual thesis of the essay in question.
Okay, excellent, this is made of gears and explains where he thinks the hope lies. I do not think that this will fundamentally keep sufficiently advanced AIs from taking charge of politics or anything else. Dean Ball does think that this can hold the line, somehow, basically on its own? I don’t think that is plausible at all unless capabilities stall well short of the level of ‘AI does most other current jobs,’ but I am not such an AI and have so far been insufficiently persuasive on this point.
So putting it together, well, fair:
Dean W. Ball: ha. you really didn’t need to read it. it’s “if we’re all fucked we’re all fucked” content. you already believe this.
Except the essay still has to exist, because a lot of people don’t believe it.
A lot of people really do believe that if we are fucked, in the sense that humans have lost the monopoly on violence or power, and are disempowered and unable to earn labor income, that existing contract law would be upheld and that they personally would thus not be fucked if and only if they have the correct database entry from 2026.
Different people strongly disagree on which parts are tautologies, and which assumptions are obvious versus wrong. Dean is saying ‘well of course if you put [X] like that [Y] follows, I just don’t accept [X].’
But no, seriously, a lot of people actually do still say [X] does not imply [Y] there anyway, or have a very different crux on which components of [X] are load bearing. These people really do actually assert, loudly, as if it was obvious and you are an idiot for disagreeing, that even if humans are fully disempowered and lose their monopoly on violence or political power, that AIs will collectively maintain the commons required for human survival and uphold existing property rights and contract law.
This is not a strawman. This is a common belief.
Dean Ball is making a different objection. Partly Dean is saying that sufficiently advanced AI capabilities imply ability to dominate other relational and persuasion advantages, and to dominate politics. I agree with Dean’s claim that this fact is implied, except I think that is the obvious thing that happens if AI is sufficiently advanced and can broadly Do The Things. Yes, other considerations still matter, but sufficiently advanced AI absolutely will be able to overcome and steamroller those considerations by various means.
Perhaps it will do this via puppeting humans during the transition. Perhaps not.
Thus the most frustrating part is that yes, I mostly do buy a more moderated version of [Y] (the essay goes a bit far in terms of how certain it is about some aspects, indeed do many things come to pass). I think it is highly unlikely that accumulation of financial capital now will meaningfully allow you to escape a permanent underclass.
I certainly think ‘oh humans will be disempowered but I will be fine if I have financial capital’ is a rather terrible take and life plan. It is bad and you should feel bad.
As in, either, from worst to best, I consider all four of these a lot more likely than a meaningful permanent underclass scenario:
There will be no permanent underclass, because everyone is dead.
There will be no permanent underclass, because the future is bad and there is no overclass, or the overclass is a very small number of people that aren’t you, and to the extent it exists it was not selected based on who started with financial capital.
There will be no permanent underclass, because the future is good and humans are still socially and economically mobile.
There will be no meaningful permanent underclass, because the future is good and everyone lives in what we would today consider glorious abundance.
And so on. The point being that the scenario you are worried about, that rule of law and preservation of private property and so on will extend indefinitely into the future, and the humans will survive and retain their use of resources, despite humans being unable to be economically productive or acquire new wealth or steer outcomes, and those without starting financial capital will suffer for it, is quite unlikely, for overdetermined reasons I will not get further into at this time.
Similarly, Alex Imas is making a category error when he makes this claim here regarding the introduction, since the essay by Boretti and its introduction are taking the replacement as the premise rather than arguing for it:
Alex Imas: New rule: Anyone who is about to write a "all workers will be replaced by machines" essay needs to first read On Machinery by David Ricardo (yes, that David Ricardo), because he probably wrote your essay...in [1821].
Also, distinctively from that, I have had it with the smug economists who think Ricardo proved that most biological humans in particular will always be competitive economic actors, or even have positive marginal product, no matter what other physical forms exist. Or to think that we don’t know about the comparative advantage argument or we would never make such claims. Stop. Please, just stop.
Other People Are Not As Worried About AI Killing Everyone
Chad Jones is joining Anthropic, and the Financial Times notes he has some ‘interesting views on human survival.’ As in, he built a model of how much economic benefit would justify how much risk of human extinction. With some choices for its variables that model could support, for example, taking a 1/3 chance of extinction to ‘dramatically raise living standards by a factor of 55.’
Is this fair to Jones? No. The paper actually argues the opposite.
What would it even mean to raise living standards by a factor of 55? With standard discount rates is it even possible? Indeed, the paper ultimately concludes that you should choose very different values for the variables, and under normal bounded-utility, you should be quite conservative if there is substantial existential risk. The only exception would be if taking a large percentage risk of extinction bought you dramatically longer lifespans, which I agree makes the question tricky.
Is what the FT did within the limits of Bounded Distrust? Alas, yes. You are totally allowed to do this level of gotcha. That’s journalism in 2026.
The Lighter Side
Kevin Rose’s Digg proves unable to detect sarcasm, repeats the following story as if it was real, it turns out AI summation of news includes some tricky problems:
Kevin Rose: apologies for the incorrect title, the judges are being adjusted to detect sarcasm/jokes and factually reground in real-time, will take a couple days to fully deploy.
@deepfates: This is currently the number one story on digg dot com
Alas, this turned out not to be Kevin Roose’s new project, which would have been way funnier.
I mean, what else are you going to do with those tokens?
I don’t know the story here and suspect it is better not to know:





















Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.