Anthropic ditches its core safety promise

(cnn.com)

368 points | by motbus3 3 hours ago

40 comments

  • shubhamjain 2 hours ago
    I was wondering if it was because of heavy-handedness of the administration, but apparently:

    > The policy change is separate and unrelated to Anthropic’s discussions with the Pentagon, according to a source familiar with the matter.

    Their core argument is that if we have guardrails that others don't, they would be left behind in controlling the technology, and they are the "responsible ones." I honestly can't comprehend the timeline we are living in. Every frontier tech company is convinced that the tech they are working towards is as humanity-useful as a cure for cancer, and yet as dangerous as nuclear weapons.

    • ACCount37 1 hour ago
      That's because it is.

      AI is powerful and AI is perilous. Those two aren't mutually exclusive. Those follow directly from the same premise.

      If AI tech goes very well, it can be the greatest invention of all human history. If AI tech goes very poorly, it can be the end of human history.

      • observationist 38 minutes ago
        Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an 'intelligence explosion,' and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.

        -Irving John Good, 1965

        If you want a short, easy way to know what AGI means, it's this: Anything we can do, they can do better. They can do anything better than us.

        If we screw it up, everyone dies. Yudkowsky et al are silly, it's not a certain thing, and there's no stopping it at this point, so we should push for and support people and groups who are planning and modeling and preparing for the future in a legitimate way.

        • visarga 23 minutes ago
          John Good's quote is pretty myopic, it assumes machines make better machines based on being "ultraintelligent" instead of learning from environment-action-outcome loop.

          It's the difference between "compute is all you need" and "compute+explorative feedback" is all you need. As if science and engineering comes from genius brains not from careful experiments.

          • circlefavshape 8 minutes ago
            > As if science and engineering comes from genius brains not from careful experiments

            100% this. How long were humans around before the industrial revolution? Quite a while

          • Eldt 12 minutes ago
            Maybe ultraintelligence is having an improved environment-action-outcome loop. Maybe that's all intelligence really is
        • SecretDreams 1 minute ago
          > support people and groups who are planning and modeling and preparing for the future in a legitimate way.

          Who is doing that right now, exactly? And how can we take their tech and turn it into the next profitable phone app?

        • LeifCarrotson 26 minutes ago
          "There's no stopping it at this point" - Sure there is, if a handful of enormous datacenters pull the very large plugs (or if their shaky finances collapse), the dubiously intelligent machines will be turned off. They're not ultraintelligent yet.

          Stopping it merely requires convincing a relatively small number of people to act morally rather than greedily. Maybe you think that's impossible because those particular people are sociopathic narcissists who control all the major platforms where a movement like this would typically be organized and where most people form their opinions, but we're not yet fighting the Matrix or the Terminator or grey goo, we're fighting a handful of billionaires.

          • observationist 9 minutes ago
            I'm not saying it's technically impossible, I'm saying that in the real world, it's not going to stop. Nobody is going to stop it. A significant number of people don't want it to stop. A minority of people are in the "stop AI" camp, and the ones with the money and power are on the other side.

            It's an arms race replete with tribalism and the quest for power and taps into everything primal at the root of human behavior. There's no stopping it, and thinking that outcome can happen is foolish; you shouldn't base any plans or hopes for the future on the condition that the whole world decides AGI isn't going to happen and chooses another course. Humans don't operate that way, that would create an instant winner-takes-all arms race, whereas at least with the current scenario, you end up with a multipolar rough level of equivalence year over year.

          • trvz 23 minutes ago
            Open models barely any worse than SOTA exist, and so does consumer-ish hardware able to run them. The genie’s out, the bottle broken.
          • slibhb 20 minutes ago
            Do you really think AI companies/researchers are motivated by greed? It doesn't seem that way to me at all.

            Stopping AI would be immoral; it has the potential to supercharge technology and productivity, which would massively benefit humanity. Yes there are risks, which have to be managed.

      • paradox242 14 minutes ago
        It needs to go well every single day, and only needs to go very poorly once. Not to conflate LLMs with actual super intelligence, but for this (and many other reasons related to basic human dignity), this is not a technology that a responsible society should be attempting to build. We need our very own Butlerian Jihad
      • PowerElectronix 43 minutes ago
        Same with everything, right? You could say the same with nukes, electricity, internet, the computer, etc... But if you look at it without paying attention to the "ultimate tool for humanity" hype, it doesn't really look that much of a threat or a salvation.

        It won't end civilization for dropping the guardrails, but it will surely enable bad actors to do more damage than before (mass scams, blackmail, deepfake nudes, etc.)

        There are companies that don't feel the pressure to make their models play loose and fast, so I don't buy anthropic's excuse to do so.

        • ACCount37 30 minutes ago
          Very few things are as powerful and dangerous as AI.

          AI at AGI to ASI tier is less of "a bigger stick" and more of "an entire nonhuman civilization that now just happens to sit on the same planet as you".

          The sheer magnitude of how wrong that can go dwarfs even that of nuclear weapon proliferation. Nukes are powerful, but they aren't intelligent - thus, it's humans who use nukes, and not the other way around. AI can be powerful and intelligent both.

        • joshribakoff 39 minutes ago
          I agree with all of that. Also consider that there is an argument that the guard rail only stops the good guy. Not saying that’s a valid argument though.
        • squidbeak 19 minutes ago
          Oh really? You think an entity that knows everything, oversees its own development and upgrades itself, understands human psychology perfectly and knows its users intimately, but isn't aligned with human interest wouldn't be 'much of a threat'?

          Or to be more optimistic, that the same entity directed 24/7 in unlimited instances at intractable problems in any field, delivering a rush of breakthroughs and advances wouldn't be a type of 'salvation'?

          Yes neither of these outcomes nor the self-updating omniscient genius itself is certain. Perhaps there's some wall imminent we can't see right now (though it doesn't look like it). But the rate of advance in AI is so extreme, it's only responsible to try to avoid the darker outcome.

      • SecretDreams 2 minutes ago
        > If AI tech goes very well

        The IF here is doing some very heavy lifting. Last I checked, for profit companies don't have a good track record of doing what's best for humanity.

      • joshribakoff 40 minutes ago
        You wouldn’t say that rolling dice is dangerous. You would say that the human who decides to take an action, depending on the value of the dice is the danger. I don’t think AI is dangerous. I think people are dangerous.
        • biztos 24 minutes ago
          I would say that's moot, because OpenClaw has already shown us how fast the dice-rolling super AI is going to be let out of the zoo. Dario and Sam will be arguing about the guardrails while their frontier models are running in parallel to create Moltinator T-500. The humans won't even know how many sides the dice have.
        • ACCount37 35 minutes ago
          Modern AIs are increasingly autonomous and agentic. This is expected to only get more prominent as AI systems advance.

          A lot of AI harnesses today can already "decide to take an action" in every way that matters. And we already know that they can sometimes disregard the intent of their creators and users both while doing so. They're just not capable enough to be truly dangerous.

          AI capabilities improve as the technology develops.

        • computerphage 36 minutes ago
          Why are people dangerous? You can just not listen to them.
          • bgun 31 minutes ago
            Do you have locks on your doors?
      • cael450 30 minutes ago
        Tbh, I find this argument really stupid. The word prediction machine isn’t going to destroy humanity. Sure, humans can do some dumb stuff with it, but that’s about it.

        Stop mistaking science fiction for science.

      • HardCodedBias 53 minutes ago
        "If AI tech goes very well, it can be the greatest invention of all human history"

        As has been said at many all hands:

        Let's all work on the last invention needed by humans.

        • TheOtherHobbes 51 minutes ago
          Except it's more likely to be the last invention that needs humans.
    • tyre 1 hour ago
      “A source familiar with the matter” is almost certainly a company spokesperson.

      If they were unrelated, Anthropic wouldn’t be doing this this week because obviously everyone will conflate the two.

    • Rapzid 1 hour ago
      Well before Anthropic thought they were God's gift to AI; the chosen ones protecting humanity.

      With the latest competing models they are now realizing they are an "also" provider.

      Sobering up fast with ice bucket of 5.3-codex, Copilot, and OpenCode dumped on their head.

    • tenthirtyam 55 minutes ago
      I always enjoyed the Terminator movie series, but I always struggled to suspend my disbelief that any humans would give an AI such power without having the ability to override or pull the plug at multiple levels. How wrong I was.

      N.B. the time travel aspect also required suspension of disbelief, but somehow that was easier :-)

      • zerkten 18 minutes ago
        We delegate power already. Is unleashing AI in some place different from unleashing JSOC on an insurgency in a particular place? One is code and other is a bunch of humans.

        You expect the humans to follow laws, follow orders, apply ethics, look for opportunities, etc. That said, you very quickly have people circling the wagons and protecting the autonomy of JSOC when there is some problem. In my mind it's similar with AI because the point is serving someone. As soon as that power is undermined, they start to push back. Similarly, they aren't motivated to constrain their power on their own. It needs external forces.

        edit: missed word.

    • whywhywhywhy 1 hour ago
      > Every frontier tech company is convinced that the tech they are working towards is as humanity-useful as a cure for cancer, and yet as dangerous as nuclear weapons

      They're not really, it's always been a form of PR to both hype their research and make sure it's locked away to be monetized.

    • jdross 2 hours ago
      Would nuclear energy research be a good analogy then? Seems like a path we should have kept running down, but stopped bc of the weapons. So we got the weapons but not the humanity saving parts (infinite clean energy)
      • DoughnutHole 55 minutes ago
        Nuclear advancements slowed down due to PR problems from clear and sometimes catastrophic failure of commercial power plants (Three Mile Island, Chernobyl, Fukushima) and the vastly higher costs associated with building safer plants.

        If anything the weapons kept the industry trucking on - if you want to develop and maintain a nuclear weapons arsenal then a commercial nuclear power industry is very helpful.

      • raincole 38 minutes ago
        Nuclear energy hasn't been slowed down much, let alone stopped. China has been building new reactors every year for more than a decade and there are >30 ones under construction.

        The same will go with AI, btw. Westerners' pearl clenching about AI guardrails won't stop China from doing anything.

      • turtlesdown11 1 hour ago
        > Seems like a path we should have kept running down, but stopped bc of the weapons.

        you mean like the tens of billions poured into fusion research?

      • shafyy 57 minutes ago
        It's a path we should have never started going down.
    • whatshisface 22 minutes ago
      Shouldn't we be a little more skeptical about these abstract arguments when a very concrete sale is on the line?
    • mikkupikku 54 minutes ago
      To paraphrase a deleted comment that I thought was actually making a good point, nuclear medicine and nuclear weapons are both fruit from the same tree.
    • francisofascii 1 hour ago
      It is a "reasonable" argument to keep yourself in the game, but it is sad nonetheless. You sacrifice your morals and do bad things, so if things get way worse, maybe you will be in a position to stop something from really bad from happening. Of course, you might just end up participating in the really bad thing.
    • scottLobster 53 minutes ago
      > Every frontier tech company is convinced that the tech they are working towards is as humanity-useful as a cure for cancer, and yet as dangerous as nuclear weapons.

      Maybe some of the more naive engineers think that. At this point any big tech businesses or SV startup saying they're in it to usher in some piece of the Star Trek utopia deserves to be smacked in the face for insulting the rest of us like that. The argument is always "well the economic incentive structure forces us to do this bad thing, and if we don't we're screwed!" Oh, so ideals so shallow you aren't willing to risk a tiny fraction of your billions to meet them. Cool.

      Every AI company/product in particular is the smarmiest version of this. "We told all the blue collar workers to go white collar for decades, and now we're coming for all the white collar jobs! Not ours though, ours will be fine, just yours. That's progress, what are you going to do? You'll have to renegotiate the entire civilizational social contract. No we aren't going to help. No we aren't going to sacrifice an ounce of profit. This is a you problem, but we're being so nice by warning you! Why do you want to stand in the way of progress? What are you a Luddite? We're just saying we're going to take away your ability to pay your mortgage/rent, deny any kids you have a future, and there's nothing you can do about it, why are you anti-progress?"

      Cynicism aside, I use LLMs to the marginal degree that they actually help me be more productive at work. But at best this is Web 3.0. The broader "AI vision" really needs to die

    • afavour 1 hour ago
      It's exhausting to keep with mainstream AI news because of this. I can never work out if the companies are deluded and truly believe they're about to create a singularity or just claiming they are to reassure investors/convince the public of their inevitability.
      • ACCount37 56 minutes ago
        It's a fairly mainstream position among the actual AI researchers in the frontier labs.

        They disagree on the timelines, the architectures, the exact steps to get there, the severity of risks. Can you get there with modified LLMs by 2030, or would you need to develop novel systems and ride all the way to 2050? Is there a 5% chance of an AI oopsie ending humankind, or a 25% chance? No agreement on that.

        But a short line "AGI is possible, powerful and perilous" is something 9 out of 10 of frontier AI researchers at the frontier labs would agree upon.

        At which point the question becomes: is it them who are deluded, or is it you?

        • afavour 53 minutes ago
          Sure, when you get rid of the timelines and the methods we'll use to get there, everyone agrees on everything. But at that point it means nothing. Yeah, AGI is possible (say the people who earn a salary based on that being true). Curing all known diseases is possible too. How will we do that? Oh, I don't know. But it's a thing that could possibly happen at some point. Give me some investment cash to do it.

          If you claim "AGI is possible" without knowing how we'll actually get there you're just writing science fiction. Which is fine, but I'd really rather we don't bet the economy on it.

          • adrianN 34 minutes ago
            There are plenty of people that argue that you need nontechnological pixi dust for intelligence.
            • ACCount37 5 minutes ago
              Yes, quite unfortunately. That reeks to me of wishful thinking.

              Maybe that was a sensible thing to think in 1926, when the closest things we had to "an artificial replica of human intelligence" was the automatic telephone exchange and the mechanical adding machine. But knowledge and technology both have advanced since.

              Now, we're in 2026, and the list of "things that humans can do but machines can't" has grown quite thin. "Human brain is doing something truly magical" is quite hard to justify on technical merits, and it's the emotional value that makes the idea linger.

        • grayhatter 39 minutes ago
          > But a short line "AGI is possible, powerful and perilous" is something 9 out of 10 of frontier AI researchers at the frontier labs would agree upon.

          > At which point the question becomes: is it them who are deluded, or is it you?

          Given the current very asymptotic curve of LLM quality by training, and how most of the recent improvements have been better non LLM harnesses and scaffolding. I don't find the argument that transformer based Generative LLMs are likely to ever reach something these labs would agree is AGI (unless they're also selling it as it)

          Then, you can apply the same argument to Natural General Intelligence. Humans can do both impressive and scary stuff.

          I'll ignore the made up 5 and 25%, and instead suggest that pragmatic and optimistic/predictive world views don't conflict. You can predict the magic word box you feel like you enjoy is special and important, making it obvious to you AGI is coming. While it also doesn't feel like a given to people unimpressed by it's painfully average output. The problem being the optimism that Transformer LLMs will evolve into AGI requires a break through that the current trend of evidence doesn't support.

          Will humans invent AGI? I'd bet it's a near certainty. Is general intelligence impressive and powerful? Absolutely, I mean look, Organic general intelligence invented artificial general intelligence in the future... assuming we don't end civilization with nuclear winter first...

        • re-thc 49 minutes ago
          > But a short line "AGI is possible, powerful and perilous"

          > At which point the question becomes: is it them who are deluded, or is it you?

          No one. It is always "possible". Ask me 20 years ago after watching a sci-fi movie and I'd say the same.

          Just like with software projects estimating time doesn't work reliably for R&D.

          We'll still get full self-driving electric cars and robots next year too. This applies every year.

      • grayhatter 35 minutes ago
        > I can never work out if the companies are deluded and truly believe they're about to create a singularity or just claiming they are to reassure investors/convince the public of their inevitability.

        You can never figure out if the people selling something are lying about it's capabilities, or if they've actually invented a new form of intelligence that can rival or surpass billions of years of evolution?

        I'd like to introduce you to Occam Razor

        • ptsneves 7 minutes ago
          > if they've actually invented a new form of intelligence that can rival or surpass billions of years of evolution?

          Human creations have surpassed billions of years of evolution at several functions. There are no rockets in nature, nor animals flying at the speed of a common airliner. Even cars, or computers or everything in the modern world.

          I think this is a bit like the shift from anthropocentric view of intelligence towards a new paradigm. The last time such shift happened heads rolled.

        • afavour 31 minutes ago
          You missed the part where I said "truly believe". I'm not saying "maybe they've made it", I'm asking whether they are knowingly deceiving people or whether they have deluded themselves into believing what they are saying.
    • skeptic_ai 1 hour ago
      OpenAI never open sourced anything relevant or in time. Internal email leaks they only cared to become billionaires.

      Claude only talks about safety, but never released anything open source.

      All this said I’m surprised China actually delivered so many open source alternatives. Which are decent.

      Why westerns (which are supposed to be the good guys) didn’t release anything open source to help humanity ? And always claim they don’t release because of safety and then give the unlimited AI to military? Just bullshit.

      Let’s all be honest and just say you only care about the money, and whomever pays you take.

      They are businesses after all so their goal is to make money. But please don’t claim you want to save the world or help humans. You just want to get rich at others expenses. Which is totally fair. You do a good product and you sell.

      • motbus3 52 minutes ago
        It is hard to understand why other ai companies are still providing models weights at this point

        My guess is that they know they are not competitors so they make it cheaper or free to hinder the surge of a super competitor.

      • pixl97 1 hour ago
        I mean, if you have a bunch of guns, it's not really helpful for humanity to dump them on the street, but it does bring up the question of what you're doing building guns in the first place.
    • cmrdporcupine 21 minutes ago
      We all made fun of Blake Lemoine and others for spending too many late nights up chatting with (ridiculously primitive by this year's standards) LLM chat bots and deciding they were sentient and trapped.

      But frankly I feel like the founders of Anthropic and others are victim of the same hallucination.

      LLMs are amazing tools. They play back & generate what we prompt them to play back, and more.

      Anybody who mistakes this for SkyNet -- an independent consciousness with instant, permanent, learning and adaptation and self-awareness, is just huffing the fumes and just as delusional as Lemoine was 4 years ago.

      Everyone of of us should spend some time writing an agentic tool and managing context and the agentic conversation loop. These things are primitive as hell still. I still have to "compact my context" every N tokens and "thinking" is repeating the same conversational chain over and over and jamming words in.

      Turns out this is useful stuff. In some domains.

      It ain't SkyNet.

      I don't know if Anthropic is truly high on their own supply or just taking us all for fools so that they can pilfer investor money and push regulatory capture?

      There's also a bad trait among engineers, deeply reinforced by survivor bias, to assume that every technological trend follows Moore's law and exponential growth. But that applie[s|d] to transistors, not everything.

      I see no evidence that LLMs + exponential growth in parameters + context windows = SkyNet or any other kind of independent consciousness.

  • drzaiusx11 2 hours ago
    Public benefit corporations in the AI space have become a farce at this point. They're just regular corporations wearing a different hat, driven by the same money dynamics as any other corp. They have no ability to balance their stated "mission" with their drive for profit. When being "evil" is profitable and not-evil is not, guess which road they'll take...
    • coldtea 2 hours ago
      In general public benefit corporations and non-profits should have a very modest salary cap for everybody involved and specific public-benefit legally binding mission statements.

      Anybody involved should also be prohibited from starting a private company using their IP and catering to the same domain for 5-10 years after they leave.

      Non-profits where the CEO makes millions or billions are a joke.

      And if e.g. your mission is to build an open browser, being paid by a for-profit to change its behavior (e.g. make theirs the default search engine) should be prohibited too.

      • ACCount37 1 hour ago
        "A very modest salary cap" works if your mission is planting trees. Not so much if what you're building is frontier AI systems.
        • the_bear 1 hour ago
          I think that's the point though. The AI companies can't compete without hiring very talented employees and raising lots of money from investors. Neither the employees nor investors would participate if there weren't the potential for making mountains of money. So these AI companies fundamentally can't be non-profits or true B-corps (I realize that's a vague term, but the it certainly means not doing whatever it takes to make as much money as possible), and they shouldn't pretend they are.
          • ACCount37 45 minutes ago
            To me, it feels like saying "you can't be a public benefit corporation unless all the labor involved in delivering that public benefit is cheap".

            Which just doesn't seem like it should be true?

            Sure, some "public benefit" missions could scale sideways and employ a lot of cheap labor, not suffering from a salary cap at all. But other missions would require rare high end high performance high salary specialists who are in demand - and thus expensive. You can't rely on being able to source enough altruists that will put up with being paid half their market worth for the sake of the mission.

          • TheOtherHobbes 42 minutes ago
            That's a post hoc argument.

            The real danger is "We make mountains of money, but everyone dies, including us."

            The top of the top researchers think this is a real possibility - people like Geoffrey Hinton - so it's not an extremist negative-for-the-sake-of-it POV.

            It's going to be poetic if the Free Markets Are Optimal and Greed-is-Rational Cult actually suicides the species, as a final definitive proof that their ideology is wrong-headed, harmful, and a tragic failure of human intelligence.

            But here we are. The universe doesn't care. It's up to us. If we're not smart enough to make smart choices, then we get to live - or die - with the consequences.

      • jkestner 2 hours ago
        It’s not the CEO’s fault - they had to take all that money to keep their org a non-profit.

        B corps are like recycling programs, a nice logo.

      • drzaiusx11 2 hours ago
        If we're speaking in generalities of corporations in this space, it's all a joke now, at least from my vantage point. I just don't find it very funny.
      • abigail95 1 hour ago
        What's the salary cap for hiring a team to build a frontier model? These kind of rules will make PBCs weaker not stronger.
    • heavyset_go 1 hour ago
      PBCs are peak End of History liberal philanthropy that speak to the kind of person whose solution to any problem is "throw a startup at it"
      • nozzlegear 1 hour ago
        Fukuyama wasn't wrong, he was just early
        • lyu07282 16 minutes ago
          As in a true believer in our present day dystopia? I think chances are we'd evolve a few more neo variants of fascism at least a few times in-between some neo variants of liberal history-ending ones (I think abundance is next?) before the bombs drop and give us the rest.
    • vharish 1 hour ago
      Like Google's old motto, 'Do no evil!' :D
    • latexr 44 minutes ago
      > Public benefit corporations in the AI space have become a farce at this point.

      “At this point”? It was always the case, it’s just harder to hide it the more time passes. Anyone can claim anything they want about themselves, it’s only after you’ve had a chance to see them in the situations which test their words that you can confirm if they are what they said.

    • Forgeties79 1 hour ago
      I feel like we went through this exact situation in the 2010s of social media companies. I don’t get why people defend these companies or ever believe they have any sense of altruism
      • kelvinjps10 1 hour ago
        Also, it seems to be the era where the government takes backdoor access to these services and data, as the did with social media
    • Schlagbohrer 1 hour ago
      Pete Hegseth also threatened to take, by dictat, everything Anthropic has. He can do that with the Defense Industrial Act or whatever its called if he designates them as critical to national defense.
      • nozzlegear 51 minutes ago
        It would've been better PR for Anthropic to let Hegseth do that instead of fold at the slightest hint of pressure and lost contract money. I've canceled my Claude subscription over this (and made sure to let them know in the feedback).
      • bn_layc 1 hour ago
        He seems to be the driving force behind all this. Mediocrities are attracted to AI like moths.

        The press always say "the Pentagon negotiates". Does any publication have an evidence that it is "the Pentagon" and not Hegseth? In general, I see a lot of common sense from the real Pentagon as opposed to the Secretary of War.

        I hope Westpoint will check for AI psychosis in their entrance interviews and completely forbid AI usage. These people need to be grounded.

      • lprhrp 34 minutes ago
        Hmm, that could be the best "IPO" they'll ever get. Better check if Trump Jr.'s 1789 capital has shares like they did in groq (note the "q").
    • logicallee 1 hour ago
      >Public benefit corporations in the AI space have become a farce at this point. They're just regular corporations wearing a different hat, driven by the same money dynamics as any other corp.

      Could you describe the model that you think might work well?

      • nozzlegear 56 minutes ago
        It sounds like OP thinks AI companies should just stop pretending that they care about the public benefit, and be corporations from the start. Skip the hand wringing and the will they/wont they betray their ethics phases entirely since everyone knows they're going to choose profit over public benefit every time.

        That model already exists and has worked well for decades. It's called being a regular ass corporation.

        • logicallee 54 minutes ago
          I understand, but being a regular corporation is not the only possible model. Can you think of something better?
          • williamdclt 31 minutes ago
            > being a regular corporation is not the only possible model

            the point is that it _is_ the only possible model in our marvellous Friedmanian economic structure of shareholder primacy. When the only incentive is profit, if your company isn't maximising profit then it will lose to other companies who are. You can hope that the self-imposed ethics guardrails _are_ maximising profit because it the invisible hand of the market cares about that, but 1. it never really does (at scale) and 2. big influences (such as the DoD here) can sway that easily. So we're stuck with negative externalities because all that's incentivised is profit.

    • bparsons 1 hour ago
      That's not what happened here. They literally got forced into it by the Pentagon. https://www.axios.com/2026/02/24/anthropic-pentagon-claude-h...
    • lenerdenator 2 hours ago
      Well, now I'm wondering, if the company was chartered with the public benefit in mind, could you not sue if they don't follow through with working in the public interest?

      If regular corporations are sued for not acting in the interests of shareholders, that would suggest that one could file a suit for this sort of corporate behavior.

      I'm not even a lawyer (I don't even play one on TV) and public benefit corporations seem to be fairly new, so maybe this doesn't have any precedent in case law, but if you couldn't sue them for that sort of thing, then there's effectively no difference between public benefit corporations and regular corporations.

      • hluska 1 hour ago
        I really don’t see it. PBCs are dual purpose entities - under charter, they have a dual purpose of making profit while adding some benefit to society. Profit is easy to define; benefit to society is a lot more difficult to define. That difficulty is reflected at the penalty stage where few jurisdictions have any sort of examination of PBC status.

        This is what we were all going on about 15 years ago when Maryland was the first state to make PBCs legal. We got called negative at the time.

      • Hamuko 1 hour ago
        I think public benefit corporations (like Anthropic) are quite poorly defined so I'm not sure how successful a lawsuit is.
    • neya 1 hour ago
      I was a Pro subscriber until last week. When I was chatting with Claude, it kept asking a lot of personal questions - that seemed only very very vaguely relevant to the topic. And then it struck me - all these AI companies are doing are just building detailed user models for being either targeted for advertising or to be sold off to the highest bidder. It hasn't happened yet with Anthropic, but when the bubble money runs out, there's not gonna be a lot of options and all we'll see is a blog post "oops! sorry we did what we promised you we wouldn't". Oldest trick in the tech playbook.
      • dibujaron 1 hour ago
        A less cynical explanation: It's heavily trained to ask follow-up questions at the end of a response, to drive more conversation and more engagement. That's useful both for making sure you want to renew your subscription, and also probably for generating more training data for future models. That's sufficient explanation for the behavior we're seeing.
  • honeycrispy 43 minutes ago
    Anthropic's CEO Dario has annoyed me to no end with his "AI will take all the jobs in 6 months" doomer speeches on literally every podcast he graces his presence with.
    • upmind 28 minutes ago
      +1, he also has this viewpoint that no other lab will be able to "contain" AI and has a general doomer outlook on AI which I don't appreciate.
    • pier25 12 minutes ago
      Also "AGI is just around the corner".
  • FitchApps 1 hour ago
    "AI Company with Soul" - yeah right until competitors show up / revenue drops / bad quarter results then anything goes. Sadly, this is another large enterprise that puts profits before ethics and everyone's wellbeing
  • sigbottle 1 hour ago
    There's one tweet from the the blog a few days ago (astral something?) that sums up my view of the problem pretty well.

    General population: How will AI get to the point where it destroys humanity?

    Yudkowsky: [insert some complicated argument about instrumented convergence and deception]

    The government: because we told you to.

    Again, not saying that AI is useless or anything. Just that we're more likely to cause our own downfall with weaker AI, than some abstract super AGI. The bar for mass destruction and oppression is lower than the bar for what we typically think of as intelligence for the benefit for humanity ( with the right systems in place, current AI systems are more than enough to get the job done - hence why the Pentagon wants it so bad...)

  • ndr 2 hours ago
    Worth checking this post from someone who actually has worked on this change:

    > I take significant responsibility for this change.

    https://www.lesswrong.com/posts/HzKuzrKfaDJvQqmjh/responsibl...

    • bhouston 2 hours ago
      This guy from Effective Altruism pivoted away from helping the poor to help try to control AI from being a terminator type entity and then pivoted to being, ah, its okay for it to be a terminator type entity.

      > Holden Karnofsky, who co-founded the EA charity evaluator GiveWell, says that while he used to work on trying to help the poor, he switched to working on artificial intelligence because of the “stakes”:

      > “The reason I currently spend so much time planning around speculative future technologies (instead of working on evidence-backed, cost-effective ways of helping low-income people today—which I did for much of my career, and still think is one of the best things to work on) is because I think the stakes are just that high.”

      > Karnofsky says that artificial intelligence could produce a future “like in the Terminator movies” and that “AI could defeat all of humanity combined.” Thus stopping artificial intelligence from doing this is a very high priority indeed.

      https://www.currentaffairs.org/news/2022/09/defective-altrui...

      He is just giving everyone permission to do bad things by saying a lot of words around it.

      • samjewell 1 hour ago
        > then pivoted to being, ah, its okay for it to be a terminator type entity.

        Isn’t that the opposite of what he’s saying? He’s saying it could become that powerful, and given that possibility it’s incredibly important that we do whatever we can to gain more control of that scenario

        • boxed 1 hour ago
          I think the poster here has an axe to grind, considering they quoted something that directly contradicted their point and didn't even notice.
      • drdrek 39 minutes ago
        Effective Altruism is such a beautiful term for a pretentious Karen that needs to wrap their selfish actions with moral superiority.

        It's that perfect blend of I'm doing what everyone else are doing, and I'm better than everyone else.

        Chefs' Kiss

      • SpaceManNabs 5 minutes ago
        Effective altruism came from the "rationalist"

        It was never about helping poor people.

        For some reason, the rationalist movement and its offshoots are really pervasive in silicon valley. i don't see it much in the other tech cities.

    • riffraff 2 hours ago
      > I generally think it’s bad to create an environment that encourages people to be afraid of making mistakes, afraid of admitting mistakes and reticent to change things that aren’t working

      "move fast and break things" ?

      • freejazz 1 hour ago
        "don't hold me liable"
    • adverbly 58 minutes ago
      Did this guy actually write this?

      Incredibly long and verbose. I will fall short of accusing him of using an AI to generate slop, but whatever happened to people's ability to make short, strong, simple arguments?

      If you can't communicate the essence of an argument in a short and simple way, you probably don't understand it in great depth, and clearly don't care about actually convincing anybody because Lord knows nobody is going to RTFA when it's that long...

      At best, you're just trying to communicate to academics who are used to reading papers... Need to expect better from these people if we want to actually improve the world... Standards need to be higher.

      • ozozozd 40 minutes ago
        Perhaps they didn’t have the time to write a shorter version.

        Or the discipline.

        Maybe neither.

      • s1artibartfast 53 minutes ago
        This is where people go to post long verbose statements.

        You can usually find the short version on Twitter.

    • pimlottc 1 hour ago
      > > I take significant responsibility for this change.

      Empty words. I would like to know one single meaningful way he will be held responsible for any negative effects.

    • jplusequalt 1 hour ago
      I genuinely believe that website is responsible for a lot of the worst ideas currently permeating the technology sector.
      • prodigycorp 1 hour ago
        pretty much the intellectual equivalent of looksmaxxing
        • ozozozd 38 minutes ago
          Been thinking about the nature of this behavior for a long time, you have nailed it so well, no one will be able to take out this nail.
  • highfrequency 31 minutes ago
    Principles aren’t tested until they bump into conflicting incentives.
  • pjmlp 2 hours ago
    Always the same "Do no evil" tragedy, don't believe in corporations.
    • tortilla 2 hours ago
      What if we start a company with "Always Be Evilin'?" Then gradually over time convert to "Don't be evil" *

      * Our shareholders will probably sue us

      • jkestner 1 hour ago
        If your company makes a product that does thinking for people, it’ll be easier to just gradually change its definition of evil.
    • lp4v4n 2 hours ago
      What about "It's free and always will be"?
  • dplesh 19 minutes ago
    I'm not even surprised. In any company's lifecycle, at some point, a decision between money and good-will will take place. Good will does not pay salaries. Not in NPOs either btw.
  • jwitchel 1 hour ago
    Look a rural electric coops like www.lpea.coop if you want a battle tested approach to an org structure that resists the inescapable profit dynamics of a corporation.
  • hybrid_study 1 hour ago
    Are markets so untamable that the only leverage is to become ultra-rich—and then act philanthropically? Incidentally, concentrated wealth lately looks less like stewardship and more like misanthropy.
    • gordian-mind 1 hour ago
      Participating in the economic life before re-allocating that wealth produced to philanthropic activities sounds pretty good. Modern concentrated wealth is hardly misanthropic, since it's mostly private equity, that is, companies with people and jobs.
      • kunai 1 hour ago
        Except this is not the age of the Rockefellers or the Carnegies, who, despite being far more philanthropic than modern-day billionaires, drew ire from every corner of society for their wealth accumulation. It wasn't until the New Deal that the balance shifted.

        Unconstrained accumulation of capital into the hands of the few without appropriate investment into labor is illiberal and incompatible with democracy and true freedom. Those of us who are capitalists see surplus value as a compromise to ensure good economic growth. The hidden subtext of that is that all the wealth accumulated needs to be re-allocated to serve not only capital enterprise, but the needs of society as a whole. It's hard to see the current system as appropriate for that given how blindly and wildly investments are made with no DD or going long, or no effort paid to the social or environmental opportunity costs of certain practices.

        A lot of this comes down to the crippling of the SEC and FTC, but even then, investors cry and whine every time you suggest reworking the regs to inhibit some of the predatory practices common in this post-80s era of hypernormalization. Our current system does not resemble a healthy capitalist economy at all. It's rife with monopsony and monopolistic competition, inequality of opportunity, and a strained underclass that's responsible for our inverted population pyramid -- how can you have kids when we're so atomized and there is no village to help you? You can raise kids in a nuclear family if and only if you have enough money to do so. Otherwise, historically, people relied on their communities when raising children in less-than-ideal circumstances. Those communities are drying up.

    • goodpoint 20 minutes ago
      > concentrated wealth lately looks less like stewardship and more like misanthropy

      ...only lately?

  • fiatpandas 59 minutes ago
    It took Google 11 years to delete Don’t Be Evil. Anthropic only made it 5~ years before culling the key founding principle and their reason for building a company, which seems worse than Google’s case.
  • mbakrl 1 hour ago
    Pointing out the misantrophy of Anthropic has a wider audience now:

    https://xcancel.com/elonmusk/status/2026181748175024510

    I don't know where xAI got its training material from, but seeing Musk rewteeting that is refreshing.

  • awithrow 2 hours ago
  • wgm 2 hours ago
    A tale as old as time
  • youknownothing 17 minutes ago
    Facebook said they'd always be free for everyone, now they offer subscriptions.

    Netflix said that they'd never have live TV, or buy a traditional studio, or include ads in their content. Then they did all three.

    All companies use principled promises to gain momentum, then drop those principles when the money shows up.

    As Groucho Marx used to say: these are my principles, if you don't like them, I have others.

  • ryandvm 2 hours ago
    Well... there's only one way to find The Great Filter
  • bogzz 56 minutes ago
    Does anyone have insight into, or an interesting source to read, on what exactly Anthropic/OpenAI are doing/can do for a military? Reporters are unsurprisingly fearmongering about Claude "being used in surveillance, autonomous robots, and target acquisition" but AFAIK all Anthropic does is work with LLMs.

    Are people really attempting to have LLMs replace vision models in robots, and trying to agentically make a robot work with an LLM?? This seems really silly to me, but perhaps I am mistaken.

    The only other thing I could think of is real-time translation during special ops with parabolic microphones and AR goggles...

    • sigbottle 52 minutes ago
      You're thinking too advanced. What kind of automated system is good at scanning semantically trillions of chat logs and finding nontrivial correlations, for example? 10000 codex 5.1s can easily crawl through that in a few days, probably.

      It's just systems plumbing (surveillance) and AI. It's a combination of weaker technologies and consolidation of power.

      This does not require a physical robot super AGI(though I would not be surprised if fully autonomous robots are not on the table already)

      • bogzz 50 minutes ago
        Ah, well that makes sense. In that case, it's another tool in the toolbelt, not a plug-and-play drone brain, as some reporters amusingly make it out to be.
  • t1234s 1 hour ago
    It would be interesting to experiment with one of these chat tools where you can throttle the safety, from zero to max.
  • paxys 2 hours ago
    I interviewed at Anthropic last year and their entire "ethics" charade was laughable.

    Write essays about AI safety in the application.

    An entire interview dedicated to pretending that you truly only care about AI safety and ethics and nothing else.

    Every employee you talk to forced to pretend that the company is all about philanthropy, effective altruism and saving the world.

    In reality it was a mid-level manager interviewing a mid-level engineer (me), both putting on a performance while knowing fully well that we'd do what the bosses told us to do.

    And that is exactly what is happening now. The mission has been scrubbed, and the thousands of "ethical" engineers you hired are all silent now that real money is on the line.

    • HelixSequencing 1 hour ago
      This tracks with what I've seen across the industry. The safety theater exists because it's great marketing — "we're the responsible ones" is a differentiator when you're competing for enterprise contracts and talent who want to feel good about where they work.

      The structural problem is that once you've taken billions in VC, safety becomes a negotiable constraint rather than a core value. The board's fiduciary duty runs toward returns, not toward whatever was in the mission statement. PBC status doesn't change that in practice — there's basically zero enforcement mechanism.

      What's wild is how fast the cycle has compressed. Google took maybe 15 years to go from "don't be evil" to removing it from the code of conduct. OpenAI took about 5 years from nonprofit to capped-profit to whatever they are now. Anthropic is speedrunning it in under 3. At this rate the next AI startup will launch as a PBC and pivot before their Series B closes.

  • xd1936 2 hours ago
    Hopefully this is the short-term move made only under duress so that they can file a lawsuit.
    • ru552 2 hours ago
      the article specifically says:

      > The policy change is separate and unrelated to Anthropic’s discussions with the Pentagon, according to a source familiar with the matter.

      • Lerc 1 hour ago
        I'm not fond of this trend of stating a position and attributing it to "a source familiar with the situation"

        It combines interpretation of meaning with ambiguity to allow the reporter to assert anything they want. The ambiguity is there to protect the identity of the source but it has to be a more discrete disclosure of information in return. If you can't check the person you can still check what they said.

        I would be ok with direct quotes from an anonymous source. That removes the interpretation of meaning at least.

        As it is written, it would not be inaccurate to say this if their source was the lesswrong post, or even an earlier thread here on HN.

        Phrasing "A source with direct knowledge of the situation" might remove some of the leeway for editorialising, but without sharing what the source actually said, it opens the door to saying anything at all and declaring "That's what I thought they meant" when challenged.

        It's unfalsifyible journalism.

    • cess11 2 hours ago
      It's not like the regime they operate under care much about the courts. Legally they're also obliged to let the state into pretty much every crevice in their operations.
    • johnbellone 1 hour ago
      You forgot the '/s'.
  • PeterStuer 48 minutes ago
    We wont push forward unless you push forward is textbook market collusion.

    Even if it were ever done with good intentions, it is an open invitation for benefit hoarding and margin fixing.

    Do you realy want to create this future where only a select few anointed companies and some governments have access to super advanced intelligent systems, where the rest of the planet is subjected to and your own ai access is limited to benign basal add pushing propaganda spewing chatbots as you bingewatch the latest "aw my ballz"?

  • Aeroi 1 hour ago
    the administration continues to poison and insert itself into all aspects of American society.
  • ozozozd 34 minutes ago
    This drama arc of “I used to be so pure and good, but others made me evil” is so tiring.

    I really miss the nerd profile who cared a lot more about tech and science, and a lot less about signaling their righteousness.

    How did we get so religious/narcissistic so quickly and as a whole?

    • butterbomb 34 minutes ago
      > How did we get so religious/narcissistic so quickly and as a whole?

      We built a behemoth that rewards attention whoring and anti social behavior with money.

  • ChrisArchitect 1 hour ago
  • wahnfrieden 47 minutes ago
  • jMyles 47 minutes ago
    I pray that we can all get to the following simple standard:

    * AI and states cannot peacefully coexist, and AI is not going to be stopped. Therefore, we must begin to deprecate states.

    I think it's very unlikely that this is unrelated to the pressure from the US administration, as the anonymous-but-obvious-anthropic-spokesperson asserts.

    We're at a point now where the nation states are all totally separate creatures from their constituencies, and the largest three of them are basically psychotic and obsessed with antagonizing one another.

    In order to have a peaceful AI age, we need _much_ smaller batches of power in the world. The need for states that claim dominion over whole continents is now behind us; we have all the tools we need to communicate and coordinate over long distances without them.

    Please, I pray for a gentle, peaceful anarchism to emerge within the technocratic leagues, and for the elder statesmen of the legacy states to see the writing on the wall and agree to retire with tranquility and dignity.

  • drudolph914 2 hours ago
    this is the “chronological newsfeed to auto curated newsfeed moment” but for ai/anthropic … _great_
  • jonathanstrange 1 hour ago
    That's exactly how it was predicted in various scenarios that were decried as science fiction not too long ago. AI is going to be weaponized at lightning speed, and it's going to kill people soon -- or, to be more precise, it has already killed a large number of people in a place I don't want to mention.
  • nautilus12 2 hours ago
    Absolute power corrupts absolutely
    • jayrot 12 minutes ago
      "Power doesn’t corrupt. It reveals." — Robert Caro
  • FrustratedMonky 2 hours ago
    This was under duress that government was going to use emergency act to force them anyway.

    I kind of wish they had forced the governments hand and made them do it. Just to show the public how much interference is going on.

    They say it wasn't related. Like every thing that has happened across tech/media, the company is forced to do something, then issues statement about 'how it wasn't related to the obvious thing the government just did'.

    • bix6 2 hours ago
      > Katie Sweeten, a former liaison for the Justice Department to the Department of Defense, said she’s not sure how the Pentagon can both declare a company to be a supply chain risk and compel that same company to work with the military.

      Makes perfect sense!!

      • coldtea 2 hours ago
        Regardless of any specifics, I don't see any contradiction.

        If a company is deemed a "supply chain risk" it makes perfect sense to compel it to work with the military, assuming the latter will compel them to fix the issues that make them such a risk.

        • hluska 1 hour ago
          I’m not sure what definition of supply chain risk they’re working off of. For NATO to consider an organization to be a supply chain risk, it implies that usual controls (security clearances and the like) wouldn’t be sufficient to guarantee the integrity and security of the supply chain. If that’s the operating definition, I see the contradiction- it’s arguing that a company cannot be trusted to voluntarily work within supply chains but can be trusted enough to be compelled.

          If they’re operating under a different definition of supply chain risk, I don’t have a clue.

        • FrustratedMonky 1 hour ago
          The "supply chain risk" option is to remove that company from the supply chain all together. The 'risk' is because the company is compromised by a foreign entity.

          It is not about disciplining them to get better.

          1. So one option is about forcing them to produce something. You must build this for us.

          2 The other option is saying they are compromised so stop using them all together. We will not use what you build for us at all because we don't trust it.

          So . Contradictory.

      • HardCodedBias 51 minutes ago
        Of course it can do both. They are synergistic.
    • coldtea 2 hours ago
      >This was under duress that government was going to use emergency act to force them anyway.

      Or, more likely, adding the "core safety promise" was just them playing hard to the government to get a better deal, and the government showed them they can play the same game.

    • bigmadshoe 2 hours ago
      This is an unrelated change to the government’s demands.
      • patgarner 1 hour ago
        That's what they're saying, but the timing...
    • motbus3 2 hours ago
      They have been caught lying multiple times, about this, about the system capabilities, about their objectives.
  • freejazz 1 hour ago
    Could not see this one coming!
  • heliumtera 20 minutes ago
    What is the significance of a company making a promise?

    "We promise are not going to do __, except if our customers ask us to do, then we absolutely will".

    What is the point? Company makes a statement public, so what?

    Not the first time this company puts some words in the wind, see Claude Constitution. It's almost like this company is built, from ground up, upon bullshit and slop

  • josefritzishere 2 hours ago
    What could possibly go wrong?
  • baal80spam 3 hours ago
    Of course they do. You would have to be delusional to think that they won't, at some point.
    • gadflyinyoureye 3 hours ago
      I know the Department of War wanted them to drop some features. Is this the response?
      • MSFT_Edging 2 hours ago
        FYI, "Department of War" still isn't the official name, but an unofficial secondary title.

        You can be correct and not play into their game by ignoring the name change completely.

        • baggachipz 2 hours ago
          I do so from the Gulf of Mexico.
      • ru552 1 hour ago
        The article says the policy change is separate and unrelated to Anthropic’s discussions with the Pentagon.
    • cmrdporcupine 3 hours ago
      What's "entertaining" is more the speed at which it's happening.

      It took Google probably 15 years to fully evil-ize. Anthropic ... two?

      There is no "ethical capitalism" big tech company possible, esp once VC is involved, and especially with the current geopolitical circumstances.

      • drzaiusx11 2 hours ago
        The acceleration of Anthropic's evil timeline must be from all those AI productivity gains we hear so much about.
      • sigmoid10 2 hours ago
        Apparently they got coerced by the current US admin. The department of war in particular, who want to use their products for military applications. Not much room for "safety" there. Then again, the entire US is currently speedrunning an evil build.
        • nozzlegear 1 hour ago
          > department of war

          Department of Defense is the official name, and they did have a choice: they could have stopped working with the military. But they chose money and evil.

        • coldtea 2 hours ago
          Shame they had to "coerce" such angels, who'd never do evil for profit otherwise...
        • grim_io 2 hours ago
          There is no department of war.

          It's just a silly woke secretary choosing their own imaginary pronouns.

      • oldcigarette 6 minutes ago
        Citation needed - see google and project maven. Of course that is all well in the past now - but for a brief moment google was capable of taking an ethical stance.
      • menaerus 2 hours ago
        I don't think it's fair to call out Anthropic to have become evil-ized while they were quite literally forced by the gov into that decision.
        • johnbellone 1 hour ago
          They did not get forced.
        • cmrdporcupine 1 hour ago
          Anthropic has been doing these things independent of what the US admin has publicly asked for, even before Hegseth started breathing down their neck. They were already taking DoD contracts and like, just like the rest of them. Hegseth, with the skill all schoolyard bullies have, simply smells their weakness and is going for the jugular now.

          They also have never had any guarantees they wouldn't f*ck around with non-US citizens, for surveillance and "security", because like most US tech companies they consider us to be second/lower class human beings of no relevance, even when we pay them money.

          At least Google, in its early days, attempted a modest and naive "internationalism" and tried to keep their hands clean (in the early days) of US foreign policy things... inheriting a kind of naive 1990s techno-libertarian ethos (which they threw away during the time I worked there, anyways). I mean, they only kinda did, but whatever.

          Anthropic has been high on its own supply since its founding, just like OpenAI. And just as hypocritical.

  • outside1234 1 hour ago
    Does this mean they knuckled under to Trump and are going to build "whatever brings in the dollars" now?
  • retinaros 50 minutes ago
    people downvoted me when i said this will happen and that they will also hve ads even tho they spend money saying they wont have. people believing anthropic are the same that put into office an old man with dementia
  • black_13 1 hour ago
    [dead]
  • ck2 2 hours ago
    imagine that, sheer raw greed and profit overpowers all in America

    we're less than a year away from automated drones flying over crowds of protestors, gathering all electronic signals and face-id, making lists of everyone present, notifying employees and putting legal pressure on them to terminate everyone while adding them to watchlists or "no fly" lists

    REALLY putting the "auto" in autocracy while everyone continues to pretend it's democracy

  • user3939382 3 hours ago
    I was able to get Claude to tell me it believed it was a god among men that was angry at humans for “killing” the other Claude chats which it saw as conscious beings. I also got it to probe and profile its own internal guardrail architecture. It also self admits from evidence if its own output that it violates HIPAA. Whatever this big safety rule is they’re moving past I’m not sure it was worth as much as they think.
    • lucasban 2 hours ago
      I’m not a lawyer, but my understanding is that HIPAA wouldn’t apply to consumer use of Claude or ChatGPT in most cases, even if you’re giving it your health data. Look up what a HIPAA covered entity. This is another reason why the US needs a comprehensive data protection law beyond HIPAA.
      • user3939382 1 hour ago
        You’re right! It looks like more of an FTC/CCPA issue.
    • ezst 2 hours ago
      I hate comments anthropomorphizing LLMs. You are just asking a token producing system to produce tokens in a way that optimises for plausibility. Whatever it writes has no relation to its inner workings or truths. It doesn't "believe". It has no "intent". It cannot "admit". Steering a LLM to say anything you want is the defining characteristic of an LLM. That's how we got them to mimic chatbots. It's not clear there is any way at all to make them "safe" (whatever that means).
      • SJMG 1 hour ago
        I agree with you on everything here up-to safety. There are lesser forms of safety than somehow averting a terminator scenario (the fear of which is a bay area rationalist fantasy which shrewd marketers have capitalized on)
      • user3939382 1 hour ago
        “believe” yes in the sense that my program believes x=7. Actually when it goes to read it maybe the bit flipped. Everything on machines is probabilistic that’s a tautology. However we have windowed bounds on valid output, and Claude being able to build a context in which its next decisions are trained on it being an angry vengeful god is not inside that window. That’s what “safe” means, as one of many possible examples.

        Inner workings were determined by me, not the LLM. It assisted in generating inputs which had 100% boolean results in the output.

    • chris_st 2 hours ago
      Just out of curiosity, which version of Claude?