I've been using ChatGPT fairly regularly for about a year. Mostly as an editor/brainstorming-partner/copy-reviewer.
Lots of things have changed in that year, but the things that haven't are:
* So, so many em-dashes. All over the place. (I've tried various ways to get it to stop. None of them have worked long term).
* Random emojis.
* Affirmations at the start of messages. ("That's a great idea!") With a brief pause when 5 launched. But it's back and worse than ever now.
* Weird adjectives it gets stuck on like "deep experience".
* Randomly bolded words.
Honestly, it's kind of helpful because it makes it really easy to recognize content that people have copied and pasted out of ChatGPT. But apart from that, it's wild to me that a $500bn company hasn't managed to fix those persistent challenges over the course of a year.
You can customize it to get rid of all that. I set it to the "Robot" personality and a custom instruction to "No fluff and politeness. Be short and get straight to the point. Don't overuse bold font for emphasis."
Obviously nothing solid to back this up, but I kind of feel like I was seeing emojis all over github READMEs on JS projects for quite a while before AI picked it up. I feel like it may have been something that bled over from Twitch streaming communities.
Agree, this stuff was trending up very fast before AI.
Could be my own changing perspective, but what I think is interesting is how the signal it sends keeps changing. At first, emoji-heavy was actually kind of positive: maybe the project doesn't need a webpage, but you took some time and interest in your README.md. Then it was negative: having emoji's became a strong indicator that the whole README was going to be very low information density, more emotive than referential[1] (which is fine for bloggery but not for technical writing).
Now there's no signal, but you also can't say it's exactly neutral. Emojis in docs will alienate some readers, maybe due to association with commercial stuff and marketing where it's pretty normalized. But skipping emojis alienates other readers, who might be smart and serious, but nevertheless are the type that would prefer WATCHME.youtube instead of README.md. There's probably something about all this that's related to "costly signaling"[2].
There’s a pattern to emoji use in docs, especially when combined with one or more other common LLM-generated documentation patterns, that makes it plainly obvious that you’re about to read slop.
Even when I create the first draft of a project’s README with an LLM, part of the final pass is removing those slop-associated patterns to clarify to the reader that they’re not reading unfiltered LLM output.
It drives me crazy. It happens with Claude models too. I even created an instruction to avoid them in a CLAUDE.md, and the miserable thing from time to time still does it.
Or... How can you detect the usage of Claude models in a writeup? Look for the word comprehensive, especially if it's used multiple times throughout the article.
I notice this less with GPT-5 and GPT-5-Codex but it has a new problem: it'll write a sentence that mostly makes sense but have one or two strange word choices that nobody would use in that situation. It tends to use a lot of very dense jargon that makes it hard to read, spitting out references to various algorithms and concepts in places that don't actually make sense for them to be. Also yesterday Codex refused a task from me because it would be too much work, which I thought was pretty ridiculous - it wasn't actually that much work, a couple hundred lines max.
> refused a task from me because it would be too much work
Was this after many iterations? Try letting it get some "sleep". Hear me out...
I haven't used Codex, so maybe not relevant, but with Claude I always notice a slow degradation in quality, refusals, and "<implementation here>" placeholders with iterations within the same context window. One time, after making a mistake, it apologized and said something like "that's what I get for writing code at 2am". Statistically, this makes sense: long conversations between developers would go into the night, and they get tired, their code gets sparser and crappier.
So, I told it "Ok, let's get some sleep and do this tomorrow.", then the very next message (since the LLM has no concept of time), "Good morning! Let's do this!" and bam, output a completely functional, giant, block of code.
I don't think this is true. The LLMs use this construction noticeably more frequently than normal people, and I too feel the annoyance when they do, but if you look around I think you'll find it's pretty common in many registers of human natural english.
Yes, this is absolutely part of it, and I think an underappreciated harm of LLMs is the homogeneity. Even to the extent that their writing style is adequate, it is homogeneous in a way that quickly becomes grating when you encounter LLM-generated text several times a day. That said, I think it's fair to judge LLM writing style not to be adequate for most purposes, partly because a decent human writer does a better job of consciously keeping their prose interesting by varying their wording and so forth.
Not sure what the downvotes are for -- it's trivial to find examples of this contruction from before 2023, or even decades ago. I'm not disagreeing that LLMs overuse this construction (tbh it was already something of a "writing smell" for me before LLMs started doing it, because it's often a sign of a weakly motivated argument).
Absolutely this. I feel like I'm having an immune response to my own language. These patterns irk me in a weird way. Lack of variance is jarring perhaps? Everyone sounding more robotic than usual? Mode-collapse of normal language.
> Affirmations at the start of messages. ("That's a great idea!") With a brief pause when 5 launched. But it's back and worse than ever now.
What a great point! I also can’t stand it. I get it’s basically a meme to point it out - even South Park has mocked it - but I just cannot stand it.
In all seriousness it’s so annoying. It is a tool, not my friend, and considering we are already coming from a place of skepticism with many of the responses, buttering me up does not do anything but make me even more skeptical and trust it less. I don’t want to be told how smart I am or how much a machine “empathizes” with my problem. I want it to give me a solution that I can easily verify, that’s it.
Stop wasting my tokens and time with fake friendship!
Drives me nuts too. All the stuff like "OK let me do..." Or "I agree ..." stop talking like a person.
I want the star trek experience. The computer just says "working" and then gives you the answer without any chit-chat. And it doesn't refer to itself as if it's a person.
What we have now is Hal 9000 before it went insane.
This comes across as an unnecessary oversimplification in service of handwaving away a valid concern about AI and its already-observed, expanding impact on our society. At the very least you should explain what you mean exactly.
Alcoholism can also be a symptom of a larger issue. Should we not at least discuss alcohol’s effects and what access looks like when deciding the solution?
> Stop wasting my tokens and time with fake friendship!
They could hide it so that it doesn't annoy you, but I think it's not a waste of tokens. It's there so the tokens that follow are more likely to align with what you asked for. It's harder for it to then say "This is a lot of work, we'll just do a placeholder for now" or give otherwise "lazy" responses, or to continue saying a wrong thing that you've corrected it about.
I bet it also probably makes it more likely to gaslight you when you're asking something it's just not capable of, though.
Also pretty sure it is a feature because the general population wants to have pleasant interactions with their ChatGPT and OpenAI's user feedback research will have told them this helps.
I know some non-developer type people which mostly talk to ChatGPT about stuff like
- how to cope with the sadness of losing their cat
- ranting about the annoying habits of their friends
- finding all the nice places to eat in a city
etc.
They do not want that "robot" personality and they are the majority.
I also recall reading a while back that it's also a dopamine trigger. If you make people feel better using your app, they keep coming back for another fix. At least until they realize the hollow nature of the affirmations and start getting negative feelings about it. Such a fine line.
There will be an intersection when the techniques and continued refinements in making tall tale signs of AI and new powerful model meets where it becomes very time consuming, expensive and difficult to tell between human generated and AI generated content.
We are already at a point where we can trick large number of the population, it can without a doubt close the gap even further where we question anything and everything.
Beyond forensics, which require large capital investment and operating costs, to be able to detect AI vs human content will be limited in terms of access. It will be so that its not that we can't detect AI content anymore its that most people cannot afford the service to detect it and thus they lose interest.
This has side effect of making live performances by humans scarce and in valuable.
I don't know if not getting the idea right, but I'm pretty sure people refer to AI outputs as "slop" not due to (only) repetitiveness. According to some sources:
[1] Wikipedia
> AI slop is digital content made with generative artificial intelligence, specifically when perceived to show a lack of effort, quality or deeper meaning, and an overwhelming volume of production.[1][4][5] Coined in the 2020s, the term has a pejorative connotation similar to spam.[4]
[2] Urban dictionary
> Low-quality randomly generated AI content (images, accounts, text, etc) that has been flooding social media sites among other pages.
Yes, I know those may not be the best primary sources, but I'd say the main shared meaning of the word is lack of quality and effort, not repetitiveness itself.
Yeah, what this actually achieves if anything is making it harder to quickly recognize slop for what it is, so readers are more likely to give it the benefit of the doubt and keep their eyeballs on it for longer. Which I suppose is desirable if you're in the slop-mongering business (e.g. doing SEO spam or other such methods of flooding the commons with sewage for the sake of profit).
Fits into a broad pattern of deceptive LLM terminology, for example "Deep Research": a humble and honest moniker would me "Reflection" or "Recursive self-prompting".
Yep, and their only reference to the word points to a survey that does not mention slop even once (A survey onllm-generated text detection: Necessity, methods, and future directions. Computational Linguistics, 51(1):275–338, 2025., https://arxiv.org/abs/2310.14724)
That's sloppy (hehe), if you are going to redefine a common word for the first time (i.e. references are not possible) at least do it explicitly.
> I don't know if not getting the idea right, but I'm pretty sure people refer to AI outputs as "slop" not due to (only) repetitiveness. According to some sources:
Yeah, slop is low effort use of AI output ("ChatGPT, write me a blog post about using AI in industry X. Copy. Paste. Publish."). If anything this is should be called Stealthslop, and when slop is harder to detect we'll all waste more time on it.
The LLM erotic roleplaying community's usage of "slop" aligns with the definition in this paper, so it's not without precedent. Several novel sampling methods have originated from that community trying to address this specific issue.
Yup. You see this with the very first projects to get a new sampler being oobabooga text gen webui, sillytavern circa early 2023 with min_p. Same with diffusion models. First projects to get new denoising algorithms are ComfyUI, Automatic1111, etc.
It isn’t AI centric, it’s derived from poor quality wet food. Often given to pigs or used to describe prison food. It’s the origin of the term ‘sloppy’.
Colloquially it means ‘poor quality’ and always has done. So buzzfeed is journalism slop, just like poor quality AI content is AI slop.
In practice, it's used for anything the speaker doesn't approve of, regardless of quality. When someone uses it, it basically tells me, I don't have anything critical to say, beyond I don't like a thing.
Interesting work but this strikes me as a somewhat quixotic fight against inevitable tendencies of statistical models. Reinforcement learning has a single goal, an agreeable mean. Reinforcement learning stops when the LLM produces agreeable responses more often than not, the only way you can achieve absolute certainty here is if you tune it for an infinite amount of time. I also don't see how this method couldn't be subsumed by a simpler method like dynamic temperature adjustment. Transformers are fully capable of generating unpredictable yet semantic text based on a single hyperparameter. Maybe it would make more sense to simply experiment with different temperature settings. Usually it's a fixed value.
Parsing a LLM as the measure of a series of qc metrics, which isolate for preference strings, whether lexical weights or parameters. Can this create the rules for formal understanding or correlating libraries of Babel?
Searle's paper calls these questions, script, or a story.
This is the epitome of patching symptoms rather than treating the disease. Even if you suppress the obvious syntactic slop like 'it's not X but Y', you have no reason to believe you've fixed mode-collapse on higher more important levels like semantics and creativity. (For example, Claude LLMs have always struck me as mode-collapsed on a semantic level: they don't have the blatant verbal tics of 4o but somehow they still 'go in circles'.) Which will potentially severely hinder the truly high-value applications of LLMs to creative applications like frontier research. To the extent that this succeeds in hiding the brain damage in contemporary LLMs, it arguably is a cure worse than the disease.
Those higher level kinds of mode collapse are hard to quantify in an automated way. To fix that, you would need interventions upstream, at pre & post training.
This approach is targeted to the kinds of mode collapse that we can meaningfully measure and fix after the fact, which is constrained to these verbal tics. Which doesn't fix higher level mode collapse on semantics & creativity that you're identifying -- but I think fixing the verbal tics is still important and useful.
> but I think fixing the verbal tics is still important and useful.
I don't. I think they're useful for flagging the existence of mode-collapse and also providing convenient tracers for AI-written prose. Erasing only the verbal tics with the equivalent of 's/ - /; /g' (look ma! no more 4o em dashes!) is about the worst solution you could come up with and if adopted would lead to a kind of global gaslighting. The equivalent of a vaccine for COVID which only suppresses coughing but doesn't change R, or fixing a compiler warning by disabling the check.
If you wanted to do useful research here, you'd be doing the opposite. You'd be figuring out how to make the verbal expressions even more sensitive to the underlying mode-collapse, to help research into fixing it and raising awareness. (This would be useful even on the released models, to more precisely quantify their overall mode-collapse, which is poorly captured by existing creative writing benchmarks, I think, and one reason I've had a hard time believing things like Eqbench rankings.)
The repetitive pattern detection approach described here is fascinating from an implementation perspective. We encountered similar challenges when building our interview feedback system - specifically around detecting and eliminating repetitive filler phrases that added no value ("um", "like", "you know").
What worked well for us was implementing a two-stage pipeline: first using a sliding window (n=3) to detect repeated n-grams, then applying cosine similarity with a threshold of 0.85 to catch semantic duplicates. This reduced redundant content by ~40% while preserving meaningful repetition (e.g. when candidates deliberately emphasize key points).
One challenge we haven't fully solved: distinguishing between harmful repetition and intentional rhetorical devices. Have others found effective heuristics for this? We're currently experimenting with attention patterns in the transformer layers to identify deliberate vs. unintentional repetition, but results are mixed.
This seems to be fundamentally based on n-grams and manually built regexes. "Slop", or more narrowly annoying -isms and model stereotypes, is not just repetitive n-gram sequences, mode collapse manifests itself semantically. Sometimes repetition/stereotyping is desirable (you need semantics to understand if it's the case), and sometimes undesirable repetition is undetectable by n-grams and regexes, especially in languages that rely on word formation. Fixing the mode collapse probably needs a sufficiently powerful reference model of semantic diversity, which doesn't currently exist.
That’s not what “slop” means. Slop is output produced by generative AI without regards to its quality, not the telltale tics that current models tend to exhibit.
It's a new term so the meaning hasn't had a chance to settle. It's generally considered to be a negative term, so there's motivation for people to expand the definition to include things that they don't like. It is much easier to subvert a category than it is to make an argument for an individual item.
Imagine if people accept that falling rocks kill hundreds of people every year, and you wanted to convince them that falling cheese also kills plenty of people.
It would be much easier to imply that cheese, often coming in large roundish lumps, counts as a type of rock. It stretches the definition a bit but it's still much easier to argue than the actual falling cheese argument that is your actual agenda.
When the definition is new it is more malleable. Sometimes you might need a qualifier to declare it is different but imply it is essentially like the other thing. It's just a dairy-rock, or just enhanced-interrogation.
I've seen it used enough that it's clear to me that the implied definition is "low-quality and/or low effort AI-generated content", and the actual usage is "AI generated content that I don't like". But both of those definitions very clearly refer to the piece of content as a whole, rather than specific parts of the content.
Slop is what you make, when I don't morally approve, or value critical nuance. WWE is slop, soap operas are slop, romance novels are slop, scifi is slop, etc.
Instead of "surgically adjusting" logits within an existing model, couldn't you just build the slop detector into the loss function during the initial training stage?
I honestly can’t always distinguish AI slop from the formulaic corp-speak used in emails and memos and brochure websites and other marketing. I’m guessing that must be a large component of the training matter.
I don't think that's a coincidence. Right now a lot of the business proposition for LLM bots is selling it to corporations as the ultimate corporate yes-man.
ScholarlyArticle: "Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models" (2025) https://arxiv.org/abs/2510.15061 :
> Abstract: [...] Our approach combines three innovations: (1) The Antislop Sampler, which uses backtracking to suppress unwanted strings at inference time without destroying vocabulary; (2) An automated pipeline that profiles model-specific slop against human baselines and generates training data; (3) Final Token Preference Optimization (FTPO), a novel fine-tuning method that operates on individual tokens, surgically adjusting logits wherever a banned pattern has appeared in an inference trace.
Lots of things have changed in that year, but the things that haven't are:
* So, so many em-dashes. All over the place. (I've tried various ways to get it to stop. None of them have worked long term).
* Random emojis.
* Affirmations at the start of messages. ("That's a great idea!") With a brief pause when 5 launched. But it's back and worse than ever now.
* Weird adjectives it gets stuck on like "deep experience".
* Randomly bolded words.
Honestly, it's kind of helpful because it makes it really easy to recognize content that people have copied and pasted out of ChatGPT. But apart from that, it's wild to me that a $500bn company hasn't managed to fix those persistent challenges over the course of a year.
Anecdotally, I use them less often these days, because of the association with AI.
Could be my own changing perspective, but what I think is interesting is how the signal it sends keeps changing. At first, emoji-heavy was actually kind of positive: maybe the project doesn't need a webpage, but you took some time and interest in your README.md. Then it was negative: having emoji's became a strong indicator that the whole README was going to be very low information density, more emotive than referential[1] (which is fine for bloggery but not for technical writing).
Now there's no signal, but you also can't say it's exactly neutral. Emojis in docs will alienate some readers, maybe due to association with commercial stuff and marketing where it's pretty normalized. But skipping emojis alienates other readers, who might be smart and serious, but nevertheless are the type that would prefer WATCHME.youtube instead of README.md. There's probably something about all this that's related to "costly signaling"[2].
[1] https://en.wikipedia.org/wiki/Jakobson%27s_functions_of_lang... [2] https://en.wikipedia.org/wiki/Costly_signaling_theory_in_evo...
Even when I create the first draft of a project’s README with an LLM, part of the final pass is removing those slop-associated patterns to clarify to the reader that they’re not reading unfiltered LLM output.
Why?!
Was this after many iterations? Try letting it get some "sleep". Hear me out...
I haven't used Codex, so maybe not relevant, but with Claude I always notice a slow degradation in quality, refusals, and "<implementation here>" placeholders with iterations within the same context window. One time, after making a mistake, it apologized and said something like "that's what I get for writing code at 2am". Statistically, this makes sense: long conversations between developers would go into the night, and they get tired, their code gets sparser and crappier.
So, I told it "Ok, let's get some sleep and do this tomorrow.", then the very next message (since the LLM has no concept of time), "Good morning! Let's do this!" and bam, output a completely functional, giant, block of code.
Human behavior is deeeeep in the statistics.
What a great point! I also can’t stand it. I get it’s basically a meme to point it out - even South Park has mocked it - but I just cannot stand it.
In all seriousness it’s so annoying. It is a tool, not my friend, and considering we are already coming from a place of skepticism with many of the responses, buttering me up does not do anything but make me even more skeptical and trust it less. I don’t want to be told how smart I am or how much a machine “empathizes” with my problem. I want it to give me a solution that I can easily verify, that’s it.
Stop wasting my tokens and time with fake friendship!
I want the star trek experience. The computer just says "working" and then gives you the answer without any chit-chat. And it doesn't refer to itself as if it's a person.
What we have now is Hal 9000 before it went insane.
If AI wants to be useful (it's not going to atm), real people need to cull all the banalities that facebook, reddit & forums have generated.
Because what you're noticing is things we typically elide over in discussions with actual humans.
Alcoholism can also be a symptom of a larger issue. Should we not at least discuss alcohol’s effects and what access looks like when deciding the solution?
They could hide it so that it doesn't annoy you, but I think it's not a waste of tokens. It's there so the tokens that follow are more likely to align with what you asked for. It's harder for it to then say "This is a lot of work, we'll just do a placeholder for now" or give otherwise "lazy" responses, or to continue saying a wrong thing that you've corrected it about.
I bet it also probably makes it more likely to gaslight you when you're asking something it's just not capable of, though.
- how to cope with the sadness of losing their cat
- ranting about the annoying habits of their friends
- finding all the nice places to eat in a city
etc.
They do not want that "robot" personality and they are the majority.
I also recall reading a while back that it's also a dopamine trigger. If you make people feel better using your app, they keep coming back for another fix. At least until they realize the hollow nature of the affirmations and start getting negative feelings about it. Such a fine line.
I assume the beginning of the answer is given to a cheaper, faster model, so that the slower, more expensive one can have time to think.
It keeps the conversation lively and natural for most people.
Would be interesting to test if it's true, by disabling it with a system prompt, and measure if the time-to-answer is slower for the first word.
Maybe it's intentional, like the "shiny" tone applied to "photorealistic" images of real people.
We are already at a point where we can trick large number of the population, it can without a doubt close the gap even further where we question anything and everything.
Beyond forensics, which require large capital investment and operating costs, to be able to detect AI vs human content will be limited in terms of access. It will be so that its not that we can't detect AI content anymore its that most people cannot afford the service to detect it and thus they lose interest.
This has side effect of making live performances by humans scarce and in valuable.
RIP take-home coding assignments.
Schools will need to reinvent themselves in some ways.
If an impersonation of an opera singer can't be distinguished from the real thing, what would be the point of the real thing?
[1] Wikipedia
> AI slop is digital content made with generative artificial intelligence, specifically when perceived to show a lack of effort, quality or deeper meaning, and an overwhelming volume of production.[1][4][5] Coined in the 2020s, the term has a pejorative connotation similar to spam.[4]
[2] Urban dictionary
> Low-quality randomly generated AI content (images, accounts, text, etc) that has been flooding social media sites among other pages.
Yes, I know those may not be the best primary sources, but I'd say the main shared meaning of the word is lack of quality and effort, not repetitiveness itself.
[1] https://en.wikipedia.org/wiki/AI_slop
[2] https://www.urbandictionary.com/define.php?term=AI+slop
> Ethics Statement
> Potential harms include: [...] (ii) attempts to evade AI-text detection.
And it's not clear to me how their mitigations would avoid fooling users (as opposed to algorithmic detection attempts).
That's sloppy (hehe), if you are going to redefine a common word for the first time (i.e. references are not possible) at least do it explicitly.
Yeah, slop is low effort use of AI output ("ChatGPT, write me a blog post about using AI in industry X. Copy. Paste. Publish."). If anything this is should be called Stealthslop, and when slop is harder to detect we'll all waste more time on it.
Colloquially it means ‘poor quality’ and always has done. So buzzfeed is journalism slop, just like poor quality AI content is AI slop.
Searle's paper calls these questions, script, or a story.
[1]:https://web.archive.org/web/20071210043312/http://members.ao...
Oof—-gotcha here’s how I’d handle that
Clutch choice—-here’s a few refinements
Sweet—-let me just…
Ok, here’s the receipts
I love your passion! Let’s try to keep it civil ok?
(Thinking) the user still appears annoyed
—————————————-
I think this annoys them also and yet they can’t change it? Or are they not dogfooding?
This approach is targeted to the kinds of mode collapse that we can meaningfully measure and fix after the fact, which is constrained to these verbal tics. Which doesn't fix higher level mode collapse on semantics & creativity that you're identifying -- but I think fixing the verbal tics is still important and useful.
I don't. I think they're useful for flagging the existence of mode-collapse and also providing convenient tracers for AI-written prose. Erasing only the verbal tics with the equivalent of 's/ - /; /g' (look ma! no more 4o em dashes!) is about the worst solution you could come up with and if adopted would lead to a kind of global gaslighting. The equivalent of a vaccine for COVID which only suppresses coughing but doesn't change R, or fixing a compiler warning by disabling the check.
If you wanted to do useful research here, you'd be doing the opposite. You'd be figuring out how to make the verbal expressions even more sensitive to the underlying mode-collapse, to help research into fixing it and raising awareness. (This would be useful even on the released models, to more precisely quantify their overall mode-collapse, which is poorly captured by existing creative writing benchmarks, I think, and one reason I've had a hard time believing things like Eqbench rankings.)
What worked well for us was implementing a two-stage pipeline: first using a sliding window (n=3) to detect repeated n-grams, then applying cosine similarity with a threshold of 0.85 to catch semantic duplicates. This reduced redundant content by ~40% while preserving meaningful repetition (e.g. when candidates deliberately emphasize key points).
One challenge we haven't fully solved: distinguishing between harmful repetition and intentional rhetorical devices. Have others found effective heuristics for this? We're currently experimenting with attention patterns in the transformer layers to identify deliberate vs. unintentional repetition, but results are mixed.
It's a new term so the meaning hasn't had a chance to settle. It's generally considered to be a negative term, so there's motivation for people to expand the definition to include things that they don't like. It is much easier to subvert a category than it is to make an argument for an individual item.
Imagine if people accept that falling rocks kill hundreds of people every year, and you wanted to convince them that falling cheese also kills plenty of people.
It would be much easier to imply that cheese, often coming in large roundish lumps, counts as a type of rock. It stretches the definition a bit but it's still much easier to argue than the actual falling cheese argument that is your actual agenda.
When the definition is new it is more malleable. Sometimes you might need a qualifier to declare it is different but imply it is essentially like the other thing. It's just a dairy-rock, or just enhanced-interrogation.
https://www.reddit.com/r/LocalLLaMA/comments/1lv2t7n/not_x_b...
> Abstract: [...] Our approach combines three innovations: (1) The Antislop Sampler, which uses backtracking to suppress unwanted strings at inference time without destroying vocabulary; (2) An automated pipeline that profiles model-specific slop against human baselines and generates training data; (3) Final Token Preference Optimization (FTPO), a novel fine-tuning method that operates on individual tokens, surgically adjusting logits wherever a banned pattern has appeared in an inference trace.
From https://news.ycombinator.com/item?id=45546037#45585680 , an additional potential method:
>> Could build a simple heuristic: if similar memory content gets created/updated N times within short timeframe, flag it as potential loop