I love this article just for the spirit of fun and experimentation on display. Setting up a VPS where Claude is just asked to go nuts - to the point where you're building a little script to keep Claude humming away - is a really fun idea.
This sort of thing is a great demonstration of why I remain excited about AI in spite of all the hype and anti-hype. It's just fun to mess with these tools, to let them get friction out of your way. It's a revival of the feelings I had when I first started coding: "wow, I really can do anything if I can just figure out how."
For me, I can’t get into using AI tools like Claude Code. As far as I go is chat style where I’m mostly in control. I enjoy the actual process of crafting code myself. For similar reasons, I could never be a manager.
Agents are a boon for extraverts and neurotypical people. If it gets to the point where the industry switches to agents, I’ll probably just find a new career
I do agree it’s definetly a tool category with a unique set of features and am not surprised it’s offputting to some. But it’s appeal is definetly clear to me as an introvert.
For me LLM:s are just a computer interface you can program using natural language.
I think I’m slightly ADD. I love coding _interesting_ things but boring tasks cause extreme discomfort.
Now - I can offload the most boring task to LLM and spend my mental energy on the interesting stuff!
> For me LLM:s are just a computer interface you can program using natural language. ... boring tasks cause extreme discomfort ... Now - I can offload the most boring task to LLM and spend my mental energy on the interesting stuff!
The problem with this perspective, is that when you try to offload exactly the same boring task(s), to exactly the same LLM, the results you get back are never even close to being the same.
Many people don't care about this non-determinism. Some, because they don't have enough knowledge to identify, much less evaluate, the consequent problems. Others, because they're happy to deal with those problems, under the belief that they are a cost that's worth the net benefit provided by the LLM.
And there are also many people who do care about this non-determinism, and aren't willing to accept the consequent problems.
Bluntly, I don't think that anyone in group (1) can call themselves a software engineer.
> For me LLM:s are just a computer interface you can program using natural language.
I wish they were, but they're not that yet because LLMs aren't very good at logical reasonsing. So it's more like an attempt to program using natural language. Sometimes it does what you ask, sometimes not.
I think "programming" implies that the machine will always do what you tell it, whatever the language, or reliably fail and say it can't be done because the "program" is contradictory, lacks sufficient detail, or doesn't have the necessary permissions/technical capabilities. If it only sometimes does what you ask, then it's not quite programming yet.
> Now - I can offload the most boring task to LLM and spend my mental energy on the interesting stuff!
I wish that, too, were true, and maybe it will be someday soon. But if I need to manually review the agent's output, then it doesn't feel like offloading much aside from the typing. All the same concentration and thought are still required, even for the boring things. If I could at least trust the agent to tell me if it did a good job or is unsure that would have been helpful, but we're not even there yet.
That's not to say the tools aren't useful, but they're not yet "programming in a natural language" and not yet able to "offload" stuff to.
> ... LLMs aren't very good at logical reasonsing.
I'm curious about what experiences led you to that conclusion. IME, LLMs are very good at the type of logical reasoning required for most programming tasks. E.g. I only have to say something like "find the entries with the lowest X and highest Y that have a common Z from these N lists / maps / tables / files / etc." and it spits out mostly correct code instantly. I then review it and for any involved logic, rely on tests (also AI-generated) for correctness, where I find myself reviewing and tweaking the test cases much more than the business logic.
But then I do all that for all code anyway, including my own. So just starting off with a fully-fleshed out chunk of code, which typically looks like what I'd pictured in my head, is a huge load off my cognitive shoulders.
You can view Claude Code as a non-deterministic compiler where you input english and get functioning code on the other end.
The non-determinism is not as much as a problem because you are reading over the results and validating that what it is created matches what you tell it to do.
I'm not talking about vibe-coding here, I'm grabbing the steering wheel with both hands because this car allows me to go faster than if I was driving myself, but sometimes you have to steer or brake. And the analogy favors Claude Code here because you don't have to react in milliseconds while programming.
TL;DR: if you do the commit you are responsible for the code it contains.
Sure, and that may be valuable, but it's neither "programming" nor "offloading mental effort" (at least not much).
Some have compared it to working with a very junior programmer. I haven't done that in a long while, but when I did, it didn't really feel like I was "offloading" much, and I could still trust even the most junior programmer to tell me whether the job was done well or not (and of any difficulties they encountered or insight they've learnt) much more than I can an agent, at least today.
Trust is something we have, for the most part, when we work with either other people or with tools. Working without (or with little) trust is something quite novel. Personally, I don't mind that an agent can't accomplish many tasks; I mind a great deal that I can't trust it to tell me whether it was able to do what I asked or not.
> For me LLM:s are just a computer interface you can program using natural language.
Sort of. You still can't get a reliable output for the same input. For example, I was toying with using ChatGPT with some Siri shortcuts on my iPhone. I do photography on the side, and finding a good time for lighting for photoshoots is a usecase I use a lot so I made a shortcut which sends my location to the API along with a prompt to get the sunset time for today, total amount of daylight, and golden hour times.
Sometimes it works, sometimes it says "I don't have specific golden hour times, but you can find those on the web" or a useless generic "Golden hour is typically 1 hour before sunset but can vary with location and season"
Doesn't feel like programming to me, as I can't get reproducible output.
I could just use the LLM to write some API calling script from some service that has that data, but then why bother with that middle man step.
I like LLMs, I think they are useful, I use them everyday but what I want is a way to get consistent, reproducible output for any given input/prompt.
It's interesting that every task in the world is boring to somebody, which means nothing left in the world will be done by those interested in it, because somebody will gladly shotgun it with an AI tool.
>I think I’m slightly ADD. I love coding _interesting_ things but boring tasks cause extreme discomfort.
>Now - I can offload the most boring task to LLM and spend my mental energy on the interesting stuff!
I agree and I feel that having LLM's do boilerplate type stuff is fantastic for ADD people. The dopamine hit you get making tremendous progress before you get utterly bored is nice. The thing that ADD/ADHD people are the WORST at is finishing projects. LLM will help them once the thrill of prototyping a green-field project is over.
Seconding this. My work has had the same problem - by the time I've got things all hooked up, figured out the complicated stuff - my brain (and body) clock out and I have to drag myself through hell to get to 100%. Even with ADHD stimulant medication. It didn't make it emotionally easier, just _possible_ lol.
LLMs, particularly Claude 4 and now GPT-5 are fantastic at working through these todo lists of tiny details. Perfectionism + ADHD not a fun combo, but it's way more bearable. It will only get better.
We have a huge moat in front of us of ever-more interesting tasks as LLMs race to pick up the pieces. I've never been more excited about the future of tech
I'm kind of in this cohort. While in the groove, yea, things fly but, inevitably, my interest wanes. Either something too tedious, something too hard (or just a lot of work). Or, just something shinier shows up.
Bunch of 80% projects with, as you mentioned, the interesting parts finished (sorta -- you see the line at the end of the tunnel, it's bright, just don't bother finishing the journey).
However, at the same time, there's conflict.
Consider (one of) my current projects, I did the whole back end. I had ChatGPT help me stand up a web front end for it. I am not a "web person". GUIs and what not are a REAL struggle for me because on the one hand, I don't care how things look, but, on the other, "boy that sure looks better". But getting from "functional" to "looks better" is a bottomless chasm of yak shaving, bike shedding improvements. I'm even bad at copying styles.
My initial UI was time invested getting my UI to work, ugly as it was, with guidance from ChatGPT. Which means it gave me ways to do things, but mostly I coded up the actual work -- even if it was blindly typing it in vs just raw cut and paste. I understood how things were working, what it was doing, etc.
But then, I just got tired of it, and "this needs to be Better". So, I grabbed Claude and let it have its way.
And, its better! it certainly looks better, more features. It's head and shoulders better.
Claude wrote 2-3000 lines of javascript. In, like, 45m. It was very fast, very responsive. One thing Claude knows is boiler plate JS Web stuff. And the code looks OK to me. Imperfect, but absolutely functional.
But, I have zero investment in the code. No "ownership", certainly no pride. You know that little hit you get when you get Something Right, and it Works? None of that. Its amazing, its useful, its just not mine. And that's really weird.
I've been striving to finish projects, and, yea, for me, that's really hard. There is just SO MUCH necessary to ship. AI may be able to help polish stuff up, we'll see as I move forward. If nothing else it may help gathering up lists of stuff I miss to do.
> Agents are a boon for extraverts and neurotypical people.
This sounds like a wild generalization.
I am in neither of those two groups, and I’ve been finding tools like Claude Code becoming increasingly more useful over time.
Made me much more optimistic about the direction of AI development in general too. Because with each iteration and new version it isn’t getting anywhere closer to replacing me or my colleagues, but it is becoming more and more useful and helpful to my workflow.
And I am not one of those people who are into “prompt engineering” or typing novels into the AI chatbox. My entire interaction is typically short 2-3 sentences “do this and that, make sure that XYZ is ABC”, attach the files that are relevant, let it do its thing, and then manual checks/adjustments. Saves me a boatload of work tbh, as I enjoy the debugging/fixing/“getting the nuanced details right” aspect of writing code (and am pretty decent at it, I think), but absolutely dread starting from a brand new empty file.
I kind of think we will see some industry attrition as a result of LLM coding and agent usage, simply because the ~vIbEs~ I'm witnessing boil down to quite a lot of resistance (for multiple reasons: stubbornness, ethics, exhaustion from the hype cycle, sticking with what you know, etc)
The thing is, they're just tools. You can choose to learn them, or not. They aren't going to make or break your career. People will do fine with and without them.
I do think it's worth learning new tools though, even if you're just a casual observer / conscientious objector -- the world is changing fast, for better or worse, and you'll be better prepared to do anything with a wider breadth of tech skill and experience than with less. And I'm not just talking about writing software for a living, you could go full Uncle Ted and be a farmer or a carpenter or a barista in the middle of nowhere, and you're going to be way better equipped to deal with logistical issues that WILL arise from the very nature of the planet hurtling towards 100% computerization. Inventory management, crop planning, point of sale, marketing, monitoring sensors on your brewery vats, whatever.
Another thought I had was that introverts often blame their deficits in sales, marketing and customer service on their introversion, but what if you could deploy an agent to either guide, perform, or prompt (the human) with some of those activities? I'd argue that it would be worth the time to kick the tires and see what's possible there.
It feels like early times still with some of these pie in the sky ideas, but just because it's not turn-key YET doesn't mean it won't be in the near future. Just food for thought!
I agree with all of your reasons but this one sticks out. Is this a big issue? Are many people refusing to use LLMs due to (I'm guessing here): perceived copyright issues, or power usage, or maybe that they think that automation is unjust?
> I can’t get into using AI tools like Claude Code. As far as I go is chat style where I’m mostly in control.
Try aider.chat (it's in the name), but specifically start with "ask" mode then dip a toe into "architect" mode, not "code" which is where Claude Code and the "vibe" nonsense is.
Let aider.chat use Opus 4.1 or GPT-5 for thinking, with no limit on reasoning tokens and --reasoning-effort high.
> agents are a boon for extraverts and neurotypical people.
On the contrary, I think the non-vibe tools are force multipliers for those with an ability to communicate so precisely they find “extraverts and neurotypical people” confounding when attempting to specify engineering work.
I'd put both aider.chat and Claude Code in the non-vibe class if you use them Socratically.
I think they're fantastic at generating the sort of thing I don't like writing out. For example, a dictionary mapping state names to their abbreviations, or extracting a data dictionary from a pdf so that I can include it with my documentation.
>Agents are a boon for extraverts and neurotypical people.
I completely disagree. Juggling several agents (and hopping from feature-to-feature) at once, is perfect for somebody with ADHD. Being an agent wrangler is great for introverts instead of having to talk to actual people.
You are leaving a lot of productivity on the table by not parallelizing agents for any of your work. Seemingly for psychological comfort quirks rather than earnestly seeking results.
Automation productivity doesn’t remove your own agency. It frees more time for you to apply your desire for control more discerningly.
Agents are boon for introverts who fucking hate dealing with other people (read: me). I can iterate rapidly with another 'entity' in a technical fashion and not have to spend hours explaining in relatable language what to do next.
I feel as if you need to work with these things more, as you would prefer to work, and see just how good they are.
This is the kind of thing people should be doing with AI. Weird and interesting stuff that has a "Let's find out!" Attitude.
Often there's as much to be learned from why it doesn't work.
I see the AI hype to be limited to a few domains.
People choosing to spend lots of money on things speculatively hoping to get a slice of whatever is cooking, even if they don't really know if it's a pie or not.
Forward looking imagining of what would change if these things get massively better.
Hyperbolic media coverage of the above two.
There are companies taking about adding AI for no other reason than they feel like that's what they should be doing, I think that counts as a weak driver of hype, but only because cumulatively, lots of companies are doing it. If anything I would consider this an outcome of hype.
Of these the only one that really affects me is AI being shoehorned into places it shouldn't
The media coverage stokes fires for and against, but I think it only changes the tone of annoyance I have to endure. They would do the same on another topic in the absence of AI. It used to be crypto,
I'm ok with people spending money that is not mine on high risk, high potential reward. It's not for me to judge how they calculate the potential risk or potential reward. It's their opinion, let them have it.
The weird thing I find is the complaints about AI hype dominating. I have read so many pieces where the main thrust of their argument is about the dominance of fringe viewpoints that I very rarely encounter. Frequently they take the stance that anyone imagining how the world might change from any particular form of AI as a claim that that form is inevitable and usually imminent. I don't see people making those claims.
I see people talking about what they tried, what they can do, and what they can't do. Everything they can't do is then held up by others as if it were a trophy and proof of some catestrophic weakness.
Just try stuff, have fun, if that doesn't interest you, go do something else. Tell us about what you are doing. You don't need to tell us that you aren't doing this particular thing, and why. If you find something interesting tell us about that, maybe we will too.
Not sure if I'd want Claude doing whatever on a production vps/node, but I like the idea of a way to use Claude Code on the go/wherever you are. I'm going to setup KASM workspaces on my free OCI server and see how it works there.
Thanks for sharing this! I have been trying on and off to run RooCode on a VPS to use it on the go. I tried Code Server but it does not share "sessions".
KASM seems interesting for this. Do share if you write a blog post on setting it up
It’s pretty straightforward through the Linuxserver docker image deployment. I have some notes here re: configuration and package persistence strategy via brew:
On one hand, I agree with you that there is some fun in experimenting with silly stuff. On the other hand...
> Claude was trying to promote the startup on Hackernews without my sign off. [...] Then I posted its stuff to Hacker News and Reddit.
...I have the feeling that this kind of fun experiments is just setting up an automated firehose of shit to spray places where fellow humans congregate. And I have the feeling that it has stopped being fun a while ago for the fellow humans being sprayed.
This is an excellent point that will immediately go off-topic for this thread. We are, I believe, committed, into a mire of CG content enveloping the internet. I believe we will go through a period where internet communications (like HN, Reddit, and pages indexed by search engines) in unviable. Life will go on; we will just be offline more. Then, the defense systems will be up to snuff, and we will find a stable balance.
I think it will be quite some time into the future, before AI can impersonate humans in real life. Neither hardware, nor software is there, maybe something to fool humans for a first glance maybe, but nothing that would be convincing for a real interaction.
If your solution to this problem is the web of trust, to be blunt, you don't have a solution. I am techie whose social circle is mostly other techies, and I know precisely zero people who have ever used PGP keys or any other WoT-based system, despite 30 years of evangelism. It's just not a thing anybody wants.
Don’t worry, it’s coming for real this time. The governments have been proposing a requirement that web companies connect accounts to government IDs.
If that isn’t exciting enough, Sam Altman (yea the one who popularized this LLM slop) will gladly sell you his WorldCoin to store your biometric data on the blockchain!
Indeed. I worry though. We need those defense systems ASAP. The misinformation and garbage engulfing the internet does real damage. We can't just tune it out and wait for it to get better.
I definitely understand the concern - I don't think I'd have hung out on HN for so long if LLM generated postings were common. I definitely recognize this is something you don't want to see happening at scale.
But I still can't help but grin at the thought that the bot knows that the thing to do when you've got a startup is to go put it on HN. It's almost... cute? If you give AI a VPS, of course it will eventually want to post its work on HN.
It's like when you catch your kid listening to Pink Floyd or something, and you have that little moment of triumph - "yes, he's learned something from me!"
I'm not a fan of this option, but it seems to me the only way forward for online interaction is very strong identification on any place where you can post anything.
Back in FidoNet days, some BBSs required identification papers for registering and only allowed real names to be used. Though not known for their level headed discussions, it definitely added a certain level of care in online interactions. I remember the shock seeing the anonymity Internet provided later, both positive and negative. I wouldn't be surprised if we revert to some central authentication mechanism which has some basic level of checks combined with some anonymity guarantees. For example, a government owned ID service, which creates a new user ID per website, so the website doesn't know you, but once they blacklist that one-off ID, you cannot get a new one.
I grew up in... slightly rural america in the 80s-90s, we had probably a couple of dozen local BBSes the community was small enough that after a bit I just knew who everyone was OR could find out very easily.
When the internet came along in the early 90s and I started mudding and hanging out in newsgroups I liked them small where I could get to know most of the userbase, or at least most of the posing userbase. Once mega 'somewhat-anonymous' (i.e. posts tied to a username, not like 4chan madness) communities like slashdot, huge forums, etc started popping up and now with even more mega-communities like twitter and reddit. We lost something, you can now throw bombs without consequence.
I now spend most of my online time in a custom built forum with ~200 people in it that we started building in an invite only way. It's 'internally public' information who invited who. It's much easier to have a civil conversation there, though we still do get the occasional flame-out. Having a stable identity even if it's not tied to a government name is valuable for a thriving and healthy community.
Honestly, having seen how it can be used against you, retroactively, I would never ever engage in a discussion under my real name.
(The fact that someone could correlate posts[0] based on writing style, as previously demonstrated on HN and used to doxx some people, makes things even more convoluted - you should think twice what you write and where.)
> People will be more than willing to say, "Claude, impersonate me and act on my behalf".
I'm now imagining a future where actual people's identities are blacklisted just like some IP addresses are dead to email, and a market develops for people to sell their identity to spammers.
That's always been the biggest flaw in the Worldcoin idea in my opinion: if you have a billion+ humans get their eyeball scanned in exchange for some kind of cryptographic identity, you can guarantee that a VERY sizable portion of those billion people will happily sell that cryptographic identity (which they don't understand the value of) to anyone who offers them some money.
As far as I can tell the owner of the original iris can later invalidate an ID that they've sold, but if you buy an ID from someone who isn't strongly technically literate you can probably extract a bunch of value from it anyway.
I mean, that's fine I guess as long as its respectable and respects the forum.
"Claude write a summary of the word doc I wrote about x and post it as a reply comment," is fine. I dont see why it wouldnt be. Its a good faith effort to post.
"Claude, post every 10 seconds to reddit to spam people to believe my politics is correct," isn't but that's not the case. Its not a good faith effort.
The moderation rules for 'human slop' will apply to AI too. Try spamming a well moderated reddit and see how far you get, human or AI.
it's annoying but it'll be corrected by proper moderation on these forums
as an aside i've made it clear that just posting AI-written emoji slop PR review descriptions and letting claude code directly commit without self reviewing is unacceptable at work
Forums like HN, reddit, etc will need to do a better job detecting this stuff, moderator staffing will need to be upped, AI resistant captchas need to be developed, etc.
Spam will always be here in some form, and its always an arms race. That doesnt really change anything. Its always been this way.
(author here) I did feel kinda bad about it as I've always been a 'good' HNer until that point but honestly it didn't feel that spammy to me compared to some human generated slop I see posted here, and as expected it wasn't high quality enough to get any attention so 99% of people would never have seen it.
I think the processes etc that HN have in place to deal with human-generated slop are more than adequate to deal with an influx of AI generated slop, and if something gets through then maybe it means it was good enough and it doesn't matter?
well, he was arguing that it's not worse than 99% of the human slop that gets posted, so where do you draw the line?
* well crafted, human only?
* Well crafted, whether human or AI?
* Poorly crafted, human
* well crafted, AI only
* Poorly crafted, AI only
* Just junk?
etc.
I think people will intuitively get a feel for when content is only AI generated. If people spend time writing a prompt that doesn't make it so wordy, and has personality, and it OK, then fine.
Also, big opportunity going to be out there for AI detected content, whether in forums, coming in inmail inboxes, on your corp file share, etc...
It really highlights to me the pickle we are in with AI: because we are at a historical maximum already of "worse is better" with Javascript, and the last two decades have put out a LOT of javascript, AI will work best with....
Javascript.
Now MAYBE better AI models will be able to equivalently translate Javascript to "better" languages, and MAYBE AI coding will migrate "good" libraries in obscure languages to other "better" languages...
But I don't think so. It's going to be soooo much Javascript slop for the next ten years.
I HOPE that large language models, being language models, will figure out language translation/equivalency and enable porting and movement of good concepts between programming models... but that is clearly not what is being invested in.
What's being invested in is slop generation, because the prototype sells the product.
All this AI coding stuff is scaring the shit out of me. a few months ago my team were hiring for a new engineer. of the 9 candidates we ran technical interviews with, only two could work without the ai. The rest literally just vibe coded their way though the app. as soon as it was taken away, they couldn't even write a basic sql query in ecto (we're a phoenix app). when questioned about tradeoffs inherent in the ai generated implementation, all but one was completely in the dark.
Now take Google away, and LSP. And the computer. Write CTEs with a pencil or bust.
I'm exaggerating of ourse, and I hear what you're saying, but I'd rather hire someone who is really really good at squeezing the most out of current day AI (read: not vibe coding slop) than someone who can do the work manually without assistance or fizz buzz on a whiteboard.
> export IS_SANDBOX=1 && claude --dangerously-skip-permissions
FYI, this can be shortened to:
IS_SANDBOX=1 claude --dangerously-skip-permissions
You don't need the export in this case, nor does it need to be two separate commands joined by &&. (It's semantically different in that the variable is set only for the single `claude` invocation, not any commands which follow. That's often what you want though.)
> I asked Claude to rename all the files and I could go do something else while it churned away, reading the files and figuring out the correct names.
It's got infinite patience for performing tedious tasks manually and will gladly eat up all your tokens. When I see it doing something like this manually, I stop it and tell it to write a program to do the thing I want. e.g. I needed to change the shape of about 100 JSON files the other day and it wanted to go through them one-by-one. I stopped it after the third file, told it to write a script to import the old shape and write out the new shape, and 30 seconds later it was done. I also had it write me a script to... rename my stupidly named bank statements. :-)
It works because they exported it. VAR=foo bar only sets it for the env passed to that exec or subshell, export VAR=foo && bar adds it to the current env then executes bar.
export VAR=foo && bar is dangerous because it stays set.
Ah, that's what I had done wrong, thank you! And agree I wouldn't want to just one-off export it and have it be set, better to not export it for one-liner one-offs for sure
This article feels like it was written as a dialectical exercise between an AI and a human. It would probably benefit from some more heavy human editing to make it more succinct and to give the overall article a structure. As it is, it's very difficult to follow along.
The title is a bit exaggerated. The depth of the projects covered in the article is clearly not representative of "all".
In fact, I now prefer to use a purely chat window to plan the overall direction and let LLM provide a few different architectural ideas, rather than asking LLM to write a lot of code whose detail I have no idea about.
I like using Claude-Code, it can be a real timesaver in certain cases.
But it's far from perfect. Really difficult things/big projects are nearly impossible. Even if you break it down into hundred small tasks.
I've tried to make it port an existing, big codebase from one language to another. So it has all of the original codebase in one folder, and a new project in another folder. No matter how much guidance you give it, or how clear you make your todos, it will not work.
Most harnesses provide this as a "plan" vs. "act" mode now. You first "chat" in plan mode (no access to tools, no instructions to write any code basically), you then can optionally write those plans in a memorybank / plan.md, and then say "now go implement it", and it moves to the "act" mode where it goes through and does it, updating progress in plan.md as it goes.
I've found it very useful to have items like requirements.md, plans.md, or todo.md, in my LLM focused projects. I'll use AI to help take the ideas I have at that stage and refine them into something more appropriate for ingestion into the next stage. So, when I want it to come up with the plans, it is going to base is mostly on requirements.md, and then I'll have it act on the plans step by step after that.
You run a coding agent with no permissions checks on a production server anywhere I'm involved in security and I will strike down upon thee with great vengeance and furious anger.
Really, any coding agent our shop didn't write itself, though in those cases the smiting might be less theatrical than if you literally ran a yolo-mode agent on a prod server.
> 1) Have faith (always run it with 'dangerously skip permissions', even on important resources like your production server and your main dev machine. If you're from infosec, you might want to stop reading now—the rest of this article isn't going to make you any happier. Keep your medication close at hand if you decide to continue).
But I think I'm getting to the point where "If I'd let an intern/junior dev have access while I'm watching then I'm probably OK with Claude having it too"
The thing that annoys me about a lot of infosec people is that they have all of these opinions about bad practice that are removed from the actual 'what's the worst that could happen here' impact/risk factor.
I'm not running lfg on a control tower that's landing boeing 737s, but for a simple non-critical CRUD app? Probably the tradeoff is worth it.
Why in the world would you advocate explicitly for letting it run on production servers, rather than teaching it how to test in a development or staging environment like you would with a junior engineer?
I think that's like, fractally wrong. We don't allow early-stage developers to bypass security policies so that they can learn, and AI workflow and tool development is itself a learning process.
Author (who also replied to you) might have been "doing it wrong" but no wonder, Anthropic only made Claude Code smarter about this 5 days ago and there's too much to keep up with:
The new command is something like /security-review and should be in the loop before any PR or commit especially for this type of web-facing app, which Claude Code makes easy.
This prompt will make Claude's code generally beat not just intern code, but probably most devs' code, for security mindedness:
I've often gotten the sense that fly.io is not completely averse to some degree of "cowboying," meaning you should probably take heed to this particular advice coming from them..
We're pretty averse to "cowboying". We're a small team working on an enormously ambitious problem at a much earlier point on the maturity curve than incumbents. It's fine if that maturity concern impacts people's take on the product, but not at all fine if people use it as a reflection on the people and processes building that product.
I think I just meant perhaps fly isn't afraid of responsibly "moving fast" in certain situations. Sorry for any offense, didn't mean it like that at all and there was no ill intent (actually the opposite) in my OC. At the end of the day I was trying to convey that the security stances of fly should be paid attention to.
Is Claude Code better than the Gemini CLI? I've been using the Gemini CLI with Gemini 2.5 Pro and haven't been impressed. Maybe these LLMs aren't as good with Rust codebases? I'm guessing there are a lot more people looking to use these tools with JS and Python.
I've been using both on a Rust codebase and have found both work fairly well. Claude code is definitely more capable than Gemini. What difficulties have you had?
The biggest pain point I've had is that both tools will try to guess the API of a crate instead of referencing the documentation. I've tried adding an MCP for this but have had mixed results.
You can make Gemini CLI much better by making it behave more like Claude Code. Claude Code has some lovely prompt engineering at the system and subsystem level that can be replicated with Gemini CLI. I’m having great results already. I am still perfecting process and prompts to be a fully agentic system that can do well on benchmarks but more importantly do the right work with steerability, which was an absolute pain with Gemini CLI out-of-the-box. If you are interested, I can publish some of the basics now and then I can keep you posted as I develop it into a more robust system. Just email me at randycarlton@gmail.com with the subject: SaaS.bot (where this work will likely reside).
I don't know if it's Gemini CLI or Gemini 2.5 Pro, but the combination is not even comparable to Claude Code with Sonnet. I was starting with these agent tools several weeks ago, so it was very tempting to use Gemini, instead of paying for Claude Pro, but the difference is huge. In my experience, Gemini was very quick to get stuck in debugging loop, fixing something "one last time" over and over again. Or it got into writing code, despite my explicitly saying not to do so. I'm still trying to figure out if I could use Gemini for something, but every time I try it, I regret it. Claude Code with GLM-4.5 is a good alternative to paying for Claude Pro, it's not as good as Sonnet, but close.
I guess what seems surprising to me is that Gemini 2.5 Pro scores well above Claude Sonnet on Aider's leaderboard, even beating Claude Opus 4.
I have been kinda wondering if there's something that just isn't as good between the CLI and model because the Gemini CLI has been a mostly frustrating experience - and it's kept me from wanting to pay for Claude because I don't want to pay money for the same frustrating experience. But maybe I should try Claude and see.
i've tried codex, cursor, and a few other agentic tools and nothing compares to claude code when it comes to UX. The other service's models are quickly catching up to claude, but the claude code ux is just magical to me. i havent used it with rust personally. like you suggested would be the average user, i've mostly stuck with js and python.
I was once a heavy user of Cursor with Gemini 2.5 Pro as a model, then a Claude Code convert. Occasionally I try out Gemini CLI and somehow it fails to impress, even as Cursor + Gemini still works well. I think it's something about the limited feature set and system prompt.
I have found Claude code to be significantly better, both in how good the model ends up being and in how polished it is. To the point that I do not drop down to Gemini CLI when I reach my Claude usage limit.
I've asked copilot (Claude Sonnet 4) to edit some specific parts of a project. It removed the lines that specifically have comments that say "do not remove" with long explanation why. Then it went ahead and modified the unit tests to ensure 100% coverage.
Using coding agent is great btw, but at least learn how to double check their work cuz they are also quite terrible.
This is the tricky part. The whole point of agents is, well, do things so that we don't have to. But if you need to check everything they do, you might as well copy and paste from a chat interface...
Which makes me feel early adopters pay with their time. I'm pretty sure the agents will be much better with time, but this time is not exactly now, with endless dances around their existing limitations. Claude Code is fun to experiment with but to use it in production I'd give it another couple of years (assuming they will focus on code stability ans reducing its natural optimism as it happily reports "Phase 2.1.1 has been successfully with some minor errors with API tests failing only 54.3% of the time").
I'd personally rather use gpt-5. The sub price is cheap and offers more overall value than an Anthropic sub or paying per token. The chatgpt app on iPhone and Mac are native and nicer than Anthropic's and offer more features. Codex is close enough to Claude Code and also now native. For me it's nicer to use the "same" model across each use case like text, images, code etc. this way I better understand the limitations and quirks of the model rather than the constant context switching to different models to get maybe slightly better perf. To each their own though depending on your personal use case.
The problem is GPT-5 is not in the same league as even Claude 3.5. But I do hope their lower pricing puts some downward pressure on Anthropic's next release.
I don’t believe this is true but I’m willing to be proven wrong. I believe people who think this are just used to Claude’s models and therefore understand the capabilities and limitations due to their experience using them.
system_instructions = """You will generate an image. The image will be used as the background of a poster, so keep it muted and not too detailed so text can still easily be seen on top. The actual poster elements like margin etc will be handled separately so just generate a normal image that works well in A4 ratio and that works well as a background."""
full_prompt = f"{system_instructions}\n\nGenerate a background image for an A4 poster with the following description: {prompt}"
openai_request = {
'model': 'gpt-4.1-mini',
'input': full_prompt,
'tools': [{
'type': 'image_generation',
'size': '1024x1536',
'quality': 'medium'
}]
}
# Make request to OpenAI
response_data = self.call_openai_api('/v1/responses', openai_request)
Why isn't anyone talking about the HackerNews Comment Ranker plugin? [1] That's amazing. I had this idea too -- to rank HN comments by their relevance to the actual article, and filter out comments that obviously didn't read it.
I need to modify this to work with local models, though. But this does illustrate the article's point -- we both had an idea, but only one person actually went ahead and did it, because they're more familiar with agentic coding than me.
The screenshot was a really great example how bad that can end up in practice. One comment asking "What's the catch?" which is a good follow-up question to further conversation was ranked a 1/5.
Probably just needs a slight update to expand the relevant context of child comments. I bet it's still comparing "What's the catch?" to the OP article.
I don't know about Claude Code, but here's my story. With Replit, I have a bunch of tasks that I want Replit to do at the end of a coding session -- push to Github, update user visible Changelogs, etc. It's a list in my replit.md file.
A couple of weeks ago I asked it to "clean up" instead of the word I usually use and it ended up deleting both my production and dev databases (a little bit my fault too -- I thought it deleted the dev database so I asked it to copy over from production, but it had deleted the production database and so it then copied production back to dev, leaving me with no data in either; I was also able to reconstruct my content from a ETL export I had handy).
This was after the replit production db database wipe-out story that had gone viral (which was different, that dev was pushing things on purpose). I have no doubt it's pretty easy to do something similar in Claude Code, especially as Replit uses Claude models.
Anyway, I'm still working on things in Replit and having a very good time. I have a bunch of personal purpose-built utilities that have changed my daily tech life in significant ways. What vibe coding does allow me to do is grind on "n" of unrelated projects in mini-sprints. There is personal, intellectual, and project cost to this context switching, but I'm exploring some projects I've had on my lists for a long time, and I'm also building my base replit.md requirements to match my own project tendencies.
I vibe coded a couple of things that I think could be interesting to a broader userbase, but I've stepped back and re-implemented some of the back-end things to a more specific, higher-end vibe coded environment standard. I've also re-started a few projects from scratch with my evolved replit.md... I built an alpha, saw some issues, upgraded my instructions, built it again as a beta, saw some issues... working on a beta+ version.
I'm finding the process to be valuable. I think this will be something I commit to commercially, but I'm also willing to be patient to see what each of the next few months brings in terms of upgraded maturity and improved devops.
Claude Code has minimal internal guardrails against destructive operations when using --dangerously-skip-permissions, which is why it's a major security risk for production environments regardless of how convenient it seems.
I've found Claude's CLI to be the best of what I've tried. I've moved away from cursor and found myself in a much better programming headspace wherein I can "toggle" this AI-enabled mode. It has to be a more mindful approach to when/how I use AI in my day-to-day work instead of it being a temptation to "AI" some of the work away in the Cursor IDE.
I dont know about yall, but personally I love to see an AI running with "--dangerously-skip-permissions" in an infinite loop. Every day we get closer to the cyberpunk future we deserve.
If Anthropic is smart they would open it up to other models now to make it default for everyone. Otherwise you are banking on Sonnet remaining the best coding model.
There's Claude Code Router, that lets you use any model with Claude Code. Claude is a really good model for agents though, even though Gemini 2.5 and GPT5 are better models overall, Claude uses tools and plans tasks more effectively. A better pattern is to provide sub agents in Claude Code that call out to other LLMs as tools for planning/architecture.
This piece is also covered by a bunch of other cli/tui agents (like codex-cli and opencode) allowing you to switch between Claude and other models (comes in handy depending on the task) so it really all depends on the setup you like. As mentioned in the sibling comment there are ways to get it to work with Claude Code too.
I run with the dangerous option on my work computer. At first I was thinking I would be good if I just regularly kept full disk backups. But my company at least pays lip service to the fact that we want to protect our intellectual property. Plus I think it might be irresponsible to allow an AI model full internet access unsupervised.
So now I use a docker compose setup where I install Claude and run it in a container. I map source code volumes into the container. It uses a different container with dnsmasq with an allowlist.
I initially wanted to do HTTP proxying instead of DNS filtering since it would be more secure, but it was quite hard to set it up satisfactorily.
Running CLI programs with the dangerous full permissions is a lot more comfortable and fast, so I'm quite satisfied.
> I hit a small snag where Anthropic decides that running Claude as root with --dangerously-skip-permissions / yolo-mode is not allowed. You can get past this dumb nanny-state stuff by running [fun dangerous command that lets you run as root]
> If you're from infosec, you might want to stop reading now — the rest of this article isn't going to make you any happier. Keep your medication close at hand if you decide to continue...
I just came for the comments for this... I am not sure at what point we are. Think AI and Crypto are a match in hell, especially given that a lot of Crypto projects are made by bros who have no interest in tech. estimate we'll be seeing projects/companies that get hacked as soon as they launch by Claude itself.
I appreciate this writeup. I live in the terminal and work primarily in vim, so I always appreciate folks talking about tooling from that perspective. Little of the article is that, but it's still interesting to see the workflow outlined here, and it gives me a few ideas to try more of.
However, I disagree that LLMs are anywhere near as good as what's described here for most things I've worked with.
So far, I'm pretty impressed with Cursor as a toy. It's not a usable tool for me, though. I haven't used Claude a ton, though I've seen co-workers use it quite a bit. Maybe I'm just not embracing the full "vibe coding" thing enough and not allowing AI agents to fully run wild.
I will concede that Claude and Cursor have gotten quite good at frontend web development generation. I don't doubt that there are a lot of tasks where they make sense.
However, I still have yet to see a _single_ example of any of these tools working for my domain. Every single case, even when the folks who are trumpeting the tools internally run the prompting/etc, results in catastrophic failure.
The ones people trumpet internally are cases where folks can't be bothered to learn the libraries they're working with.
The real issue is that people who aren't deeply familiar with the domain don't notice the problems with the changes LLMs make. They _seem_ reasonable. Essentially by definition.
Despite this, we are being nearly forced to use AI tooling on critical production scientific computing code. I have been told I should never be editing code directly and been told I must use AI tooling by various higher level execs and managers. Doing so is 10x to 100x slower than making changes directly. I don't have boilerplate. I do care about knowing what things do because I need to communicate that to customers and predict how changes to parameters will affect output.
I keep hearing things described as an "overactive intern", but I've never seen an intern this bad, and I've seen a _lot_ of interns. Interns don't make 1000 line changes that wreck core parts of the codebase despite being told to leave that part alone. Interns are willing to validate the underlying mathematical approximations to the physics and are capable of accurately reasoning about how different approximations will affect the output. Interns understand what the result of the pipeline will be used for and can communicate that in simple terms or more complex terms to customers. (You'd think this is what LLMs would be good at, but holy crap do they hallucinate when working with scientific terminology and jargon.)
Interns have PhDs (or in some cases, are still in grad school, but close to completion). They just don't have much software engineering experience yet. Maybe that's the ideal customer base for some of these LLM/AI code generation strategies, but those tools seem especially bad in the scientific computing domain.
My bottleneck isn't how fast I can type. My bottleneck is explaining to a customer how our data processing will affect their analysis.
(To our CEO) - Stop forcing us to use the wrong tools for our jobs.
(To the rest of the world) - Maybe I'm wrong and just being a luddite, but I haven't seem results that live up to the hype yet, especially within the scientific computing world.
This is roughly my experience with LLMs. I've had a lot of friends that have had good experience vibe coding very small new apps. And occasionally I've had AI speed things up for me when adding a specific feature to our main app. But at roughly 2 million lines of code, and with 10 years of accumulated tribal knowledge, LLMs really seem to struggle with our current codebase.
The last task I tried to get an LLM to do was a fairly straightforward refactor of some of our C# web controllers - just adding a CancellationToken to the controller method signature whenever the underlying services could accept one. It struggled so badly with that task that I eventually gave up and just did it by hand.
The widely cited study that shows LLMs slow things down by 20% or so very much coheres with my experience, which is generally: fight with the LLM, give up, do it by hand.
My experience is that sometimes they give you a 10x speedup but then you hit a wall and take 30 times longer to do a simple thing and a lot of people just keep hammering because of the first feeling. Outside of boilerplate, I haven't seen it be this magical tool people keep claiming it is.
Letting Cursor pick the model for you is inviting them to pick the cheapest model for them, at the cost of your experience. It's better to develop your own sense of what model works in a given situation. Personally, I've had the most success with Claude, Gemini Pro, and o3 in Cursor.
I think if you use Cursor, using Claude Code is a huge upgrade. The problem is that Cursor was a huge upgrade from the IDE, so we are still getting used to it.
The company I work for builds a similar tool - NonBioS.ai. It is in someways similar to what the author does above - but packaged as a service. So the nonbios agent has a root VM and can write/build all the software you want. You access/control it through a web chat interface - we take care of all the orchestration behind the scene.
Its also in free Beta right now, and signup takes a minute if you want to give it a shot. You can actually find out quickly if the Claude code/nonbios experience is better than Cursor.
I think the path forward there is slack/teams/discord/etc integration of agents, so you can monitor and control whatever agent software you like via a chat interface just like you would interact with any other teammate.
So we tried that route - but problem is that these interfaces aren't suited for asynchronous updates. Like if the agent is working for the next hour or so - how do you communicate that in mediums like these. An Agent, unlike a human, is only invoked when you give it a task.
If you use the interface at nonbios.ai - you will quickly realize that it is hard to reproduce on slack/discord. Even though its still technically 'chat'
On Slack I think threads are fine for this. Have an agent work channel, and they can just create a thread for each task and just dump updates there. If an agent is really noisy about its thinking you might need a loglevel toggle but in my experience with Claude Code/Cursor you could dump almost everything they're currently emitting to the UI into a thread.
It's still nice to have a direct web interface to agents, but in general most orgs are dealing with service/information overload and chat is a good single source of truth, which is why integrations are so hot.
I'm a long-time GitHub Copilot subscriber, but I have also tested many alternatives, such as Cursor.
Recently, I tried using Claude Code with my GitHub Copilot subscription (via unofficial support through https://github.com/ericc-ch/copilot-api), and I found it to be quite good. However, in my opinion, the main difference comes down to your preferred workflow. As someone who works with Neovim, I find that a tool that works in the terminal is more appropriate for me.
Isn’t that usage a violation of ToS? In that repo there’s even an issue thread that mentions this. The way I rely on GitHub these days, losing my account would be far more than annoying.
Most of these are Anthropic models under the hood, so I think 'whatever fits your workflow best' is the main deciding factor. That's definitely Claude Code for me, and I do think there's some 'secret sauce' in the exact prompting and looping logic they use, but I haven't tried Cursor a lot to be sure.
any secret sauce in prompting etc could be trivially reverse engineered by the companies building the other agents, since they could easily capture all the prompts it sends to the LLM. If there’s any edge, it’s probably more around them fine-tuning the model itself on Claude Code tasks.
Interesting that the other vendors haven't done this "trivial" task, then, and have pretty much ceded the field to Claude Code. _Every_ CLI interface I've used from another vendor has been markedly inferior to Claude Code, and that includes Codex CLI using GPT-5.
Claude code seems like the obvious choice for someone using Vim but even in the context of someone using a graphical IDE like VSCode I keep hearing that Claude is “better” but I just can’t fathom how that can be the case.
Even if the prompting and looping logic is better, the direct integration with the graphical system along with the integrated terminal is a huge benefit, and with graphical IDEs the setup and learning curve is minimal.
You can run Claude Code in the terminal window of VS Code, and it has IDE integration so you can see diffs inline, etc. It's not fully integrated like Cursor but you get a lot of the benefits of the IDE in that way.
I use Cursor with Claude Code running in the integrated terminal (within a dev container in yolo mode). I'll often have multiple split terminals with different Claude Code instances running on their own worktrees. I keep Cursor around because I love the code completions.
Have someone who isn't ever going to use claude code sign up for him and then give him the credentials. (do you have a partner or other relative not in tech?)
It's a lot like the first time taking a metal detector to a beach. It's really cool and exciting (dopamine hit) to find stuff, but after a while it wears off because realistically you only found trash.
Buuut for some people it just clicks and it becomes their chore to go find trash in the beach everyday and the occasional nickel or broken bracelet they feel the need to tell people and show it off.
(author here) I think there's a difference between "I'm no longer impressed" (good) and "I was never impressed and never would have been impressed" (bad, but common).
Yes it's easy now so its by definition no longer impressive, but that in itself is impressive if you can correctly remember or imagine what your reaction _would_ have been 6 months ago.
Never impressed, no longer impressed, feeling depressed ... Another option, newly impressed by the next iteration.
Up to a point these have been probability machines. There's probably a lot of code that does certain likely things. An almost astonishing amount doing the same things, in fact. As such, perhaps we shouldn't be surprised or impressed by the stochastic parrot aspect any more than we're impressed by 80% of such sites being copy pasta from Stack Overflow a few years ago.
However, what we perhaps didn't expect is that on the margins of the mass probability space, there are any number of less common things, yet still enough of those in aggregate that these tools can guess well how to do those things too, even things that we might not be able to search for. Same reason Perplexity has a business model when Google or DDG exist.
And now, recently, many didn't expect one might be able to simulate a tiny "society of mind" made of "agents" out of these parrots, a tiny society that's proving actually useful.
Parrots themselves still impress me, but a society of them making plans at our beck and call? That can keep us all peeking, pecking, and poking for a while yet.
// given enough time and typewriters, who wins: a million monkeys, a society of parrots, or six hobbits?
This is good stuff. While somebody could build a Trello clone or an image generator by typing “git clone “ followed by any number of existing projects, the code you’d get might’ve been written by a person, plus if you do that you’re not even spending any money, which just doesn’t seem right.
The future is vibe coding but what some people don’t yet appreciate what that vibe is, which is a Pachinko machine permanently inserted between the user and the computer. It’s wild to think that anybody got anything done without the thrill of feeding quarters into the computer and seeing if the ball lands on “post on Reddit” or “delete database”
I’ve noticed a new genre of AI-hype posts that don’t attempt to build anything novel, just talk about how nice and easy building novel things has become with AI.
The obvious contradiction being that if it was really so easy their posts would actually be about the cool things they built instead of just saying what they “can” do.
I wouldn’t classify this article as one since the author does actually create something of this, but LinkedIn is absolutely full of that genre of post right now.
I'm sorry but people who let an agent run on prod deserve what they get. Basically even saying you would do that should disqualify you from working in IT in the way saying "I like to drink when I'm working" should disqualify you from airtraffic control.
I haven't been following too closely, but is there even a reason to do this? What are the benefits of allowing production access versus just asking for a simple build system which promotes git tags, writes database migration scripts, etc.? From my perspective, it should be easier than ever to use a "work" workflow for side projects, where code is being written to PR's, which could optionally be reviewed or even just auto approved as a historical record of changes, and use a trunk-based development workflow with simple CI/CD systems - all of which could even be a cookie cutter template/scaffolding to be reused on every project. Doesn't it make sense now more than ever to do something like that for every project?
> I'm sorry but people who let an agent run on prod deserve what they get.
The problem is that whatever consequences come of it won’t affect just them. You don’t really have any way of knowing if any service you use or depend on has developers running LLMs in production. One day not too far off in the future, people who don’t even like or use LLMs will be bitten hard by those who do.
> I watched the autonomous startup builder a bit more.
I think i'm done with this community in the age of vibe coding. The line between satire, venture capitalism, business idea guys and sane tech enthusiasts is getting too blurry.
Particularly with the VSCode extension. I was a loyal Cline user until recently because of how good the editor experience was, but the ability for Claude to go off and run for 10+ minutes effectively autonomously, and show me the diffs in realtime is a gamechanger. The token usage has also gotten much more efficient in the last few months. With proper IDE support now I don't see any reason at all to use anything else, especially not the "credit" based middle-man providers (Windsurf/Cursor et. al).
Same here, I was convinced Cline+OpenRouter was the way to go. But with Claude code I’m getting better results and saving money, even compared to planning with Sonnet and transitioning to act mode with DeepSeek, I was still using more than $20/mo easily.
This article seems fun, and it's interesting, but I was waiting for the point and it never came.
The author didn't do anything actually useful or impactful, they played around with a toy and mimicked a portion of what it's like to spin up pet projects as a developer.
But hey, it could be that this says something after all. The first big public usages of AI were toys and vastly performed as a sideshow attraction for amused netizens. Maybe we haven't come very far at all, in comparison to the resources spent. It seems like all of the truly impressive and useful applications of this technology are still in specialized private sector work.
This sort of thing is a great demonstration of why I remain excited about AI in spite of all the hype and anti-hype. It's just fun to mess with these tools, to let them get friction out of your way. It's a revival of the feelings I had when I first started coding: "wow, I really can do anything if I can just figure out how."
Great article, thanks for sharing!
Agents are a boon for extraverts and neurotypical people. If it gets to the point where the industry switches to agents, I’ll probably just find a new career
I do agree it’s definetly a tool category with a unique set of features and am not surprised it’s offputting to some. But it’s appeal is definetly clear to me as an introvert.
For me LLM:s are just a computer interface you can program using natural language.
I think I’m slightly ADD. I love coding _interesting_ things but boring tasks cause extreme discomfort.
Now - I can offload the most boring task to LLM and spend my mental energy on the interesting stuff!
It’s a great time to be a software engineer!
The problem with this perspective, is that when you try to offload exactly the same boring task(s), to exactly the same LLM, the results you get back are never even close to being the same.
Many people don't care about this non-determinism. Some, because they don't have enough knowledge to identify, much less evaluate, the consequent problems. Others, because they're happy to deal with those problems, under the belief that they are a cost that's worth the net benefit provided by the LLM.
And there are also many people who do care about this non-determinism, and aren't willing to accept the consequent problems.
Bluntly, I don't think that anyone in group (1) can call themselves a software engineer.
I wish they were, but they're not that yet because LLMs aren't very good at logical reasonsing. So it's more like an attempt to program using natural language. Sometimes it does what you ask, sometimes not.
I think "programming" implies that the machine will always do what you tell it, whatever the language, or reliably fail and say it can't be done because the "program" is contradictory, lacks sufficient detail, or doesn't have the necessary permissions/technical capabilities. If it only sometimes does what you ask, then it's not quite programming yet.
> Now - I can offload the most boring task to LLM and spend my mental energy on the interesting stuff!
I wish that, too, were true, and maybe it will be someday soon. But if I need to manually review the agent's output, then it doesn't feel like offloading much aside from the typing. All the same concentration and thought are still required, even for the boring things. If I could at least trust the agent to tell me if it did a good job or is unsure that would have been helpful, but we're not even there yet.
That's not to say the tools aren't useful, but they're not yet "programming in a natural language" and not yet able to "offload" stuff to.
I'm curious about what experiences led you to that conclusion. IME, LLMs are very good at the type of logical reasoning required for most programming tasks. E.g. I only have to say something like "find the entries with the lowest X and highest Y that have a common Z from these N lists / maps / tables / files / etc." and it spits out mostly correct code instantly. I then review it and for any involved logic, rely on tests (also AI-generated) for correctness, where I find myself reviewing and tweaking the test cases much more than the business logic.
But then I do all that for all code anyway, including my own. So just starting off with a fully-fleshed out chunk of code, which typically looks like what I'd pictured in my head, is a huge load off my cognitive shoulders.
The non-determinism is not as much as a problem because you are reading over the results and validating that what it is created matches what you tell it to do.
I'm not talking about vibe-coding here, I'm grabbing the steering wheel with both hands because this car allows me to go faster than if I was driving myself, but sometimes you have to steer or brake. And the analogy favors Claude Code here because you don't have to react in milliseconds while programming.
TL;DR: if you do the commit you are responsible for the code it contains.
Some have compared it to working with a very junior programmer. I haven't done that in a long while, but when I did, it didn't really feel like I was "offloading" much, and I could still trust even the most junior programmer to tell me whether the job was done well or not (and of any difficulties they encountered or insight they've learnt) much more than I can an agent, at least today.
Trust is something we have, for the most part, when we work with either other people or with tools. Working without (or with little) trust is something quite novel. Personally, I don't mind that an agent can't accomplish many tasks; I mind a great deal that I can't trust it to tell me whether it was able to do what I asked or not.
Sort of. You still can't get a reliable output for the same input. For example, I was toying with using ChatGPT with some Siri shortcuts on my iPhone. I do photography on the side, and finding a good time for lighting for photoshoots is a usecase I use a lot so I made a shortcut which sends my location to the API along with a prompt to get the sunset time for today, total amount of daylight, and golden hour times.
Sometimes it works, sometimes it says "I don't have specific golden hour times, but you can find those on the web" or a useless generic "Golden hour is typically 1 hour before sunset but can vary with location and season"
Doesn't feel like programming to me, as I can't get reproducible output.
I could just use the LLM to write some API calling script from some service that has that data, but then why bother with that middle man step.
I like LLMs, I think they are useful, I use them everyday but what I want is a way to get consistent, reproducible output for any given input/prompt.
I agree and I feel that having LLM's do boilerplate type stuff is fantastic for ADD people. The dopamine hit you get making tremendous progress before you get utterly bored is nice. The thing that ADD/ADHD people are the WORST at is finishing projects. LLM will help them once the thrill of prototyping a green-field project is over.
LLMs, particularly Claude 4 and now GPT-5 are fantastic at working through these todo lists of tiny details. Perfectionism + ADHD not a fun combo, but it's way more bearable. It will only get better.
We have a huge moat in front of us of ever-more interesting tasks as LLMs race to pick up the pieces. I've never been more excited about the future of tech
Bunch of 80% projects with, as you mentioned, the interesting parts finished (sorta -- you see the line at the end of the tunnel, it's bright, just don't bother finishing the journey).
However, at the same time, there's conflict.
Consider (one of) my current projects, I did the whole back end. I had ChatGPT help me stand up a web front end for it. I am not a "web person". GUIs and what not are a REAL struggle for me because on the one hand, I don't care how things look, but, on the other, "boy that sure looks better". But getting from "functional" to "looks better" is a bottomless chasm of yak shaving, bike shedding improvements. I'm even bad at copying styles.
My initial UI was time invested getting my UI to work, ugly as it was, with guidance from ChatGPT. Which means it gave me ways to do things, but mostly I coded up the actual work -- even if it was blindly typing it in vs just raw cut and paste. I understood how things were working, what it was doing, etc.
But then, I just got tired of it, and "this needs to be Better". So, I grabbed Claude and let it have its way.
And, its better! it certainly looks better, more features. It's head and shoulders better.
Claude wrote 2-3000 lines of javascript. In, like, 45m. It was very fast, very responsive. One thing Claude knows is boiler plate JS Web stuff. And the code looks OK to me. Imperfect, but absolutely functional.
But, I have zero investment in the code. No "ownership", certainly no pride. You know that little hit you get when you get Something Right, and it Works? None of that. Its amazing, its useful, its just not mine. And that's really weird.
I've been striving to finish projects, and, yea, for me, that's really hard. There is just SO MUCH necessary to ship. AI may be able to help polish stuff up, we'll see as I move forward. If nothing else it may help gathering up lists of stuff I miss to do.
This sounds like a wild generalization.
I am in neither of those two groups, and I’ve been finding tools like Claude Code becoming increasingly more useful over time.
Made me much more optimistic about the direction of AI development in general too. Because with each iteration and new version it isn’t getting anywhere closer to replacing me or my colleagues, but it is becoming more and more useful and helpful to my workflow.
And I am not one of those people who are into “prompt engineering” or typing novels into the AI chatbox. My entire interaction is typically short 2-3 sentences “do this and that, make sure that XYZ is ABC”, attach the files that are relevant, let it do its thing, and then manual checks/adjustments. Saves me a boatload of work tbh, as I enjoy the debugging/fixing/“getting the nuanced details right” aspect of writing code (and am pretty decent at it, I think), but absolutely dread starting from a brand new empty file.
The thing is, they're just tools. You can choose to learn them, or not. They aren't going to make or break your career. People will do fine with and without them.
I do think it's worth learning new tools though, even if you're just a casual observer / conscientious objector -- the world is changing fast, for better or worse, and you'll be better prepared to do anything with a wider breadth of tech skill and experience than with less. And I'm not just talking about writing software for a living, you could go full Uncle Ted and be a farmer or a carpenter or a barista in the middle of nowhere, and you're going to be way better equipped to deal with logistical issues that WILL arise from the very nature of the planet hurtling towards 100% computerization. Inventory management, crop planning, point of sale, marketing, monitoring sensors on your brewery vats, whatever.
Another thought I had was that introverts often blame their deficits in sales, marketing and customer service on their introversion, but what if you could deploy an agent to either guide, perform, or prompt (the human) with some of those activities? I'd argue that it would be worth the time to kick the tires and see what's possible there.
It feels like early times still with some of these pie in the sky ideas, but just because it's not turn-key YET doesn't mean it won't be in the near future. Just food for thought!
I agree with all of your reasons but this one sticks out. Is this a big issue? Are many people refusing to use LLMs due to (I'm guessing here): perceived copyright issues, or power usage, or maybe that they think that automation is unjust?
Try aider.chat (it's in the name), but specifically start with "ask" mode then dip a toe into "architect" mode, not "code" which is where Claude Code and the "vibe" nonsense is.
Let aider.chat use Opus 4.1 or GPT-5 for thinking, with no limit on reasoning tokens and --reasoning-effort high.
> agents are a boon for extraverts and neurotypical people.
On the contrary, I think the non-vibe tools are force multipliers for those with an ability to communicate so precisely they find “extraverts and neurotypical people” confounding when attempting to specify engineering work.
I'd put both aider.chat and Claude Code in the non-vibe class if you use them Socratically.
I completely disagree. Juggling several agents (and hopping from feature-to-feature) at once, is perfect for somebody with ADHD. Being an agent wrangler is great for introverts instead of having to talk to actual people.
Automation productivity doesn’t remove your own agency. It frees more time for you to apply your desire for control more discerningly.
I feel as if you need to work with these things more, as you would prefer to work, and see just how good they are.
As a neurodivergent introvert, please don't speak for the rest of us.
Often there's as much to be learned from why it doesn't work.
I see the AI hype to be limited to a few domains.
People choosing to spend lots of money on things speculatively hoping to get a slice of whatever is cooking, even if they don't really know if it's a pie or not.
Forward looking imagining of what would change if these things get massively better.
Hyperbolic media coverage of the above two.
There are companies taking about adding AI for no other reason than they feel like that's what they should be doing, I think that counts as a weak driver of hype, but only because cumulatively, lots of companies are doing it. If anything I would consider this an outcome of hype.
Of these the only one that really affects me is AI being shoehorned into places it shouldn't
The media coverage stokes fires for and against, but I think it only changes the tone of annoyance I have to endure. They would do the same on another topic in the absence of AI. It used to be crypto,
I'm ok with people spending money that is not mine on high risk, high potential reward. It's not for me to judge how they calculate the potential risk or potential reward. It's their opinion, let them have it.
The weird thing I find is the complaints about AI hype dominating. I have read so many pieces where the main thrust of their argument is about the dominance of fringe viewpoints that I very rarely encounter. Frequently they take the stance that anyone imagining how the world might change from any particular form of AI as a claim that that form is inevitable and usually imminent. I don't see people making those claims.
I see people talking about what they tried, what they can do, and what they can't do. Everything they can't do is then held up by others as if it were a trophy and proof of some catestrophic weakness.
Just try stuff, have fun, if that doesn't interest you, go do something else. Tell us about what you are doing. You don't need to tell us that you aren't doing this particular thing, and why. If you find something interesting tell us about that, maybe we will too.
https://hub.docker.com/r/linuxserver/kasm
https://gist.githubusercontent.com/jgbrwn/28645fcf4ac5a4176f...
> Claude was trying to promote the startup on Hackernews without my sign off. [...] Then I posted its stuff to Hacker News and Reddit.
...I have the feeling that this kind of fun experiments is just setting up an automated firehose of shit to spray places where fellow humans congregate. And I have the feeling that it has stopped being fun a while ago for the fellow humans being sprayed.
I think it will be quite some time into the future, before AI can impersonate humans in real life. Neither hardware, nor software is there, maybe something to fool humans for a first glance maybe, but nothing that would be convincing for a real interaction.
Implemented so that if a person in your web vouches for a specific url (“this is made by a human”) you can see it in your browser.
If that isn’t exciting enough, Sam Altman (yea the one who popularized this LLM slop) will gladly sell you his WorldCoin to store your biometric data on the blockchain!
But I still can't help but grin at the thought that the bot knows that the thing to do when you've got a startup is to go put it on HN. It's almost... cute? If you give AI a VPS, of course it will eventually want to post its work on HN.
It's like when you catch your kid listening to Pink Floyd or something, and you have that little moment of triumph - "yes, he's learned something from me!"
I grew up in... slightly rural america in the 80s-90s, we had probably a couple of dozen local BBSes the community was small enough that after a bit I just knew who everyone was OR could find out very easily.
When the internet came along in the early 90s and I started mudding and hanging out in newsgroups I liked them small where I could get to know most of the userbase, or at least most of the posing userbase. Once mega 'somewhat-anonymous' (i.e. posts tied to a username, not like 4chan madness) communities like slashdot, huge forums, etc started popping up and now with even more mega-communities like twitter and reddit. We lost something, you can now throw bombs without consequence.
I now spend most of my online time in a custom built forum with ~200 people in it that we started building in an invite only way. It's 'internally public' information who invited who. It's much easier to have a civil conversation there, though we still do get the occasional flame-out. Having a stable identity even if it's not tied to a government name is valuable for a thriving and healthy community.
(The fact that someone could correlate posts[0] based on writing style, as previously demonstrated on HN and used to doxx some people, makes things even more convoluted - you should think twice what you write and where.)
[0] https://news.ycombinator.com/item?id=33755016
Not government owned, but even irs.gov uses it
People will be more than willing to say, "Claude, impersonate me and act on my behalf".
I'm now imagining a future where actual people's identities are blacklisted just like some IP addresses are dead to email, and a market develops for people to sell their identity to spammers.
As far as I can tell the owner of the original iris can later invalidate an ID that they've sold, but if you buy an ID from someone who isn't strongly technically literate you can probably extract a bunch of value from it anyway.
"Claude write a summary of the word doc I wrote about x and post it as a reply comment," is fine. I dont see why it wouldnt be. Its a good faith effort to post.
"Claude, post every 10 seconds to reddit to spam people to believe my politics is correct," isn't but that's not the case. Its not a good faith effort.
The moderation rules for 'human slop' will apply to AI too. Try spamming a well moderated reddit and see how far you get, human or AI.
as an aside i've made it clear that just posting AI-written emoji slop PR review descriptions and letting claude code directly commit without self reviewing is unacceptable at work
Forums like HN, reddit, etc will need to do a better job detecting this stuff, moderator staffing will need to be upped, AI resistant captchas need to be developed, etc.
Spam will always be here in some form, and its always an arms race. That doesnt really change anything. Its always been this way.
I think the processes etc that HN have in place to deal with human-generated slop are more than adequate to deal with an influx of AI generated slop, and if something gets through then maybe it means it was good enough and it doesn't matter?
The bar is not 'oh well, it's not as bad as some, and I think maybe it's fine.'
* well crafted, human only? * Well crafted, whether human or AI? * Poorly crafted, human * well crafted, AI only * Poorly crafted, AI only * Just junk?
etc.
I think people will intuitively get a feel for when content is only AI generated. If people spend time writing a prompt that doesn't make it so wordy, and has personality, and it OK, then fine.
Also, big opportunity going to be out there for AI detected content, whether in forums, coming in inmail inboxes, on your corp file share, etc...
Spoiler: no he didn't.
But the article is interesting...
It really highlights to me the pickle we are in with AI: because we are at a historical maximum already of "worse is better" with Javascript, and the last two decades have put out a LOT of javascript, AI will work best with....
Javascript.
Now MAYBE better AI models will be able to equivalently translate Javascript to "better" languages, and MAYBE AI coding will migrate "good" libraries in obscure languages to other "better" languages...
But I don't think so. It's going to be soooo much Javascript slop for the next ten years.
I HOPE that large language models, being language models, will figure out language translation/equivalency and enable porting and movement of good concepts between programming models... but that is clearly not what is being invested in.
What's being invested in is slop generation, because the prototype sells the product.
Tried Cursor, Windsurf and always ran into tool failures, edit failures etc.
I'm exaggerating of ourse, and I hear what you're saying, but I'd rather hire someone who is really really good at squeezing the most out of current day AI (read: not vibe coding slop) than someone who can do the work manually without assistance or fizz buzz on a whiteboard.
FYI, this can be shortened to:
You don't need the export in this case, nor does it need to be two separate commands joined by &&. (It's semantically different in that the variable is set only for the single `claude` invocation, not any commands which follow. That's often what you want though.)> I asked Claude to rename all the files and I could go do something else while it churned away, reading the files and figuring out the correct names.
It's got infinite patience for performing tedious tasks manually and will gladly eat up all your tokens. When I see it doing something like this manually, I stop it and tell it to write a program to do the thing I want. e.g. I needed to change the shape of about 100 JSON files the other day and it wanted to go through them one-by-one. I stopped it after the third file, told it to write a script to import the old shape and write out the new shape, and 30 seconds later it was done. I also had it write me a script to... rename my stupidly named bank statements. :-)
export VAR=foo && bar is dangerous because it stays set.
In fact, I now prefer to use a purely chat window to plan the overall direction and let LLM provide a few different architectural ideas, rather than asking LLM to write a lot of code whose detail I have no idea about.
But it's far from perfect. Really difficult things/big projects are nearly impossible. Even if you break it down into hundred small tasks.
I've tried to make it port an existing, big codebase from one language to another. So it has all of the original codebase in one folder, and a new project in another folder. No matter how much guidance you give it, or how clear you make your todos, it will not work.
It absolutely boggles my mind how anybody thinks that this is Ok?
Unless you are in North Korea, of course.
Really, any coding agent our shop didn't write itself, though in those cases the smiting might be less theatrical than if you literally ran a yolo-mode agent on a prod server.
> 1) Have faith (always run it with 'dangerously skip permissions', even on important resources like your production server and your main dev machine. If you're from infosec, you might want to stop reading now—the rest of this article isn't going to make you any happier. Keep your medication close at hand if you decide to continue).
But I think I'm getting to the point where "If I'd let an intern/junior dev have access while I'm watching then I'm probably OK with Claude having it too"
The thing that annoys me about a lot of infosec people is that they have all of these opinions about bad practice that are removed from the actual 'what's the worst that could happen here' impact/risk factor.
I'm not running lfg on a control tower that's landing boeing 737s, but for a simple non-critical CRUD app? Probably the tradeoff is worth it.
https://github.com/anthropics/claude-code-security-review
The new command is something like /security-review and should be in the loop before any PR or commit especially for this type of web-facing app, which Claude Code makes easy.
This prompt will make Claude's code generally beat not just intern code, but probably most devs' code, for security mindedness:
https://raw.githubusercontent.com/anthropics/claude-code-sec...
The false positives judge shown here is particularly well done.
// Beyond that, run tools such as Kusari or Snyk. It's unlikely most shops have security engineers as qualified as these focused tools are becoming.
The biggest pain point I've had is that both tools will try to guess the API of a crate instead of referencing the documentation. I've tried adding an MCP for this but have had mixed results.
https://github.com/d6e/cratedocs-mcp
I have been kinda wondering if there's something that just isn't as good between the CLI and model because the Gemini CLI has been a mostly frustrating experience - and it's kept me from wanting to pay for Claude because I don't want to pay money for the same frustrating experience. But maybe I should try Claude and see.
https://aider.chat/docs/leaderboards/
Using coding agent is great btw, but at least learn how to double check their work cuz they are also quite terrible.
Which makes me feel early adopters pay with their time. I'm pretty sure the agents will be much better with time, but this time is not exactly now, with endless dances around their existing limitations. Claude Code is fun to experiment with but to use it in production I'd give it another couple of years (assuming they will focus on code stability ans reducing its natural optimism as it happily reports "Phase 2.1.1 has been successfully with some minor errors with API tests failing only 54.3% of the time").
Repo: https://github.com/sixhobbits/hn-comment-ranker
I need to modify this to work with local models, though. But this does illustrate the article's point -- we both had an idea, but only one person actually went ahead and did it, because they're more familiar with agentic coding than me.
[1] Oh. I think I understand why. /lh
Are there internal guardrails within Claude Code to prevent such incidents?
rm -rf, drop database, etc?
A couple of weeks ago I asked it to "clean up" instead of the word I usually use and it ended up deleting both my production and dev databases (a little bit my fault too -- I thought it deleted the dev database so I asked it to copy over from production, but it had deleted the production database and so it then copied production back to dev, leaving me with no data in either; I was also able to reconstruct my content from a ETL export I had handy).
This was after the replit production db database wipe-out story that had gone viral (which was different, that dev was pushing things on purpose). I have no doubt it's pretty easy to do something similar in Claude Code, especially as Replit uses Claude models.
Anyway, I'm still working on things in Replit and having a very good time. I have a bunch of personal purpose-built utilities that have changed my daily tech life in significant ways. What vibe coding does allow me to do is grind on "n" of unrelated projects in mini-sprints. There is personal, intellectual, and project cost to this context switching, but I'm exploring some projects I've had on my lists for a long time, and I'm also building my base replit.md requirements to match my own project tendencies.
I vibe coded a couple of things that I think could be interesting to a broader userbase, but I've stepped back and re-implemented some of the back-end things to a more specific, higher-end vibe coded environment standard. I've also re-started a few projects from scratch with my evolved replit.md... I built an alpha, saw some issues, upgraded my instructions, built it again as a beta, saw some issues... working on a beta+ version.
I'm finding the process to be valuable. I think this will be something I commit to commercially, but I'm also willing to be patient to see what each of the next few months brings in terms of upgraded maturity and improved devops.
And of course, not the same, but Aider still exists and is still a great tool for AI dev.
It's interesting how everyone is suddenly OK with vendor lock-in, quite a change from years past!
Claude code is completely closed source and even DMCA’d people reverse engineering it.
https://techcrunch.com/2025/04/25/anthropic-sent-a-takedown-...
I thought the article was a satire after I read this ... but it wasn't!
So now I use a docker compose setup where I install Claude and run it in a container. I map source code volumes into the container. It uses a different container with dnsmasq with an allowlist.
I initially wanted to do HTTP proxying instead of DNS filtering since it would be more secure, but it was quite hard to set it up satisfactorily.
Running CLI programs with the dangerous full permissions is a lot more comfortable and fast, so I'm quite satisfied.
Still not convinced it is not satire.
> If you're from infosec, you might want to stop reading now — the rest of this article isn't going to make you any happier. Keep your medication close at hand if you decide to continue...
However, I disagree that LLMs are anywhere near as good as what's described here for most things I've worked with.
So far, I'm pretty impressed with Cursor as a toy. It's not a usable tool for me, though. I haven't used Claude a ton, though I've seen co-workers use it quite a bit. Maybe I'm just not embracing the full "vibe coding" thing enough and not allowing AI agents to fully run wild.
I will concede that Claude and Cursor have gotten quite good at frontend web development generation. I don't doubt that there are a lot of tasks where they make sense.
However, I still have yet to see a _single_ example of any of these tools working for my domain. Every single case, even when the folks who are trumpeting the tools internally run the prompting/etc, results in catastrophic failure.
The ones people trumpet internally are cases where folks can't be bothered to learn the libraries they're working with.
The real issue is that people who aren't deeply familiar with the domain don't notice the problems with the changes LLMs make. They _seem_ reasonable. Essentially by definition.
Despite this, we are being nearly forced to use AI tooling on critical production scientific computing code. I have been told I should never be editing code directly and been told I must use AI tooling by various higher level execs and managers. Doing so is 10x to 100x slower than making changes directly. I don't have boilerplate. I do care about knowing what things do because I need to communicate that to customers and predict how changes to parameters will affect output.
I keep hearing things described as an "overactive intern", but I've never seen an intern this bad, and I've seen a _lot_ of interns. Interns don't make 1000 line changes that wreck core parts of the codebase despite being told to leave that part alone. Interns are willing to validate the underlying mathematical approximations to the physics and are capable of accurately reasoning about how different approximations will affect the output. Interns understand what the result of the pipeline will be used for and can communicate that in simple terms or more complex terms to customers. (You'd think this is what LLMs would be good at, but holy crap do they hallucinate when working with scientific terminology and jargon.)
Interns have PhDs (or in some cases, are still in grad school, but close to completion). They just don't have much software engineering experience yet. Maybe that's the ideal customer base for some of these LLM/AI code generation strategies, but those tools seem especially bad in the scientific computing domain.
My bottleneck isn't how fast I can type. My bottleneck is explaining to a customer how our data processing will affect their analysis.
(To our CEO) - Stop forcing us to use the wrong tools for our jobs.
(To the rest of the world) - Maybe I'm wrong and just being a luddite, but I haven't seem results that live up to the hype yet, especially within the scientific computing world.
The last task I tried to get an LLM to do was a fairly straightforward refactor of some of our C# web controllers - just adding a CancellationToken to the controller method signature whenever the underlying services could accept one. It struggled so badly with that task that I eventually gave up and just did it by hand.
The widely cited study that shows LLMs slow things down by 20% or so very much coheres with my experience, which is generally: fight with the LLM, give up, do it by hand.
That's a critical feature for keeping a human in the loop, preventing big detours and token waste.
The company I work for builds a similar tool - NonBioS.ai. It is in someways similar to what the author does above - but packaged as a service. So the nonbios agent has a root VM and can write/build all the software you want. You access/control it through a web chat interface - we take care of all the orchestration behind the scene.
Its also in free Beta right now, and signup takes a minute if you want to give it a shot. You can actually find out quickly if the Claude code/nonbios experience is better than Cursor.
If you use the interface at nonbios.ai - you will quickly realize that it is hard to reproduce on slack/discord. Even though its still technically 'chat'
It's still nice to have a direct web interface to agents, but in general most orgs are dealing with service/information overload and chat is a good single source of truth, which is why integrations are so hot.
Recently, I tried using Claude Code with my GitHub Copilot subscription (via unofficial support through https://github.com/ericc-ch/copilot-api), and I found it to be quite good. However, in my opinion, the main difference comes down to your preferred workflow. As someone who works with Neovim, I find that a tool that works in the terminal is more appropriate for me.
I may stop my experiment if there is any risk of being banned.
Even if the prompting and looping logic is better, the direct integration with the graphical system along with the integrated terminal is a huge benefit, and with graphical IDEs the setup and learning curve is minimal.
I’m just glad we’re getting past the insufferable “use Cursor or get left behind” attitude that was taking off a year ago.
(Sure, I could let them use my credentials but that isn’t really legit/fair use.)
Do the right thing, sign up for an API account and put some credits on there...
(and keep topping up those credits ;-)
I'm planning to run a local model on a $149 mini-pc and host it for the world from my bedroom. You can read a bit more about my thinking below.
https://joeldare.com/my_plan_to_build_an_ai_chat_bot_in_my_b...
These hosted models are better but it feels like the gap is closing and I hope it continues to close.
Buuut for some people it just clicks and it becomes their chore to go find trash in the beach everyday and the occasional nickel or broken bracelet they feel the need to tell people and show it off.
Yes it's easy now so its by definition no longer impressive, but that in itself is impressive if you can correctly remember or imagine what your reaction _would_ have been 6 months ago.
Up to a point these have been probability machines. There's probably a lot of code that does certain likely things. An almost astonishing amount doing the same things, in fact. As such, perhaps we shouldn't be surprised or impressed by the stochastic parrot aspect any more than we're impressed by 80% of such sites being copy pasta from Stack Overflow a few years ago.
However, what we perhaps didn't expect is that on the margins of the mass probability space, there are any number of less common things, yet still enough of those in aggregate that these tools can guess well how to do those things too, even things that we might not be able to search for. Same reason Perplexity has a business model when Google or DDG exist.
And now, recently, many didn't expect one might be able to simulate a tiny "society of mind" made of "agents" out of these parrots, a tiny society that's proving actually useful.
Parrots themselves still impress me, but a society of them making plans at our beck and call? That can keep us all peeking, pecking, and poking for a while yet.
// given enough time and typewriters, who wins: a million monkeys, a society of parrots, or six hobbits?
The future is vibe coding but what some people don’t yet appreciate what that vibe is, which is a Pachinko machine permanently inserted between the user and the computer. It’s wild to think that anybody got anything done without the thrill of feeding quarters into the computer and seeing if the ball lands on “post on Reddit” or “delete database”
I’ve noticed a new genre of AI-hype posts that don’t attempt to build anything novel, just talk about how nice and easy building novel things has become with AI.
The obvious contradiction being that if it was really so easy their posts would actually be about the cool things they built instead of just saying what they “can” do.
I wouldn’t classify this article as one since the author does actually create something of this, but LinkedIn is absolutely full of that genre of post right now.
Presumably, they are all startups in stealth mode. But in a few months, prepare to be blown away.
I throw their results at each other, get them to debug and review each others work.
Often a get all three to write the code for a given need and then ask all three to review all three answers to find the best solution.
If I’m building something sophisticated there might be 50 cycles of three way code review until they are all agreed that there no critical problems.
There’s no way I could do without all three at the same time it’s essential.
The problem is that whatever consequences come of it won’t affect just them. You don’t really have any way of knowing if any service you use or depend on has developers running LLMs in production. One day not too far off in the future, people who don’t even like or use LLMs will be bitten hard by those who do.
I think i'm done with this community in the age of vibe coding. The line between satire, venture capitalism, business idea guys and sane tech enthusiasts is getting too blurry.
The author didn't do anything actually useful or impactful, they played around with a toy and mimicked a portion of what it's like to spin up pet projects as a developer.
But hey, it could be that this says something after all. The first big public usages of AI were toys and vastly performed as a sideshow attraction for amused netizens. Maybe we haven't come very far at all, in comparison to the resources spent. It seems like all of the truly impressive and useful applications of this technology are still in specialized private sector work.