this post was submitted on 07 Dec 2025

1073 points (98.1% liked)

Technology

77084 readers

2905 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

1073

I Went All-In on AI. The MIT Study Is Right. (open.substack.com)

submitted 3 days ago* (last edited 3 days ago) by AutistoMephisto@lemmy.world to c/technology@lemmy.world

296 comments fedilink hide all child comments

Just want to clarify, this is not my Substack, I'm just sharing this because I found it insightful.

The author describes himself as a "fractional CTO"(no clue what that means, don't ask me) and advisor. His clients asked him how they could leverage AI. He decided to experience it for himself. From the author(emphasis mine):

I forced myself to use Claude Code exclusively to build a product. Three months. Not a single line of code written by me. I wanted to experience what my clients were considering—100% AI adoption. I needed to know firsthand why that 95% failure rate exists.

I got the product launched. It worked. I was proud of what I’d created. Then came the moment that validated every concern in that MIT study: I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.

Now when clients ask me about AI adoption, I can tell them exactly what 100% looks like: it looks like failure. Not immediate failure—that’s the trap. Initial metrics look great. You ship faster. You feel productive. Then three months later, you realize nobody actually understands what you’ve built.

you are viewing a single comment's thread
view the rest of the comments

[–] Evotech@lemmy.world 24 points 2 days ago (2 children)

Just ask the ai to make the change?

[–] theneverfox@pawb.social 21 points 2 days ago (3 children)

AI isn't good at changing code, or really even understanding it... It's good at writing it, ideally 50-250 lines at a time

[–] Evotech@lemmy.world 6 points 2 days ago* (last edited 2 days ago) (1 children)

I'm just not following the mindset of "get ai to code your whole program" and then have real people maintain it? Sounds counter productive

I think you need to make your code for an Ai to maintain. Use Static code analysers like SonarQube to ensure that the code is maintainable (cognitive complexity)!and that functions are small and well defined as you write it.

[–] theneverfox@pawb.social 8 points 1 day ago (1 children)

I don't think we should be having the AI write the program in the first place. I think we're barreling towards a place where remotely complicated software becomes a lost technology

I don't mind if AI helps here and there, I certainly use it. But it's not good at custom fit solutions, and the world currently runs on custom fit solutions

AI is like no code solutions. Yeah, it's powerful, easier to learn and you can do a lot with it... But eventually you will hit a limit. You'll need to do something the system can't do, or something you can't make the system do because no one properly understands what you've built

At the end of the day, coding is a skill. If no one is building the required experience to work with complex systems, we're going to be swimming in a world of endless ocean of vibe coded legacy apps in a decade

I just don't buy that AI will be able to take something like a set of State regulations and build a complaint outcome. Most of our base digital infrastructure is like that, or it uses obscure ancient systems that LLMs are basically allergic to working with

To me, we're risking everything on achieving AGI (and using it responsibly) before we run out of skilled workers, and we're several game changing breakthroughs from achieving that

[–] MangoCats@feddit.it 0 points 1 day ago (1 children)

I think we’re barreling towards a place where remotely complicated software becomes a lost technology

I think complicated software has been an art more than a science, for the past 30 years we have been developing formal processes to make it more of a procedural pursuit but the art is still very much in there.

I think if AI authored software is going to reach any level of valuable complexity, it's going to get there with the best of our current formal processes plus some more that are being (rapidly) developed specifically for LLM based tools.

But eventually you will hit a limit. You’ll need to do something...

And how do we surpass those limits? Generally: research. And for the past 20+ years where do we do most of that research? On the internet. And where were the LLMs trained, and what are they relatively good at doing quickly? Internet research.

At the end of the day, coding is a skill. If no one is building the required experience to work with complex systems

So is semiconductor design, application of transistors to implement logic gates, etc. We still have people who can do that, not very many, but enough. Not many people work in assembly language anymore, either...

[–] theneverfox@pawb.social 1 points 18 hours ago

So is semiconductor design, application of transistors to implement logic gates, etc. We still have people who can do that, not very many, but enough. Not many people work in assembly language anymore, either...

Yeah, that's a lost tech. We still use the same decades, even century old, frameworks

They're not perfect. But they are unchangeable. We no longer have the skills to adapt them to modern technology. Improvements are incremental, despite decades of effort you still can't reliably run a system on something like RISK.

[–] MangoCats@feddit.it 0 points 1 day ago (1 children)

It’s good at writing it, ideally 50-250 lines at a time

I find Claude Sonnet 4.5 to be good up to 800 lines at a chunk. If you structure your project into 800ish line chunks with well defined interfaces you can get 8 to 10 chunks working cooperatively pretty easily. Beyond about 2000 lines in a chunk, if it's not well defined, yeah - the hallucinations start to become seriously problematic.

The new Opus 4.5 may have a higher complexity limit, I haven't really worked with it enough to characterize... I do find Opus 4.5 to get much slower than Sonnet 4.5 was for similar problems.

[–] theneverfox@pawb.social 1 points 20 hours ago

Okay, but if it's writing 800 lines at once, it's making design choices. Which is all well and good for a one off, but it will make those choices, make them a different way each time, and it will name everything in a very generic or very eccentric way

The AI can't remember how it did it, or how it does things. You can do a lot... Even stuff that hasn't entered commercial products like vectorized data stores to catalog and remind the LLM of key details when appropriate

2000 lines is nothing. My main project is well over a million lines, and the original author and I have to meet up to discuss how things flow through the system before changing it to meet the latest needs

But we can and do it to meet the needs of the customer, with high stakes, because we wrote it. These days we use AI to do grunt work, we have junior devs who do smaller tweaks.

If an AI is writing code a thousand lines at a time, no one knows how it works. The AI sure as hell doesn't. If it's 200 lines at a time, maybe we don't know details, but the decisions and the flow were decided by a person who understands the full picture

[+] lepinkainen@lemmy.world -9 points 2 days ago (4 children)

I’ve made full-ass changes on existing codebases with Claude

It’s a skill you can learn, pretty close to how you’d work with actual humans

[–] MangoCats@feddit.it 2 points 1 day ago

pretty close to how you’d work with actual humans

That has been my experience as well. It's like working with humans who have extremely fast splinter skills, things they can rip through in 10 minutes that might take you days, weeks even. But then it also takes 5-10 minutes to do some things that you might accomplish in 20 seconds. And, like people, it's not 100% reliable or accurate, so you need to use all those same processes we have developed to help people catch their mistakes.

[–] TheBlackLounge@lemmy.zip 6 points 2 days ago (2 children)

What full ass changes have you made that can't be done better with a refactoring tool?

I believe Claude will accept the task. I've been fixing edge cases in a vibe colleague's full-ass change all month. Would have taken less time to just do it right the first time.

[–] lepinkainen@lemmy.world 1 points 9 hours ago

I just did three tasks purely with Claude - at work.

All were pretty much me pasting the Linear ticket to Claude and hitting go. One got some improvement ideas on the PR so I said “implement the comments from PR 420” and so it did.

These were all on a codebase I haven’t seen before.

The magic sauce is that I’ve been doing this for a quarter century and I’m pretty good at reading code and I know if something smells like shit code or not. I’m not just YOLOing the commits to a PR without reading first, but I save a ton of time when I don’t need to do the grunt work of passing a variable through 10 layers of enterprise code.

[–] MangoCats@feddit.it 1 points 1 day ago

True that LLMs will accept almost any task, whether they should or not. True that their solutions aren't 100% perfect every time. Whether it's faster to use them or not I think depends a lot on what's being done, and what alternative set of developers you're comparing them with.

What I have seen across the past year is that the number of cases where LLM based coding tools are faster than traditional developers has been increasing, rather dramatically. I called them near useless this time last year.

[–] desertdruid@lemmy.blahaj.zone 4 points 2 days ago (1 children)

It's a skill this "fractional CTO" lacks

[–] lepinkainen@lemmy.world 1 points 9 hours ago

Definitely

[–] demonsword@lemmy.world 1 points 2 days ago

full-ass (...) with Claude

heh

[–] BarneyPiccolo@lemmy.today 11 points 2 days ago (2 children)

I don't know shit about anything, but it seems to me that the AI already thought it gave you the best answer, so going back to the problem for a proper answer is probably not going to work. But I'd try it anyway, because what do you have to lose?

Unless it gets pissed off at being questioned, and destroys the world. I've seen more than few movies about that.

[–] MangoCats@feddit.it 3 points 1 day ago

AI already thought it gave you the best answer, so going back to the problem for a proper answer is probably not going to work.

There's an LLM concept/parameter called "temperature" that determines basically how random the answer is.

As deployed, LLMs like Claude Sonnet or Opus have a temperature that won't give the same answer every time, and when you combine this with feedback loops that point out failures (like compliers that tell the LLM when its code doesn't compile), the LLM can (and does) the old Beckett: try, fail, try again, fail again, fail better next time - and usually reach a solution that passes all the tests it is aware of.

The problem is: with a context window limit of 200,000 tokens, it's not going to be aware of all the relevant tests in more complex cases.

[–] Evotech@lemmy.world 6 points 2 days ago* (last edited 2 days ago) (2 children)

You are in a way correct. If you keep sending the context of the "conversation" (in the same chat) it will reinforce its previous implementation.

The way ais remember stuff is that you just give it the entire thread of context together with your new question. It's all just text in text out.

But once you start a new conversation (meaning you don't give any previous chat history) it's essentially a "new" ai which didn't know anything about your project.

This will have a new random seed and if you ask that to look for mistakes etc it will happily tell you that the last Implementation was all wrong and here's how to fix it.

It's like a minecraft world, same seed will get you the same map every time. So with AIs it's the same thing ish. start a new conversation or ask a different model (gpt, Google, Claude etc) and it will do things in a new way.

[–] TheBlackLounge@lemmy.zip 10 points 2 days ago (2 children)

Doesn't work. Any semi complex problem with multiple constraints and your team of AIs keeps running circles. Very frustrating if you know it can be done. But what if you're a "fractional CTO" and you get actually contradictory constraints? We haven't gotten yet to AIs who will tell you that what you ask is impossible.

[–] MangoCats@feddit.it 1 points 1 day ago

your team of AIs keeps running circles

Depending on your team of human developers (and managers), they will do the same thing. Granted, most LLMs have a rather extreme sycophancy problem, but humans often do the same.

We haven’t gotten yet to AIs who will tell you that what you ask is impossible.

If it's a problem like under or over-constrained geometry or equations, they (the better ones) will tell you. For difficult programing tasks I have definitely had the AIs bark up all the wrong trees trying to fix something until I gave them specific direction for where to look for a fix (very much like my experiences with some human developers over the years.)

I had a specific task that I was developing in one model, and it was a hard problem but I was making progress and could see the solution was near, then I switched to a different model which did come back and tell me "this is impossible, you're doing it wrong, you must give up this approach" up until I showed it the results I had achieved to-date with the other model, then that same model which told me it was impossible helped me finish the job completely and correctly. A lot like people.

[–] Evotech@lemmy.world 3 points 1 day ago

Yeah right now you have to know what's possible and nudge the ai in the right direction to use the correct approach according to you if you want it to do things in an optimized way

[–] BarneyPiccolo@lemmy.today -1 points 2 days ago (1 children)

Maybe the solution is to keep sending the code through various AI requests, until it either gets polished up, or gains sentience, and destroys the world. 50-50 chance.

This stuff ALWAYS ends up destroying the world on TV.

Seriously, everybody is complaining about the quality of AI product, but the whole point is for this stuff to keep learning and improving. At this stage, we're expecting a kindergartener to product the work of a Harvard professor. Obviously, were going to be disappointed.

But give that kindergartener time to learn and get better, and they'll end up a Harvard professor, too. AI may just need time to grow up.

And frankly, that's my biggest worry. If it can eventually start producing results that are equal or better than most humans, then the Sociopathic Oligarchs won't need worker humans around, wasting money that could be in their bank accounts.

And we know what their solution to that problem will be.

[–] MangoCats@feddit.it 2 points 1 day ago (1 children)

This stuff ALWAYS ends up destroying the world on TV.

TV is also full of infinite free energy sources. In the real world warp drive may be possible, you just need to annihilate the mass of Jupiter with an equivalent mass of antimatter to get the energy necessary to create a warp bubble to move a small ship from the orbit of Pluto to a location a few light years away, but on TV they do it every week.

[–] BarneyPiccolo@lemmy.today 1 points 1 day ago

Sounds like we have a plan, let's get to work. The Cochran Warp Drive isn't going to invent itself.