Technology

77084 readers

2905 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

1073

I Went All-In on AI. The MIT Study Is Right. (open.substack.com)

submitted 3 days ago* (last edited 3 days ago) by AutistoMephisto@lemmy.world to c/technology@lemmy.world

296 comments fedilink hide all child comments

Just want to clarify, this is not my Substack, I'm just sharing this because I found it insightful.

The author describes himself as a "fractional CTO"(no clue what that means, don't ask me) and advisor. His clients asked him how they could leverage AI. He decided to experience it for himself. From the author(emphasis mine):

I forced myself to use Claude Code exclusively to build a product. Three months. Not a single line of code written by me. I wanted to experience what my clients were considering—100% AI adoption. I needed to know firsthand why that 95% failure rate exists.

I got the product launched. It worked. I was proud of what I’d created. Then came the moment that validated every concern in that MIT study: I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.

Now when clients ask me about AI adoption, I can tell them exactly what 100% looks like: it looks like failure. Not immediate failure—that’s the trap. Initial metrics look great. You ship faster. You feel productive. Then three months later, you realize nobody actually understands what you’ve built.

you are viewing a single comment's thread
view the rest of the comments

[–] Agent641@lemmy.world 48 points 2 days ago (2 children)

I cannot understand and debug code written by AI. But I also cannot understand and debug code written by me.

Let's just call it even.

[–] MangoCats@feddit.it 1 points 1 day ago

I also cannot understand and debug code written by me.

So much this. I look back at stuff I wrote 10 years ago and shake my head, console myself that "we were on a really aggressive schedule." At least in my mind I can do better, in practice the stuff has got to ship eventually and what ships is almost never what I would call perfect, or even ideal.

[–] ICastFist@programming.dev 8 points 2 days ago (1 children)

At least you can blame yourself for your own shitty code, which hopefully will never attempt to "accidentally" erase the entire project

[–] PoliteDudeInTheMood@lemmy.ca 0 points 2 days ago (1 children)

I don't know how that happens, I regularly use Claude code and it's constantly reminding me to push to git.

[–] MangoCats@feddit.it 1 points 1 day ago* (last edited 1 day ago) (1 children)

As an experiment I asked Claude to manage my git commits, it wrote the messages, kept a log, archived excess documentation, and worked really well for about 2 weeks. Then, as the project got larger, the commit process was taking longer and longer to execute. I finally pulled the plug when the automated commit process - which had performed flawlessly for dozens of commits and archives, accidentally irretrievably lost a batch of work - messed up the archive process and deleted it without archiving it first, didn't commit it either.

AI/LLM workflows are non-deterministic. This means: they make mistakes. If you want something reliable, scalable, repeatable, have the AI write you code to do it deterministically as a tool, not as a workflow. Of course, deterministic tools can't do things like summarize the content of a commit.

[–] PoliteDudeInTheMood@lemmy.ca 1 points 1 day ago (1 children)

The longer the project the more stupid Claude gets. I've seen it both in chat, and in Claude code, and Claude explains the situation quite well:

Increased cognitive load: Longer projects have more state to track - more files, more interconnected components, more conventions established earlier. Each decision I make needs to consider all of this, and the probability of overlooking something increases with complexity.

Git specifically: For git operations, the problem is even worse because git state is highly sequential - each operation depends on the exact current state of the repository. If I lose track of what branch we're on, what's been committed, or what files exist, I'll give incorrect commands.

Anything I do with Claude. I will split into different chats, I won't give it access to git but I will provide it an updated repository via Repomix. I get much better results because of that.

[–] MangoCats@feddit.it 1 points 1 day ago

Yeah, context management is one big key. The "compacting conversation" hack is a good one, you can continue conversations indefinitely, but after each compact it will throw away some context that you thought was valuable.

The best explanation I have heard for the current limitations is that there is a "context sweet spot" for Opus 4.5 that's somewhere short of 200,000 tokens. As your context window gets filled above 100,000 tokens, at some point you're at "optimal understanding" of whatever is in there, then as you continue on toward 200,000 tokens the hallucinations start to increase. As a hack, they "compact the conversation" and throw out less useful tokens getting you back to the "essential core" of what you were discussing before, so you can continue to feed it new prompts and get new reactions with a lower hallucination rate, but with that lower hallucination rate also comes a lower comprehension of what you said before the compacting event(s).

Some describe an aspect of this as the "lost in the middle" phenomenon since the compacting event tends to hang on to the very beginning and very end of the context window more aggressively than the middle, so more "middle of the window" content gets dropped during a compacting event.