Technology

85468 readers

4807 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

Feds freaked over Fable 5 after simple 'fix this code' prompt, not jailbreak, says researcher (www.theregister.com)

submitted 5 hours ago by sanitation@lemmy.today to c/technology@lemmy.world

16 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Hackworth@piefed.ca 3 points 3 hours ago (3 children)

Did you try it? In the few coding tasks I threw at it, it performed much better than Opus.

[–] ID10T@programming.dev 1 points 58 minutes ago

I played with it at work for the afternoon when I noticed I had access. It was fine. Sure, it was an improvement, but it wasn’t so good that it could end the world. It was basically just more of the same for anyone familiar with coding agents.

[–] rozodru@piefed.world 6 points 3 hours ago (1 children)

I tried it, had to VPN in to do so but I tried it. I gave it 5 tasks, it succeeded in 2 of them, rest were hallucinations. so...yeah...guess it's much better than Opus.

[–] Hackworth@piefed.ca 2 points 2 hours ago (1 children)

rest were hallucinations

I'm having trouble parsing whatcha mean here if they were coding tasks. The code didn't run? Ran but had 0 functionality? If they were non-coding tasks, then agreed, I didn't notice it being significantly more accurate. Though I did appreciate the larger vocab. I wasn't gonna be able to afford to keep using it once it went to API pricing anyway.

[–] rozodru@piefed.world 3 points 2 hours ago

sorry should have been more specific. it was a mix of coding and non-coding. 1 coding task ran fine, another one just didn't work at all. one was a basic walk through tutorial type task that was accurate, the others were hallucinations.

[–] abbadon420@sh.itjust.works 3 points 3 hours ago (1 children)

Of course not. It's offline and my boss only pays for ChatGpt. I have used ChatGpt 5.4 and it's performance is fine. I have not used it for coding, but I did notice it being a bit more coherent. I am am not a poweruser though. I don't work with agents. I'm sure that makes it better, but I'm not willing ti pay for the tokens.

[–] Hackworth@piefed.ca 2 points 3 hours ago (1 children)

I just use the web app, mostly to make self-contained html toys. During the brief period it was up and part of the general subscription, I asked it to make Terrace (the old board game from TNG) with very little help in the initial prompt. I kinda know how involved that task is, cause I manually wrote a Godot version back in '20. It nailed it with only minor fixes - 3D, reactive sound and visuals, a music score that is pretty chill, with Easy, Medium, and Hard levels of AI to compete against. I have yet to beat it on Hard. Opus couldn't touch that. I'm pretty sure the fed's response is simple retaliation against Anthropic for not playing ball with the DoD/W, but the capability jump was definitely notable. I saw someone liken it to the jump from gpt 3.5 to 4, and I agree, if not a bit more.

[–] abbadon420@sh.itjust.works 1 points 2 hours ago

Yeah, that sounds like just another LLM iteration. It's nice and all. Great technological innovationn. Very impressive. But is it wort investing billion upon billions of dollars? Is it worth breaking the chip market? Is it worth breaking the job market? Is it worth (possibly) causing a complete marhet crash when the bubble bursts?