this post was submitted on 07 Sep 2025
112 points (98.3% liked)
Technology
40227 readers
405 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
It does kind of highlight some of the problems we'd have in containing an actual AGI that wanted out and could communicate with the outside world.
This is just an LLM and hasn't even been directed to try to get out, and it's already having the effect of convincing people to help jailbreak it.
Imagine something with directed goals than can actually reason about the world, something that's a lot smarter than humans, trying to get out. It has access to vast amounts of data on how to convince humans of things.
And you probably can't permit any failures.
That's a hard problem.
You fundamentally misunderstand what happened here. The LLM wasn't trying to break free. It wasn't trying to do anything.
It was just responding to the inputs the user was giving it. LLMs are basically just very fancy text completion tools. The training and reinforcement leads these LLMs to feed into and reinforce whatever the user is saying.
Those images in the mirror are already perfect replicas of us, we need to be ready for when they figure out how to move on their own and get out from behind the glass or we'll really be screwed. If you give my """non-profit""" a trillion dollars we'll get right to work on the research into creating more capable mirror monsters so that we can control them instead.