They get 85% on the last benchmark, this one was specifically designed to stump them, when the last one came out everyone said the same things as this go around.
will anyone be retracting their statements when they get to 85% on this one?
This is a most excellent place for technology news and articles.
They get 85% on the last benchmark, this one was specifically designed to stump them, when the last one came out everyone said the same things as this go around.
will anyone be retracting their statements when they get to 85% on this one?
"Leaderboard" where rank one scores 0.3% success lol
AI code is prewritten and is unable to edit that. Humans edit their "code" every second
It's funny because that means something like freaking Neurosama made by a YouTuber could probably do better at AGI than these multi billion dollar companies due to it being designed so it can modify it's own code depending on the task given (and at one point, doing so while not directly prompted).
Of course, this makes Neurosama completely useless at work focused tasks outside of coding, because it can and does refuse to do things on purpose.
And that's exactly why you won't see AGI coming from any huge business corporation - because they're trying to make something that replaces workers, rather than something that has no direct purpose.
(Disclaimer - this is not to say Neurosama is AGI in any way, just that it could probably do the tasks much better than the mainstream AIs can, because it has been build with flexibility and adaptability in mind.)