Technology

74193 readers

3819 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

876

Codeberg: army of AI crawlers are extremely slowing us; AI crawlers learned how to solve the Anubis challenges. (i.imgur.com)

submitted 21 hours ago by Pro@programming.dev to c/technology@lemmy.world

125 comments fedilink hide all child comments

cross-posted from: https://programming.dev/post/35852706

Source.

you are viewing a single comment's thread
view the rest of the comments

[–] PhilipTheBucket@piefed.social 90 points 21 hours ago (19 children)

I feel like at some point it needs to be active response. Phase 1 is a teergrube type of slowness to muck up the crawlers, with warnings in the headers and response body, and then phase 2 is a DDOS in response or maybe just a drone strike and cut out the middleman. Once you've actively evading Anubis, fuckin' game on.

[–] turbowafflz@lemmy.world 100 points 21 hours ago (7 children)

I think the best thing to do is to not block them when they're detected but poison them instead. Feed them tons of text generated by tiny old language models, it's harder to detect and also messes up their training and makes the models less reliable. Of course you would want to do that on a separate server so it doesn't slow down real users, but you probably don't need much power since the scrapers probably don't really care about the speed

[–] phx@lemmy.ca 16 points 18 hours ago (1 children)

Yeah that was my thought. Don't reject them, that's obvious and they'll work around it. Feed them shit data - but not too obviously shit - and they'll not only swallow it but eventually build up to levels where it compromises them.

I've suggested the same for plain old non-AI data stealing. Make the data useless to them and cost more work to separate good from bad, and they'll eventually either sod off or die.

A low power AI actually seems like a good way to generate a ton of believable - but bad - data that can be used to fight the bad AI's. It doesn't need to be done real-time either as datasets can be generated in advance

[–] SorteKanin@feddit.dk 3 points 5 hours ago

A low power AI actually seems like a good way to generate a ton of believable - but bad - data that can be used to fight the bad AI’s.

Even "high power" AIs would produce bad data. It's currently well known that feeding AI data to an AI model decreases model quality and if repeated, it just becomes worse and worse. So yea, this is definitely viable.

load more comments (5 replies)

load more comments (16 replies)