this post was submitted on 18 Aug 2025
876 points (98.9% liked)

Technology

74193 readers
3819 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] PhilipTheBucket@piefed.social 90 points 21 hours ago (19 children)

I feel like at some point it needs to be active response. Phase 1 is a teergrube type of slowness to muck up the crawlers, with warnings in the headers and response body, and then phase 2 is a DDOS in response or maybe just a drone strike and cut out the middleman. Once you've actively evading Anubis, fuckin' game on.

[–] turbowafflz@lemmy.world 100 points 21 hours ago (7 children)

I think the best thing to do is to not block them when they're detected but poison them instead. Feed them tons of text generated by tiny old language models, it's harder to detect and also messes up their training and makes the models less reliable. Of course you would want to do that on a separate server so it doesn't slow down real users, but you probably don't need much power since the scrapers probably don't really care about the speed

[–] sudo@programming.dev 16 points 19 hours ago* (last edited 19 hours ago) (1 children)

The problem is primarily the resource drain on the server and tarpitting tactics usually increase that resource burden by maintaining the open connections.

[–] SorteKanin@feddit.dk 3 points 5 hours ago (1 children)

The idea is that eventually they would stop scraping you cause the data is bad or huge. But it's a long term thing, it doesn't help in the moment.

[–] Monument@lemmy.sdf.org 1 points 1 hour ago

The promise of money — even diminishing returns — is too great. There's a new scraper spending big on resources every day while websites are under assault.

In the paraphrased words of the finance industry: AI can stay stupid longer than most websites can stay solvent.

load more comments (5 replies)
load more comments (16 replies)