this post was submitted on 18 Aug 2025
984 points (98.9% liked)

Technology

74193 readers
3897 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] PhilipTheBucket@piefed.social 91 points 1 day ago (20 children)

I feel like at some point it needs to be active response. Phase 1 is a teergrube type of slowness to muck up the crawlers, with warnings in the headers and response body, and then phase 2 is a DDOS in response or maybe just a drone strike and cut out the middleman. Once you've actively evading Anubis, fuckin' game on.

[–] traches@sh.itjust.works 14 points 1 day ago (4 children)

These crawlers come from random people’s devices via shady apps. Each request comes from a different IP

[–] AmbitiousProcess@piefed.social 29 points 1 day ago (1 children)

Most of these AI crawlers are from major corporations operating out of datacenters with known IP ranges, which is why they do IP range blocks. That's why in Codeberg's response, they mention that after they fixed the configuration issue that only blocked those IP ranges on non-Anubis routes, the crawling stopped.

For example, OpenAI publishes a list of IP ranges that their crawlers can come from, and also displays user agents for each bot.

Perplexity also publishes IP ranges, but Cloudflare later found them bypassing no-crawl directives with undeclared crawlers. They did use different IPs, but not from "shady apps." Instead, they would simply rotate ASNs, and request a new IP.

The reason they do this is because it is still legal for them to do so. Rotating ASNs and IPs within that ASN is not a crime. However, maliciously utilizing apps installed on people's devices to route network traffic they're unaware of is. It also carries much higher latency, and could even allow for man-in-the-middle attacks, which they clearly don't want.

[–] PhilipTheBucket@piefed.social 13 points 1 day ago

Honestly, man, I get what you're saying, but also at some point all that stuff just becomes someone else's problem.

This is what people forget about the social contract: It goes both ways, it was an agreement for the benefit of all. The old way was that if you had a problem with someone, you showed up at their house with a bat / with some friends. That wasn't really the way, and so we arrived at this deal where no one had to do that, but then people always start to fuck over other people involved in the system thinking that that "no one will show up at my place with a bat, whatever I do" arrangement is a law of nature. It's not.

[–] sudo@programming.dev 3 points 1 day ago

Or your TV or IOT devices. Residential proxies are extremely shady businesses.

[–] PhilipTheBucket@piefed.social 1 points 1 day ago (1 children)

Is that really true? I guess I have no reason to doubt it, I just hadn't heard it before.

[–] sudo@programming.dev 8 points 1 day ago

Here's one example of a proxy provider offering to pay developers to inject their proxies into their apps. ("100% ethical proxies" because they signed a ToS). Another is BrightData proxies traffic through users of their free HolaVPN.

IOT and smart TVs are also obvious suspects.

load more comments (15 replies)