this post was submitted on 21 May 2026
138 points (99.3% liked)

Fuck AI

7069 readers
1263 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] algernon@lemmy.ml 1 points 5 hours ago (1 children)

Unless a significant portion of the internet does this, and we’re talking hundreds of millions of pages, the only cost here is to you.

Fun twist: no! There's a very neat trick you can do when you serve the crawlers poison: you can hide an identifier in the URLs you serve them, and you can then identify that id when they come back riding on the back of remote controlled chromes. By serving them garbage, you can overload their queue with poisoned ones, which helps you block crawlers that you wouldn't otherwise be able to block.

Generating and serving garbage is incredibly cheap (cheaper than serving a file from a filesystem on SSD, in most cases), and once you have requests landing on poisoned URLs, you can firewall them off for a day or so, and reduce your costs even more.

We may not be able to poison the models, but we can poison their crawling queues. I have a year's worth of data to support that. They still haven't caught on.

[–] TheOctonaut@piefed.zip 0 points 5 hours ago (1 children)

They still haven't caught on

I admire the optimism to see it this way and not "it's still not worth it to them to bother blacklisting the domain"

[–] algernon@lemmy.ml 1 points 2 hours ago

I wonder too, why they didn't, because they're happily crawling domains that never had anything but junk on them. To me, that suggests they have no idea they're trapped. Not at crawling time at least.