Fuck AI

7560 readers

1689 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

TrickDacy@lemmy.world

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

167

Aggressive AI scrapers are making it kinda suck to run wikis (weirdgloop.org)

submitted 1 month ago by Itwasntme223@discuss.online to c/fuck_ai@lemmy.world

41 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] algernon@lemmy.ml 4 points 1 month ago (1 children)

Unless a significant portion of the internet does this, and we’re talking hundreds of millions of pages, the only cost here is to you.

Fun twist: no! There's a very neat trick you can do when you serve the crawlers poison: you can hide an identifier in the URLs you serve them, and you can then identify that id when they come back riding on the back of remote controlled chromes. By serving them garbage, you can overload their queue with poisoned ones, which helps you block crawlers that you wouldn't otherwise be able to block.

Generating and serving garbage is incredibly cheap (cheaper than serving a file from a filesystem on SSD, in most cases), and once you have requests landing on poisoned URLs, you can firewall them off for a day or so, and reduce your costs even more.

We may not be able to poison the models, but we can poison their crawling queues. I have a year's worth of data to support that. They still haven't caught on.

[–] TheOctonaut@piefed.zip 0 points 1 month ago (1 children)

They still haven't caught on

I admire the optimism to see it this way and not "it's still not worth it to them to bother blacklisting the domain"

[–] algernon@lemmy.ml 2 points 1 month ago

I wonder too, why they didn't, because they're happily crawling domains that never had anything but junk on them. To me, that suggests they have no idea they're trapped. Not at crawling time at least.