Fuck AI

7560 readers

1763 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

TrickDacy@lemmy.world

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

167

Aggressive AI scrapers are making it kinda suck to run wikis (weirdgloop.org)

submitted 1 month ago by Itwasntme223@discuss.online to c/fuck_ai@lemmy.world

41 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] algernon@lemmy.ml 1 points 1 month ago (1 children)

The Daily Mail (vomit) alone publishes 1,500 articles a day. How many do you plan on publishing?

I have an automatically generated infinite maze. It produces roughly a million unique pages each day. It used to produce ~60 million pages / day, but a few months ago I decided to firewall some of the crawlers off instead of serving them garbage.

And I run niche sites. A site with more lucrative traffic than mine (eg, Codeberg, who uses the same software I do) likely generates a lot more garbage.

There was also a paper, commissioned by Anthropic, I believe, that concluded that only 250 malicious pages they fail to remove from the training set is enough to poison even the largest model. Now, I do not trust anything Anthropic says. But even if we'd need a billion pages to poison a model... I alone served that much in the past year.

[–] TheOctonaut@piefed.zip 0 points 1 month ago (1 children)

As you've said elsewhere, you've created a crawler trap, not a way to poison a model. You're wasting... some resources I guess? Both theirs and your own. Fascinating to think that you've served a billion http requests to no benefit to anyone and you believe this is you winning somehow.

[–] algernon@lemmy.ml 1 points 1 month ago

Yes, it does have a cost. It has a far smaller cost than serving the real thing. It also allows me to firewall them off and stop serving them, even if they come at me with real browsers. That's a very definitive win: I saved CPU time, I saved RAM, I saved network bandwidth, and I stopped them from accessing my stuff. How is that not a win?