this post was submitted on 19 Aug 2025

626 points (98.9% liked)

Technology

74193 readers

4308 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

626

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall (www.searchenginejournal.com)

submitted 17 hours ago* (last edited 17 hours ago) by Davriellelouna@lemmy.world to c/technology@lemmy.world

146 comments fedilink hide all child comments

(page 2) 50 comments

sorted by: hot top controversial new old

[–] BaroqueInMind@piefed.social 18 points 16 hours ago

Cry more, Perplexity.

[–] Ermiar@lemmy.world 19 points 17 hours ago* (last edited 17 hours ago) (1 children)

Oh no ! Anyway…

[–] prex@aussie.zone 13 points 16 hours ago

boo fucking hoo

[–] Ekybio@lemmy.world 19 points 17 hours ago (3 children)

Can someone with more knowledge shine a bit more light on this while situation? Im out of the loop on the technical details

[–] panda_abyss@lemmy.ca 32 points 17 hours ago* (last edited 17 hours ago) (6 children)

Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.

A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.

This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.

[–] _cryptagion@lemmy.dbzer0.com 10 points 15 hours ago

It should be pointed out that Cloudflare didn't say they were going to block AI traffic, they give you the option to. The service is a free opt-in for people who want it.

[–] nutsack@lemmy.dbzer0.com 6 points 15 hours ago* (last edited 15 hours ago)

they don't outright block ai crawlers. they added some new tools and options for managing or blocking ai bot traffic which the cloudflare customer can choose to use or to not use.

im running a free educational resource and i let the crawlers hit my site all they want because its useful knowledge unavailable anywhere else and it's served to them from cloudflare's free tier cache. i just don't know why they have to read it ten thousand times a day.

load more comments (4 replies)

[–] BetaDoggo_@lemmy.world 21 points 17 hours ago* (last edited 17 hours ago) (1 children)

Perplexity (an "AI search engine" company with 500 million in funding) can't bypass cloudflare's anti-bot checks. For each search Perplexity scrapes the top results and summarizes them for the user. Cloudflare intentionally blocks perplexity's scrapers because they ignore robots.txt and mimic real users to get around cloudflare's blocking features. Perplexity argues that their scraping is acceptable because it's user initiated.

Personally I think cloudflare is in the right here. The scraped sites get 0 revenue from Perplexity searches (unless the user decides to go through the sources section and click the links) and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.

[–] lividweasel@lemmy.world 6 points 14 hours ago (2 children)

…and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.

That seems almost maliciously stupid. We need to train a new model. Hey, where’d the data go? Oh well, let’s just go scrape it all again. Wait, did we already scrape this site? No idea, let’s scrape it again just to be sure.

load more comments (2 replies)

load more comments (1 replies)

[–] EncryptKeeper@lemmy.world 8 points 15 hours ago (2 children)

I can’t get over their CEO that looks like a nine year old. Not sure what it is about him

load more comments (2 replies)

[–] interdimensionalmeme@lemmy.ml 9 points 16 hours ago (6 children)

Just buy cloudflare duh

load more comments (6 replies)

[–] ordnance_qf_17_pounder@reddthat.com 9 points 17 hours ago

Oh no!

load more comments