this post was submitted on 19 Aug 2025

691 points (98.9% liked)

Technology

74233 readers

4393 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

691

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall (www.searchenginejournal.com)

submitted 1 day ago* (last edited 1 day ago) by Davriellelouna@lemmy.world to c/technology@lemmy.world

161 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] poopkins@lemmy.world 1 points 1 hour ago* (last edited 1 hour ago)

I've developed my own agent for assisting me with researching a topic I'm passionate about, and I ran into the exact same barrier: Cloudflare intercepts my request and is clearly checking if I'm a human using a web browser. (For my network requests, I've defined my own user agent.)

So I use that as a signal that the website doesn't want automated tools scraping their data. That's fine with me: my agent just tells me that there might be interesting content on the site and gives me a deep link. I can extract the data and carry on my research on my own.

I completely understand where Perplexity is coming from, but at scale, implementations like this are awful for the web.

[–] kreskin@lemmy.world 5 points 2 hours ago* (last edited 2 hours ago)

they cant get their ai to check a box that says "I am not a robot"? I'd think thatd be a first year comp sci student level task. And robots.txt files were basically always voluntary compliance anyway.

[–] TheGrandNagus@lemmy.world 13 points 4 hours ago

Can't believe I've lived to see Cloudflare be the good guys

[–] Wispy2891@lemmy.world 7 points 5 hours ago* (last edited 5 hours ago)

Here comes the ridiculous offer to buy Google chrome with money they don't have: easy delicious scraping directly from the user source

[–] kittenzrulz123@lemmy.blahaj.zone 19 points 8 hours ago

[–] tibi@lemmy.world 57 points 14 hours ago

You could say they are... Perplexed.

[–] Kissaki@feddit.org 81 points 16 hours ago* (last edited 16 hours ago) (1 children)

Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.

So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?

[–] lime@feddit.nu 29 points 13 hours ago

yeah it's almost like there as already a system for this in place

[–] WolfLink@sh.itjust.works 125 points 17 hours ago (1 children)

This is a nice CloudFlare ad

[–] pyre@lemmy.world 21 points 14 hours ago (2 children)

yeah. still not worth dealing with fucking cloudflare. fuck cloudflare.

[–] oppy1984@lemdro.id 1 points 7 minutes ago

I'm out of the loop, what's wrong with cloud flare?

[–] int32@lemmy.dbzer0.com 7 points 14 hours ago (1 children)

DEATH TO CLOUDFLARE!

[–] pressanykeynow@lemmy.world 4 points 5 hours ago (1 children)

That would be terrible for a lot of people as they are the only company providing such services that doesn't charge for traffic.

[–] int32@lemmy.dbzer0.com 3 points 3 hours ago* (last edited 3 hours ago) (1 children)

They can use web.archive.org as a cdn(I do that to cloudflare websites). But honestly, cloudflare or not, the internet is broken.

[–] pressanykeynow@lemmy.world 1 points 45 minutes ago

Can you explain please? How can I use archive.org as a cdn for my website?

[–] NotASharkInAManSuit@lemmy.world 49 points 16 hours ago (1 children)

That’s the entire point, dipshit. I wish we got one of the cool techno dystopias rather than this boring corporate idiot one.

[–] Dojan@pawb.social 10 points 16 hours ago

I'm still holding out for Stephen Hawking to mail out Demon Summoning programs.

[–] frezik@lemmy.blahaj.zone 64 points 18 hours ago

Traveling snake oil salesman complains he can't pick people's locks.

[–] Glitchvid@lemmy.world 211 points 23 hours ago (2 children)

When a firm outright admits to bypassing or trying to bypass measures taken to keep them out, you think that would be a slam dunk case of unauthorized access under the CFAA with felony enhancements.

[–] GamingChairModel@lemmy.world 88 points 22 hours ago (2 children)

Fuck that. I don't need prosecutors and the courts to rule that accessing publicly available information in a way that the website owner doesn't want is literally a crime. That logic would extend to ad blockers and editing HTML/js in an "inspect element" tag.

[–] kibiz0r@midwest.social 21 points 19 hours ago (1 children)

They already prosecute people under the unauthorized access provision. They just don’t prosecute rich people under it.

[–] GamingChairModel@lemmy.world 9 points 16 hours ago

They prosecuted and convicted a guy under the CFAA for figuring out the URL schema for an AT&T website designed to be accessed by the iPad when it first launched, and then just visiting that site by trying every URL in a script. And then his lawyer (the foremost expert on the CFAA) got his conviction overturned:

https://www.eff.org/cases/us-v-auernheimer

We have to maintain that fight, to make sure that the legal system doesn't criminalize normal computer tinkering, like using scripts or even browser settings in ways that site owners don't approve of.

[–] EncryptKeeper@lemmy.world 48 points 22 hours ago (20 children)

That logic would not extend to ad blockers, as the point of concern is gaining unauthorized access to a computer system or asset. Blocking ads would not be considered gaining unauthorized access to anything. In fact it would be the opposite of that.

load more comments (20 replies)

[–] jve@lemmy.world 7 points 17 hours ago* (last edited 17 hours ago) (1 children)

Right? Isn’t this a textbook DMCA violation, too?

[–] WhyJiffie@sh.itjust.works 2 points 9 hours ago

for us, not for them. wait until they argue in court that actually its us at fault and we need to provide access or else

[–] EtherWhack@lemmy.world 49 points 19 hours ago

[–] floquant@lemmy.dbzer0.com 220 points 1 day ago (1 children)

It's difficult to be a shittier company than OpenAI, but Perplexity seems to be trying hard.

[–] Brunbrun6766@lemmy.world 59 points 21 hours ago (1 children)

Step 1, SOMEHOW find a more punchable face than Altman

[–] Tollana1234567@lemmy.today 4 points 4 hours ago* (last edited 4 hours ago)

put META android zuckerberg on or mechahitler musk.

[–] SugarCatDestroyer@lemmy.world 18 points 17 hours ago* (last edited 17 hours ago) (2 children)

It seems like it's some kind of distraction to make people think things aren't as bad as they really are, it just sounds too far-fetched to me.

It's like a bear that has eaten too much and starts whining because a small rabbit is running away from him, even though the bear has already eaten almost all the rabbits and is clearly full.

load more comments (2 replies)

[–] ubergeek@lemmy.today 46 points 20 hours ago (2 children)

Good. I went through my CF panel, and blocked some of those "AI Assistants" that by default were open, including Perplexity's.

load more comments (2 replies)

[–] cupcakezealot@piefed.blahaj.zone 66 points 22 hours ago (1 children)

rare cloudflare w

[–] boonhet@sopuli.xyz 51 points 20 hours ago

As far as security is concerned, their w's are pretty common tbh. It's just the whole centralization issue.

[–] gravitas_deficiency@sh.itjust.works 27 points 19 hours ago

good, that means it’s working

I’m gonna be frustrated (though not surprised) if the response is anything other than this.

[–] peoplebeproblems@midwest.social 34 points 20 hours ago

Well... Good.

[–] JeeBaiChow@lemmy.world 88 points 1 day ago

Uh.. good?

[–] wosat@lemmy.world 31 points 21 hours ago

This is why companies like Perplexity and OpenAI are creating browsers.

[–] LodeMike@lemmy.today 11 points 17 hours ago

Words cannot describe how much I hate this person

[–] sylver_dragon@lemmy.world 51 points 1 day ago (3 children)

You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.

[–] spankmonkey@lemmy.world 68 points 1 day ago* (last edited 23 hours ago)

Or find a more efficient way to manage data, since their current approach is basically DDOSing the internet for training data and also for responding to user interactions.

load more comments (2 replies)

[–] iAvicenna@lemmy.world 36 points 22 hours ago (1 children)

ask AI how to do it?

[–] prex@aussie.zone 25 points 22 hours ago

They tried nothing & they're all out of ideas.

load more comments