this post was submitted on 19 Aug 2025

626 points (98.9% liked)

Technology

74193 readers

4308 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

626

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall (www.searchenginejournal.com)

submitted 17 hours ago* (last edited 17 hours ago) by Davriellelouna@lemmy.world to c/technology@lemmy.world

146 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] kittenzrulz123@lemmy.blahaj.zone 5 points 2 hours ago

[–] tibi@lemmy.world 42 points 8 hours ago

You could say they are... Perplexed.

[–] Kissaki@feddit.org 58 points 9 hours ago* (last edited 9 hours ago) (1 children)

Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.

So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?

[–] lime@feddit.nu 16 points 7 hours ago

yeah it's almost like there as already a system for this in place

[–] WolfLink@sh.itjust.works 101 points 10 hours ago (1 children)

This is a nice CloudFlare ad

[–] pyre@lemmy.world 18 points 7 hours ago (1 children)

yeah. still not worth dealing with fucking cloudflare. fuck cloudflare.

[–] int32@lemmy.dbzer0.com 7 points 7 hours ago

DEATH TO CLOUDFLARE!

[–] NotASharkInAManSuit@lemmy.world 40 points 10 hours ago (1 children)

That’s the entire point, dipshit. I wish we got one of the cool techno dystopias rather than this boring corporate idiot one.

[–] Dojan@pawb.social 6 points 10 hours ago

I'm still holding out for Stephen Hawking to mail out Demon Summoning programs.

[–] frezik@lemmy.blahaj.zone 52 points 11 hours ago

Traveling snake oil salesman complains he can't pick people's locks.

[–] SugarCatDestroyer@lemmy.world 15 points 10 hours ago* (last edited 10 hours ago) (1 children)

It seems like it's some kind of distraction to make people think things aren't as bad as they really are, it just sounds too far-fetched to me.

It's like a bear that has eaten too much and starts whining because a small rabbit is running away from him, even though the bear has already eaten almost all the rabbits and is clearly full.

[–] EtherWhack@lemmy.world 47 points 13 hours ago

[–] Glitchvid@lemmy.world 196 points 16 hours ago (2 children)

When a firm outright admits to bypassing or trying to bypass measures taken to keep them out, you think that would be a slam dunk case of unauthorized access under the CFAA with felony enhancements.

[–] GamingChairModel@lemmy.world 84 points 15 hours ago (2 children)

Fuck that. I don't need prosecutors and the courts to rule that accessing publicly available information in a way that the website owner doesn't want is literally a crime. That logic would extend to ad blockers and editing HTML/js in an "inspect element" tag.

[–] kibiz0r@midwest.social 20 points 12 hours ago (1 children)

They already prosecute people under the unauthorized access provision. They just don’t prosecute rich people under it.

[–] GamingChairModel@lemmy.world 8 points 10 hours ago

They prosecuted and convicted a guy under the CFAA for figuring out the URL schema for an AT&T website designed to be accessed by the iPad when it first launched, and then just visiting that site by trying every URL in a script. And then his lawyer (the foremost expert on the CFAA) got his conviction overturned:

https://www.eff.org/cases/us-v-auernheimer

We have to maintain that fight, to make sure that the legal system doesn't criminalize normal computer tinkering, like using scripts or even browser settings in ways that site owners don't approve of.

[–] EncryptKeeper@lemmy.world 44 points 15 hours ago (10 children)

That logic would not extend to ad blockers, as the point of concern is gaining unauthorized access to a computer system or asset. Blocking ads would not be considered gaining unauthorized access to anything. In fact it would be the opposite of that.

[–] GamingChairModel@lemmy.world 16 points 14 hours ago (9 children)

gaining unauthorized access to a computer system

And my point is that defining "unauthorized" to include visitors using unauthorized tools/methods to access a publicly visible resource would be a policy disaster.

If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA? I think the answer should obviously be "no," and that the way to define "authorization" is whether the website puts up some kind of login/authentication mechanism to block or allow specific users, not to put a simple request to the visiting public to please respect the rules of the site.

To me, a robots.txt is more like a friendly request to unauthenticated visitors than it is a technical implementation of some kind of authentication mechanism.

Scraping isn't hacking. I agree with the Third Circuit and the EFF: If the website owner makes a resource available to visitors without authentication, then accessing those resources isn't a crime, even if the website owner didn't intend for site visitors to use that specific method.

load more comments (9 replies)

[–] jve@lemmy.world 7 points 10 hours ago* (last edited 10 hours ago) (1 children)

Right? Isn’t this a textbook DMCA violation, too?

[–] WhyJiffie@sh.itjust.works 1 points 3 hours ago

for us, not for them. wait until they argue in court that actually its us at fault and we need to provide access or else

[–] ubergeek@lemmy.today 43 points 13 hours ago (2 children)

Good. I went through my CF panel, and blocked some of those "AI Assistants" that by default were open, including Perplexity's.

load more comments (2 replies)

[–] floquant@lemmy.dbzer0.com 203 points 17 hours ago (1 children)

It's difficult to be a shittier company than OpenAI, but Perplexity seems to be trying hard.

[–] Brunbrun6766@lemmy.world 53 points 15 hours ago

Step 1, SOMEHOW find a more punchable face than Altman

[–] gravitas_deficiency@sh.itjust.works 25 points 13 hours ago

good, that means it’s working

I’m gonna be frustrated (though not surprised) if the response is anything other than this.

[–] cupcakezealot@piefed.blahaj.zone 63 points 15 hours ago (1 children)

rare cloudflare w

[–] boonhet@sopuli.xyz 48 points 14 hours ago

As far as security is concerned, their w's are pretty common tbh. It's just the whole centralization issue.

[–] LodeMike@lemmy.today 10 points 11 hours ago

Words cannot describe how much I hate this person

[–] peoplebeproblems@midwest.social 33 points 14 hours ago

Well... Good.

[–] wosat@lemmy.world 30 points 14 hours ago

This is why companies like Perplexity and OpenAI are creating browsers.

[–] JeeBaiChow@lemmy.world 87 points 17 hours ago

Uh.. good?

[–] kokesh@lemmy.world 8 points 11 hours ago (1 children)

Is there some simply deployable PHP honeytrap for AI crawlers?

[–] blargh513@sh.itjust.works 1 points 6 hours ago

Used to make tarpits with reverse proxies. Accept the connection and then set the responses for a few seconds before default TCP timeout. Doesn't eat much resource as long as you have enough TCP connections and can reuse them effectively.

[–] iAvicenna@lemmy.world 34 points 16 hours ago (1 children)

ask AI how to do it?

[–] prex@aussie.zone 24 points 16 hours ago

They tried nothing & they're all out of ideas.

[–] sylver_dragon@lemmy.world 50 points 17 hours ago (3 children)

You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.

[–] spankmonkey@lemmy.world 67 points 17 hours ago* (last edited 16 hours ago)

Or find a more efficient way to manage data, since their current approach is basically DDOSing the internet for training data and also for responding to user interactions.

[–] Quill7513@slrpnk.net 31 points 17 hours ago

see, but they're not competent. further, they don't care. most of these ai companies are snake oil. they're selling you a solution that doesn't meaningfully solve a problem. their main way of surviving is saying "this is what it can do now, just imagine what it can do if you invest money in my company."

they're scammers, the lot of them, running ponzi schemes with our money. if the planet dies for it, that's no concern of theirs. ponzi schemes require the schemer to have no long term plan, just a line of credit that they can keep drawing from until they skip town before the tax collector comes

load more comments (1 replies)

[–] GissaMittJobb@lemmy.ml 19 points 15 hours ago (1 children)

Skill issue. Cope and seethe

load more comments (1 replies)

[–] fossilesque@mander.xyz 10 points 13 hours ago

I hate that these bots ruin my read it later app. :(

[–] kescusay@lemmy.world 21 points 16 hours ago

I set up a WAF for my company's publicly facing developer portal to block out bot traffic from assholes like these guys. It reduced bot traffic to the site by something like - I kid you not - 99.999%.

Fucking data vultures.

load more comments