Technology

74994 readers

3044 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

129

Pay-per-output? AI firms blindsided by beefed up robots.txt instructions. (arstechnica.com)

submitted 19 hours ago by ccunning@lemmy.world to c/technology@lemmy.world

27 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Kissaki@feddit.org 18 points 13 hours ago

evolves robots.txt instructions by adding an automated licensing layer that's designed to block bots that don't fairly compensate creators for content

robots.txt - the well known technology to block bad-intention bots /s

What's automated about the licensing layer? At some point, I started skimming the article. They didn't seem clear about it. The AI can "automatically" parse it?

# NOTICE: all crawlers and bots are strictly prohibited from using this 
# content for AI training without complying with the terms of the RSL 
# Collective AI royalty license. Any use of this content for AI training 
# without a license is a violation of our intellectual property rights.

License: https://rslcollective.org/royalty.xml

Yeah, this is as useless as I thought it would be. Nothing here is actively blocking.

I love that the XML then points to a text/html content website. I guess nothing for machine parsing, maybe for AI parsing.

I don't remember which AI company, but they argued they're not crawlers but agents acting on the users behalf for their specific request/action, ignoring robots.txt. Who knows how they will react. But their incentives and history is ignoring robots.txt.

Why ~~am I~~ is this comment so negative. Oh well.