As long as they cannot copyright what they generate from using the pirated materials
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
We're going to end up in a situation where whatever is necessary to train AI is permitted, and the main question is whether that will be through (re)interpretation of existing law or the passage of a new law.
Good thing I have a local model running that's constantly learning, for precisely this reason
Arguing that training models isn't fair use us going to be a massive uphill battle, it's basically reading the book but with a computer. It's not actually a big deal to people, unless you hold the copyright to a ton of works and want to get a percentage of all the AI income these companies have made.
Torrenting the books is likely absolutely copyright infringement, but that has relatively low payout compared to the money these companies are getting for their models. The training being fair use means that rights holders can't try to take any money from the model's use. The statutory limits for infringement even at per work levels aren't significant compared to the legal cost of proving it happened.
There's an argument to be made that it is, in fact, not 'reading'. The training of the model could be considered a lossy compression of the data. And streaming movies in a lossy compression format is not fair use, is it?
It's not the storage of the information that matters as much as the presentation. Google's search index stores a huge amount of copyrighted material, even losslessly. But they only present small snippets at a time which is not considered copyright infringement. The question really is whether or not the information being presented by the models is in a format which is considered copyright infringement. So far, courts have not found that they are.
The model doesn't stream out anyone's content though. The article mentions that the plaintiffs have provided no examples of a prompt that creates anything substantial.
Streaming a lossy compression would generally be infringement, but there is definitely a point where it becomes not infringement if it's lossy enough.
What a model generally stores, is factual information that isn't copyright in the first place. It's storing word counts, sentence lengths, sentiment analysis, and so on.
They didn't say seeding is fair use, just inherently part of torrenting. Good thing Sarah Silverman has pc gamer there to pander for her.