this post was submitted on 18 Dec 2025
197 points (97.6% liked)
Technology
77790 readers
2761 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
This is from a Stanford study that is summarized here:
https://www.linkedin.com/pulse/does-ai-actually-boost-developer-productivity-striking-%C3%A7elebi-tcp8f
There are other studies with different conclusions, but this one aligns with my own experience. To your point about how AI won’t reproduce the Linux kernel, this study also points out that AI is significantly less effective, even going into the negative, with complex codebases, which is in agreement with what you said, since the Linux kernel certainly qualifies as a complex codebase.
I agree big tech is using open source unethically, but how much different is this situation from the other ways big tech profits from open source without contributing back? Training proprietary LLMs on open source code is shitty, rent-seeking behavior, but not really a unique development, and certainly not something that undermines the core value of open source.
Destroying "share alike" doesn't undermine the core value of open source? What IS the core value?
The LLMs are not distributing the GPL code, their weights are being trained on it. You can’t just have Copilot pump out something that works like the Linux kerne or Blender, except with different code that isn’t subject to the GPL license. At best, the AI can learn from it and assist humans with developing a proprietary alternative. In that case, it’s not really that much better than having humans study a GPL codebase and make a proprietary alternative without AI. It’s still going to cost a lot of money to replicate the thing no matter what, so why not just save money and use the GPL code and contribute back? Also, it’s going to be hard to sell your proprietary alternative, because why wouldn’t people just use the FOSS version?
You can't "train" on code you haven't copied. That is kind of obvious, right? So did they have the right to copy and then reproduce the work without attribution?
Yeah, I guess this is a bit of gray area. With GPL, you only have rights to code if it was distributed to you. In the case of GPL code that has only been distributed to select people and none of those people distributed it to the general public, but GitHub still trained their models on the private repo, then that would technically be in violation of the license. This would be a more niche scenario, though, since the intent normally is public distribution.