this post was submitted on 05 Dec 2025
126 points (98.5% liked)
Fuck AI
4728 readers
828 users here now
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I mean Elon Musk is an asshole but is this really an issue? I mean there were the yellow pages which basically doxxed everyone technically...
Internet Pedantry Alert: That's the white pages. The yellow pages are for business listings. You could, and still can, opt out of the white pages listings.
I still find it crazy that those books existed in the first place. When I grew up you only needed a name and you could look it up in the yellow pages to get their phone number and address.
However, where I lived it was possible to opt out from this.
Back in the AOL days, the first iterations of Google had built-in white pages lookup, for everyone, where if you put in a landline phone number you’d get their name and address. One of my first experiences on the internet as a kid was talking people from AOL chatrooms into sending me their phone number, googling it, and sending back their name and address with some nonsense about being from the FBI. Really freaked people out.
Leaking people's personally identifiable nformation (PII) is harmful, even if this particular instance of leakage weren't harmful.
When proponents of AI respond to the argument from creatives that training Generative AI involves stealing creative works, they often assert that the entire method of training means that the original works are not contained within the end model, and that the process is analogous to how humans learn. In a technical sense, I do agree with this characterisation of training as a sort of informational distillation. However, it appears that there are instances where an unreasonable amount of the original work is still retained in the final model. An analogy that I'd draw here is that in determining whether a derivative work that draws on an existing one is fair use, one of the factors is how much of the original work is contained within the derivative, and in what context. If a model is able to regurgitate data that it was trained on, then morally speaking, it's harder to justify this as being fair use (I say "morally" because I'm drawing on the ethical theme of fair use rather than using it in its straightforward legal sense). Of course, the question here isn't about stealing of art or other copyright concerns, but considering this separate problem is useful for understanding why this leakage is problematic.
One of the big problems with AI, whether we're talking about training on creative works, or the leakage of PII is that these models are incredibly opaque. It is exceptionally hard, if not impossible, to determine what data from the training data has been preserved in the final model — I don't even know whether the AI companies are able to glean that information. These models are so incredibly complex and are trained on unfathomable amounts of data, which leads to more and more instances where we see inappropriate levels of reproduction of the training data.
The key questions are:
I consider this leakage of PII to be pretty serious already, but this is just an example of why people are so concerned about these systems being rolled out in the way they have. This particular instance is barely scratching the surface of a much wider, and deeper problem