this post was submitted on 05 Dec 2025

126 points (98.5% liked)

Fuck AI

4728 readers

828 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

TrickDacy@lemmy.world

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

126

Elon Musk's Grok AI Is Doxxing Home Addresses of Everyday People (futurism.com)

submitted 1 day ago by ThefuzzyFurryComrade@pawb.social to c/fuck_ai@lemmy.world

7 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[+] vrek@programming.dev -17 points 1 day ago (3 children)

I mean Elon Musk is an asshole but is this really an issue? I mean there were the yellow pages which basically doxxed everyone technically...

[–] dual_sport_dork@lemmy.world 6 points 1 day ago

Internet Pedantry Alert: That's the white pages. The yellow pages are for business listings. You could, and still can, opt out of the white pages listings.

[–] onnekas@sopuli.xyz 1 points 1 day ago (1 children)

I still find it crazy that those books existed in the first place. When I grew up you only needed a name and you could look it up in the yellow pages to get their phone number and address.

However, where I lived it was possible to opt out from this.

[–] NormalOnNSFW@lemmynsfw.com 1 points 5 hours ago

Back in the AOL days, the first iterations of Google had built-in white pages lookup, for everyone, where if you put in a landline phone number you’d get their name and address. One of my first experiences on the internet as a kid was talking people from AOL chatrooms into sending me their phone number, googling it, and sending back their name and address with some nonsense about being from the FBI. Really freaked people out.

[–] AnarchistArtificer@slrpnk.net 1 points 1 day ago

Leaking people's personally identifiable nformation (PII) is harmful, even if this particular instance of leakage weren't harmful.

When proponents of AI respond to the argument from creatives that training Generative AI involves stealing creative works, they often assert that the entire method of training means that the original works are not contained within the end model, and that the process is analogous to how humans learn. In a technical sense, I do agree with this characterisation of training as a sort of informational distillation. However, it appears that there are instances where an unreasonable amount of the original work is still retained in the final model. An analogy that I'd draw here is that in determining whether a derivative work that draws on an existing one is fair use, one of the factors is how much of the original work is contained within the derivative, and in what context. If a model is able to regurgitate data that it was trained on, then morally speaking, it's harder to justify this as being fair use (I say "morally" because I'm drawing on the ethical theme of fair use rather than using it in its straightforward legal sense). Of course, the question here isn't about stealing of art or other copyright concerns, but considering this separate problem is useful for understanding why this leakage is problematic.

One of the big problems with AI, whether we're talking about training on creative works, or the leakage of PII is that these models are incredibly opaque. It is exceptionally hard, if not impossible, to determine what data from the training data has been preserved in the final model — I don't even know whether the AI companies are able to glean that information. These models are so incredibly complex and are trained on unfathomable amounts of data, which leads to more and more instances where we see inappropriate levels of reproduction of the training data.

The key questions are:

If the model can reproduce this, are there more harmful things that could plausibly be retrievable via the AI? (Given that we have been seeing models trained on extremely sensitive medical or legal data, the answer is "almost certainly");
How can we know what PII or other sensitive data may have been contained in the training data? I.e. how do we gauge the extent of the severity of the risk of sensitive stuff being reproduced (Certainly we can't, and I'm doubtful if even the engineers behind the models could effectively answer this)
If we know for certain that sensitive materials have been included in the training data, how do we stop (or reduce the likelihood of) that data being reproduced? Is it possible to train a general purpose AI on sensitive data without significant risk of said sensitive data being reproduced (speaking as someone who has done a lot of nitty gritty data work and coding with machine learning systems, and tries to keep up with the literature, to my knowledge, we can't, and we might not ever be able to)

I consider this leakage of PII to be pretty serious already, but this is just an example of why people are so concerned about these systems being rolled out in the way they have. This particular instance is barely scratching the surface of a much wider, and deeper problem