Fuck AI

4728 readers

648 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

TrickDacy@lemmy.world

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

116

Elon Musk's Grok AI Is Doxxing Home Addresses of Everyday People (futurism.com)

submitted 1 day ago by ThefuzzyFurryComrade@pawb.social to c/fuck_ai@lemmy.world

5 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] AnarchistArtificer@slrpnk.net 1 points 19 hours ago

Leaking people's personally identifiable nformation (PII) is harmful, even if this particular instance of leakage weren't harmful.

When proponents of AI respond to the argument from creatives that training Generative AI involves stealing creative works, they often assert that the entire method of training means that the original works are not contained within the end model, and that the process is analogous to how humans learn. In a technical sense, I do agree with this characterisation of training as a sort of informational distillation. However, it appears that there are instances where an unreasonable amount of the original work is still retained in the final model. An analogy that I'd draw here is that in determining whether a derivative work that draws on an existing one is fair use, one of the factors is how much of the original work is contained within the derivative, and in what context. If a model is able to regurgitate data that it was trained on, then morally speaking, it's harder to justify this as being fair use (I say "morally" because I'm drawing on the ethical theme of fair use rather than using it in its straightforward legal sense). Of course, the question here isn't about stealing of art or other copyright concerns, but considering this separate problem is useful for understanding why this leakage is problematic.

One of the big problems with AI, whether we're talking about training on creative works, or the leakage of PII is that these models are incredibly opaque. It is exceptionally hard, if not impossible, to determine what data from the training data has been preserved in the final model — I don't even know whether the AI companies are able to glean that information. These models are so incredibly complex and are trained on unfathomable amounts of data, which leads to more and more instances where we see inappropriate levels of reproduction of the training data.

The key questions are:

If the model can reproduce this, are there more harmful things that could plausibly be retrievable via the AI? (Given that we have been seeing models trained on extremely sensitive medical or legal data, the answer is "almost certainly");
How can we know what PII or other sensitive data may have been contained in the training data? I.e. how do we gauge the extent of the severity of the risk of sensitive stuff being reproduced (Certainly we can't, and I'm doubtful if even the engineers behind the models could effectively answer this)
If we know for certain that sensitive materials have been included in the training data, how do we stop (or reduce the likelihood of) that data being reproduced? Is it possible to train a general purpose AI on sensitive data without significant risk of said sensitive data being reproduced (speaking as someone who has done a lot of nitty gritty data work and coding with machine learning systems, and tries to keep up with the literature, to my knowledge, we can't, and we might not ever be able to)

I consider this leakage of PII to be pretty serious already, but this is just an example of why people are so concerned about these systems being rolled out in the way they have. This particular instance is barely scratching the surface of a much wider, and deeper problem