this post was submitted on 18 Aug 2025
772 points (98.9% liked)

Technology

74193 readers
3790 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] zbyte64@awful.systems 16 points 7 hours ago (2 children)

Is there nightshade but for text and code? Maybe my source headers should include a bunch of special characters that then give a prompt injection. And sprinkle some nonsensical code comments before the real code comment.

[–] Honytawk@feddit.nl 1 points 1 hour ago

Maybe like a bunch of white text at 2pt?

Not visible to the user, but fully readable by crawlers.

[–] kuberoot@discuss.tchncs.de 2 points 2 hours ago

I think the issue is that text uses comparatively very little information, so you can't just inject invisible changes by changing the least insignificant bits - you'd need to change the actual phrasing/spelling of your text/code, and that'd be noticable.

[–] StopSpazzing@lemmy.world 10 points 7 hours ago* (last edited 7 hours ago) (2 children)

Is there a migration tool? If not would be awesome to migrate everything including issues and stuff. Bet even more people woild move.

[–] BlameTheAntifa@lemmy.world 6 points 3 hours ago

Codeberg has very good migration tools built in. You need to do one repo at a time, but it can move issues, releases, and everything.

[–] dodos@lemmy.world 2 points 3 hours ago

There are migration tools, but not a good bulk one that I could find. It worked for my repos except for my unreal engine fork.

[–] londos@lemmy.world 37 points 11 hours ago (2 children)

Can there be a challenge that actually does some maliciously useful compute? Like make their crawlers mine bitcoin or something.

[–] raspberriesareyummy@lemmy.world 54 points 11 hours ago (6 children)

Did you just say use the words "useful" and "bitcoin" in the same sentence? o_O

[–] polle@feddit.org 50 points 10 hours ago (2 children)

The saddest part is, we thought crypto was the biggest waste of energy ever and then the LLMs entered the chat.

[–] 1rre@discuss.tchncs.de 12 points 7 hours ago (3 children)

At least LLMs produce something, even if it's slop, all crypto does is... What does crypto even do again?

[–] Honytawk@feddit.nl 2 points 1 hour ago

It gives people with already too much money a way to invest by gambling without actually helping society.

[–] xiwi@lemmy.dbzer0.com 3 points 3 hours ago (1 children)

Crypto does drug sales and fraud!

[–] echodot@feddit.uk 4 points 2 hours ago

It also makes it's fans poorer, which at least is funny, especially since they never learn

Blockchain m8 gg

load more comments (1 replies)
[–] kameecoding@lemmy.world 26 points 11 hours ago (4 children)

Bro couldn't even bring himself to mention protein folding because that's too socialist I guess.

[–] andallthat@lemmy.world 11 points 5 hours ago* (last edited 4 hours ago) (1 children)

LLMs can't do protein folding. A specifically-trained Machine Learning model called AlphaFold did. Here's the paper.

Developing, training and fine tuning that model was a research effort led by two guys who got a Nobel for it. Alphafold can't do conversation or give you hummus recipes, it knows shit about the structure of human language but can identify patterns in the domain where it has been specifically and painstakingly trained.

It wasn't "hey chatGPT, show me how to fold a protein" is all I'm saying and the "superhuman reasoning capabilities" of current LLMs are still falling ridiculously short of much simpler problems.

[–] kameecoding@lemmy.world 1 points 1 hour ago

They can't bitcoin mine either, so technical feasibility wasn't the goal of my reply

[–] NeilBru@lemmy.world -3 points 2 hours ago* (last edited 2 hours ago) (1 children)

Hey dipshits:

The number of mouth-breathers who think every fucking "AI" is a fucking LLM is too damn high.

  • Every artificial intelligence is not a deep neural network algorithm.
  • Every deep neural network algorithm is not a generative adversarial network.
  • Every generative adversarial network is not a language model.
  • Every language model is not a large language model.

Fucking fart-sniffing twats.

$ ./end-rant.sh

[–] londos@lemmy.world 15 points 10 hours ago* (last edited 10 hours ago)

You're 100% right. I just grasped at the first example I could think of where the crawlers could do free work. Yours is much better. Left is best.

load more comments (4 replies)
load more comments (1 replies)
[–] SufferingSteve@feddit.nu 218 points 16 hours ago* (last edited 16 hours ago) (8 children)

There once was a dream of the semantic web, also known as web2. The semantic web could have enabled easy to ingest information of webpages, removing soo much of the computation required to get the information. Thus preventing much of the AI crawling cpu overhead.

What we got as web2 instead was social media. Destroying facts and making people depressed at a newer before seen rate.

Web3 was about enabling us to securely transfer value between people digitally and without middlemen.

What crypto gave us was fraud, expensive jpgs and scams. The term web is now even so eroded that it has lost much of its meaning. The information age gave way for the misinformation age, where everything is fake.

[–] vacuumflower@lemmy.sdf.org 2 points 3 hours ago

Much drama.

I agree about semantic web, but the issue is with all of the Internet. Both its monopoly as the medium of communication, and its architecture.

And if we go semantic for webpages, allowing the clients to construct representation, then we can go further, to separate data from medium, making messages and identities exist in a global space, as they (sort of, need a better solution) do in Usenet.

About the Internet itself being the problem - that's because it's hierarchical, despite appearances, and nobody understands it well. Especially since new systems of this kind are not being built often, to say the least, so the majority of people using the Internet doesn't even think about it as a system. It takes it for given that this is the only paradigm for the global network. And that it's application-neutral, which may not be true.

20 years ago, when I was a kid, people would think and imagine all kinds of things about the Internet and about the future and about ways all this can break, and these were normal people, not tech types, and one would think with time we wouldn't become more certain, as it becomes bigger and bigger.

OK, I'm just having an overvalued idea that the Internet is poisoned. Bad sleep, nasty weather, too much sweets eaten. Maybe that movement of packets on the IP protocol can somehow give someone free computation, with enough machines under their control, by using counters in the network stack as registers, or maybe something else.

[–] muusemuuse@sh.itjust.works 13 points 11 hours ago

Sound like it went the same way everything else went. The less money is involved the more trustworthy it is.

[–] tourist@lemmy.world 48 points 14 hours ago (5 children)

Web3 was about enabling us to securely transfer value between people digitally and without middlemen.

It's ironic that the middlemen showed up anyway and busted all the security of those transfers

You want some bipcoin to buy weed drugs on the slip road? Don't bother figuring out how to set up that wallet shit, come to our nifty token exchange where you can buy and sell all kinds of bipcoins

oh btw every government on the planet showed up and dug through our insecure records. hope you weren't actually buying shroom drugs on the slip rod

also we got hacked, you lost all your bipcoins sorry

At least, that's my recollection of events. I was getting my illegal narcotics the old fashioned way.

[–] raspberriesareyummy@lemmy.world 9 points 11 hours ago

also we got hacked, you lost all your bipcoins sorry

aaaaaaaaand - it's gone!

load more comments (4 replies)
[–] Marshezezz@lemmy.blahaj.zone 64 points 15 hours ago (1 children)

Capitalism is grand, innit. Wait, not grand, I meant to say cancer

[–] Serinus@lemmy.world 1 points 6 hours ago (1 children)

I feel like half of the blame capitalism gets is valid, but the other half is just society. I don't care what kind of system you're under, you're going to have to deal with other people.

Oh, and if you try the system where you don't have to deal with people, that just means other people end up handling you.

[–] amju_wolf@pawb.social 1 points 35 minutes ago

In this case it is purely fault of the money incentive though. Noone would spend so much effort and computation power on AI if they didn't think it could make them money.

The funniest part is though that it's only theoretical anyway, everyone is only losing on it and they're most likely never gonna make it back.

load more comments (4 replies)
[–] oeuf@slrpnk.net 37 points 13 hours ago (2 children)

Crazy. DDoS attacks are illegal here in the UK.

[–] BlameTheAntifa@lemmy.world 2 points 2 hours ago

The problem is that hundreds of bad actors doing the same thing independently of one another means it does not qualify as a DDoS attack. Maybe it’s time we start legally restricting bots and crawlers, though.

[–] rdri@lemmy.world 13 points 10 hours ago

So, sue the attackers?

[–] Wispy2891@lemmy.world 10 points 10 hours ago (2 children)

Question: those artificial stupidity bots want to steal the issues or want to steal the code? Because why they're wasting a lot of resources scraping millions of pages when they can steal everything via SSH (once a month, not 120 times a second)

[–] lime@feddit.nu 2 points 3 hours ago

they just want all text

[–] Passerby6497@lemmy.world 19 points 10 hours ago

That would require having someone with real intelligence running the scraper.

[–] zifk@sh.itjust.works 81 points 15 hours ago (9 children)

Anubis isn't supposed to be hard to avoid, but expensive to avoid. Not really surprised that a big company might be willing to throw a bunch of cash at it.

load more comments (9 replies)
[–] PhilipTheBucket@piefed.social 90 points 16 hours ago (12 children)

I feel like at some point it needs to be active response. Phase 1 is a teergrube type of slowness to muck up the crawlers, with warnings in the headers and response body, and then phase 2 is a DDOS in response or maybe just a drone strike and cut out the middleman. Once you've actively evading Anubis, fuckin' game on.

[–] turbowafflz@lemmy.world 97 points 16 hours ago (6 children)

I think the best thing to do is to not block them when they're detected but poison them instead. Feed them tons of text generated by tiny old language models, it's harder to detect and also messes up their training and makes the models less reliable. Of course you would want to do that on a separate server so it doesn't slow down real users, but you probably don't need much power since the scrapers probably don't really care about the speed

[–] xthexder@l.sw0.com 52 points 16 hours ago

I love catching bots in tarpits, it's actually quite fun

load more comments (5 replies)
load more comments (11 replies)
load more comments
view more: next ›