Is there nightshade but for text and code? Maybe my source headers should include a bunch of special characters that then give a prompt injection. And sprinkle some nonsensical code comments before the real code comment.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
Maybe like a bunch of white text at 2pt?
Not visible to the user, but fully readable by crawlers.
I think the issue is that text uses comparatively very little information, so you can't just inject invisible changes by changing the least insignificant bits - you'd need to change the actual phrasing/spelling of your text/code, and that'd be noticable.
Is there a migration tool? If not would be awesome to migrate everything including issues and stuff. Bet even more people woild move.
Codeberg has very good migration tools built in. You need to do one repo at a time, but it can move issues, releases, and everything.
There are migration tools, but not a good bulk one that I could find. It worked for my repos except for my unreal engine fork.
Can there be a challenge that actually does some maliciously useful compute? Like make their crawlers mine bitcoin or something.
Did you just say use the words "useful" and "bitcoin" in the same sentence? o_O
The saddest part is, we thought crypto was the biggest waste of energy ever and then the LLMs entered the chat.
At least LLMs produce something, even if it's slop, all crypto does is... What does crypto even do again?
It gives people with already too much money a way to invest by gambling without actually helping society.
Crypto does drug sales and fraud!
It also makes it's fans poorer, which at least is funny, especially since they never learn
Blockchain m8 gg
Bro couldn't even bring himself to mention protein folding because that's too socialist I guess.
LLMs can't do protein folding. A specifically-trained Machine Learning model called AlphaFold did. Here's the paper.
Developing, training and fine tuning that model was a research effort led by two guys who got a Nobel for it. Alphafold can't do conversation or give you hummus recipes, it knows shit about the structure of human language but can identify patterns in the domain where it has been specifically and painstakingly trained.
It wasn't "hey chatGPT, show me how to fold a protein" is all I'm saying and the "superhuman reasoning capabilities" of current LLMs are still falling ridiculously short of much simpler problems.
They can't bitcoin mine either, so technical feasibility wasn't the goal of my reply
Hey dipshits:
The number of mouth-breathers who think every fucking "AI" is a fucking LLM is too damn high.
- Every artificial intelligence is not a deep neural network algorithm.
- Every deep neural network algorithm is not a generative adversarial network.
- Every generative adversarial network is not a language model.
- Every language model is not a large language model.
Fucking fart-sniffing twats.
$ ./end-rant.sh
You're 100% right. I just grasped at the first example I could think of where the crawlers could do free work. Yours is much better. Left is best.
There once was a dream of the semantic web, also known as web2. The semantic web could have enabled easy to ingest information of webpages, removing soo much of the computation required to get the information. Thus preventing much of the AI crawling cpu overhead.
What we got as web2 instead was social media. Destroying facts and making people depressed at a newer before seen rate.
Web3 was about enabling us to securely transfer value between people digitally and without middlemen.
What crypto gave us was fraud, expensive jpgs and scams. The term web is now even so eroded that it has lost much of its meaning. The information age gave way for the misinformation age, where everything is fake.
Much drama.
I agree about semantic web, but the issue is with all of the Internet. Both its monopoly as the medium of communication, and its architecture.
And if we go semantic for webpages, allowing the clients to construct representation, then we can go further, to separate data from medium, making messages and identities exist in a global space, as they (sort of, need a better solution) do in Usenet.
About the Internet itself being the problem - that's because it's hierarchical, despite appearances, and nobody understands it well. Especially since new systems of this kind are not being built often, to say the least, so the majority of people using the Internet doesn't even think about it as a system. It takes it for given that this is the only paradigm for the global network. And that it's application-neutral, which may not be true.
20 years ago, when I was a kid, people would think and imagine all kinds of things about the Internet and about the future and about ways all this can break, and these were normal people, not tech types, and one would think with time we wouldn't become more certain, as it becomes bigger and bigger.
OK, I'm just having an overvalued idea that the Internet is poisoned. Bad sleep, nasty weather, too much sweets eaten. Maybe that movement of packets on the IP protocol can somehow give someone free computation, with enough machines under their control, by using counters in the network stack as registers, or maybe something else.
Sound like it went the same way everything else went. The less money is involved the more trustworthy it is.
Web3 was about enabling us to securely transfer value between people digitally and without middlemen.
It's ironic that the middlemen showed up anyway and busted all the security of those transfers
You want some bipcoin to buy weed drugs on the slip road? Don't bother figuring out how to set up that wallet shit, come to our nifty token exchange where you can buy and sell all kinds of bipcoins
oh btw every government on the planet showed up and dug through our insecure records. hope you weren't actually buying shroom drugs on the slip rod
also we got hacked, you lost all your bipcoins sorry
At least, that's my recollection of events. I was getting my illegal narcotics the old fashioned way.
also we got hacked, you lost all your bipcoins sorry
aaaaaaaaand - it's gone!
Capitalism is grand, innit. Wait, not grand, I meant to say cancer
I feel like half of the blame capitalism gets is valid, but the other half is just society. I don't care what kind of system you're under, you're going to have to deal with other people.
Oh, and if you try the system where you don't have to deal with people, that just means other people end up handling you.
In this case it is purely fault of the money incentive though. Noone would spend so much effort and computation power on AI if they didn't think it could make them money.
The funniest part is though that it's only theoretical anyway, everyone is only losing on it and they're most likely never gonna make it back.
Crazy. DDoS attacks are illegal here in the UK.
The problem is that hundreds of bad actors doing the same thing independently of one another means it does not qualify as a DDoS attack. Maybe it’s time we start legally restricting bots and crawlers, though.
So, sue the attackers?
Question: those artificial stupidity bots want to steal the issues or want to steal the code? Because why they're wasting a lot of resources scraping millions of pages when they can steal everything via SSH (once a month, not 120 times a second)
they just want all text
That would require having someone with real intelligence running the scraper.
Anubis isn't supposed to be hard to avoid, but expensive to avoid. Not really surprised that a big company might be willing to throw a bunch of cash at it.
I feel like at some point it needs to be active response. Phase 1 is a teergrube type of slowness to muck up the crawlers, with warnings in the headers and response body, and then phase 2 is a DDOS in response or maybe just a drone strike and cut out the middleman. Once you've actively evading Anubis, fuckin' game on.
I think the best thing to do is to not block them when they're detected but poison them instead. Feed them tons of text generated by tiny old language models, it's harder to detect and also messes up their training and makes the models less reliable. Of course you would want to do that on a separate server so it doesn't slow down real users, but you probably don't need much power since the scrapers probably don't really care about the speed
I love catching bots in tarpits, it's actually quite fun