I know this is the most ridiculous idea, but we need to pack our bags and make a new internet protocol, to separate us from the rest, at least for a while. Either way, most “modern” internet things (looking at you, JavaScript) are not modern at all, and starting over might help more than any of us could imagine.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
Like Gemini?
From official Website:
Gemini is a new internet technology supporting an electronic library of interconnected text documents. That's not a new idea, but it's not old fashioned either. It's timeless, and deserves tools which treat it as a first class concept, not a vestigial corner case. Gemini isn't about innovation or disruption, it's about providing some respite for those who feel the internet has been disrupted enough already. We're not out to change the world or destroy other technologies. We are out to build a lightweight online space where documents are just documents, in the interests of every reader's privacy, attention and bandwidth.
Yep! That was exactly the protocol on my mind. One thing, though, is that the Fediverse would need to be ported to Gemini, or at least for a new protocol to be created for Gemini.
Is there nightshade but for text and code? Maybe my source headers should include a bunch of special characters that then give a prompt injection. And sprinkle some nonsensical code comments before the real code comment.
Maybe like a bunch of white text at 2pt?
Not visible to the user, but fully readable by crawlers.
I think the issue is that text uses comparatively very little information, so you can't just inject invisible changes by changing the least insignificant bits - you'd need to change the actual phrasing/spelling of your text/code, and that'd be noticable.
Is there a migration tool? If not would be awesome to migrate everything including issues and stuff. Bet even more people woild move.
Codeberg has very good migration tools built in. You need to do one repo at a time, but it can move issues, releases, and everything.
There are migration tools, but not a good bulk one that I could find. It worked for my repos except for my unreal engine fork.
Can there be a challenge that actually does some maliciously useful compute? Like make their crawlers mine bitcoin or something.
Did you just say use the words "useful" and "bitcoin" in the same sentence? o_O
The saddest part is, we thought crypto was the biggest waste of energy ever and then the LLMs entered the chat.
At least LLMs produce something, even if it's slop, all crypto does is... What does crypto even do again?
It gives people with already too much money a way to invest by gambling without actually helping society.
Crypto does drug sales and fraud!
It also makes it's fans poorer, which at least is funny, especially since they never learn
Blockchain m8 gg
Bro couldn't even bring himself to mention protein folding because that's too socialist I guess.
LLMs can't do protein folding. A specifically-trained Machine Learning model called AlphaFold did. Here's the paper.
Developing, training and fine tuning that model was a research effort led by two guys who got a Nobel for it. Alphafold can't do conversation or give you hummus recipes, it knows shit about the structure of human language but can identify patterns in the domain where it has been specifically and painstakingly trained.
It wasn't "hey chatGPT, show me how to fold a protein" is all I'm saying and the "superhuman reasoning capabilities" of current LLMs are still falling ridiculously short of much simpler problems.
The crawlers for LLM are not themselves LLMs.
They can't bitcoin mine either, so technical feasibility wasn't the goal of my reply
You're 100% right. I just grasped at the first example I could think of where the crawlers could do free work. Yours is much better. Left is best.
Hey dipshits:
The number of mouth-breathers who think every fucking "AI" is a fucking LLM is too damn high.
AlphaFold is not a language model. It is specifically designed to predict the 3D structure of proteins, using a neural network architecture that reasons over a spatial graph of the protein's amino acids.
- Every artificial intelligence is not a deep neural network algorithm.
- Every deep neural network algorithm is not a generative adversarial network.
- Every generative adversarial network is not a language model.
- Every language model is not a large language model.
Fucking fart-sniffing twats.
$ ./end-rant.sh
There once was a dream of the semantic web, also known as web2. The semantic web could have enabled easy to ingest information of webpages, removing soo much of the computation required to get the information. Thus preventing much of the AI crawling cpu overhead.
What we got as web2 instead was social media. Destroying facts and making people depressed at a newer before seen rate.
Web3 was about enabling us to securely transfer value between people digitally and without middlemen.
What crypto gave us was fraud, expensive jpgs and scams. The term web is now even so eroded that it has lost much of its meaning. The information age gave way for the misinformation age, where everything is fake.
Much drama.
I agree about semantic web, but the issue is with all of the Internet. Both its monopoly as the medium of communication, and its architecture.
And if we go semantic for webpages, allowing the clients to construct representation, then we can go further, to separate data from medium, making messages and identities exist in a global space, as they (sort of, need a better solution) do in Usenet.
About the Internet itself being the problem - that's because it's hierarchical, despite appearances, and nobody understands it well. Especially since new systems of this kind are not being built often, to say the least, so the majority of people using the Internet doesn't even think about it as a system. It takes it for given that this is the only paradigm for the global network. And that it's application-neutral, which may not be true.
20 years ago, when I was a kid, people would think and imagine all kinds of things about the Internet and about the future and about ways all this can break, and these were normal people, not tech types, and one would think with time we wouldn't become more certain, as it becomes bigger and bigger.
OK, I'm just having an overvalued idea that the Internet is poisoned. Bad sleep, nasty weather, too much sweets eaten. Maybe that movement of packets on the IP protocol can somehow give someone free computation, with enough machines under their control, by using counters in the network stack as registers, or maybe something else.
Sound like it went the same way everything else went. The less money is involved the more trustworthy it is.
Web3 was about enabling us to securely transfer value between people digitally and without middlemen.
It's ironic that the middlemen showed up anyway and busted all the security of those transfers
You want some bipcoin to buy weed drugs on the slip road? Don't bother figuring out how to set up that wallet shit, come to our nifty token exchange where you can buy and sell all kinds of bipcoins
oh btw every government on the planet showed up and dug through our insecure records. hope you weren't actually buying shroom drugs on the slip rod
also we got hacked, you lost all your bipcoins sorry
At least, that's my recollection of events. I was getting my illegal narcotics the old fashioned way.
Crazy. DDoS attacks are illegal here in the UK.
The problem is that hundreds of bad actors doing the same thing independently of one another means it does not qualify as a DDoS attack. Maybe it’s time we start legally restricting bots and crawlers, though.
So, sue the attackers?
Anubis isn't supposed to be hard to avoid, but expensive to avoid. Not really surprised that a big company might be willing to throw a bunch of cash at it.
Question: those artificial stupidity bots want to steal the issues or want to steal the code? Because why they're wasting a lot of resources scraping millions of pages when they can steal everything via SSH (once a month, not 120 times a second)
they just want all text
That would require having someone with real intelligence running the scraper.