this post was submitted on 29 Aug 2025
2 points (100.0% liked)
Technology
40191 readers
326 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Never heard of Kagi before, article convinced me I don't wanna use it anyways...lol.
Wasn't the original Google search algorithm published in a research paper? Maybe someone with more domain knowledge than I could help me understand this: is there any obstacle to starting a search engine today that just works like that? No AI, no login, no crazy business...just something nice and rudimentary. I do understand all the ways that system could be gamed, but given Google/Bing etc.'s dominance, I feel like a smaller search engine doesn't really need to worry about people trying to game it's algorithm.
The basic algorithm is quite straightforward, it's the scale and edge cases that make it hard to compete.
"Ideally", from a pure data perspective, everybody would have all the data and all the processing power to search through it on their own with whatever algorithm they prefer, like a massive P2P network of per-person datacenters.
Back to reality, that's pretty much insanely impossible. So we get a few search engines, with huge entry costs, offering more value the larger they get... which leads to lock-in, trying to game their algorithms, filtering, monetization, and all the other issues.
Hrrmm. Webrings it is. But also, the search engine problem seems like one calling out for a creative solution. I'll try to look into it some more I guess. Maybe there's a way that you could distribute which peer indexes which sites. I would even be fine sharing some local processing power when I browse to run a local page ranking that then gets shared with peers...maybe it could be done in a way where attributes of the page are measured by prevalence and then the relative positive or negative weighting of those attributes could be adjusted per-user.
Hope it's not annoying for me to spitball ideas in random Lemmy comments.
There is an experimental distributed open source search engine: https://dawnsearch.org/
It has a series of issues of its own, though.
Per-user weighting was out of the reach of hardware 20 years ago... and is still out of the reach of anything other than very large distributed systems. No single machine is currently capable of holding even the index for the ~200 million active websites, much less the ~800 billion webpages in the Wayback Machine. Multiple page attributes... yes, that would be great, but again things escalate quickly. The closest "hope", would be some sort of LLM on the scale of hundreds of trillions of parameters... and even that might fall short.
Distributed indexes, with queries getting shared among peers, mean that privacy goes out the window. Homomorphic encryption could potentially help with that, but that requires even more hardware.
TL;DR: it's being researched, but it's hard.
Makes me wonder if something similar to the veilid architecture could solve some of the problems.