It’s really fast - nice job! Can you elaborate on the ranking algorithm you are using? It seems that this will become more important as you index more pages.
Thanks! A really simple one for now: number of matching terms, and then prioritising matches earlier in the result string. But this is something I'm looking forward to working on properly when I get a bigger index.
I also want to incorporate a community aspect to ranking, allowing upvoting and downvoting of results. I've not yet figured out how to reconcile this idea with not having any tracking though. Perhaps a separate interface for logged-in users.
One ambitious project I've thought about over and over again over the years is search (and social sites / forums) where the votes, tags, and flags make a public dataset and users can manipulate their own weights (or even the ranking algorithm) to construct a "web of trust" that yields favorable results.
This way you can escape spammers, powertripping moderators, and the tyranny of the hive mind; it doesn't matter if there's a large population of spammers, shills, and idiots upvoting crap because you set their weights to zero (or negative). In fact, that becomes a feature, because by upvoting crap, they generate a crap filter for you. If the weights are also public, then you can automatically & algorithmically seed your web of trust (simplest algo for sake of example: give positive weight to identities who upvoted and downvoted the same way you did) but you could still override the algo with manually set values if it gave too much weight to bad actors.
Obviously this has privacy implications (all your votes and your network becomes public), and can generate a large dataset (performance challenge, how do you distribute it / give access to it?), so it's far from a trivial project. For the privacy angle, I'd start by keeping identities pseudonymous (e.g. a public key or random id -- you don't know who's behind the identity unless they blurt it out). Furthermore, I think it'd be useful to automagically split your actions across multiple identities so it's harder to link all your activity. I think the system should also explicitly allow switching identities, for privacy but also because sometimes you just want a different "filter bubble" which helps tailor the content you get to what you're looking for. Maybe the network that yields best shopping results isn't the same network that yields best cooking recipes or technical docs.
With this model, everyone is a moderator and everyone can defer moderation to identities they trust, but neither the hive mind nor individuals have the ultimate power to dictate what you see. If you want to read spam or conspiracy theories, you just switch to your identity which upvotes such content and has positive weights towards other identities with similar votes.
I doubt you're going to build this; I doubt people want this. I certainly want it. Maybe one day I'll try, but it probably won't work well without network effects (=reasonably large quantity of users). I just wanted to let you know about the idea because your project is inspiring and inspiring things inspire me to share ideas.. :)
This sounds like the sorting hat algorithm (tiktok) applied to query search engines. If there could be a way to visualize your recommendations network and switch to others without logging out, this could work really well. But a lot of research needs to be done, and the interest of big actors is to keep users blind inside their webs.
This topic is interesting to me because I'm building a faster search engine for programming queries and trying to solve the core issues that got us stuck with crappy engines.
I think there's something worth looking into here. Great insight about the "crap filter". What would you say is the most technically ambitious part about this project? I'd reckon that it is the resources to constantly pull updated content from all monitored pages at an acceptable frequency e.g. daily/hourly/minute-ly.