On that note, i wonder if you could build a secondary search engine at your home, but only index a small part of the web you typically use?
This immediately sounds useless, but for me i feel[1] like i primarily search a handful of sites. If i can index the sites i mainly care about, and fallback to "normal" engines automatically or with a flag, then for the queries i care about suddenly i get 100% accurate[2] and 100% private searches.
This seems like a good idea to me. My only question, is how much bandwidth and storage are needed to even index something like reddit? If it's too much to run on a moderate home computer, then what's the use?
[1]: I don't have any data to back up this claim, though. Purely a hunch.
[2]: edit, well i guess 100% accurate depends on the search implementation, but it's still 100% private :)
If you only use a handful of sites, you can also solve this problem by building yourself a site that handles specific types of queries.
I'm doing something like this, by building a site to search lectures (https://www.findlectures.com). The "fallback" for me is to just use youtube, etc instead.
Rather than trying to index everything on Reddit or Youtube, if you just index "good" parts it's a lot easier, since there is a lot of low quality material either way. I think you're more limited by bandwidth, for what you can get into your own index.
A search index is basically a mapping of hashed search tokens -> urls, so it can be pretty efficient to store locally (e.g. for a video search engine, you just need unique words in the transcript/title, not the entire video)