Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Hey, great project - the more competition in this space the better. To be honest, at the moment the algorithm doesn't return any sensible results for anything (at least that I can find), but I hope that you can find a way past this as it's a great place to have a project.

I've included some search terms below that I've tried - I've not cherrypicked these and believe they are indicative of current performance. Some of these might be the size of the index - however I suspect it's actually how the search is being parsed/ranked (in particular I think the top two examples show that).

> Search "best car brands"

Expected: Car Reviews

Returns a page showing the best mobile phone brands.

then...

> Then searching "Best Mobile Phone"

Expected: The article from the search above.

Returns a gizmodo page showing the best apps to buy... "App Deals: Discounted iOS iPhone, iPad, Android, Windows Phone Apps"

> Searching "What is a test?"

Expected result: Some page describing what a test is, maybe wikipedia?

Returns "Test could confirm if Brad Pitt does suffer from face blindness"

> Searching "Duck Duck Go"

Expected result: DDG.com

Returns "There be dragons? Why net neutrality groups won't go to Congress"

> Searching "Google"

Expected result: Google.com

Returns: An article from the independent, "Google has just created the world’s bluest jeans"



I guess that's the real problem. People like to wonder what would be the "ideal world" in a search engine. It may be wishful thinking, I don't know.

It seems really hard to produce quality search results. Takes a lot of investment. Makes it an expensive product. But no one wants to pay. So selling ads it's the only way forward.

Maybe there's a way to convince people to pay what it takes? I dunno...


I would gladly pay $5 a month for a Google-quality search service that doesn’t track me. I’ve been using Duck Duck Go for most of this year, but frequently find myself falling back to !g because Google’s results really are much better.

I wonder how much money Google search makes per the average user. Is it more than $5/mo?


It is a bit above your price point, but I have been using Kagi.com (not affiliated, just impressed). They're in beta, but will charge ~$10 once they go GA. Like you, I tried DuckDuckGo for awhile, but resorted to g! so often that I started using it for everything out of habit.

In contrast, Kagi provides Google-quality results mosts of the time, better-than-Google semi-often, and worse-than-google rarely. They support g!, but I only use it a couple of times a week, usually for site-specific searches.

Additionally, I really like that I am their customer and not their product - incentives are aligned for them to continue respecting my privacy and preferences.


I've been quite happy with DDG, serves about 80% of my needs.

The other 20% I resort to Google are mostly things with a geographical/country context, which DDG really sucks and Google excels.


If you are OK with using Google, why not simply use Google for 100% of your needs?


One only uses Google services if one absolutely needs to. Google on the other hand never needs you. You are absolutely unnecessary and super easy to replace. Whichever you chose, the next search engine really needs our queries. If we give them enough they might be able to create a competitive product. If they do google will dramatically improve. I'm sure they have plenty of ideas, the incentive is just not there.


The problem with this approach is that the group most likely to pay to remove ads is also the same group that is most attractive to advertisers. So your pool of users to whom you serve ads is less attractive overall. That means you need to figure out not just how much you make serving those users ads, but also how much you lose by removing them from the ad viewer pool.


Ironically enough tracking you is what improves the search results.


Likewise, I'd pay that as well (I'd like no ads as well, but tracking is the main issue for me).

Similarly, I do have DDG as my main search on all machines and devices just out of principle, but its region-aware searching (I'm in NZ, and often only want NZ results) are very close to useless in my experience (with NZ as the region ticked, it will still return results from .ca and .co.uk domains, which I would have hoped would be almost trivial to remove), and Google seems much better in this area (but not perfect).

Similarly, there's often technical/programming things I'll search for that DDG doesn't have indexed at all, and Google does.

Google also seems a lot better at ignoring spelling differences (color/colour, favorite/favourite) than DDG, which is often (but not always!) useful.


In a recent HN post the creator of Kagi[1] was talking about it. I think it's probably what you're looking for.

[1] https://kagi.com/


I've read a few years ago an estimate revenue at $180/user/year in the US. Rest of the world was lower.

Many here would pay $5/mo, but probably only a handful would pay $180/year.

Now imagine the hundreds of millions they serve. Safe to say most wouldn't pay even $1/year.

Also, if Google loses even 10% or 20% of their audience, the overall value lost will probably higher due to network effects.


While search funds most of Google, most employees at Google don't work on search. A small proportion of Google's revenue would be plenty for good quality search.


Just wanted to mention that you could try falling back to StartPage (!sp) first as it uses results from Google.


I've been enjoying a lot neeva, not affiliated at all, just a happy user :-)

I think they are $4.95/mo or something? haven't payed a cent yet since there are a few discounts that they do to prompt you to learn how to use it (I really liked that, and def made me more likely to stick with it!)


$5 per month is sadly out of budget for many third world citizens because that $5 can be urgently used elsewhere to plug a need. I think somewhere in the range of $0.1-$0.5 might be doable though.


I don't know why Google doesn’t offer a paid for ad free version. This would reduce their reliance on the ad industry and become more like a utility.


> because Google’s results really are much better.

Or maybe they're average, but you only see the ones where DDG fails.

Next time also try Yandex and Baidu.


If you have specific areas of interest that are unpopular enough your local yacy instance can index those [say] 100 or 10 000 websites and the results will blow you away. Google is still useful of course but its a joke by comparison.

Say for laughs you are only interested in yourself. You put every page by you and about you in the crawler. It will obviously render fantastic results. Using google you would have to start every query with your full name OR user names, whatever you type behind it doesn't even matter, it wont return pages with all keywords. With yacy you just type the query and it will return EVERYTHING. To compare the 2 would be to compare useless with perfection.


Thanks for the feedback! I'll take a look at your examples and see if I can improve the rankings.


Fun idea. It seems to be getting stuck on the first word you enter.

e.g. you get the same results for “London” as you do for “London cats”, “London cat rescue” and “London test”.


The first thing I do to test a search engine is to search for my own username on various public sites to see if it can find me. It didn’t find me. But keep it up and I’m sure I’ll be in there eventually (or maybe a overestimate how interesting I am, hehe).


I get this is your usual testikg case for search engines, but if you've read their README you'd have seen its inaproppriate for the project at the current stage.


I was curious and tried a bunch of other searches, with similarly disappointing results. My searches were a bit more esoteric than Closi's.

"langlands program" (pure mathematics thing): yup, top result is indeed related to the Langlands program, though it isn't obviously what anyone would want as their first result for that search. Not bad.

"asmodeus" (evil spirit in one of the deuterocanonical books of the Bible, features extensively in later demonology, name used for an evil god in Dungeons & Dragons, etc.): completely blank page, no results, no "sorry, we have no results" message, nothing. Not good.

"clerihew" (a kind of comic biographical short poem popular in the late 19th / early 20th century): completely blank page. Not good.

"marlon brando" (Hollywood actor): first few results are at least related to the actor -- good! -- but I'd have expected to see something like his Wikipedia or IMDB page near the top, rather than the tangentially related things I actually god.

"b minor mass" (one of J S Bach's major compositions): nothing to do with Bach anywhere in the results; putting quotation marks around the search string doesn't help.

"top quark" (fundamental particle): results -- of which there were only 7 -- do seem to be about particle physics, and in some cases about the top quark, but as with Marlon Brando they're not exactly the results one would expect.

"ferrucio busoni" (composer and pianist): blank page.

"dry brine goose" (a thing one might be interested in doing at this time of year): five results, none relevant; top two were about Untitled Goose Game.

"alphazero" (game-playing AI made by Google): blank page. Putting a space in results in lots of results related to the word "alpha", none of which has anything to do with AlphaZero.

OK, let's try some more mainstream things.

"harry potter": blank page. Wat. Tried again; did give some results this time. They are indeed relevant to Harry Potter, though the unexpected first-place hit is Eric Raymond's rave review of Eliezer Yudkowsky's "Harry Potter and the Methods of Rationality", which I am fairly sure is not what Google gives as its first result for "harry potter" :-).

"iphone 12" (confession: I couldn't remember what the current generation was, and actually this is last year's): top results are all iPhone-related, but first one is about the iPhone 6, second is from 2007, this is about the iPhone 6, fourth is from 2007, fifth is about the iPhone 4S, etc.

"pfizer vaccine": does give fairly relevant-looking results, yay.


Thanks for the detailed feedback! I think most of most of the problems here are because we have a really small index right now. Increasing the number of documents is our top priority. I agree that some kind of feedback when there are no results would be a good idea.


I actually think it’s probably the algorithm too - if I take one of the search items returned from a search that I know is in the index, but then search for it with slightly different terminology (or a different tense / pluralisation), the same item doesn’t come up.


Ah good point, we haven't done any stemming or anything like that yet.


What does a search engine algorithm look like, and where can I find examples to build from?


Depends on which algorithm you are looking for, but these are commonly used:

* Okapi BM25 for determining the relevance of a result to a query.

* TF-IDF for determining the relevance of a term to a document.

* PageRank for ranking domains.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: