Hacker Timesnew | past | comments | ask | show | jobs | submitlogin
Show HN: Discover the tech stack for any website (sitestacks.com)
53 points by ishansgupta on Oct 4, 2017 | hide | past | favorite | 38 comments


So today I learnt that Amazon.com is acutally hosted on Google Cloud Platform uses Google Domains as well as 1&1 Hosting, alongside Adobe TagManager and Google Analytics. Who could have known? \s

Something is really off there, what do you do to get these results? https://sitestacks.com/amazon.com


wow !! unsure if that's actually true.


It's at least partially true:

This is valid: https://mail.google.com/a/amazon.com

This is not valid: https://mail.google.com/a/thisisafakedomain.cx

All that means is that they at least used Google Apps at one time. I have a number of abandoned accounts that show as valid even though the domains are dead and the accounts are no longer used.

The same type of domain enumeration doesn't work for Drive.


Interesting, but the results seem kind of misleading. For example, https://sitestacks.com/linestarve.com says I'm using "NameCheap Web Hosting" -- they're my registrar, but I run my own HTTP server for that domain. Meanwhile https://sitestacks.com/analytics.bitmash.io reports that it's built with Piwik (true, but I would expect that to appear on a page using Piwik, which it doesn't), but for some reason leaves out all mention of webserver, DNS hosting, and so on.


Same with my site, says iwantmyname hosting, but in reality that's just my registrar. Actually I'm using a very popular cloud VM service, so I'm not sure why that wasn't picked up.

Also couldn't discern my actual web stack beyond nginx and html5 and some javascript libraries like google analytics.

It's probably not very detectable, though...flask.py


Thanks for the feedback. Sounds like NameCheap should be picked up as a registrar; we'll take a look!

On the second point, we should clarify that the data is meant to be domain-wide (except for the real-time mapper for domains not already in our universe - there we only look at the first page so we can return data quickly).


What do you mean by 'domain-wide'? Is it supposed to pick up everything on the domain and its subdomains?


Had a quick go and was pretty disappointed. Most of this is just having a look at the third party js loaded. Other things are wild guesses based on DNS, which has nothing to do with the website but are either out of date or reside in unassociated parts of the business. The tech stack detection was was almost entirely wrong :(


In order for me to actually use your service it would need to be rolled into a browser extension. This needs to be at your fingertips when you need it, now I need to copy-paste the url into a new tab. I use Wappalyzer [1] a couple of times a day and love the browser integration.

[1] https://wappalyzer.com/



These tools never work well, usually a weak combination of scraping the html, JS tags and checking DNS entries. Siftery tried something similar with a bunch of VC raised and is equally bad.

The only decent one is http://builtwith.com which also happens to be a great 1-man business.


From the website:

> This is a limited profile. > To see additional products added by employees and vendors, check out Siftery's profile.

Both sites use the same domain provider and the same SSL cert provider. I'd say with medium confidence they are probably 2 products of the same company.


Looks like that's corroborated by the other comments on this page, which makes it even worse.


doesn't http://www.builtwith.com/ do this already and have so much data too ?


Yeah, and they're not the only provider you can point to. Some competition pushes everyone to do better?

Here's my pitch for why you might want to use SiteStacks with a browser extension:

Lightweight: The extension is only 25 KB (mostly images). The one-click technology lookup runs entirely on our servers. No content insertions or background processes to slow down your browsing. It's like your favorite search engine.

Secure: SiteStacks doesn't download any of your browsing data - only the active tab URL is passed along.

Great product coverage: SiteStacks is supported by Siftery and its library of over 40,000 products. SiteStaks includes data for some products that isn't publicly available anywhere else.

Best-in-class accuracy: The data on SiteStacks benefits from validation form the awesome Siftery community. This built-in constant feedback loop helps us identify data collection methods that are yielding bad data and ultimately promotes best-in-class data accuracy.


Yup and this doesn't seem to be stacking up well against it.



SiteStacks can find the technology used at any domain, including a set of roughly 700,000 that we’re regularly checking.

What makes the dataset unique is the combination of programmatic data (code breadcrumbs, network requests, DNS, some NLP, etc.), but augmented by data validated by users directly.

The user validated data is only available on Siftery (e.g. for sitestacks.com/uber.com you have to follow the link through to siftery.com/company/uber to see the full set), but all the programmatic methods are improved by user-validated data (e.g. if a method yields too many false positive, we bump it out).

We think this approach helps create the most accurate dataset of its kind. We’ve done some internal benchmarking and feel really good about it.

We’re looking for feedback on how this can be better, and open to partnering with others who want to make use of this data for good.


> SiteStacks can find the technology used at any domain

I punched in a URL of a website I built, it didn't have data, went out to get some, then reported back that it couldn't.

Meanwhile https://builtwith.com/reservations.camprrm.com worked.


just tried this for our app, and it wrongly reported mandrill and flash (we’re not using any of them). we used mandrill a few years ago, so this might be some stale historical data, but the app never used flash.


What's the URL? We can report back exactly why it was picked up.

Even if we're wrong, it's exactly this kind of feedback loop that's built into the product and ultimately helps make the data better for everyone else.


Could mandrill stuff still be listed in your DNS? Like TXT or SPF or DKIM?


You already did a Show HN on this, though...

https://hackertimes.com/item?id=15249136


I should also mention we have new browser extensions for chrome - [1] and firefox - [2]

[1] https://chrome.google.com/webstore/detail/sitestacks-instant...

[2] https://addons.mozilla.org/en-US/firefox/addon/sitestacks/re...


Also see Whatruns which was on hn last month : https://hackertimes.com/item?id=15098028


> We’re looking for feedback on how this can be better, and open to partnering with others who want to make use of this data for good.

Responding to the feedback and data discrepancies mentioned here would be a good start. The HN community here is testing this for you for free and providing you with valuable feedback, and asking you questions that you need to answer, if you want to make your product useful.

I don't see you (OP) responding to anyone. The 2 posts from you are both promoting the site.


We are looking into the data discrepancies, it might take a bit of time to get through the individual errors. DNS + Registrar data seem to have brought on false positives. We need to beef up on Front-end tech too. The real time mapper needs work.

Humbling comments really, we have our work cut out.


https://sitestacks.com/products/g-suite-formerly-google-apps...

Doesn't seem like it is accurate though. Seems like this is more of crowdsourced data than automatically figuring out things.


Ran it on https://canpicker.com/. It's kind of cool and accurate but I was expecting it to pick up stuff like react and maybe finer grain details like individual libraries.


Passenger doesn't necessarily imply Rails. builtwith.com gets this wrong too.


Thanks for that piece of feedback. Team's looking into it.


I wish I could be more specific but I looked at this for the site I work on and quite a bit of it is incorrect.


I tried it with my website. It spotted google analytics (correct) and HTML5 (correct again)

I was curious to see if it was going to work out that I am running it against Google App Engine but it did not figure that


Congrats on the launch. It looks like I can click on a tag (on hover, color and cursor change), but when I do that nothing happens.


You're right. We'll fix that. Thanks!


It does not pick react, and mis-reported a site as using angular while it was based on React.


Id like to see some ability to detect different CMSs


Our focus has been more on business tech, but we're expanding coverage into cms, libraries etc.

Here's some data we have on CMSs currently

https://siftery.com/categories/content-management-system-cms... https://sitestacks.com/products/wordpress https://siftery.com/wordpress




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: