Hacker Timesnew | past | comments | ask | show | jobs | submit | ozgooen's commentslogin

This sounds great.

How suitable do you think this is for CPU-intensive work? I'm interested in having servers for scientific-computational work, which would be rather CPU-heavy. It would be great to offload some of this to a nearby browser for bits and pieces that desire low-latency.


I would think it would be better to have a bunch of servers in a single datacenter if low-latency between them is important. Fly.io sounds great for the case where you're willing to sacrifice low-latency between your own servers to get them to have low-latency with your users.


We have some customers doing CPU heavy tasks like image and video processing, but we're not specifically optimizing for that right now. If there's demand we might offer better processors or GPUs for those workloads, or maybe even spot pricing on idle nodes, but that's far off.


Strongly thirded. I'm not sure I agree with absolutely everything they say, but overall I think they have a far more pragmatic and honest set of answers than any competing advice I hear. For disclosure, I worked there for a year.

The EA worldview takes some getting used to though.


Please do so. The UI is all open-source react, so you may be able to copy some components directly if you wanted. I'd be happy to help people out with this if you have requests.


You can copy & paste an array of samples and Guesstimate will sample from that cluster. For instance, try pasting the following into the value field of a cell: [1,1,1,2,2,2,2,3,3,3,3,3,4,4,4,4,4,7,7,7,7]

You can use tools like distshaper6 to generate arbitrary distributions, then copy the samples into Guesstimate.

http://smpro.ca/pjs/distshaper/

Guesstimate doesn't yet support an input format for distributions outside of via samples.


I've heard of it being used in a few classes. There was one estimation session with one group of what I remember to be 8th-graders. Honestly, I really don't think you need to be great at statistics to understand the fundamental concepts.


We're keeping it running but aren't actively improving it.


Thanks for the feedback.

We have some documentation [here](https://docs.getguesstimate.com/), and some in the sidebar entree.

Generally, we recommend lognormal distributions for estimated parameters that can't be negative. This works when you span multiple orders of magnitude, though it's possible you may want an even more skewed distribution (which is unsupported).

I may be able to make a much longer video introduction some-time soon.


We initially had a lot of uncertainty on how to price it but wanted to experiment with more users rather than fewer, with the premise that if it were very successful we could scale up.

I think if I were to start again or spend much time restructuring it, I'd probably focus a lot more on enterprise customers. That would be quite a bit of work though, so I don't have intentions of doing that soon.


I feel like this was a big missed opportunity to link to your exploratory pricing guesstimate.


Cofounder Here: Happy to see this on hnews again.

Update: Matthew (the other cofounder) and I got Guesstimate to a stage we were happy with. After a good amount of work it seemed like several customers were pretty happy with it, but there weren't many obvious ways of making a ton more money on it, and we ran out of many of the most requested/obvious improvements. We're keeping it running, but it's not getting much more active development at this time.

Note that it's all open source, so if you want to host it in-house you're encouraged to do so. I'm also happy to answer questions about it or help with specific modeling concerns.

Right now I'm working at the Future of Humanity Insitute on some other tools I think will compliment Guesstimate well. There was a certain point where it seemed like many of the next main features would make more sense in separate apps. Hopefully, I'll be able to announce one of these soon.


Are you able to apply global correlations to all the variates?

One of the triggers for the financial crisis in '08 was that the Monte Carlo pricers assumed the various risks were much less correlated than they actually were.

For example, they largely assumed that it was unlikely for many mortgages or underlying MBS securities to simultaneously default (low correlation). This is how many AAA rated CDO securities ended up trading at 50%+ discounts.

IMHO, any multivariate Monte Carlo analysis that doesn't show your sensitivity to correlation is essentially useless, since your answers may change completely.

In the second example model (https://www.getguesstimate.com/models/316), Fermi estimation for startups, you would expect many of the inputs (deals at Series A, B, C, amount raised per deal) in real life to be highly correlated with each other since they all depend on 'how well is VC in general doing right now?'

The final estimate of 'Capital Lost in Failed Cos from VC' has a range of 22B to 39B, this seems way too low. The amount of VC money lost during a crisis (like in '01) can easily be an order of magnitude more.


I'd definitely agree that correlations can be a really big deal, especially in very large models like that one.

Guesstimate doesn't currently allow for correlations as you're probably thinking of them. However, if two nodes are both functions of a third base node, then they will both be correlated with each other. You can use this to make somewhat hacky correlations in cases where there isn't a straightforward causal relationship.

Implementing non-causal correlations in an interface like this is definitely a significant challenge. It could introduce essentially another layer to the currently 2-dimensional grid. It's probably the feature I'd most like to add, but the cost was too high so far.

I think Guesstimate is really ideal for smaller models, or for the prototyping of larger models. However, if you are making multi-million dollar decisions with hundreds of variables and correlations, I suggest more heavyweight tools (either enterprise Excel plugins or probabilistic programming).


Thanks for explaining your thought process, I read your other replies and it's agree that many decisions are being made without any formal probabilistic model at all. There's a lot of value in sitting down and working out how things might be related to each other.

> where there isn't a straightforward causal relationship

One way to interpret a global pairwise correlation is simply that the person building the model is being systematically biased in one direction—either being too pessimistic or optimistic. This is a 'non-causal' relationship but often the biggest contributor to variance between the model and the real world.

Philosophically, this is a bit like the difference between 538's modeling approach and Princeton Election Consortium's for the 2016 election—the former gave Hillary a 2/3 chance of winning, while the latter ascribed a ~99% chance.

The risk of leaving modeling error out is that you'll end up with much more confidence than is called for—it feels very different to come up with a point estimate (I'll save $10k this year) vs. a tight range (I'll save 9k-11k this year), if the true range is much wider.

In the former case you know your point estimate may be very far off, but in the latter you may be tempted to rely on an estimate for variance that too low.

> It could introduce essentially another layer to the currently 2-dimensional grid

You could probably get away with doing almost all of this automatically for the user as long as the decide on what the 'primary' output is:

- For every input, calculate whether it's positively or negatively correlated with the output

- Apply a global rank correlation to all the inputs with all the standard techniques, flipping the signs found above as appropriate

- Report what the output range looks with a significant positive correlation (usually the negative correlation case isn't as interesting)


You can arbitrarily (rank-)correlate any variables of any distribution using copulas as an intermediary.

So basically, draw a multivariate correlated standard normal with the intended correlation. Transform the standard normal draws into quantiles of the standard normal (this works the marginal distributions are also normal). Now you have X and Y quantiles for your target distributions and you can draw from them.

The distributional transformations will slightly attenuate the correlation, and the choppier the distributions the more attenuation you'll get. Additionally, while you can do this for 3+ variables at once, there are constraints on the possible sigma matrix describing n-dimensional correlations.

If the variables have no fixed distribution you can use the eCDF of the actual data, so it's even possible to import e.g. population and income data and produce a permutation that gives you the correlation desired.

I agree that it is fairly difficult to do this if you have arbitrary DGPs with complex interdependencies in them--if you correlate on variables A and B, then it is difficult to guarantee the observed correlation between f(A) and g(B) and vice versa--but still you can provide a lot of utility with the copula method.


>any multivariate Monte Carlo analysis that doesn't show your sensitivity to correlation is essentially useless

This rational is why this tool shouldn't be used for anything consequential -- like business decisions that can sink you company.


In my experience, many consequential business decisions aren't even made with probability distributions, let alone probabilistic models with realistic correlations. I would generally encourage people who are comfortable with more advanced probabilistic systems to use them.


I would really like to know the answer to the correlations question. It's essential for doing some of the things we do at our company.

Also, are we able to adjust the distribution of each variable?

EDIT: I think you can actually manually create correlations on sheet so it should be fine


You can choose from a few distributions (normal, lognormal, uniform) in the main editor, or you can type many others using the function editor. The sidebar describes all of them.


I’ve used the product to “guesstimate” a few things like quality of life with a higher paying job with longer commute (not worth it!) and starting a business. Love how intuitive and clean the UI is and how it puts probability estimation at my fingertips, in simple, human language.

Thank you!


Curious what was the result specifics?


Can you share what you found?


Hey this is super cool, looking forward to your future tools to complement guesstimate.

I'd imagine an Excel plugin to do something similar would be valuable.


Great work; this looks awesome. I am wondering what products would look like, if hardware engineers applied this to the modeling of future products. At my startup valispace.com for now we only allow for a simple propagation of worst-case values (gaussian distribution or worst-case stacking), but I think that specially for early design phases it would be of huge help and foresee problems in complex projects early on. Do you know of anyone using guesstimate for hardware engineering purposes?


When this was originally on HN, I was impressed and signed up as a paying subscriber. Over time, I noticed development had stopped and the tiny things that I used to work around started to annoy me more. It's still a neat platform but I cancelled my subscription last year.


Seems like you've put some solid work here. Who would you name your top competitors?


On the question of "what are other ways of doing MC analysis", there are two approaches.

The first is to use Excel apps like Oracle Crystal Ball or @Risk. These are aimed at business analysists. They're pretty expensive, but also quite powerful.

The other option is to use probabilistic programming languages. Stan and PYMC3 are probably the best now, but hopefully, some others will become much better in the next few years.

That said, this is a pretty small space. The main "business competitor" is probably people just using google sheets or Excel without distributions to make models.

Crystall Ball: https://www.oracle.com/applications/crystalball/ @Risk: https://www.palisade.com/risk/default.asp Stan: https://mc-stan.org/ PYMC3: https://docs.pymc.io


Maybe kaggle but different?


this has nothing to do with Kaggle, which is a competition platform


Is Lobe only for image data? Would it work for inputs that are text files or similar?


They support more than image input, their examples have inputs such as 3d models, accelerometer data, sound, 3d depth maps, and numeric data.

It goes the other way as well and supports generation.


Yeah! You can mainly work with images and arbitrary vectors (that's what the bounding box examples we show are using, for instance) currently, and have plans to include native support for text, video, etc. as time progresses.


I guess another question here is what are heuristics for how many images are necessary for different levels of functionality. The demos look pretty impressive, but I'm not sure how much went into them.


We've been surprised how little data folks have needed to use. If you look at the examples page you'll see in the lower right hand corner of the screen shot the number of examples they uploaded and trained on. Some examples, like the water tank, it's fine to some extent if it overfits on the training data, because the nest cam will only ever be pointed at the water tank, and it's worked in all situations and been robust for us with only ~500 examples. Other times folks are more interested in prototyping out an idea to see if it's possible on a wider scale, so a small dataset works well to prove out an idea.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: