Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

As with many data analysis posts: Give me more information about the data. In this case it could be particularly interesting. You have correlation and you estimate the temperature through indirect measurements. The important word here: estimate

As with all estimates I care about the error and not much else. The only error measure provided is mean absolute error which I find a little bit unsatisfying. It gives only very little information about the behavior of the error.

The easiest is to compute the error for every sample and then show a histogram of it.

Why you might wonder? In this case we deal with varying -though often over time stabalizing- data. This means that sometimes the error is actually greater and then very little. If you take the absoluate value of it and take the average (MAE), you might underestimate (resp overestimate) the error at times. In this case I might expect a tri-modal error distribution if we consider positive & negative errors.

A histogram is much more expressive (and oh so simple to generate) than a simple MEAN, STD DEV, MAE or MSE.



Nice points,

We were told the paper would be published (and open access today) but doesn't quite seem to be up yet: http://onlinelibrary.wiley.com/doi/10.1002/grl.50786/abstrac...

The paper does have more detail, and I believe supporting materials will be released. You can also download the data and do some of your own digging (either from the big file at the bottom, or just view source and grab the js values that feed the graphs)


James,

did you guys consider (or try out) Eureqa [1] to get a good model for the estimate? It works very well with such kind of data.

[1] http://creativemachines.cornell.edu/eureqa


We actually used R, but this looks very neat, thanks for sharing!




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: