Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Excel isn't the program of choice for most scientists and computational biologists, who typically use R, python, or command line tools. However, we often get data from other scientists or reanalyze data from other groups that can have these errors. It's so frequent of a problem that there are scientific papers about it [1].

[1] https://genomebiology.biomedcentral.com/articles/10.1186/s13...



This ignores the reality that one will ultimately have to interact with people who have no understanding of any of these tools. Unless you happen to work in a pure computational biology group, one _will_ have to interact with lab workers, biologists with no training (or understanding of) in R or python, doctors, etc. All these people will know excel.


That's why I'm in favor of this change in nomenclature.


IMO the change is good, but is a case of detected vs undetected.

Recently I was working with some colleagues, being I the computer savvy and them the lab people. I send them some data in CSV, that when opened in Excel turned 123.456 into 123456 (it was a problem with locales, some people using "," as decimal and some using "."). We noticed because the values should be between 0 and 1000. But what if the column could be between 0 and 1000000? A small quantity of numbers bumped up by a factor of 3 could fly under the radar, and distort further measurements. And the error is undetectable forever once published.

I like it better the programming language approach: look, this is how you write a string, this is a char, this is a float and this an integer. "2020-08-04" is a string until you ask me to turn it into a date. "SEPT1" is a string, and you are going to do quite the gymnastics to make me understand it as "date(2020, 9, 1)". Do you like "," or "." as thousands? Then we first turn the number into a string and then format, but the original number is kept.


Excel technically has a type system where you can change the type of value a cell has. In my experience it is difficult to convince excel to actually change the cell's data type sometimes. Doing so can often change the underlying data as well. Personally I avoid excel if I can, because it's quirks are just too frustrating. But it certainly has its uses.


I had to use Excel for 100% of my publications and posters in medical services research. Either the data is in Excel or Excel is a tidy place to put data dictionaries. While I'd love to use only R, most of my collaborators wouldn't be able to use it, it's niche, whereas Excel is the lingua franca of data analysis.


Those tools aren’t exclusive.

I use python almost exclusively for data analysis but still open files in excel to view them for lots of reasons.

Nothing in excel is reproducible, but it’s still on all my computers.

Did an analysis a few years ago of the programs run At my organization by job series and scientists run a lot of excel.


I spent 30 minutes today convincing a Data scientist why she shouldn't use excel to store her interim data and that even an untyped data forms such as csv or json would be a better medium compared to an excel document.


I feel that excel should still do the right thing which is annoying, but the change also helps people avoid confusing the names with dates, for example, in the excel sheet has dates in it well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: