Do people actual use Rust for datascience? I mean this literally and not as a snarky retort. I dabble in data science in personal projects but I have little concept of the field in general. Polars seems cool I just haven't seen Rust used as a general purpose datascience language like R or Python which is what an dataframe library seems like it's about. As opposed to running some demanding calculations in C or C++ for performance. It seems like Rust might fit better in the latter, but again, that's not what a dataframe lib suggests to me.
Polars has a python front end, which I have used. All the work happens in Rust but the queries can be specified in python. The data is stored using Apache Arrow format so there is no copy required for the same data to be accessible in both Rust and python.
Oh, that's cool. I did not realize this. I realize this isn't novel but I do not really understand how a library written in one language is used by another - some sort of bindings that the library handles I guess.
It looks like polars is using PyO3, which are Rust bindings for Python. Python's reference implementation is in C, so I imagine it's interacting with that API [0] through FFI. Common Python extension modules (as these are called) are compiled as dynamically linked libraries and binaries (or compilation instructions) are included in Python wheels.
I don’t think anyone is doing any explorative analytics using rust.
But I do imagine someone out there has a data engineer hat they put on, and starts rewriting Python calls to polars into pure rust code, compiling and then deploying.
I'm working on a project that leans heavily on matrix vectorization and large dirty datasets. I'm not a scientist, though.
Here's how I work.
* Hack together a PoC with python, sed, awk, grep, cut, xsv etc.
* Clean that up, run it on larger sample sets (samples made with said sed/awk/cut etc)
* Attempt to run it on the full dataset.
* Rewrite it in rust.
Step 2 and 3 are hit-or-miss in python. I find it near impossible to do any refactoring without static types and/or tests. And quite often, I'm looking at a run for over an hour to have it crash on that one broken line. Whereas the same happens in the rust version in seconds: crucial for my trial-and-error style of building.
So: Python because I must, rust, as soon as it's clear what I'm going to do.
It supports type hinting. Which helps, but is far from the tool (crutch) that I need when refactoring. Still far too much guessfactoring and not the confident "it compiles. tests are green. it's Friday: let's deploy!" that rust (or java, or even typescript) offer me.
Yes. What's cool is in Rust you have direct access to polars, so you can do all the low level munging and computations (and/or read/write the data to/from arrow directly from rust if need be) in Rust and return dataframes directly to Python. The front end is still Python, of course, but pyo3 makes it pretty trivial.