Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Do people actual use Rust for datascience? I mean this literally and not as a snarky retort. I dabble in data science in personal projects but I have little concept of the field in general. Polars seems cool I just haven't seen Rust used as a general purpose datascience language like R or Python which is what an dataframe library seems like it's about. As opposed to running some demanding calculations in C or C++ for performance. It seems like Rust might fit better in the latter, but again, that's not what a dataframe lib suggests to me.


Polars has a python front end, which I have used. All the work happens in Rust but the queries can be specified in python. The data is stored using Apache Arrow format so there is no copy required for the same data to be accessible in both Rust and python.


Oh, that's cool. I did not realize this. I realize this isn't novel but I do not really understand how a library written in one language is used by another - some sort of bindings that the library handles I guess.


The magic of well-defined APIs! If you're interested in mixing different DS backends and kernels in a single notebook, check out Quarto:

[1]: https://quarto.org/


It looks like polars is using PyO3, which are Rust bindings for Python. Python's reference implementation is in C, so I imagine it's interacting with that API [0] through FFI. Common Python extension modules (as these are called) are compiled as dynamically linked libraries and binaries (or compilation instructions) are included in Python wheels.

[0] https://docs.python.org/3/extending/extending.html


Both R and pythons pandas uses code written in C and Fortran to do the actual calculations when you ask them to manipulate data.


Polars is a Python library as well as a Rust library, and most of its use comes through the Python library.


I don’t think anyone is doing any explorative analytics using rust.

But I do imagine someone out there has a data engineer hat they put on, and starts rewriting Python calls to polars into pure rust code, compiling and then deploying.


I'm working on a project that leans heavily on matrix vectorization and large dirty datasets. I'm not a scientist, though.

Here's how I work.

* Hack together a PoC with python, sed, awk, grep, cut, xsv etc.

* Clean that up, run it on larger sample sets (samples made with said sed/awk/cut etc)

* Attempt to run it on the full dataset.

* Rewrite it in rust.

Step 2 and 3 are hit-or-miss in python. I find it near impossible to do any refactoring without static types and/or tests. And quite often, I'm looking at a run for over an hour to have it crash on that one broken line. Whereas the same happens in the rust version in seconds: crucial for my trial-and-error style of building.

So: Python because I must, rust, as soon as it's clear what I'm going to do.


Doesn't Python support optional static types these days?


It supports type hinting. Which helps, but is far from the tool (crutch) that I need when refactoring. Still far too much guessfactoring and not the confident "it compiles. tests are green. it's Friday: let's deploy!" that rust (or java, or even typescript) offer me.


It's got type annotations and mypy has a discussion about it here as well: https://github.com/python/mypy/issues/1282


Yes. What's cool is in Rust you have direct access to polars, so you can do all the low level munging and computations (and/or read/write the data to/from arrow directly from rust if need be) in Rust and return dataframes directly to Python. The front end is still Python, of course, but pyo3 makes it pretty trivial.


Rust bindings for Python: https://github.com/PyO3/pyo3


The first code snippet in the provided link is actually Python




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: