Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I believe Vaex can do this, in addition to GPU processing and reading direct from s3. https://github.com/vaexio/vaex


To you and all the other sibling comments: Thanks a lot! Exactly what I have been looking for!

With regard to Vaex, I would really be interested in an independent benchmark comparing it to dask, spark, data.table etc. However, I have seen in the comments that others also can't find that.


The H20 benchmarks cover Dataframe operations:

https://h2oai.github.io/db-benchmark/

It has pandas, dask, Spark, data.table, Polars, etc. Sadly, Vaex is currently missing from this suite.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: