Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

It sounds like maybe your parquet file has no partitioning. Apart from the iterating over row groups like someone else suggested, I suspect there is no better solution than downloading the whole thing to your computer, partitioning it in a sane way, and uploading it again. It's only 15 GB so it should be fine even on an old laptop.

Of course then you might as well do all the processing you're interested in while the file is on your local disk, since it is probably much faster than the cloud service disk.



What do you mean by the parquet file might have no partitioning? Is the row group size not the implicit partitioning?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: