Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

That doesn't sound like an amazingly safe idea


It isn't. But that's easily mitigated with temp tables, ephemeral database and COPY etc.

Upstream can easily f-up and (accidentally) delete production data if you do this on a live db. Which is why PostgreSQL and nearly all other DBS have a miriad of tools to solve this by not doing it directly on a production database


Maybe I'm missing something but I don't see how it's possible for a COPY statement alone to remove existing data.


If in the regular scenario you load 10000 rows of new data and delete the old then it’s fine.

What if someone screws up the zip and instead of 10000 today, it’s only 10?


I had this last week, but instead it was a 3rd party api and their service started returning null instead of true for the has_more property beyond the second page of results.

In either the solution is probably to check rough counts and error if not reasonable.


I think generally don't replace the prod db until the new one passes tests.


What specific risks do you foresee with this approach?


Seem totally fine to me. As long as you can rollback if the download is truncated or the crc checksum doesn’t match.


> or the crc checksum doesn’t match.

which wouldn't exist if the api is simply just a single CSV file?

at least with a zip, the CRC exists (an incomplete zip file is detectable, an incomplete, but syntactically correct CSV file is not)


DROP DATABASE blah;


That’s not how COPY FROM works in postgres. You give it a csv and a table matching the structure and it hammers the data into the table faster than anything else can.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: