That doesn't sound like an amazingly safe idea

berkes · on Sept 15, 2023

It isn't. But that's easily mitigated with temp tables, ephemeral database and COPY etc.

Upstream can easily f-up and (accidentally) delete production data if you do this on a live db. Which is why PostgreSQL and nearly all other DBS have a miriad of tools to solve this by not doing it directly on a production database

wefarrell · on Sept 16, 2023

Maybe I'm missing something but I don't see how it's possible for a COPY statement alone to remove existing data.

LgWoodenBadger · on Sept 16, 2023

If in the regular scenario you load 10000 rows of new data and delete the old then it’s fine.

What if someone screws up the zip and instead of 10000 today, it’s only 10?

aidos · on Sept 16, 2023

I had this last week, but instead it was a 3rd party api and their service started returning null instead of true for the has_more property beyond the second page of results.

In either the solution is probably to check rough counts and error if not reasonable.

camgunz · on Sept 16, 2023

I think generally don't replace the prod db until the new one passes tests.

dambi0 · on Sept 15, 2023

What specific risks do you foresee with this approach?

diroussel · on Sept 15, 2023

Seem totally fine to me. As long as you can rollback if the download is truncated or the crc checksum doesn’t match.

chii · on Sept 16, 2023

> or the crc checksum doesn’t match.

which wouldn't exist if the api is simply just a single CSV file?

at least with a zip, the CRC exists (an incomplete zip file is detectable, an incomplete, but syntactically correct CSV file is not)

NL807 · on Sept 16, 2023

DROP DATABASE blah;

aidos · on Sept 16, 2023

That’s not how COPY FROM works in postgres. You give it a csv and a table matching the structure and it hammers the data into the table faster than anything else can.