Yup, people tend to confuse concepts and refer to synthetic data as anonymised d... | Hacker News

Hacker Timesnew | past | comments | ask | show | jobs | submit

		thruflo on Feb 20, 2020 \| parent \| context \| favorite \| on: A group of ex-NSA and Amazon engineers are buildin... Yup, people tend to confuse concepts and refer to synthetic data as anonymised data. They are very different things. Anonymised data or redacted data are transformations of a data set that _hopes_ not to leak too much PII / sensitive data. People don’t use ML to anonymise but they do use ML to classify as a first step before splatting or generalising. In that case, its absolutely right that the ML classifier not being 100% results in PII leaking. This is a key reason why anonymisation and redaction are widely seen as problematic and are being replaced by synthetic data and, maybe in future, homomorphic encryption.

v4dok on Feb 20, 2020 [–]

Homomorphic encryption and any encryption in-use technology is no guarantee of privacy on its own. Synthetic data has the same dillema of utility vs anonymity as any other anonymization tech.

thruflo on Feb 20, 2020 | [–]

Yes, but starting from the position of “entirely artificial data”.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact