Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

What kind of arms do you envision for normalizing text down to Latin-1, or even ASCII, while normalizing any whitespace to single space characters?

There is a known variant when every subscriber of a confidential text receives a slightly different copy with the same meaning. But it's much harder to implement, and does not scale.



Not that much harder to implement. You could automate that too with some basic text substitution. Eg replacing various instances of "and" with ampasand or plus sign. You could also vary joined words like "without" / "with out", and alternative between the types of quotation marks used (as there are several in unicode).

And this is without breaking into more intelligent heuristics where you swap out synonyms ("more intelligent" because you'd need to be careful not to alter passages that need to be kept verbatim, like quoted text or where a synonym might alter the context of the sentence.but with a little care I think that is achievable as well)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: