HN2new | past | comments | ask | show | jobs | submitlogin

> Remember the photocopier that worked by OCR and would occasionally mis-transcribe numbers?

That was perfectly ordinary compression?

The phenomenon is all over the place, most visible in autocorrect.



It was ordinary compression, something called JBIG2. It did not mistranscribe, but mark slightly different number or character blocks as same, resulting replaced parts in images.

In other words, its match tolerance is a bit too lax, so it get poisoned by blocks in its own dictionary, thinking it already has the blocks for things it had just scanned.

More details can be found in [0] and [1].

[0]: https://www.theregister.com/2013/08/06/xerox_copier_flaw_mea...

[1]: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...?


Yes! This is why I always turn off autocorrect! It’s true that I absolutely make more typos without it, but at least they’re obvious as typos, and not different words that potentially change the meaning of the sentence.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: