Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

OCR'd ebooks are universally trash. For one, all formatting is gone. Anything in the book other than ASCII characters will vanish. You lose links within the book and all other advanced features.

And OCR is generally just not accurate enough and still makes very visible mistakes throughout the text.

Have you read many OCR'd ebooks? I have, and every single one was massively inferior. Most I would consider barely readable.



For books that you want to keep the formatting the best option is to use Adobe Acrobat Pro and its Editable Text and Images feature. This replaces the scanned letters with a custom TrueType font. I used this in college to scan textbooks and it worked really well. Modern OCR on books is incredibly accurate.

see https://www.youtube.com/watch?v=bhJ9zqY8Da0


Open-source, free version of this is Stirling PDF https://github.com/Stirling-Tools/Stirling-PDF where you can do very accurate OCR while keep the formatting.


I love it when formatting is removed. Some ebooks especially epub don't work well with alternative fonts somehow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: