Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I like the concept and it is a good start. Pulling text from PDFs is especially painful. I think the output format needs improvement. It is just a large array of strings. It seems like the strings are sometimes a single line, and sometimes not. My particular use case is extracting raw data from a PDF. I would like to see more structure to the output. For example, knowing where new lines, tabs, etc are located would be very helpful for parsing raw data.

Here is the PDF I used to test: https://www.gov.uk/government/uploads/system/uploads/attachm...

Is there a technical reason for the 1-2MB limit or is it arbitrary?



Thanks for your comment!

That's something we can provide pretty easily and we would try to provide that in our next release. If you want us to help you with your specific problem, please send us an email at info@stamplin.com.

The limit has been set to prevent our server from crashing as we do not have, for the moment, the financial capability to support a massive server farm. Again, if this limit prevent you from using our API, we might move the limit up if you ask it by email.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: