The use case is exploratory literature review in a specific scientific field.
I have a setup that takes pdfs and does some OCR and layout detection with Amazon, and then bunch them with some internal reports. Then, I have a pipeline to write summaries of each document and another one to slice them into chunks, get embeddings and set up a vector store for a RAG chat bot. At the moment it’s using Mixtral and the command line. But I like being able to swap LLMs to experiments with different models and quantisation without hassle, and I more or less plan to set this up on a remote server to free some resources on my workstation so the web UI could come in handy. Running this locally is a must for confidentiality reasons. I’d like to get rid of Textract as well, but unfortunately I haven’t found a solution that’s even close. Tesseract in particular was very disappointing.
Curious, can you share more details about your usecase?