Whisper is awesome, but managing it in production environment is not easy. I am ...

qwertox · on Feb 21, 2023

Getting a server running is easy if you use https://github.com/ahmetoner/whisper-asr-webservice as a guide. It's then a REST API which you post the file to and get the transcription in return.

But I don't know what you consider being "in production". If it's for internal use then it is enough.

Here are some comparisons of running it on GPU vs CPU According to https://github.com/MiscellaneousStuff/openai-whisper-cpu the medium model needs 1.7 seconds to transcribe 30 seconds of audio when run on a GPU.

garblegarble · on Feb 21, 2023

Doesn't whisper.cpp already get you that? It takes ~6 seconds per 30 second segment on an M1 Max with the Large model. Do you mean you want snappy appearance of words shortly after you say them, rather than having to recognise in 30 second segments?