HN2new | past | comments | ask | show | jobs | submitlogin

I occasionally get a single hallucinated word (more like a mis-transcription) where the audio contains a clunk/bang/cough/etc, but I've never had full hallucinated phrases from clean silence.

There are a couple of GitHub discussions on the Whisper repository with various fixes/hacks to deal with it: https://github.com/openai/whisper/discussions/679 https://github.com/openai/whisper/discussions/813

If you get a chance, I encourage you to try out the other newer models I mentioned, I think you'd be very impressed.



I don't see this much different than what commonly happens with humans when we hear our named called when it was some environmental noise.

As for the silence, I wonder why the the model even receives it. I would think a lot of that would be compressed out of existence to save bandwidth.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: