I'll second the recommendation for Kaldi. It's more complicated to get running v...

I'll second the recommendation for Kaldi. It's more complicated to get running vs pocketsphinx, but in my experience Kaldi has better accuracy/lower latency in general cases vs pocketsphinx (assuming caveats below).

https://github.com/gooofy/zamia-speech/ has been training good [acoustic] models which are worth looking at (including training with robustness against noise). They've also got lots of code and docker images and documentation.

pocketsphinx isn't actually that bad to use with their latest acoustic models and small vocabularies (so its utility depends on your exact use case). But it's not generally good with far field mics/dsp processed audio, not really good with noise, and in my experiments quite not as fast as Kaldi.

Better/larger language models in my experience make a world of difference (esp in the general vocab case) for improving accuracy for either of kaldi or pocketsphinx. Nobody really seems to talk about this(?), since everyone always uses the news corpus from like the 80s as the default language model.

I haven't really ever gotten the various ~deepspeech systems working, so I can't speak to them.