That's a mix of Polish and Ukrainian in the transcript. Now, if I try speaking Ukrainian, I'm getting transcript in Russian every time. That's upsetting.
Oh no! The model won't translate to an unsupported language, and incorrectly reverts to one that it was explicitly trained on.
The base likely was pretrained on days that included Polish and Ukrainian. You shouldn't be surprised to learn it doesn't perform great on languages it wasn't trained on, or perhaps had the highest share of training data.
Played with the demo a bit. It's really good at English, and detects language change on the fly. Impressive.
But whatever I tried, it could not recognise my Ukrainian and would default to Russian in absolutely ridiculous transcription. Other STT models recognise Ukrainian consistently, so I assume there is a lot of Russian in training material, and zero Ukrainian. Made me really sad.
Thats just the result of the model only supporting russian (and 12 other languages) and not urkainian. It maps to the closest words from training data.
Writing documentation, amongst other things, has been a save for me.
I use it to understand new codebases quickly, create the documentation boilerplate for the code I'm working on that needs better docs, or update/rewrite outdated ones.
When the codebase fits in the context window, it's simple. But even if I'm working on a larger thing, it takes a bit of RAG-alike effort to build knowledge topology, and then it's super easy to get docs on (the actual!) architecture, specific components, and all the way down to atomic function level.
You made so much for Mojolicious growth and promotion and now doing great job for security audit (which is a point most people are lazy to do). Thank you much! It's a good article to re-read from time to time, just like check-list.
Do you plan any security-related articles in nearest future?