Claude's tokenizers have actually been getting less efficient over the years (I think we're at the third iteration at the least since Sonnet 3.5). And if you prompt the LLM in a language other than English, or if your users prompt it or generate content in other languages, the costs go higher even more. And I mean hundreds of percent more for languages with complex scripts like Tamil or Japanese. If you're interested in the research we did comparing tokenizers of several SOTA models in multiple languages, just hit me up.
Thanks for sharing! Have been begrudgingly using Darktable since that seems to be your best option on Linux, but the UI/UX never really clicked with me. I wish this was opensource but I will give this a shot (pun intended) for sure.
We're doing multilingual testing and I can confirm what you've observed: Gemma 4 is surprisingly good at multilingual tasks, especially given its size. This is mostly true for the dense 31B model.
Same, I quickly tested it for code gen and it produced mostly good code for simple problems, but it sometimes hallucinated words in non-English scripts inside the code.
For anyone interested in multilingual performance, which is not usually well benchmarked or reported: Gemma 4 does really well, especially the dense 31B version. In fact, it outperforms many models with an order of magnitude higher number of parameters.
It is not quite capable of performing work on really long tail languages, but their claim of 35 languages supported (and a hint of some knowledge of up to 140) was substantiated by our tests.
If you're doing work outside of English and/or need to run a translation model in your terms, Gemma 4 is a very good candidate.
Thank you. +1.
There are obviously differences and things getting lost or slightly misaligned in the latent space, and these do cause degradation in reasoning quality, but the decline is very small in high resource languages.
reply