Hacker Timesnew | past | comments | ask | show | jobs | submit | curioussquirrel's commentslogin

Claude's tokenizers have actually been getting less efficient over the years (I think we're at the third iteration at the least since Sonnet 3.5). And if you prompt the LLM in a language other than English, or if your users prompt it or generate content in other languages, the costs go higher even more. And I mean hundreds of percent more for languages with complex scripts like Tamil or Japanese. If you're interested in the research we did comparing tokenizers of several SOTA models in multiple languages, just hit me up.

I would encourage you to post a link here, and also to submit to HN if you haven't already. :)

Will do! Thanks for the encouragement

Thanks for sharing! Have been begrudgingly using Darktable since that seems to be your best option on Linux, but the UI/UX never really clicked with me. I wish this was opensource but I will give this a shot (pun intended) for sure.

Thank you for the transparency and insights! Very helpful.

We actually did the same thing re generating charts in brand style to avoid any mishaps, since then I sleep much better


Absolutely unhinged and very entertaining. Thanks for sharing!

Give Gemma 31B a shot for translation, it does a very good job at that given its size.


We're doing multilingual testing and I can confirm what you've observed: Gemma 4 is surprisingly good at multilingual tasks, especially given its size. This is mostly true for the dense 31B model.


Same, I quickly tested it for code gen and it produced mostly good code for simple problems, but it sometimes hallucinated words in non-English scripts inside the code.


For anyone interested in multilingual performance, which is not usually well benchmarked or reported: Gemma 4 does really well, especially the dense 31B version. In fact, it outperforms many models with an order of magnitude higher number of parameters.

It is not quite capable of performing work on really long tail languages, but their claim of 35 languages supported (and a hint of some knowledge of up to 140) was substantiated by our tests.

If you're doing work outside of English and/or need to run a translation model in your terms, Gemma 4 is a very good candidate.


Thank you. +1. There are obviously differences and things getting lost or slightly misaligned in the latent space, and these do cause degradation in reasoning quality, but the decline is very small in high resource languages.


Such a good game and execution. Thank you


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: