More

curioussquirrel · 2026-04-17T16:52:46 1776444766

Claude's tokenizers have actually been getting less efficient over the years (I think we're at the third iteration at the least since Sonnet 3.5). And if you prompt the LLM in a language other than English, or if your users prompt it or generate content in other languages, the costs go higher even more. And I mean hundreds of percent more for languages with complex scripts like Tamil or Japanese. If you're interested in the research we did comparing tokenizers of several SOTA models in multiple languages, just hit me up.

arcanemachiner · 2026-04-17T17:25:45 1776446745

I would encourage you to post a link here, and also to submit to HN if you haven't already. :)

curioussquirrel · 2026-04-17T19:59:20 1776455960

Will do! Thanks for the encouragement

curioussquirrel · 2026-04-14T07:22:47 1776151367

Thanks for sharing! Have been begrudgingly using Darktable since that seems to be your best option on Linux, but the UI/UX never really clicked with me. I wish this was opensource but I will give this a shot (pun intended) for sure.

curioussquirrel · 2026-04-12T20:47:38 1776026858

Thank you for the transparency and insights! Very helpful.

We actually did the same thing re generating charts in brand style to avoid any mishaps, since then I sleep much better

curioussquirrel · 2026-04-11T08:47:05 1775897225

Absolutely unhinged and very entertaining. Thanks for sharing!

curioussquirrel · 2026-04-04T07:31:24 1775287884

Give Gemma 31B a shot for translation, it does a very good job at that given its size.

curioussquirrel · 2026-04-04T07:27:48 1775287668

We're doing multilingual testing and I can confirm what you've observed: Gemma 4 is surprisingly good at multilingual tasks, especially given its size. This is mostly true for the dense 31B model.

curioussquirrel · 2026-04-04T07:25:53 1775287553

Same, I quickly tested it for code gen and it produced mostly good code for simple problems, but it sometimes hallucinated words in non-English scripts inside the code.

curioussquirrel · 2026-04-04T07:22:35 1775287355

For anyone interested in multilingual performance, which is not usually well benchmarked or reported: Gemma 4 does really well, especially the dense 31B version. In fact, it outperforms many models with an order of magnitude higher number of parameters.

It is not quite capable of performing work on really long tail languages, but their claim of 35 languages supported (and a hint of some knowledge of up to 140) was substantiated by our tests.

If you're doing work outside of English and/or need to run a translation model in your terms, Gemma 4 is a very good candidate.

curioussquirrel · 2026-03-31T18:24:04 1774981444

Thank you. +1. There are obviously differences and things getting lost or slightly misaligned in the latent space, and these do cause degradation in reasoning quality, but the decline is very small in high resource languages.

curioussquirrel · 2026-01-10T12:41:21 1768048881

Such a good game and execution. Thank you