The Google translate offline translation datasets are absolutely tiny - like, 20mb in size for French. Obviously this is heavily quantised and so on, but it’s not that surprising that a 30-60+ billion parameter language model outcompetes Google translate handily.
I assume you’re right - that Google translate hasn’t been updated to take advantage of much bigger, more computationally complex models. I suppose for direct translation it’s probably not needed, but being able to ask chatgpt to explain the translation (and any cultural nuances involved) is a game changer when you’re trying to learn a language.
But they could easily be using a larger dataset online?
Which goes two ways: maybe this line of reasoning doesn't mean anything; or well yes exactly, but why so small online when they have all this space and also offer Bard.
They don't charge anywhere near enough to do that, I'd imagine; and likely couldn't at the scale they operate at (I mean, they are even embedded into many apps to help instantly translate banal things like comments). Imagine trying to translate a long news article with a sequence of max-length LLM inferences.
Yeah I'm not actually suggesting it run through Bard/a LLM, I just mean surely small dataset size is a design requirement for space constrained devices' offline translation, it doesn't necessarily mean they use the same datasets online, and if they do.. why, because it seems to be enough?
(It's a bit confusing to talk about because surely it is just an older version of the same sort of thing, it's a less large language model right? I just think it could/should/would be a bit larger in the online hosted version.)
I assume you’re right - that Google translate hasn’t been updated to take advantage of much bigger, more computationally complex models. I suppose for direct translation it’s probably not needed, but being able to ask chatgpt to explain the translation (and any cultural nuances involved) is a game changer when you’re trying to learn a language.