The Google translate offline translation datasets are absolutely tiny - like, 20...

OJFord · on Aug 10, 2023

But they could easily be using a larger dataset online?

Which goes two ways: maybe this line of reasoning doesn't mean anything; or well yes exactly, but why so small online when they have all this space and also offer Bard.

saurik · on Aug 10, 2023

They don't charge anywhere near enough to do that, I'd imagine; and likely couldn't at the scale they operate at (I mean, they are even embedded into many apps to help instantly translate banal things like comments). Imagine trying to translate a long news article with a sequence of max-length LLM inferences.

OJFord · on Aug 10, 2023

Yeah I'm not actually suggesting it run through Bard/a LLM, I just mean surely small dataset size is a design requirement for space constrained devices' offline translation, it doesn't necessarily mean they use the same datasets online, and if they do.. why, because it seems to be enough?

(It's a bit confusing to talk about because surely it is just an older version of the same sort of thing, it's a less large language model right? I just think it could/should/would be a bit larger in the online hosted version.)