> Actually, I think this can be very easily done fully client-side, with good accuracy. Even on Android, the voice recognition can run client-side / offline.
I'm not sure I'd say it's easy; you will certainly trade off accuracy versus a state-of-the-art server model. Among other things, Firefox users are not going to download gigabytes of recognition model, so it'd have to be a lot smaller than the server ones would be.
Very possibly it will be slower too, since the servers would most likely be using GPUs for at least parts of the recognition, but it might not be easy to ensure the same on all the millions of PCs Firefox runs on.
I'm not sure I'd say it's easy; you will certainly trade off accuracy versus a state-of-the-art server model. Among other things, Firefox users are not going to download gigabytes of recognition model, so it'd have to be a lot smaller than the server ones would be.
Very possibly it will be slower too, since the servers would most likely be using GPUs for at least parts of the recognition, but it might not be easy to ensure the same on all the millions of PCs Firefox runs on.