Based on the video [1] it seems like a real time text translation. Its doesn't do real time audio translation, which is what the title of article implies.
Itβs transcribing your audio and translating it into a subtitle. That seems what would be expected, unless you thought it would synthesize a voice in Mandarin on the other end? That would probably not even be desirable, just as subtitled movies are better than dubbed movies.
[1] https://www.youtube.com/watch?v=ePYMMXVJPYg