Does the LLM performs differently in different languages? It'd be interesting to...

numpad0 · on March 17, 2024

Basically always worse in languages other than English. Not sure if it's just from volume of dataset or if it has to do with dataset quality, or the GPT architecture is inherently English-centric, but LLMs don't have like, a universal subconscious with superficial English frontend wrapping UG, like such that would support !(sapir-whorf). LLMs so far are kind of English-based thinking machine(if we were to recognize their apparent behavior as "thinking").

Below is just cherry-picked search results, selected largely by whether last few lines in the abstracts support my narrative, but I mean, it's a problem obvious enough that the rest of the world just knows.

0: "Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting": https://arxiv.org/abs/2305.07004

1: "Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries", https://arxiv.org/abs/2310.13132

2: "Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues Test": https://arxiv.org/abs/2402.02135

3: "Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance": https://arxiv.org/abs/2402.14531

4: "Exploring Multilingual Human Value Concepts in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?": https://arxiv.org/abs/2402.18120

5: "How do Large Language Models Handle Multilingualism?": https://arxiv.org/abs/2402.18815

canjobear · on March 17, 2024

If so it would likely be a function of amount of training data in that language.