John McWhorter has a book about this called The Language Hoax: Why the World Looks the Same in Any Language in which he's extremely skeptical of Sapir-Worf, particularly the sort of stoner linguistics "what if we're all, like, made of language maaaaaan" as it's usually described in what seems to be the biannual Popular Science article.
In fact, time perceived as volumetric rather than as a length is the specific example he uses and includes the actual research. There are extremely specific cases like this where it does in fact hold up but the effect is extremely minute, to the point that it's difficult to even measure properly. In most cases it doesn't hold up at all. But these very tiny barely measurable cases are often used as evidence for frankly nonsense claims so it's important not to extrapolate this to "Japanese blue and green are the same word so Japanese people must be colourblind", which is where people often take this.
The strict interpretation of Sapir-Whorf, where language determines thought, is obviously nonsense. But the weak interpretation of Sapir-Whorf, where language merely influences thought to at least some degree, is obviously true (which is why Sapir-Whorf is pretty useless as a statement either way). The fact that there are minute differences between language speakers is because language is highly malleable and very difficult to police, so speakers tend to alter their language specifically to make it easier to think about things that are important to them (in the same way that we as programmers restructure code and rename variables to make the program easier to understand). In addition, global human societies are connected enough that useful linguistic concepts are rapidly disseminated into every language.
English now just catches all the other words that it doesn't have definitions for and brings them into the language. A lot of Buddhist terminology made it in this way in the 19th century bc English had no direct translation so they just gave the words English definitions, essentially adding them to language.
Even if 100% of humans spoke English as their primary language, we would still have words like Dharma, samsara, nirvana etc.
English will eventually take from every culture every word of significance without a direct translation - these words will be understood by their English definitions to the majority of the global population.
We will still call that amalgamation of global languages English
Don't overlook that language is fundamental to the process of discovery of the "truth" here, as is culture. For example (of culture), if someone was to suggest we clean up our language while discussing the matter, the notion would be rejected absolutely.
> extremely skeptical of Sapir-Worf, particularly the sort of stoner linguistics "what if we're all, like, made of language maaaaaan"
I don't get how this kind of skepticism can exist under this current "arguably alive" LLM hype. They're machine-executable form of strong Sapir-Worf hypothesis, literally things that think* and speak solely by use of English language and English language alone(they sound quite like machine translation in other languages).
Is that the right way to think of LLMs (that they "think in English")?
I think a better way to think of them is as an n-dimensional "meaning space" where words/phrases are n-vectors, n is a very large number, and each dimension has semantic meaning. It may be the case that this meaning space is pretty much the same between all natural languages, which would be evidence that Sapir-Whorf is false, that differences between natural languages are largely cosmetic
That'll be the official explanation, but I've yet to see a working LLM that don't speak in translated American.
As one possible counter example, I've seen one of 7B models insist in using a Chinese verb in a Japanese sentence, and while it's fascinating in itself, it's not necessarily in line with that "difference in languages are cosmetic and we just don't realize" narrative.
Basically always worse in languages other than English. Not sure if it's just from volume of dataset or if it has to do with dataset quality, or the GPT architecture is inherently English-centric, but LLMs don't have like, a universal subconscious with superficial English frontend wrapping UG, like such that would support !(sapir-whorf). LLMs so far are kind of English-based thinking machine(if we were to recognize their apparent behavior as "thinking").
Below is just cherry-picked search results, selected largely by whether last few lines in the abstracts support my narrative, but I mean, it's a problem obvious enough that the rest of the world just knows.
0: "Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting": https://arxiv.org/abs/2305.07004
1: "Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries", https://arxiv.org/abs/2310.13132
2: "Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues Test": https://arxiv.org/abs/2402.02135
3: "Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance": https://arxiv.org/abs/2402.14531
4: "Exploring Multilingual Human Value Concepts in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?": https://arxiv.org/abs/2402.18120
Thanks for this. It looks like a great read. As an immigrant I was always mildly annoyed by this idea.
Like, yes, my language has a ton of words for all possible familial relationships, for example different words for maternal and paternal uncle, but that’s because familial relationships are important in my culture and that’s reflected in the language.
Language is the reflection of culture, not the other way around
In fact, time perceived as volumetric rather than as a length is the specific example he uses and includes the actual research. There are extremely specific cases like this where it does in fact hold up but the effect is extremely minute, to the point that it's difficult to even measure properly. In most cases it doesn't hold up at all. But these very tiny barely measurable cases are often used as evidence for frankly nonsense claims so it's important not to extrapolate this to "Japanese blue and green are the same word so Japanese people must be colourblind", which is where people often take this.