Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I think it's only a matter of time before people start trying to optimize model performance and token usage by creating their own more technical dialect of English (LLMSpeak or something). It will reduce both ambiguity and token usage by using a highly compressed vocabulary, where very precise concepts are packed into single words (monads are just monoids in the category of endofunctors, what's the problem?). Grammatically, expect things like the Oxford comma to emerge that reduce ambiguity and rounds of back-and-forth clarification with the agent.

The uninitiated can continue trying to clumsily refer to the same concepts, but with 100x the tokens, as they lack the same level of precision in their prompting. Anyone wanting to maximize their LLM productivity will start speaking in this unambiguous, highly information-dense dialect that optimizes their token usage and LLM spend...



Have you just reinvented programming languages and reinforced the author's point?

Setting aside the problem of training, why bother prompting if you’re going to specify things so tightly that it resembles code?


Programming languages admit only unambiguous text. What he's proposing is more like EARS, Gherkin or Planguage.


Not necessarily. I was intending it as a thought experiment illustrating why some kind of formal language (whether that mean technical jargon, unambiguous syntax, unambiguous semantics, conlangs, specification languages, or some combination thereof) will eventually arise from natural language - as it has countless times in the past, within mathematics (as referenced in TFA) and elsewhere. Gherkin is kind of nice though.


Unless you're training your own model, wouldn't you have to send this dialect in your context all the time? Since the model is trained on all the human language text of the internet, not on your specialized one? At which point you need to use human language to define it anyway? So perhaps you could express certain things with less ambiguity once you define that, but it seems like your token usage will have to carry around that spec.


Let's use a non-ambiguous language for this. May I suggest Lojban [1][2]?

[1] https://en.wikipedia.org/wiki/Lojban

[2] Someone speaking it: https://www.youtube.com/watch?v=lxQjwbUiM9w


Lojban allows you to be vastly more semantically ambiguous than English while still speaking technically correctly. A single predicate-word (“gismu”) is a valid utterance. For example, saying “tanxe” is so vague or context-dependent enough as to be hard to translate: “something unspecified is a box, at an unspecified time or place, and it may or may not even exist”. Language will not save you. Or if it will, we already have them in the form of programming languages.


Lojban allows you to speak ambiguously, it just disallows grammatical ambiguity because in the 70s it was hypothesized that NLP understanding was impossible so humans would have to adapt instead of computers. That debate is over; understanding grammar is solved. The new debate is over semantic ambiguity.


It looks like that's about syntactic ambiguity, whereas the parent is talking semantic ambiguity


> optimizes their token usage and LLM spend

Context pollution is a bigger problem.

E.g., those SKILL.md files that are tens of kilobytes long, as if being exhaustively verbose and rambling will somehow make the LLM smarter. (It won't, it will just dilute the context with irrelevant stuff.)


Human language is already very efficient for conveying the ideas we have. Some languages are more efficient at conveying certain concepts, but all are able to handle the 90% case. I'd expect any attempts to build a "technical dialect of English" to go about as well as Esperanto.


We already speak in a "technical dialect of English". All we need is some jargon to talk about technical things. (Lawyers have their own jargon too, also chemists, etc)

Some languages don't have this kind of vocabulary, because there aren't enough speakers that deal with technical things in a given area (and those that do, use another language to communicate)


For me the proof that we do indeed have and use a technical dialect of English in this field lies in the simple observation that no matter how much praise I get at work for how good my English, that doesn't map at all with my ability (or lack thereof) to converse fluently with the random Joe on the street of an English speaking country


The thing is, doesn't the LLM need to be trained on this dialect, and if the training material we have is mostly ambiguous, how do we disambiguate it for the purpose of training?

In my mind this is solving different problems. We want it to parse out our intent from ambiguous semantics because that's how humans actually think and speak. The ones who think they don't are simply unaware of themselves.

If we create this terse and unambiguous language for LLMs, it seems likely to me that they would benefit most from using it with each other, not with humans. Further, they already kind of do this with programming languages which are, more or less, terse and unambiguous expression engines for working with computers. How would we meaningfully improve on this, with enough training data to do so?

I'm asking sincerely and not rhetorically because I'm under no illusion that I understand this or know any better.


Codex already has such a language. The specs it’s been writing for me are full of “dedupe”, “catch-up”, and I often need to feedback that it should use more verbose language. Some of that has been creeping into my lingo already. A colleague of mine suddenly says the word “today” all the time, and I suspect that’s because he uses Claude a lot. Today, as in, current state of the code.


It was mentioned somewhere else on hn today, but why do I care about token usage? I prompt AI day and night for coding and other stuff via claude code max 200 and mistral; haven't had issues for many months now.


it’s a measure of efficiency. one might not care about tokens until vendors jack up the price and running your own comparable model is infeasible.


You may not but many people do. My boss routinely runs over his quota of Claude max.


> by creating their own more technical dialect of English

Ah, the Lisp curse. Here we go again.

coincidently, the 80s AI bubble crashed partly because Lisp dialetcts aren't inter-changable.


Lisp doesn't get to claim all bad accidental programming languages are simply failing to be it, I don't care how cute that one quote is.


I bet a modern LLM could inter-change them pretty easily.


trained on public data, yes.

But some random in-house DSL? Doubt it.


and then someone will como along and say "wouldn'tt it be nice if this highly specific dialect was standardized?" goto 1


I do this already, it's called Python.


Or they could look at the past few centuries of language theory and start crafting better tokenizers with inductive biases.

We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language and we still are acting like more compute will solve all our problems.


> We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language

Can you elaborate? I think you're talking about https://github.com/PastaPastaPasta/llm-chinese-english , but I read those findings as far more nuanced and ambiguous than what you seem to be claiming here.


> We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language and we still are acting like more compute will solve all our problems.

Post a link because until you do, I’m almost certain this is pseudoscientific crankery.

Chinese characters are not an “iron age ontology of meaning” nor anything close to that.

Also please cite the specific results in centuries-old “language theory” that you’re referring to. Did Saussure have something to say about LLMs? Or someone even older?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: