I think it's only a matter of time before people start trying to optimize model ...

grey-area · 2026-03-19T07:34:02 1773905642

Have you just reinvented programming languages and reinforced the author's point?

Setting aside the problem of training, why bother prompting if you’re going to specify things so tightly that it resembles code?

mike_hearn · 2026-03-19T09:17:56 1773911876

Programming languages admit only unambiguous text. What he's proposing is more like EARS, Gherkin or Planguage.

rdevilla · 2026-03-19T09:23:52 1773912232

Not necessarily. I was intending it as a thought experiment illustrating why some kind of formal language (whether that mean technical jargon, unambiguous syntax, unambiguous semantics, conlangs, specification languages, or some combination thereof) will eventually arise from natural language - as it has countless times in the past, within mathematics (as referenced in TFA) and elsewhere. Gherkin is kind of nice though.

majormajor · 2026-03-19T05:21:23 1773897683

Unless you're training your own model, wouldn't you have to send this dialect in your context all the time? Since the model is trained on all the human language text of the internet, not on your specialized one? At which point you need to use human language to define it anyway? So perhaps you could express certain things with less ambiguity once you define that, but it seems like your token usage will have to carry around that spec.

nomel · 2026-03-19T04:25:11 1773894311

Let's use a non-ambiguous language for this. May I suggest Lojban [1][2]?

[1] https://en.wikipedia.org/wiki/Lojban

[2] Someone speaking it: https://www.youtube.com/watch?v=lxQjwbUiM9w

dwb · 2026-03-19T12:04:39 1773921879

Lojban allows you to be vastly more semantically ambiguous than English while still speaking technically correctly. A single predicate-word (“gismu”) is a valid utterance. For example, saying “tanxe” is so vague or context-dependent enough as to be hard to translate: “something unspecified is a box, at an unspecified time or place, and it may or may not even exist”. Language will not save you. Or if it will, we already have them in the form of programming languages.

mike_hearn · 2026-03-19T09:19:25 1773911965

Lojban allows you to speak ambiguously, it just disallows grammatical ambiguity because in the 70s it was hypothesized that NLP understanding was impossible so humans would have to adapt instead of computers. That debate is over; understanding grammar is solved. The new debate is over semantic ambiguity.

dooglius · 2026-03-19T05:13:09 1773897189

It looks like that's about syntactic ambiguity, whereas the parent is talking semantic ambiguity

otabdeveloper4 · 2026-03-19T05:02:06 1773896526

> optimizes their token usage and LLM spend

Context pollution is a bigger problem.

E.g., those SKILL.md files that are tens of kilobytes long, as if being exhaustively verbose and rambling will somehow make the LLM smarter. (It won't, it will just dilute the context with irrelevant stuff.)

kstenerud · 2026-03-19T07:32:25 1773905545

Human language is already very efficient for conveying the ideas we have. Some languages are more efficient at conveying certain concepts, but all are able to handle the 90% case. I'd expect any attempts to build a "technical dialect of English" to go about as well as Esperanto.

nextaccountic · 2026-03-19T07:47:34 1773906454

We already speak in a "technical dialect of English". All we need is some jargon to talk about technical things. (Lawyers have their own jargon too, also chemists, etc)

Some languages don't have this kind of vocabulary, because there aren't enough speakers that deal with technical things in a given area (and those that do, use another language to communicate)

ithkuil · 2026-03-19T14:05:43 1773929143

For me the proof that we do indeed have and use a technical dialect of English in this field lies in the simple observation that no matter how much praise I get at work for how good my English, that doesn't map at all with my ability (or lack thereof) to converse fluently with the random Joe on the street of an English speaking country

steve_adams_86 · 2026-03-19T06:17:43 1773901063

The thing is, doesn't the LLM need to be trained on this dialect, and if the training material we have is mostly ambiguous, how do we disambiguate it for the purpose of training?

In my mind this is solving different problems. We want it to parse out our intent from ambiguous semantics because that's how humans actually think and speak. The ones who think they don't are simply unaware of themselves.

If we create this terse and unambiguous language for LLMs, it seems likely to me that they would benefit most from using it with each other, not with humans. Further, they already kind of do this with programming languages which are, more or less, terse and unambiguous expression engines for working with computers. How would we meaningfully improve on this, with enough training data to do so?

I'm asking sincerely and not rhetorically because I'm under no illusion that I understand this or know any better.

manmal · 2026-03-19T05:30:35 1773898235

Codex already has such a language. The specs it’s been writing for me are full of “dedupe”, “catch-up”, and I often need to feedback that it should use more verbose language. Some of that has been creeping into my lingo already. A colleague of mine suddenly says the word “today” all the time, and I suspect that’s because he uses Claude a lot. Today, as in, current state of the code.

anonzzzies · 2026-03-19T05:30:37 1773898237

It was mentioned somewhere else on hn today, but why do I care about token usage? I prompt AI day and night for coding and other stuff via claude code max 200 and mistral; haven't had issues for many months now.

sda2 · 2026-03-19T06:15:16 1773900916

it’s a measure of efficiency. one might not care about tokens until vendors jack up the price and running your own comparable model is infeasible.

abustamam · 2026-03-19T16:57:20 1773939440

You may not but many people do. My boss routinely runs over his quota of Claude max.

est · 2026-03-19T04:58:21 1773896301

> by creating their own more technical dialect of English

Ah, the Lisp curse. Here we go again.

coincidently, the 80s AI bubble crashed partly because Lisp dialetcts aren't inter-changable.

Dylan16807 · 2026-03-19T05:19:17 1773897557

Lisp doesn't get to claim all bad accidental programming languages are simply failing to be it, I don't care how cute that one quote is.

reverius42 · 2026-03-19T05:15:25 1773897325

I bet a modern LLM could inter-change them pretty easily.

est · 2026-03-19T05:23:04 1773897784

trained on public data, yes.

But some random in-house DSL? Doubt it.

vrighter · 2026-03-19T07:04:12 1773903852

and then someone will como along and say "wouldn'tt it be nice if this highly specific dialect was standardized?" goto 1

globular-toast · 2026-03-19T15:37:21 1773934641

I do this already, it's called Python.

noosphr · 2026-03-19T06:30:19 1773901819

Or they could look at the past few centuries of language theory and start crafting better tokenizers with inductive biases.

We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language and we still are acting like more compute will solve all our problems.

retsibsi · 2026-03-19T06:53:29 1773903209

> We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language

Can you elaborate? I think you're talking about https://github.com/PastaPastaPasta/llm-chinese-english , but I read those findings as far more nuanced and ambiguous than what you seem to be claiming here.

umanwizard · 2026-03-19T07:12:33 1773904353

> We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language and we still are acting like more compute will solve all our problems.

Post a link because until you do, I’m almost certain this is pseudoscientific crankery.

Chinese characters are not an “iron age ontology of meaning” nor anything close to that.

Also please cite the specific results in centuries-old “language theory” that you’re referring to. Did Saussure have something to say about LLMs? Or someone even older?