Knowledge benchmarks can't really be improved upon via distillation or RL. It re...

YetAnotherNick · 2026-05-28T20:25:10 1779999910

Lot of the things aren't facts that could be stated. No one can just see the dictionary or translation of words and start talking in that language.

There isn't a clear definition of what is knowledge and what is intelligence. Is being able to write in C knowledge? Is knowing undefined behaviour in that knowledge?

ertgbnm · 2026-05-29T16:22:57 1780071777

My point is that if I made someone "smarter" they wouldn't suddenly know "What day, month, and year was Carrie Underwood’s album “CryPretty” certified Gold by the RIAA?" which is an example of a question in the SimpleQA benchmark.

So (in my opinion) knowledge benchmarks stagnating for small models is not evidence that small model agentic coding performance improvement will stagnate soon. Small models do not struggle with syntax, the barrier is not knowledge. The barrier is long context coherence and problem solving, which I don't see a bottleneck on improvements for small models in the near horizon as we get more and more high quality reasoning traces to train upon.

slashdave · 2026-05-28T17:51:40 1779990700

RL is more than facts. Synthetic feedback is an obvious approach. Does the model suggest code that compiles and performs well?