I don't trust anyone who claims that LLMs today are superhumanly intelligent. All they do is perform compute-intensive brute-force attacks on the problem/solution space and call it 'reasoning', all while subsidising the real costs to capture the market. So much SciFi BS and extrapolation about a technology that is useful if adopted with care.
This technology needs to become a commodity to destroy this aggregation of power between a few organizations with untrustworthy incentives and leadership.
Your brain is performing "compute-intensive brute-force attacks on the problem/solution space" as you read this very sentence. You trained patterns on English syntax, structure, and semantics since you were a child and it is supporting you now with inference (or interpretation). And, for compute efficiency, you probably have evolution to thank.
people like to say this like they’re apples to apples but this comparison isn’t remotely how the brain actually works - and even if it did, the brain does it automatically without direction and at an infitesimal percentage of the power required.
And we’re just talking about cognition - it completely ignores the automatic processes such as maintaining and regulating the body and it’s hormones, coordinating and maintaining muscles, visual/spacial processing taking in massive amounts of data at a very fine scale, and informing the body what to do with it - could go on.
One of the more annoying things about this conversation is you don’t even need to make this argument to make the point you’re trying to make, but people love doing it anyway. It needlessly reduces how amazing the human brain is to a bunch of catchy sci fi sounding idioms.
It can be simultaneously true that transformer based language models can be very smart and that the human brain is also very smart. It genuinely confuses me why people need to make it an either/or.
Thank you, this comparison has been a huge annoyance of mine for the past 3 years of... this same debate over and over.
I think it's the hubris that I find most offensive in this argument: a guy knows one complex thing (programming) and suddenly thinks he can make claims about neuroscience.
Human cognition is nothing like AI "cognition." It really bothers me that people think AI is doing the same thing the human mind does. AI is more like a parrot which is trained to give a correct-looking response to any question. The parrot doesn't think, doesn't know what its doing etc, it just does it because it gets a treat every time a "good" answer is prompted. This is why it can't do things like know how many parenthesis are balanced here ((((()))))) (you can test this), it doesn't have any kind of genuine cognition.
I've wondered about this. Do we really know enough about what the human brain is doing to make a statement like this? I feel like if we did, we would be able to model it faithfully and OpenAI, etc. would not be doing what they're doing with LLMs.
What if human cognition turns out to be the biological equivalent of a really well-tuned prediction machine, and LLMs are just a more rudimentary and less-efficient version of this?
Yes, we do. Humans share the statistical association ability that LLMs possess, but also conscious meaning and understanding. This is a difference in kind and means that we can generalize beyond the statistical pattern associations that we've extracted from data, so we don't require trillions of examples to develop knowledge.
Theoretically a human could sit alone in a dark room, knowing nothing of mathematics and come up with numbers, arithmetic algebra, etc...
They don't need to read every math textbook, paper, and online discussion in existence.
Pre-training is not a good term if you are trying to compare it to LLM pre-training. Closer would be the model's architecture and learning algorithms which has been designed through decades of PhD research, and my point on that is that the differences are still much greater than the similarities.
FYI, Opus 4.6 had no problem with your arbitrary "cognition" test:
Someone on HN claimed "This is why it [LLMs] can't do things like know how many parenthesis are balanced here ((((()))))) (you can test this), it doesn't have any kind of genuine cognition". So, how many parenthesis are balanced in that quoted text?
● The string from the quote is ((((()))))) — 5 opening parens and 6 closing parens.
10 parentheses are balanced (5 matched pairs). There is 1 extra unmatched ).
Walking through it with a stack:
( ( ( ( ( ) ) ) ) ) )
1 2 3 4 5 4 3 2 1 0 -1 ← depth tracker
↑ balanced ↑ unmatched
The depth goes negative on the last ), meaning it has no matching (.
This is such a boring cliche by now. "thinking" and "knowing what it's doing" are totally vague statements that we barely understand about the human mind but in every comment section about AI people definitively state that LLMs don't do them, whatever they are.
This is the epitome of learned helplessness, that you need a neuroscience paper to tell you what thinking and knowledge is when you experience it directly all the time, and can't tell that an LLM doesn't have it.
Something is extremely evil about these ideologies that are teaching people that they are NPCs.
I know I'm thinking, I have no idea if you're thinking, or if you're a human or an LLM. But I wouldn't assume you aren't thinking just from reading your output.
I love reading posts like this. When you were a child, learning math or grammar, do you not remember bouncing off the walls of incorrect answers, eventually landing on a trajectory down the corridor of the right answer? Or were you always instantly zero-shotting everything?
In my experience, this is exactly how language models solve hard new problems, and largely how I solve them too. Propose a new idea, see if it works, iterate if not, keep going until it works.
Of course you can see how to solve a problem that you've seen before, like a visual puzzle about balanced parentheses. We're hyper specialized to visually identify asymmetries. LMs don't have eyes. Your mockery proves nothing.
The mistake in these types of arguments is that natural, classical-artificial, and/or neural-net-artificial learning methods all employ some kind of counterexample/counterfactual reasoning, but their underlying methods could well be fundamentally different. Thus these arguments are invalid, until computer science advances enough to explain what the differences and similarities actually are.
I suspect we just continually overestimate the uniqueness of both our code and our vocabulary. We think we are pretty smart, and we are, but on these two measures 99.999% of us are pretty average, and the LLM just keeps surprising us anyway by proving it.
> Human cognition is nothing like AI "cognition." It really bothers me that people think AI is doing the same thing the human mind does.
This might sound callous, but I wonder if people saying this themselves have very limited brains more akin to stochastic parrots rather the average homo sapiens.
We are very different, and there are some high-profile people that don't even have an internal monologue or self-introspection abilities (one of the other symptoms is having an egg-shaped head)
> This might sound callous, but I wonder if people saying this themselves have very limited brains more akin to stochastic parrots rather the average homo sapiens.
I have a different theory.
Aside from a few exceptions like Blake Lemoine few people seem to really act as if they believe A.I. is doing the same thing the human mind is doing.
My theory is people are for some reason role-playing as people who believe human thought is equivalent to A.I. for undisclosed reasons they themselves may or may not understand. They do not actually believe their own arguments.
> All they do is perform compute-intensive brute-force attacks on the problem/solution space and call it 'reasoning'
If they discover the cure to cancer, I don't care how they did it. "I don't trust anyone who claims they're superhumanly intelligent" doesn't follow from "all they do is <how they work>".
That's moonshot logic that reinforces the parent's point. You'd absolutely care if the AI's cure to cancer entailed full-body transplants or dismemberment.
"The cure for cancer" as a phrase doesn't include those solutions. If the headline was "Pope discovers the cure for cancer" and those were his solutions you would say "No he didn't." OP was referring to AI discovering the cure for cancer that cancer research is working towards.
> "I don't trust anyone who claims they're intelligent" doesn't follow from "all they do is <how they work>".
It kind of does if how they work is nothing like genuine intelligence. You can (rightly) think AI is incredible and amazing and going to bring us amazing new medical technologies, without wrongly thinking its super amazing pattern recognition is the same thing as genuine intelligence. It should be worrying if people begin to believe the stochastic parrot is actually wise.
I can slow down the compute by a factor of a thousand. It would not change the result. But it changes the economics. We only call it intelligent, because we can do the backpropagation, the inference (and training) fast enough and with enough memory for it to appear this way.
If LLMs can come up with superhumanly intelligent solutions, then they're superhumanly intelligent, period. Whether they do this by magic or by stochastic whatever doesn't make any difference at all.
If all they do is "just" brute-force problem solving, then they are already bound to take over R&D & other knowledge work and exponentially accelerate progress, i.e. the SciFi "singularity" BS ends up happening all the same. Whether we classify them as true reasoning is just semantics.
Axel's engagement with the issue and refusal to give up is admirable. It also demonstrates that code and architecture remain important even in an era when managers believe these subjects can now be handled by LLMs. Imagine if LLMs were mandated for use in such an environment, further distancing SWEs from the code and overarching architectural choices. I am not saying that it can't work. But friction and maturity through experience really matters.
Also explains perfectly why I never met an engineer who was eager to run workloads on Azure. In orgs I worked, either the use of Azure was mandated by management (probably good $$ incentives) or through Microsoft leaning into the "Multi-Cloud for resilience" selling point, to get Orgs shift workloads from competitors.
So when Anthropic releases a new model that "breaks compatibility" with some Markdown files, do we call it "refactoring" to find (guess) the required changes to have the desired outcome again? Don't we create brittle specifications to fit a version of a model?
You just said it. If consistency is that important, keep consistent versions of model, harness, prompts, skills, etc., and regression test changes. That way lies madness :)
I have been using Java since version 1.4. Both the language and its ecosystem have come a long way since then. I endured the height of the EJB phase. I adopted Spring when version 1.2 was released. I spent hours fighting with IDEs to run OSGi bundles. I hated building UIs with Swing/AWT, many of which are still in use today and are gradually being replaced by lovely JavaFX. When I look at code I wrote around 12 years ago, I'm amazed at how much I've matured too.
> I hated building UIs with Swing/AWT, many of which are still in use today and are gradually being replaced by lovely JavaFX.
Dude JFX yielded what was called RIAs to JavaScript like almost 15 years ago. Of the three major GUI toolkits Swing, JavaFX, and SWT it was Swing that gained HighDPI support first (10 years ago), and continues to be the base for kick-as IntelliJ IDEA and other Jetbrains IDEs.
Swing was simpler in some ways than JavaFX. I still remember JavaFX treeview taking 3 Generics and I was unable to figure out what one of them is for from the docs. Had to go on Stackoverflow, where someone said it basically didn't matter. But JavaFX looked great at the time.
> SQLite is not primarily fast because it is written in C. Well.. that too, but it is fast because 26 years of profiling have identified which tradeoffs matter.
Someone (with deep pockets to bear the token costs) should let Claude run for 26 months to have it optimize its Rust code base iteratively towards equal benchmarks. Would be an interesting experiment.
The article points out the general issue when discussing LLMs: audience and subject matter. We mostly discuss anecdotally about interactions and results. We really need much more data, more projects to succeed with LLMs or to fail with them - or to linger in a state of ignorance, sunk-cost fallacy and supressed resignation. I expect the latter will remain the standard case that we do not hear about - the part of the iceberg that is underwater, mostly existing within the corporate world or in private GitHubs, a case that is true with LLMs and without them.
In my experience, 'Senior Software Engineer' has NO general meaning. It's a title to be awarded for each participation in a project/product over and over again. The same goes for the claim: "Me, Senior SWE treat LLMs as Junior SWE, and I am 10x more productive." Imagine me facepalming every time.
If the output has problems, do you usually rerun the compilation with the same input (that you control)? I don't usually.
What is included in the 'verify' step? Does it involve changing the generated code? If not, how do you ensure things like code quality, architectural constraints, efficiency and consistency? It's difficult, if not (economically) impossible, to write tests for these things. What if the LLM does not follow the guidelines outlined in your prompt? This is still happening. If this is not included, I would call it 'brute forcing'. How much do you pay for tokens?
I thought to myself that I do this pretty frequently, but then I realized only if I'm going from make -j8 to make -j1. I guess parallelism does throw some indeterminancy into this
If parallelism adds indeterminacy, then you have a bug (probably in working out the dependency graph.) Not an unusual one - lots of open source in the 1990s had warnings about not building above -j1 because multi-core systems weren't that common and people weren't actually trying it themselves...
Whenever I traced them, those bugs were always in the logic of the makefile rather than in the compiler. A target in fact depends on another target (generally from much earlier in the file) but the makefile doesn't specify that.
Why not show a summary of who actually received the data? It should be easy to implement. You could also add what data is retained and an estimate of how long it is kept for. It could be a summary page that I can print as a PDF after the process is complete.
I'd consider that a feature that would increase trust in such a platform. These platforms require trust, right?
What is the purpose of an AGENTS.md file when there are so many different models? Which model or version of the model is the file written for? So much depends on assumptions here. It only makes sense when you know exactly which model you are writing for. No wonder the impact is 'all over the place'.
This technology needs to become a commodity to destroy this aggregation of power between a few organizations with untrustworthy incentives and leadership.
reply