Hacker Timesnew | past | comments | ask | show | jobs | submit | 4bpp's commentslogin

Assuming the benchmarks are sound (rather than capturing a fluke), the provided explanation still does not pass the smell test. As far as I can tell, there is nothing about the training process of these models that would encourage them to make the output of any layer apart from (n-1) meaningful as the input of layer n, unless perhaps these layers were initialised as identity and the training process did not get to change them much. (Plausible for middle layers?)

Considering this, I think (again, assuming the benchmarks themselves are sound) the most plausible explanation for the observations is (1) the layers being duplicated are close to the identity function on most inputs; (2) something happened to the model in training (RLHF?) that forcefully degraded its reasoning performance; (3) the mechanism causing the degradation involves the duplicated layers, so their duplication has the effect of breaking the reasoning-degrading mechanism (e.g. by clobbering a "refusal" "circuit" that emerged in post-training).

More concisely, I'm positing that this is an approach that can only ever break things, and rather than boosting reasoning, it is selectively breaking things deleterious to reasoning.


Empirical findings tell a very different tale: all LLM layers use vaguely compatible internal representations. And middle layers in particular can be almost interchangeable - a lot of what they seems to be "iterative refinement of the same representations". Proven by various probes and ablations, but the most obvious one is probably the good old logit lens.

This is likely to be shaped by tied embeddings and skips on one end, and maybe training pressures on the other.

The very top of FF stack and the very bottom of FF stack both reflect the same token embeddings - and this propagates through the model, setting up a shared identity space. Skip connections propagate that through the layers. No explicit shared identity imposed, but there is an implicit one set by the architecture. Fairly well established.

(Now: highly speculative! Attention over past tokens creates an implicit "robustness/convergence" pressure? The model can't be "certain" if it'll have access to the right representations at a given layer, because representations depend not just on the past layers, but also on the highly uncertain contents of previous tokens as passed through attention. Which in turn depends on more of the same, increasing variance further. So the training causes: "each layer can't be certain of what it will have access to, so it develops to refine anything it currently has access to in a convergent fashion, because that's what's useful under pressure of attention-induced uncertainty".)

LLMs are notoriously nonfragile, and robust to perturbations. Far more so if you anneal with SFT/distillation after your model surgery, although this wasn't done here. Plenty of weird franken-LLM experiments prove that empirically.

So I'm not too surprised to find that someone has managed to improve benchmark performance on a few narrow tasks by duplicating a few middle layers. "Duplicating a few layers that were doing convergent iterative refinement benefits a few tasks that suffered from insufficient depth of convergent iterative refinement" is a fairly reasonable hypothesis, in my eyes.

The chances of duplication "breaking something somewhere" are high, and I would expect the capability profile of an unannealed franken-LLM like this to have a few gaps in it if evaluated extensively against the original. But "franken-LLM layer duplication can actually improve some things" is far too plausible with what we know to be dismissed pre-emptively.


That's interesting, could you point me to some source on these findings?

It seems to me that the difference between "iterative improvement" as you put it and "close to the identity" (as in the output is close to the input for most of the volume of the input space) as I put it is fairly subtle, anyway. One experiment I would like to see is what happens to the reasoning performance if rather than duplicating the selected layers, they are deleted/skipped entirely. If the layers improve reasoning by iterative improvement, this should make the performance worse; but if they contain a mechanism that degrades reasoning and is not robust against unannealed self-composition, it should make the performance similarly better.


https://arxiv.org/abs/2505.12540 https://arxiv.org/abs/2405.07987

These, other papers, and the lottery ticket phenomenon; what it boils down to is that any neural network like system which encodes some common mapping of a phenomenon in the context of the world - not necessarily a world model, but some "real-world thing" - will tend to map to a limited number of permutations of some archetypal representation, which will resemble other mappings of the same thing.

The lottery ticket phenomenon is a bit like the birthday paradox; there will be some number of structures in a large, random initialization of neural network weights that coincide with one or more archetypal mappings of complex objects. Some sub-networks are also useful mappings to features of one or more complex objects, which makes learning hierarchical nested networks of feature mappings easier; it's also why interpretability is so damned difficult.


> As far as I can tell, there is nothing about the training process of these models that would encourage them to make the output of any layer apart from (n-1) meaningful as the input of layer n

Right, I had the same thought.

Even if the output was in the same "format", does the LLM even have any way to know which order the outputs will go in? The ordering of the nodes is part of our representation of the network, it's not fundamental to it.

It would be like shuffling the bytes in a PNG file and expecting the program still to understand it as a PNG file.

The more I think about this, the more I don't get this at all.


These layers are residual layers, so what a layer does is:

x = x + layer(x)

so it's not too surprising that they can be used recurrently


Ah! Thank you

> there is nothing about the training process of these models that would encourage them to make the output of any layer apart from (n-1) meaningful as the input of layer n

There is something that does exactly that - the residual connections. Each layer adds a delta to it, but that means they share a common space. There are papers showing the correlation across layers, of course it is not uniform across depth, but consecutive layers tend to be correlated.


> far as I can tell, there is nothing about the training process of these models that would encourage them to make the output of any layer apart from (n-1) meaningful as the input of layer n

Wouldn't "pass-through" identity connections have exactly that effect? These are quite common in transformer models.


Yeah, that's what I meant with "initialised as identity and the training process did not get to change them much".

There are explicit residual connections in a transformer block. Look up "residual connections" in Google images and you will see.

Some transformers have a block recurrent structure, here is a paper that made a similar observation recently:

https://www.alphaxiv.org/abs/2512.19941


Basically all of them are using residual connections so it’s not that surprising honestly

> something happened to the model in training (RLHF?) that forcefully degraded its reasoning performance

I've been seeing more people speculating like this and I don't understand why. What evidence do we have for RLHF degrading performance on a key metric like reasoning? Why would this be tolerated by model developers?

Can someone point to an example of an AI researcher saying "oops, RLHF forcefully degrades reasoning capabilities, oh well, nothing we can do"?

It strikes me as conspiracist reasoning, like "there's a car that runs on water but they won't sell it because it would destroy oil profits".


The most obvious way would simply be excessive agreeableness. Users rate responses more highly if they affirm the user's thinking, but a general tendency to affirm would presumably result in the model being more inclined to affirm its own mistakes in a reasoning chain.

There was some research about it early on that was shared widely and shaped the folklore perception around it, such as the graph in https://static.wixstatic.com/media/be436c_84a7dceb0d834a37b3... from the GPT-4 whitepaper which shows that RLHF destroyed its calibration (ability to accurately estimate the likelihood that its guesses are correct). Of course the field may have moved on in the 2+ years that have passed since then.


"What have the Romans ever done for us?" (https://www.youtube.com/watch?v=Qc7HmhrgTuQ)


If you are so allergic to using terms previously reserved for animal behaviour, you can instead unpack the definition and say that they produce outputs which make human and algorithmic observers conclude that they did not instantiate some undesirable pattern in other parts of their output, while actually instantiating those undesirable patterns. Does this seem any less problematic than deception to you?


> Does this seem any less problematic than deception to you?

Yes. This sounds a lot more like a bug of sorts.

So many times when using language models I have seem answers contradicting answers previously given. The implication is simple - They have no memory.

They operate upon the tokens available at any given time, including previous output, and as information gets drowned those contradictions pop up. No sane person should presume intent to deceive, because that's not how those systems operate.

By calling it "deception" you are actually ascribing intentionality to something incapable of such. This is marketing talk.

"These systems are so intelligent they can try to deceive you" sounds a lot fancier than "Yeah, those systems have some odd bugs"


Running them in a loop with context, summaries, memory files or whatever you like to call them creates a different story right?


what kind of question is that


Environmentalism has always been a "weight of our sins" sort of issue. Plastic straws are a rounding error relative to all the capricious uses of plastic and fossil fuels in our economy, but few things feel as frivolous as using once and then throwing away a piece of plastic for personal convenience while engaging in an already-kinda-sinful feeling activity like indulging in a soft drink, while simultaneously the paper straw that turns to cardboard mash in your mouth is perfectly calibrated to make you feel like you are doing real penance without encumbering anything economically important.

So plastic straw bans (instead of plastic slipper bans, plastic food packaging bans, taxes on plastic clothes fibres...) are what we get. And because the structure of the cause/problem is the same, the language of environmentalism naturally attaches itself and gives form to the vague sense of moral unease surrounding AI. Governments are surely already building tomorrow's tightly integrated thought police drone swarm complexes, but a crusade against those who simulate a zoo of programming weasels in our midst is much easier and morally no less fulfilling.


Unfortunately, all it will take is an appropriate choice of story about "Nazis"/"child predators"/"pirates"/"terrorists"/"Russian bots" sideloading unregulated apps or disabling the GPS trackers on their cars, and every prospective member of Doctorow's great new coalition (including most everyone in attendance when the talk was given) can be peeled away with ease.


> freeze peach

Do you not think that trying to malign your opposition by putting a comical misspelling in their mouths is a bit infantile as a rhetorical tactic? The same thing being done to you would look something like an insinuation that what is being banned is "hurting someone's widdle fee-fees"; surely the discussion here would not benefit if everyone stooped down to that level.


> surely the discussion here would not benefit if everyone stooped down to that level.

Oh we were already at that level by that time: the comment mine responds to makes the claim that "it is really difficult to define what hate speech is" (untrue); that "more often than not it's used as a cudgel to silence the opposition" (unsubstantiated); and claims that the UK government's intentions match that of Iran and Russia (untrue).

For some reason, so many people seem to tolerate outright disinformation but draw the line at mild childishness. It's bewildering.


Do you think that the people who made those remarks you cite considered them untrue themselves? If yes, you are suggesting bad faith (which should be grounds to extricate yourself from the discussion and/or call it out, not add fuel to the fire); if not, you are suggesting that factual disagreement is appropriately answered by childishness, which basically is saying that you think every discussion worth the name should devolve into childishness.

Often, it seems like this concept of "disinformation" you invoke just serves as a way people give themselves moral license to suspend normal rules of debate conduct in the face of disagreement. Being charitable to your opponents and having to engage with their claims is tiring and difficult, and sometimes they even come better prepared - how much easier if you can just frame dissent as dangerous enemy action and shut it down.


Do you also insist that we treat with proper decorum those who throw out assertions that jetfuel cannot melt steel beams? I notice you have yet to criticise them for posting what is at best misguided and unsubstantiated misinformation, and at worst disinformation. Hardly decorum on their part, is it? Instead, you are hyperfocusing on my "freeze peach", disregarding everything else I said in my comment. I find this to be a boring distraction from the topic at hand.


Well, I don't see anything obvious to criticise about what your interlocutors posted; their statements seem plausible enough to me, and if there is actually a knockout argument against them, I don't know it, because the person who seemed to disagree (you) was busy making childish noises instead of making it!

> jet fuel/steel beams

This debate was carried out sufficiently publicly that I got the sense people actually ran experiments confirming the pro-beam softening/structural failure/whatever case; certainly the "truther" case should have been taken seriously before that, and with decorum always because there is no situation in which any debate in a moderatable forum benefits from playground behaviour.


Alas, the distraction continues.


It is by no means a good publication, but at the same time being accepted as a citation on Wikipedia or not is not necessarily a particularly objective measure of quality. I recommend reading https://www.tracingwoodgrains.com/p/reliable-sources-how-wik... for the critical perspective on Wikipedia's integrity in this regard.


As usual with these, it helps to try to keep the metaphor used for downplaying AI, but flip the script. Let's grant the author's perception that AI is a "bag of words", which is already damn good at producing the "right words" for any given situation, and only keeps getting better at it.

Sure, this is not the same as being a human. Does that really mean, as the author seems to believe without argument, that humans need not be afraid that it will usurp their role? In how many contexts is the utility of having a human, if you squint, not just that a human has so far been the best way to "produce the right words in any given situation", that is, to use the meat-bag only in its capacity as a word-bag? In how many more contexts would a really good magic bag of words be better than a human, if it existed, even if the current human is used somewhat differently? The author seems to rest assured that a human (long-distance?) lover will not be replaced by a "bag of words"; why, especially once the bag of words is also ducttaped to a bag of pictures and a bag of sounds?

I can just imagine someone - a horse breeder, or an anthropomorphised horse - dismissing all concerns on the eve of the automotive revolution, talking about how marketers and gullible marks are prone to hippomorphising anything that looks like it can be ridden and some more, and sprinkling some anecdotes about kids riding broomsticks, legends of pegasi and patterns of stars in the sky being interpreted as horses since ancient times.


I don't think the author's argument is that it won't replace any human labour. Or at least I wouldn't agree with such an argument. But the stronger case is that however much LLMs improve, they won't replace humans in general. In the furtherment of knowledge, because they are fundamentally parroting and synthesizing the already known, vs performing truly novel thought. And in creative fields, because people are fundamentally interested in creations of other people, not of computers.

Neither of these is entirely true in all cases, but they could be expected to remain true in at least some (many) cases, and so the role for humans remains.


So a human is just a really expensive, unreliable bag of words. And we get more expensive and more unreliable by the day!

There's a quote I love but have misplaced, from the 19th century I think. "Our bodies are just contraptions for carrying our heads around." Or in this instance... bag of words transport system ;)



I think the canonical answer is that humans are “bags of mostly water” .


If I'm remembering the full quote correctly, it's "Ugly bags of mostly water."


I just came from the Pluribus sub-Reddit. I’ll take AI over that cohort any day.


So tell me, why do I still have a job and why am frequently successful in getting profitable / useful products into production if I’m “expensive and unreliable”?

I mean I use AI tools to help achieve the goal but I don’t see any signs of the things I’m building and doing being unreliable.


Her argument really only works if you institute new economic systems where humans don’t need to labor in order to eat or pay rent.


"Her"->"the"? (Or, who is "she" here?)

Either way, in what way is this relevant? If the human's labor is not useful at any price point to any entity with money, food or housing, then they presumably will not get paid/given food/housing for it.


Why are you repeating what I said with slightly different words?


Maybe because I didn't understand what you said. Who does "her" refer to?


Once upon a time, in a more innocent age, someone made a parody (of an even older Evangelical propaganda comic [1]) that imputed an unexpected motivation to cultists who worship eldritch horrors: https://www.entrelineas.org/pdf/assets/who-will-be-eaten-fir...

It occurred to me that this interpretation is applicable here.

[1] https://en.wikipedia.org/wiki/Chick_tract


"The plans for scanning your chats were on display for fifty Earth years at the local planning department in Alpha Centauri"?

Nobody's attention span is infinite. I don't doubt I could understand all details of the EU legislative process and keep track of what sort of terrible proposals are underway if I put in the time, but I have a day job, hobbies that are frankly more interesting, and enough national legislation to keep track of.

If you then also say that the outcome is still my responsibility as a voter, then it seems like the logical solution is that I should vote for whatever leave/obstruct-the-EU option is on the menu. I don't understand why I am obliged to surrender either a large and ever-growing slice of my attention or my one-over-400something-million share of sovereignty.


> I don't understand why I am obliged to surrender either a large and ever-growing slice of my attention or my one-over-400something-million share of sovereignty.

Because your puny state is no match for the US, China or soon enough, India. Heck, even Russia in its current incarnation outmatches 80% of the EU countries.

That's it, it's that simple, conceptually.

It's basically the Articles of Confederation vs the Constitution of the United States.

Yes, it's not a pretty process, but the alternative is worse.

We can all live in La-La-Land and pretend we're hobbits living in the Shire ("Keep your nose out of trouble and no trouble will come to you") until reality comes crashing down.


If the end result is going to be that the EU turns into Russia or China under the pretext of standing up to them (because apparently building an opaque process that civil society can't keep up with to ram through authoritarian laws is what it takes to be competitive?), then I'd rather they cut out the extra steps and let the Russians/Chinese take over. At least then nobody would be telling me that what I got is the outcome of some sacred democratic process I am obliged to respect.


Why, then, is the supposed anti-US/China/India/Russia power bloc trying to pass laws to mandate absolute surveillance of all private communications? If the EU is going to continue attempting to legislate away people's freedoms for purposes that are completely out of scope for the reason it exists, then the natural result is that people will turn on the EU. There is little purpose in staving off the surrendering of independence to US/China if the process entails surrendering even more freedom than they would demand to the EU, all the more so when the EU already rolls over to the US/China on almost everything anyways. I am supportive of a pan-European unification in theory, but if the result looks anything like this, no wonder people are disillusioned with the European project. With friends like the EU, who needs enemies?


Every government has abhorrent proposals. This is a PROPOSAL.

Then proposals maybe turn into laws, through a complex process. We are HERE.

A good government doesn't have many with abhorrent LAWS.


I understand that it is not currently law. I also understand that the EU has been dedicated to this road of eroding citizen privacy for decades, constantly trying to pass more and more egregious legislation. For example, the Data Retention Directive of 2006 was abhorrent law. After 8 years in force, it was struck down by the ECJ, which would be somewhat reassuring if not for the fact that the EU appears to consider the ECJ a thorn in its side that it seeks to undermine at every turn. I have very little faith that this will not eventually become abhorrent law given the persistence with which the EU pursues becoming a surveillance state.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: