And conveniently, by making your machine non upgradeable, it allows the manufacturer to enforce market segmentation / charge a huge premium for small RAM upgrade (a la Apple)
LPCAMM2/SOCAMM2 exist, heck I think Framework is using LPCAMM2 in one of their new laptops.
Heck, I'm willing to bet that a lot of manufacturers would rather go that route than soldered in, if for no other reason than the relative cost of warranty work between the two.
However, people probably need to stop being obsessed with ultrathin laptops for that to happen.
> However, people probably need to stop being obsessed with ultrathin laptops for that to happen.
I've never been able to understand this. Once we made it down to ~20 mm (which for the record still accommodates dual-stacked SO-DIMMs, a 2.5 inch bay, and a user replaceable battery but not an RJ45 jack) I don't understand what the practical impact of any further reduction is supposed to be. Regardless of how thin you make it the thing will still be a massive rectangle that you can't flex or press on.
> Regardless of how thin you make it the thing will still be a massive rectangle that you can't flex or press on.
There's very wide variation between laptops in how noticeably they'll flex or yield or creak when pressed. Laptops with a build quality that actually feels solid are far from being ubiquitous or even a majority.
Doubling the thickness of my MacBook Air would probably make it regress on that solid feeling, unless the weight was also significantly increased.
And regardless of whether current laptop form factors could accommodate a 2.5" drive, there's no use in doing so. That drive form factor is entirely obsolete for laptops and is just a waste of space and materials, and has been for about a decade.
I wasn't saying that I want a 2.5 inch drive, I was merely listing off a number of rather large things that fit just fine within a 20 mm budget.
I'm not sure why you seem to think that making something thicker would reduce the stiffness or strength. It's generally the opposite - see the concept of a torsion box. Anyway that wasn't the point. The point was that regardless of how thin you make the thing it will forever remain a cumbersome and delicate item that you have to treat with care when packing so what meaningful positive impact does shaving off those last few mm have? It's never made any sense to me.
I came here to say just this myself! Modern DIMM formats make SFF/portable builds with unified memory pools far more plausible than prior designs. There's absolutely no reason desktop machines couldn't implement similar DIMM formats or design a new board standard around something similar.
Unified memory doesn't have to be soldered on or serviceable. That's a choice Apple made because it fit their product vision, but it's not mandatory in the slightest.
LPCAMM2 is available in real systems at 7467MT/s and 120ns latency, vs apple (and intel) at 9600MT/s (and apple soldered memory at 100ns latency).
I don't know how linear or sensitive CPU and GPU benchmarks are to such a 20% slowdown, but i don't think Apple wants to pay it. And it looks like the next generation will be even closer to the SOC.
how about the LPCAMM route? Framework uses LPCAMM2 in 13 Pro laptop mainboards and claims that it satisfies the iGPU and NPU hardware without needing soldered RAM
Thought getting sleep right was something that happened before MS decided they need to be able to wake your PC any time they want and not hardware related much.
A note that this was rather common on the days before PC clones took off.
The vertical integration many associate with Apple, was the common approach to most 8 and 16 bit home computers.
Naturally after all these years, many PC vendors want their margins back, and thus the phenomenon of everyone going back to vertical integration, especially in form factors that are ideal for such, like laptops, tablets and phones.
So the option boils down to classical desktops, or being picky on which laptops to buy.
I mean is it possible to make unified memory systems with good performance or is it not really feasible due to memory timing/trace length issues?
It’s possible if you’re willing to go with much slower RAM than GPUs like but CPUs often use. Thats what integrated graphics laptops have done for a long time right?
But can you get high end CPU and GPU performance with unified memory and maintain user upgradable memory in a reasonable way? Thats what I don’t know.
> I mean is it possible to make unified memory systems with good performance or is it not really feasible due to memory timing/trace length issues?
LPCAMM and similar solutions exist, but have never been demonstrated running at speeds that match what the leading soldered memory systems are using; there's always been some speed penalty. I'm not sure we've ever seen a system demonstrated using LPCAMM or similar for a 512-bit bus to match Apple's Max tier SoCs, so it's somewhat of an open question whether those solutions can offer upgradability at the high end of the market for unified memory systems.
> LPCAMM and similar solutions exist, but have never been demonstrated running at speeds that match what the leading soldered memory systems are using; there's always been some speed penalty.
LPCAMM2 supports up to 9600MT/s, which appears to be the same speed Apple is using.
> I'm not sure we've ever seen a system demonstrated using LPCAMM or similar for a 512-bit bus
Servers commonly use a 768-bit DDR5 memory bus per socket even without LPCAMM and LPCAMM allows shorter traces than traditional DIMMs. It's basically down to most existing DDR5 system boards/sockets having been designed before anyone was trying to run LLMs on consumer hardware, e.g. AM5 has a 128-bit memory bus and you're not changing that without a new socket. But every memory generation gets a new socket anyway, and the existing Threadripper Pro socket has a 512-bit memory bus as well.
Moreover, making the bus wider is "easy" -- the main problem with it is that it adds cost. Apple's least expensive machines use the same 128-bit memory bus as most PCs and the ones with the 512-bit bus cost as much as Threadripper if not more.
> LPCAMM2 supports up to 9600MT/s, which appears to be the same speed Apple is using.
The difference here is in what the standard defines on paper vs what is actually shipping in products and readily available off the shelf. Who's selling a whole system with LPCAMM2 certified for 9600MT/s? Intel's current-gen Panther Lake top of the line laptop chips are rated for 9600MT/s when using soldered LPDDR5x but only 7467MT/s when using LPCAMM2, according to their current datasheet: https://www.intel.com/content/www/us/en/content-details/8721...
That puts the current Intel-with-LPCAMM2 supported memory speed at 1.5 years and counting lag behind Apple's shipping memory speeds. Intel's own shipping memory speed moved past 7467MT/s a few months earlier than even Apple's.
> Servers commonly use a 768-bit DDR5 memory bus per socket even without LPCAMM and LPCAMM allows shorter traces than traditional DIMMs.
> Moreover, making the bus wider is "easy"
Citations needed. Servers aren't anywhere close to 9600MT/s yet; Intel and AMD are at 6400MT/s. The trace length advantages offered by LPCAMM2 don't necessarily mean the traces for the sixth or eighth channel would be short enough for 9600MT/s (which again, is not yet available even in a 128-bit configuration in shipping hardware). Adding more channels to even a LPCAMM2 configuration means adding more trace length, because only two modules can actually be adjacent to the CPU socket. (Maybe you could get to 512-bit with modules on the front and back of the board while maintaining trace lengths short enough to reach meaningfully higher speeds than regular DDR5, but so far nobody is doing that or even talking about it.)
> That puts the current Intel-with-LPCAMM2 supported memory speed at 1.5 years and counting lag behind Apple's shipping memory speeds.
It turns out Apple isn't getting 9600MT/s either. I assumed that soldering would be getting them at least what LPCAMM2 is rated for, but if you actually do the math, they're getting ~8500MT/s for their most expensive systems and ~7500MT/s for the others.
> Servers aren't anywhere close to 9600MT/s yet; Intel and AMD are at 6400MT/s.
Servers use conservative timings. EXPO memory kits above 6400MT/s are available for Threadripper with 8 channels. And again, these are using traditional DIMMs with longer traces rather than CAMM, but they're still managing an extremely wide bus with close to the same performance.
> The trace length advantages offered by LPCAMM2 don't necessarily mean the traces for the sixth or eighth channel would be short enough for 9600MT/s
CAMM modules use a compression fitting to attach the chips to the system board using approximately the same amount of space as the solder pads would for soldered chips. If you get to the point of having so many channels that the chips are in the way of the other chips then the soldered ones have the same problem.
> (which again, is not yet available even in a 128-bit configuration in shipping hardware).
A single LPCAMM2 module is a 128-bit bus. Every system that uses it has at least that.
> Maybe you could get to 512-bit with modules on the front and back of the board while maintaining trace lengths short enough to reach meaningfully higher speeds than regular DDR5, but so far nobody is doing that or even talking about it.
Nobody is really using a bus that wide with soldered memory either though, outside of the couple of Macs that start at ~$3500 and are getting the same speed Framework does with LPCAMM2.
> Framework already sells LPCAMM2 at 8533MT/s with full validation:
From your link:
> Framework Laptop 13 Pro (Intel® Core™ Ultra Series 3) supports one slot of LPCAMM2 memory up to 96GB at the native 7467 MT/s speed. It is compatible with LPCAMM2 modules with memory speed rated above 7467 MT/s, but the speed will be capped at 7467 MT/s because of the platform limitation.
The modules in question can only theoretically operate at 8533MT/s. Framework has yet to sell a system where the modules actually operate at more than 7467MT/s.
> It turns out Apple isn't getting 9600MT/s either. I assumed that soldering would be getting them at least what LPCAMM2 is rated for, but if you actually do the math, they're getting ~8500MT/s for their most expensive systems and ~7500MT/s for the others.
You're either doing the math wrong, or just plain looking at the wrong systems. Try looking at the M5 generation.
> CAMM modules use a compression fitting to attach the chips to the system board using approximately the same amount of space as the solder pads would for soldered chips. If you get to the point of having so many channels that the chips are in the way of the other chips then the soldered ones have the same problem.
Yes, that's a problem, and Apple has solved it by moving the DRAM on-package. Datacenter GPUs have also solved it that way by putting the DRAM on a silicon interposer to allow even wider bus widths. Soldering standard DRAM packages on the motherboard is not the limit of how memory can be soldered down.
> A single LPCAMM2 module is a 128-bit bus. Every system that uses it has at least that.
Yes, 128 bits at lower speeds. Did you forget that the whole point I'm making here is that the speeds are not the same?
> Nobody is really using a bus that wide with soldered memory either though, outside of the couple of Macs that start at ~$3500 and are getting the same speed Framework does with LPCAMM2.
The Mac Studio with the M3 Ultra is actually running the DRAM at a lower frequency than what Framework and other Intel-based systems could, but more than making up for it in bus width, to provide far more total memory bandwidth than any plausible LPCAMM2-based system that could be built today.
> You're either doing the math wrong, or just plain looking at the wrong systems. Try looking at the M5 generation.
The M5 generation isn't "1.5 years old" and even those aren't all that speed. The M5 Max with the 32-core GPU is ~7200MT/s, while the one with the 40-core GPU is over $4000.
> Yes, that's a problem, and Apple has solved it by moving the DRAM on-package.
There is no "package" here. Apple's processors are soldered to the logic board, as are Intel's in laptops. The DRAM Apple uses is standard LPDDR5 from the normal OEMs. Have a look at the LPCAMM2 module. It has four standard DRAM chips on the top and a connector on the bottom. DDR5 channels are really 32-bits, so the 128-bit module has four channels, four chips. The module is barely any larger than the chips themselves. It's not saving significant space by soldering them, it's just an alternative means of attaching them to the system board in the same place.
> Yes, 128 bits at lower speeds.
At the same speeds Apple was shipping a few months ago. Apple being the first to ship LPDDR5-9600 when it was that recent doesn't imply that it needs to be soldered, it implies that they're a huge company that can pay for early access to the new thing whether it's soldered or not. 9600MT/s LPCAMM2 modules have already been announced -- it's not a technical problem, it's an "Apple and OpenAI are buying out the fastest DRAM right now" problem.
> The Mac Studio with the M3 Ultra is actually running the DRAM at a lower frequency than what Framework and other Intel-based systems could, but more than making up for it in bus width, to provide far more total memory bandwidth than any plausible LPCAMM2-based system that could be built today.
By this logic the thing to beat it is the 8S Xeon servers from almost a decade ago with 48 channels of DDR4-2666. Or existing 2S servers with 24 channels of DDR5-6400.
Ok, so the problem is you doing the math wrong. Note that the MacBook Pro configuration you're talking about has a DRAM capacity of 36GB, compared to 48+ GB for the ones with all the cores enabled and the full memory bandwidth. That 32-core config isn't running the DRAM slower, it's running with a narrower bus and fewer DRAM chips: https://theapplewiki.com/wiki/MacBook_Pro_(16-inch,_M5_Max)
> There is no "package" here. Apple's processors are soldered to the logic board, as are Intel's in laptops.
Denying the difference between putting the RAM on-package vs on the motherboard doesn't make that difference stop being real.
> Apple being the first to ship LPDDR5-9600 when it was that recent doesn't imply that it needs to be soldered
Apple wasn't even close to being the first to ship LPDDR5-9600. Android phones using DRAM at that speed started shipping at the end of 2023, and moved on to 10700MT/s starting in 2024. The situation here is not anywhere close to being one of Apple paying a premium to get faster DRAM chips that other laptop manufacturers can afford. Rather, for most of the past several years, laptop manufacturers (especially on the x86 side) have been unable to buy DRAM chips with a rating slow enough to match what their processors are capable of running at. It's become quite common to see on a Thinkpad spec sheet that eg. the DRAM parts are rated for 7467MT/s but will only operate at 6400MT/s due to processor limitations, then the next year see that the DRAM parts are rated for 8533MT/s but run at 7467MT/s, and so on. LPDDR speed increases have been driven primarily by flagship smartphones, and even the leftover slower-binned parts are faster than what most laptops can handle.
> LPCAMM and similar solutions exist, but have never been demonstrated running at speeds that match what the leading soldered memory systems are using;
Does it need to be leading, though? Being median is just fine for what high-RAM systems are intended to be used for.
Don't I/you wish. The mechanical junction adds no delay, only manufacturing expense, and the delay of purchasing new systems to keep up with OS bloat.
Actually the opposite is true. Socketed RAM can be made to overclock and adjust timings, while soldered ram, no. Two Lenovo's one soldered ( Carbon X1 ), one T590, one slot: Crucial 16GB, 260-pin SODIMM, DDR4 PC4-19200. Exact same processor, the X1 is DDR3 soldered on 532.0 MHz PC3-1066. The T590, has DDR4, PC4-19200, 1200Mhz.
Both have a Core i7 8665U... and the T590 is much faster, with socketed ram.
I think you'll find that in the current day, high speed LP(?)DDR5 requires a better signal path than what the SODIMM can provide. Which is why laptop makers initially moved to soldered RAM before moving to CAMM (probably only for the high end ones).
Maybe I won't care about upgradeability right now. The architecture is clearly in flux, the roles of traditional "CPU" and "GPU" are rapidly evolving. Maybe in 5 years, or even 3 years, a brand-new machine from 2026 won't be worth upgrading for a new role due to a seriously different architecture, but would only be relegated to do something "traditional".
I wish manufacturers could consider a hybrid approach. There should be no reason an architecture can't support both unified memory (effectively L4(?) cache), and cheaper, upgradeable system memory on sticks for old-school application use.
Upgradable memory and unified memory aren't entirely mutually exclusive. You can design a chip that uses DDR5 and has a decently-powerful iGPU that can use that whole memory pool. But you'll be starving that GPU of bandwidth relative to what you'd achieve with soldered LPDDR, and it's not really worth the trouble of building a large iGPU unless you're also going to feed it with the fastest memory you can reasonably put down.
If you look at eg. an Intel laptop chip, you'll see they design and build a memory PHY that can interface with either DDR5 or LPDDR5x. They don't support splitting it to have one controller operating with DDR5 and the other with LPDDR5x, for fairly obvious reasons: more complex hardware, harder for software/operating systems to manage optimally, and not a lot of benefits to drive demand and justify the expenses. The speed difference between LPDDR5x and DDR5 isn't really large enough to use LPDDR5x as an L4 cache; it would be more like two different NUMA nodes, with complications for laptop power management.
If you want somebody to build a chip with more than the usual 128-bit bus and make some of the memory controllers use LPDDR and some DDR5, then you're asking for a significant increase in chip cost due to the extra memory PHYs and pin count. That cost is only justified if almost all products using the bigger chips are going to actually take advantage of the full complement of memory controllers.
AFAIK PCIe6 just started getting implemented in hardware last year... PCIe7 Spec was just released last year too...
PCIe6 is a much larger change than 'just bump up the transfer rate', the encoding changed too (on top of the new code length, it's no longer NRZ,) so everyone needed to design and validate both the new encoding block, negotiation, etc etc.
That said, I'm guessing PCIe7 will be a 'smoother' transition from PCIE6, i.e. we might see 7.0 products in 2027. That will theoretically get you ~240GB/sec, on an x16 link, or hypothetically a little less than the hypothetical max of a current Strix Halo. (I'm guessing however, that PCIe protocol overhead will make the difference larger.)