More

vlovich123 · 2026-06-08T16:04:30 1780934670

Sliding window for the draft model, not for the main. 42B for active params because it’s a sparse MoE which is a common technique for the larger models to not get bottlenecked by memory bandwidth.

moffkalast · 2026-06-08T16:27:12 1780936032

Seems to be for both according to the spec [0], maybe it's wrong though.

128 sounds really tiny, I wonder if they mean some kind of blocks?

[0] https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro-FP4-DFlash#4...

E-Reverance · 2026-06-08T16:47:16 1780937236

No

> It uses 384 routed experts (top-8) with hybrid attention (full-attention + sliding-window 128 at 6:1 ratio) over 70 layers (1 dense + 69 MoE)

https://recipes.vllm.ai/XiaomiMiMo/MiMo-V2.5-Pro

vlovich123 · 2026-06-08T14:11:27 1780927887

> But also: cutting one kid off from social networks ostracises them. The parents recognize it's a collective action problem.

OP already gave you your answer, you just chose to ignore it

vlovich123 · 2026-06-08T02:15:14 1780884914

MVCC is not inherently non serializable. For example, Postgres adds serializability on top through SSI.

vlovich123 · 2026-06-08T02:12:40 1780884760

Do you mean to say “if you’re NOT going to use serialisable”? I think you missed the NOT and every reply seemed to think you were arguing a different point but your description about not using foreign keys and using Redis instead only makes sense if there’s a NOT there.

lmm · 2026-06-08T05:12:20 1780895540

No. If you're going to use serialisable you might as well just use something like Redis (which achieves serialisable behaviour by the much simpler approach of... actually executing your operations serially, and generally outperforms a traditional RDBMS set to serialisable transaction isolation). If you're going to use the horrendously complex machinery of a traditional RDBMS, it should be because you need a level of performance not achievable under serialisable isolation level.

vlovich123 · 2026-06-08T05:27:14 1780896434

Redis doesn’t support rollbacks so on a conflict, you’re left in a potentially inconsistent state, which is precisely what serialized transactions are supposed to prevent. Additionally, you’re very limited in the kind of logic you can express within a transaction safely unlike SQL where you can make decisions based on past reads remain correct or unapplied whereas redis can do nothing here - once you’ve scheduled a transaction it’ll complete all the operations you enqueued even if a racing operation altered the underlying data that drove the decision.

Pretending like redis is suitable for an RDMS workload because it executes things serially means you’re completely ignoring what transactions are actually used for and how they work.

lmm · 2026-06-08T05:51:42 1780897902

> you’re very limited in the kind of logic you can express within a transaction safely

On the contrary, you have Lua which is much more expressive than SQL (yes Turing completeness, but there's a huge difference in how easy it is to read and understand).

> Pretending like redis is suitable for an RDMS workload because it executes things serially means you’re completely ignoring what transactions are actually used for and how they work.

Well the vast majority of RDMS workloads don't use serial isolation, that's part of my point. As for the rest, all I can say is I've worked on many systems in many industries and seen very few (honestly none) that actually made effective use of what transactions do and don't give you.

vlovich123 · 2026-06-08T16:12:01 1780935121

Redis does not support indexes and all sorts of other things. It works well only for a specific kind of workload but isn’t as generic or flexible.

SSI gives you the performance of close to snapshot isolation with the safety of serializability. The lowest I would ever go is snapshot isolation and ideally SSI. Anything else I’d use an eventually consistent database instead because then there’s at least no pretense.

vlovich123 · 2026-06-08T01:57:00 1780883820

That’s the same mentality C and C++ take. Even if it was only 20%, that’s a non trivial amount of vulnerabilities that have nothing to do with the code.

vlovich123 · 2026-06-06T17:18:20 1780766300

> (which is good for Rust adherents, I figure).

As a Rust adherent, please do not put words in our mouths or set up unrealistic expectations for other people by linking together concepts at a very shallow level.

Language level memory safety has no answer for hardware security flaws which is what side channel attacks are. No programming language can provide memory privacy if another chip in your machine can read your memory. Just like no programming language can protect your application from a kernel vulnerability of the kernel it’s running on.

stego-tech · 2026-06-06T17:24:52 1780766692

Damn. That wasn’t my intention at all, I was just pointing out that Rust has another reason to see wider adoption vis a vis the usual Valley advertising bullshit of deliberately conflating hardware security with software security. I personally give no fucks what something is written in, only that it’s written well enough that I don’t have to twist arms or babysit yet another sloppy piece of code in my enterprise.

b112 · 2026-06-06T17:25:10 1780766710

But... it's rust.

vlovich123 · 2026-06-06T16:44:30 1780764270

The paper explicitly covers it that various memory COW/snapshot mechanisms are probably faster and safer than the zygote pattern. As it stands getting the zygote pattern correct and safe is something you have to plan for upfront. You can’t retrofit it which is why the paper mentions it has poor composability. Also the advantages of the zygote pattern can be overstated since the memory sharing benefit is minimal since it has to happen so early and modern OSes already transparently CoW duplicate pages in the background.

loeg · 2026-06-06T18:35:05 1780770905

In what sense can you not retrofit the zygote pattern?

vlovich123 · 2026-06-06T23:48:21 1780789701

I recommend at least skimming the paper as it covers this. But essentially you can’t just inject a call at a random point in code to start being a zygote. It’s something you have to plan up front as to the exact point you’re going to fork and that you’re going to do it at the start of program before any threads have started or any files are open and before any locks have been acquired. It’s basically all the challenges of invoking fork at arbitrary points in time.

The reason to do a zygote in the first place could be solved with alternative special APIs that are safer and harder to misuse. But we have fork so there’s not as big of a demand despite the warts.

loeg · 2026-06-07T19:37:28 1780861048

Sure, but you can always retrofit a program to fork early on... this is a relatively trivial change. No?

vlovich123 · 2026-06-06T01:14:13 1780708453

If the author is this concerned about security, I’m curious why rsync doesn’t just build with fil-c by default and skip the noise. Those who need the extra perf to do more than 1 gigabit/s can build it in “unsafe” mode.

saagarjha · 2026-06-06T01:42:01 1780710121

Because Fil-C is not a serious project

int_19h · 2026-06-06T02:00:24 1780711224

If you make claims like that, you need to expand on them or at least provide some references.

saagarjha · 2026-06-06T02:49:18 1780714158

It’s Fil’s side project that he uses to spend his extra creative energy and troll people on Twitter

vlovich123 · 2026-06-05T14:21:07 1780669267

Which version of rust are these in?

modulared · 2026-06-05T14:56:55 1780671415

Looks like it was introduced in 1.81.0:

https://blog.rust-lang.org/2024/09/05/Rust-1.81.0/#new-sort-...

vlovich123 · 2026-06-04T04:31:54 1780547514

Something tells me a war with china isn’t going to be carriers duking it out but carriers filled to the brim with aviation and naval drones that seek and destroy enemy craft. As Iran has shown, you don’t need to attack the USA directly to destabilize its influence. The US market economical influence has been far more important for force projection and stabilizing trade than anything else and by all accounts Trump has pissed away allies on that front too. US force projection for trade stabilization is for minor things like protecting against pirates - you don’t need million dollar missiles for that.