More

nikic · 2026-01-13T14:17:20 1768313840

> Is there any implicit understanding in the community that byte types will inevitably be added to LLVM?

Among the people who are familiar with such things, yes. An RFC on the topic will be posted in the near future.

nikic · 2026-01-12T20:46:44 1768250804

That was ambiguously phrased. The point I was trying to make here is that we don't have the situation that is very common for open-source projects, where a project might nominally have a 100 contributors, but in reality it's one person doing 95% of the changes.

LLVM of course has plenty of contributors that only ever landed one change, but the thing that matters for project health is that that the group of "top contributors" is fairly large.

(And yes, this does differ by subproject, e.g. lld is an example of a subproject where one contributor is more active than everyone else combined.)

pklausler · 2026-01-12T20:57:03 1768251423

There may be a difference of degree here, but not a difference of kind.

nikic · 2026-01-12T18:54:23 1768244063

Yes, the Orc C API follows different rules from the rest of the C API (https://github.com/llvm/llvm-project/blob/501416a755d1b85ca1...).

anarazel · 2026-01-12T19:03:52 1768244632

I know, but even if it's not breaking promises, the constant stream of changes still makes it still rather painful to utilize LLVM. Not helped by the fact that unless you embed LLVM you have to deal with a lot of different LLVM versions out there...

lhames · 2026-01-12T22:30:30 1768257030

FWIW eventual stability is a goal, but there's going to be more churn as we work towards full arbitrary program execution (https://www.youtube.com/watch?v=qgtA-bWC_vM covers some recent progress).

If you're looking for stability in practice: the ORC LLJIT API is your best bet at the moment (or sticking to MCJIT until it's removed).

nikic · 2025-12-24T15:40:50 1766590850

This particular case isn't really due to pattern matching -- it's a result of a generic optimization that evaluates the exit value of an add recurrence using binomial coefficients (even if the recurrence is non-affine). This means it will work even if the contents of the loop get more exotic (e.g. if you perform the sum over x * x * x * x * x instead of x).

nikic · 2025-10-04T08:18:42 1759565922

This depends a lot on what you're doing with LLVM. If you are just using LLVM as a code generation backend for your language frontend, you generally do not need an LLVM fork.

For example, while Rust does have an LLVM fork, it just exists for tighter control over backports, not because there are any modifications to LLVM.

nikic · 2025-10-04T08:13:34 1759565614

> LLVM is a trap.

Is it? I think Rust is a great showcase for why it isn't. Of course it depends somewhat on your compiler implementation approach, but actual codegen-to-LLVM tends to only be a tiny part of the compiler, and it is not particularly hard to replace it with codegen-to-something-else if you so desire. Which is why there is now codegen_cranelift, codegen_gcc, etc.

The main "vendor lock-in" LLVM has is if you are exposing the tens of thousands of vendor SIMD intrinsics, but I think that's inherent to the problem space.

Of course, whether you're going to find another codegen backend (or are willing to write one yourself) that provides similar capabilities to LLVM is another question...

> You bootstrap extra fast, you get all sorts of optimization passes and platforms for free, but you lose out on the ability to tune the final optimization passes and performance of the linking stages.

You can tune the pass pipeline when using LLVM. If your language is C/C++ "like", the default pipeline is good enough that many such users of LLVM don't bother, but languages that differ more substantially will usually use fully custom pipelines.

> I think we'll see cranelift take off in Rust quite soon, though I also think it wouldn't be the juggernaut of a language if they hadn't stuck with LLVM those early years.

I'd expect that most (compiled) languages do well to start with an LLVM backend. Having a tightly integrated custom backend can certainly be worthwhile (and Go is a great success story in that space), but it's usually not the defining the feature of the language, and there is a great opportunity cost to implementing one.

crawshaw · 2025-10-04T18:51:23 1759603883

It is a “trap” in the sense that it limits the design space for your language by forcing some of the choices implicit in C++. Rust went with slow build times like C++ and so was not held back in this dimension by LLVM.

Nothing about LLVM is a trap for C++ as that is what it was designed for.

nikic · 2025-06-30T20:45:53 1751316353

At least at a skim, what this specifies for exposure/synthesis for reads/writes of the object representation is concerning. One of the consequences is that dead integer loads cannot be eliminated, as they may have an exposure side effect. I guess C might be able to get away with it due to the interaction with strict aliasing rules. Still quite surprised that they are going against consensus here (and reduces the likelihood that these semantics will get adopted by implementers).

ben0x539 · 2025-06-30T23:24:52 1751325892

Can you say more about what the consensus is that this is going against?

nikic · 2025-07-01T19:14:17 1751397257

That type punning through memory does not expose or synthesize memory. There are some possible variations on this, but the most straightforward is that pointer to integer transmutes just return the address (without exposure) and integer to pointer transmutes return a pointer with nullary provenance.

comex · 2025-06-30T21:59:35 1751320775

> I guess C might be able to get away with it due to the interaction with strict aliasing rules.

But not for char-typed accesses. And even for larger types, I think you would have to worry about the combo of first memcpying from pointer-typed memory to integer-typed memory, then loading the integer. If you eliminate dead integer loads, then you would have to not eliminate the memcpy.

nikic · 2025-07-01T19:52:29 1751399549

That's a great point. I initially thought we could assume no exposure for loads with non-pointer-compatible TBAA, but you are right that this is not correct if the memory has been laundered through memcpy.

uecker · 2025-07-02T06:02:06 1751436126

You can still eliminate the memcpy of if you mark the pointer exposed at this point.

alextingle · 2025-07-01T09:27:28 1751362048

I don't imagine that the exposed state would need to be represented in the final compiler output, so the optimiser could mark the pointer as exposed, but still eliminate the dead integer load.

Or from a pragmatic viewpoint, perhaps if the optimiser eliminates a dead load, then don't mark the pointer as exposed? After all, the whole point is to keep track of whether a synthesised pointer might potentially refer to the exposed pointer's storage. There's zero danger of that happening if the integer load never actually occurs.

Hercuros · 2025-07-01T20:48:11 1751402891

I guess the internal exposure state would be “wrong” if the compiler removes the dead load (e.g in a pass that runs before provenance analysis).

However, if all of the program paths from that point onward behave the same as if the pointer was marked as exposed, that would be fine. It’s only “wrong” to track the incorrect abstract machine state when that would lead to a different behaviour in the abstract machine.

In that sense I suppose it’s no different from things like removing a variable initialisation if the variable is never used. That also has a side effect in the abstract machine, but it can still be optimised out if that abstract machine side effect is not observable.

uecker · 2025-06-30T21:01:55 1751317315

(Never mind, I misread you comment at first.) Yes, the representation access needs to be discussed... I took a couple of years to publish this document. More important would be if the ptr2int exposure could be implemented.

nikic · 2025-05-26T17:14:12 1748279652

> Interestingly, Windows on ARM hasn't made it up to Tier 1 yet.

An RFC for that has been submitted recently: https://github.com/rust-lang/rfcs/pull/3817

nikic · 2025-04-25T21:04:27 1745615067

One peculiar thing about the benchmark results is that disabling individual UB seems to fairly consistently reduce performance without LTO, but improve it with LTO. I could see how the UB may be less useful with LTO, but it's not obvious to me why reducing UB would actually help LTO. As far as I can tell, the paper does not attempt to explain this effect.

Another interesting thing is that there is clearly synergy between different UB. For the LTO results, disabling each individual UB seems to be either neutral or an improvement, but if you disable all of them at once, then you get a significant regression.

nikic · 2025-04-25T20:55:33 1745614533

They do a bit more than that. One of the options (-disable-object-based-analysis under AA2) disables the assumption that distinct identified objects do not alias, which is disabling pointer provenance in at least one key place.

So I think this option very roughly approximates a kind of "no provenance, but with address non-determinism" model, which still permits optimizations like SROA on non-escaping objects.

jcranmer · 2025-04-25T21:04:10 1745615050

That's what I get for relying on a skimming of the paper.

Also, hi, didn't know you commented on this site.