> If you don't mind wasting a bit of time, you could forward size+alignment to t...

viraptor · on April 24, 2023

> and then just repeatedly allocate in the hope that you will eventually get a correctly aligned value out

If you preload something that patches all the new/delete interfaces, you can do this without guesswork.

    new(size, alignment) ->
      res=alloc(size+alignment)
      res_aligned=res+...
      offsets[res_aligned] = res

    new(size) -> alloc(size)

    free(ptr) ->
      free(offsets[ptr] || ptr)
      offsets.del(ptr)

olliej · on April 24, 2023

Haha, you've missed the issue. The question is what does the system do when someone overrides the builtin allocator functions, but does not override all of them.

You are absolutely correct that as a developer you can have your process override the allocator functions, and that is in fact what TF has done. The problem is that they have not overridden all of the allocation functions, and so they're crashing due to mismatching allocators being used. TF2 can "easily" fix this crash by implementing the aligned new, new[], delete, and delete[] operators in their custom allocator, or by simply removing their custom allocator's override of the global new & delete operators and using a common base class to get their faster allocator.

The question we're talking about is "how does the standard library respond to this scenario in a way that maximizes correctness?".

viraptor · on April 24, 2023

I was going for "ignore the issue, let's just re-patch all alloc/free pointers, built-in or external, new or old" which I think would still work. (As long as anticheat doesn't freak out) It wouldn't suffer from inconsistencies, because you'd control all the calls again. Or is there something missing in this approach?

olliej · on April 25, 2023

You can't repatch all the calls. The OS/standard library provides a set of global operator new and delete implementations, and for largely historical reasons they are _required_ to allow processes to override them with their own implementations.

Now when a program does decide that they're going to override the global operator new and delete functions the standard library is required to default to them instead. So generally the standard library exposes them as weak symbols, and the OS and stdlib links to them by symbol name. That way on program launch the program's version of the operator new/delete symbols are what win. So that's how the OS and standard library are able to interact with the program despite it overriding what is ostensibly the system allocator.

So in principle the OS could simply make sure that the user provided operator new, delete, etc are always directed to the system allocator routines. The problem is that when compiling user code there's no obligation to call the user provided new, delete, etc through a symbol, and in general won't. Instead the calls will generally be compiled down to PC relative loads and branches as those are significantly faster. The net result is that while the OS _could_ force the symbols to always resolve to the system functions, things would break due to the user code still using the user specified functions, but those functions then would not be compatible and the result would be sadness. Hence the user defined operator implementations have to win.

The problem is what happens when not all of the operators are overloaded. This historically hasn't been a problem: there's the plain and [] variants, which can be overloaded independently, and the no_throw variants of each which have in practice not been an issue because the way those are implemented by default is essentially

    try { return ::operator new(size); } catch(...) { return nullptr; }

So does just directly the operator new that people override.

The problem that operator new(size_t, align_val_t) is that depending on your compiler flags you will get different versions of ::operator new being called, and because of the alignment requirements the aligned operator new can't just forward to the default new implementation. So introducing it is the first time failure of a program to implement the full suite of operators results in an actual runtime error vs minor inefficiency.

The reality is that the minimally correct solution is for all programs and libraries that overload the global allocation operators to override all of them. The better solution is for these programs and libraries to stop overloading the global allocators.

Many years ago (talking >a decade at this point) when webkit first adopted a non-default allocator it overloaded the global operators. Perhaps unsurprisingly this caused issues, and now webkit (and presumably blink) do the correct thing: there's a standard base class (FastAllocated or something) that defines operator new, delete, and the [] variants, and using that as a base class results in the non-default allocator.

nemetroid · on April 25, 2023

> Haha, you've missed the issue.

That's not very nice. The root comment said nothing about making the system handle this automatically, it just described an idea for a potential fix to be applied to this particular case:

> It should be straightforward to make a little LD_PRELOAD shim to implement the new operator new on top of old overloads and thus restore proper functioning.

olliej · on April 27, 2023

> That's not very nice.

:(

It was not intended as a dismissive or derisive laugh at the author, but a laugh at the absurdity of the issue itself. Think "haha, you'd think that the reason is X, but technology is involved, and so everythong is terrible" vs "haha you're dumb" which sure as heck was not my intended message.

nneonneo · on April 24, 2023

The latter suggestion assumes that there’s enough entropy in the allocation process to make this work. But that’s not guaranteed! Suppose that your allocator doesn’t pad allocations (e.g. because it uses a bitmap), and that it only guarantees 0x10 alignment. If the top of the heap happens to be unaligned with respect to your desired alignment (e.g. address ends in 0x10 when you want 0x20 alignment), you might wind up just repeatedly allocating unaligned blocks off the top of the heap forever.

This is not an easy problem to solve, unfortunately. On MacOS I believe they solve this problem using the two-level namespace: symbol references include the library name, so “operator new(size_t)” from libstdc++ is distinct from “operator new(size_t)” from libtcmalloc.

Symbol versioning also seems like it should solve the problem: have the new interfaces explicitly declared with a newer ABI version (e.g. @@LIBCXX_17) and link only to those new versions from code that expects them. Of course, symbol versioning comes with its own set of nasty drawbacks, but in this case it seems like a solution that might work?

olliej · on April 24, 2023

> The latter suggestion assumes that there’s enough entropy in the allocation process to make this work. But that’s not guaranteed!

Oh absolutely, there's no guarantee it's ever aligned: the allocator could wrap an aligned allocator but include a pointer sized prefix (a la array allocations) so you would be _guaranteed_ to never be more than pointer size aligned :D

As you say versioning and namespacing is super problematic, but I'm not sure they'd even work here.

At it's core the problem is that some code is compiling with the knowledge it has aligned allocations, so can assume alignment, and the some parts are not. There are a bunch of options that ensure that the allocator is consistent, but they devolve to either ignoring the new+delete overrides, or having the aligned allocators detect the override and forward to unaligned allocators while hoping nothing depended on correct alignment.

amluto · on April 24, 2023

See my comment above. tcmalloc implements the C API as well, including aligned_alloc().

olliej · on April 25, 2023

It doesn't matter what C APIs the allocator you're using provides, if it (or you) want to override global new and delete operators, you need to override all of them need to use that. The system implementation can't just assume that the overriding implementation happens to override and/or be compatible with C's implementation.

Libcxx (the example here) uses posix_memalign for its aligned allocation - which tcmalloc could _also_ have overridden but doesn't. Again the problem is only some of the allocation routines being overridden.