GCC 6.1 Released

jordigh · on April 27, 2016

I recently compiled a contribution graph (number of lines changed per user) for gcc since 2015 using Mercurial's churn command and a simple revset:

http://paste.debian.net/plain/442020

(I excluded the po files, as that gives one user too much credit, since he just checked in the translations, but didn't write them himself.)

This is a rather unusual contribution graph. Most I've seen follow a very stark power law, with one contributor doing over 90% of the work, and the second contributor doing around 5%. This graph shows a lot more collaboration from many people. The development of gcc seems healthier than ever!

preetamjinka · on April 27, 2016

I believe "ian" on that list is Ian Lance Taylor. He's on the Go team and works on gccgo so I think most of his contributions [0] are for the Go compiler.

[0] Example commit: https://github.com/gcc-mirror/gcc/commit/b0c0971248103432dd6...

geofft · on April 27, 2016

Ian Lance Taylor was working on GCC and binutils long before Go ever existed. He also wrote the gold linker.

It would be interesting if most of his contributions recently have shifted to gccgo, though!

jordigh · on April 27, 2016

I'm personally cheering for David Malcom (dmalcolm) and his work on gccjit. He started testing that on GNU Octave's JIT, so we could replace LLVM. We haven't done that yet, but maybe this year we'll be able to!

Afforess · on April 27, 2016

>This is a rather unusual contribution graph. Most I've seen follow a very stark power law, with one contributor doing over 90% of the work, and the second contributor doing around 5%. This graph shows a lot more collaboration from many people. The development of gcc seems healthier than ever!

Is it a sign of health (diverse community) or a sign of illness (no active technical lead, outsiders driving development in fragmented directions)? I don't think you can tell just by looking at the graph.

jordigh · on April 27, 2016

With the 2nd most contributions and also the one who wrote the release announcement, it seems that Jakub is the current de-facto leader. The graph suggests that he does not lead alone, though.

edit: Upon closer inspection, it appears that Arnaud Charlet is currently the top one because he's committed a lot of Ada code for other people from AdaCore. It would take some teasing out of the GNU changelogs instead of the svn log to figure out who really wrote what. So, looks like Jakub really is the leader.

brandmeyer · on April 27, 2016

Perhaps it suggests that the steering committee that GCC has chosen to use is an appropriate governance model for them.

meifun · on April 27, 2016

Honest question: on OS X, Apple has embraced clang and not gcc anymore. When I use linux clang is also available.

Can anyone chime in about how a developer would choose gcc over clang (and vice versa). I have to admit that even though I write in c++, I seldom pay attention to which compiler I would use for x vs y feature, etc. Compilers feel like a black-box to me. I don't go inside it.

Edit: I don't know about Windows, I know gcc is available. I haven't looked if clang is.

Edit 2: Intel also makes compilers, to which I own, but never use. I guess I feel like I am not doing anything specific to Intel

brandmeyer · on April 27, 2016

Right now, clang is missing several features that I care about, but most of the industry doesn't[1]. Clang is a 90% or maybe 95% solution, and my use cases aren't covered. So I don't use it. That said, I am grateful for the competition: GCC has definitely been getting better thanks to clang.

[1] Namely, -Wdouble-promotion, big-endian ARM interrupt handlers, -fstack-usage, -Og, -fno-delete-null-pointer-checks, -falign-functions

beeforpork · on April 27, 2016

+1 for -fno-delete-null-pointer-checks

But actually, what I'd rather like is -fno-optimise-based-on-undefined-behaviour.

brandmeyer · on April 27, 2016

-fno-delete-null-pointer-checks

In my particular case, I'm programming in a microcontroller environment where there is ordinary RAM at the NULL address. Dereferencing NULL therefore doesn't halt the program at all.

teddyh · on April 29, 2016

Your compiler, or rather your architecture definition supplied to the compiler, is broken. NULL should be an invalid pointer value — it need not actually be memory address 0. (The explicit constant 0 in a pointer context should be automatically translated by the compiler into this special invalid pointer value.) There have been several such architectures already, so this should not present an insurmountable problem.

Kristine1975 · on April 27, 2016

Same thing in say the Linux kernel, where gcc removing a NULL pointer check actually created a vulnerability: http://lwn.net/Articles/342330/

brandmeyer · on April 27, 2016

Yep. In fact, it was this very vulnerability that brought my attention to this issue in embedded systems.

yrro · on April 27, 2016

AIUI, this code is not affected:

  X *foo = ...;
  if (foo) foo->bar();

But this code will be:

  X *foo = ...;
  foo->bar();
  if (foo) { ... }

Since foo is dereferenced unconditionally, the compiler is allowed to assume that it will not be null in the above code, and therefore the test on the third line to determine whether foo is null or not is useless and can be omitted.

iso-8859-1 · on April 27, 2016

Since this code is broken, wouldn't it be better to emit a warning and let the programmer fix it?

caf · on April 29, 2016

The real case where it matters is where the "if (foo) { ... }" test happens in a function that's been inlined into the caller, but not all the callers do the foo->bar() dereference. So it's only being removed in some cases.

brandmeyer · on April 27, 2016

There used to be a fairly broad warning (-Wunreachable-code), but it caused so many false positives and false negatives that it was removed.

astrange · on April 28, 2016

Try "-fno-strict-overflow -fno-strict-aliasing -fno-delete-null-pointer-checks" and then see if your program gets slower.

That won't nearly get everything, but loop optimization in C relies on undefined behavior a lot of the time, so you'll get to see how much you need it.

gravypod · on April 27, 2016

GCC, from my experience, generates much faster binaries.

Clang is much faster at compiling though. It's a trade off.

octoploid · on April 27, 2016

Correction. Clang used to compile much faster years ago. But nowadays GCC-6 actually compiles faster than clang-3.8.0. See e.g.: http://hubicka.blogspot.com/2016/03/building-libreoffice-wit...

PeCaN · on April 27, 2016

That was the case circa 2012 with Clang 3.1 and GCC 4.7, but both have closed the gaps. Clang generates code roughly as fast as GCC and GCC compiles about as fast as Clang.

agumonkey · on April 29, 2016

funny how this opposite spectrum leads to https://upload.wikimedia.org/wikipedia/commons/7/7c/Vector_A...

risc / cisc gathered around each other, clang / gcc...

progress by antagonism

jcranmer · on April 27, 2016

gcc generally provides somewhat better code than clang, both in runtime and in object size. Clang has traditionally done better at warnings and diagnostics, although gcc has definitely closed that space in the past few years (since 4.9 or so).

meifun · on April 27, 2016

are there circumstances where a developer should compile their products with more than one compiler to flush out issues, potentially fine tune/optimize code, etc? I've never thought about doing this.

lallysingh · on April 27, 2016

Yes, at least make sure that your code compiles and the tests pass on more than one. That way you're sure that you're not depending on peculiars of one compiler (their specific behaviors that are undefined in the standard, optimizer actions, etc).

bschwindHN · on April 27, 2016

And a nice way to run multiple compilers easily is by using a tool like Travis CI with their compiler matrix support.

https://docs.travis-ci.com/user/languages/c

gaius · on April 27, 2016

At a previous job we ensured we got clean compiles with Visual C++, SPARCworks and xlc, even tho' all our customers were only actually interested in the Solaris version.

corysama · on April 27, 2016

When I was working on multi-platform console engines, I had no choice but to compile with several highly varying compilers. It teaches you very quickly how quietly compiler-specific extensions and quirks work their way into your code.

Also, different dev environments offer different tools. A nice benefit of running in lots of environments was that I often could pick between the best tools of any platform whenever I had a problem they could help solve.

EliRivers · on April 27, 2016

Yes; whenever they want to produce a higher quality product. Which I'd anticipate being "whenever possible", limited only by compiler extensions one has to use, and dev/target hardware.

blue11 · on April 27, 2016

I've worked at places where we maintained a dual build, VisualC++ and Linux gcc, because we wanted to be able to develop and run in multiple environments. That forces you to write clean standard-compliant code, although in some cases it could be a pain.

danieltillett · on April 27, 2016

From experience make sure you compile 32bit and 64bit versions too - this flushed out a surpring number of bugs.

Esau · on April 27, 2016

The one thing I hear mentioned is that GCC supports more processors than Clang does. I believe this is the reason that NetBSD and OpenBSD still use GCC.

maaku · on April 27, 2016

What do you mean by support more processors?

jlg23 · on April 27, 2016

The comment was referring to CPU architectures.

maaku · on April 28, 2016

I understand that. Still, what was meant? C/C++ don't have mechanisms for automatic parallelization, or if there are obscure extensions they are not commonly used. Compilation is usually a single-processor task, with linear speedup obtained by doing multiple compilation units in parallel (make -jN). So in what way can a compiler be "better" for multi-processor / multi-core CPU architectures?

MereInterest · on April 28, 2016

I believe that in this context, "more processors" meant "more different models of processors". That is, there exist processors for which gcc can produce binaries and clang cannot produce binaries.

maaku · on April 28, 2016

Ah ok. More backends.

groovy2shoes · on April 29, 2016

Well, support for more target architectures, really. I can easily imagine cases where a compiler could have multiple backends for the same target arch, so the number of backends could be misleading in that regard.

For a summary from the OpenBSD perspective, see here: http://marc.info/?l=openbsd-misc&m=137530560232232&w=2

jordigh · on April 27, 2016

Different set of bugs (I know of more annoying bugs in clang and libc++ than I do of their GNU counterparts, perhaps only because I know gcc better), different performance in the generated code. Try them both and benchmark, see which you like best. I believe you can get gcc from homebrew, but beware of linking gcc-built code with libstdc++ and not libc++.

http://stackoverflow.com/questions/18976922/how-to-use-the-g...

jeffdavis · on April 27, 2016

1. License. If you are worried about GPL (especially v3), then you are better off with clang.

2. GCC is generally considered to produce better binaries.

3. Smart projects will support both and test with both. Before, even though projects tried to be portable, GCC was the only compiler most developers had available. Now they have two (although clang does try to support almost everything GCC does, so it doesn't say a huge amount about portability).

4. If you want to quickly see whether an optimization was applied (like inlining or loop unrolling), looking at LLVM is more approachable, so clang is nice for that.

illivah · on April 27, 2016

why would the GPL matter for using gcc? It's not like you're modifying, distributing, and selling forks of gcc, it's just a compiler.

sqeaky · on April 27, 2016

You are using logic, not all decision makers do. Some hear "GPL" and react, "oh that's the viral license can't use anything with that".

jeffdavis · on April 27, 2016

See sibling comment.

jeffdavis · on April 27, 2016

Runtime code generation / JIT.

ajdecon · on April 27, 2016

gcc also supports OpenMP 4.x and OpenACC, both of which are used extensively in the HPC world to support automated parallelization and offloading to accelerators like GPUs. clang only recently added support for OpenMP 3.1, and my understanding is that there are no plans for OpenACC support.

Camillo · on April 27, 2016

> Edit 2: Intel also makes compilers, to which I own, but never use.

I see this more and more often on Reddit, from posters I assume are millennials, and I am interested in the psychology of this error. Why do you put a "to" before "which"? You would not say "I own to an Intel compiler".

I never see people make this mistake with other pronouns, either: for instance, they will correct turn "I saw pg yesterday" into "pg, whom I saw yesterday". What is it about "which" specifically that makes it hard for kids to use?

loeg · on April 27, 2016

This person made at least four other grammar errors. I don't think they are a native English speaker.

Camillo · on April 27, 2016

Neither am I. I would be curious to know if this particular error is associated with a particular mother tongue.

loeg · on April 27, 2016

If their Github is anything to go by, they appear to work for a Chinese company.

Kristine1975 · on April 27, 2016

IIRC gcc generates somewhat faster machine code.

clang is available for Windows as clang-cl from here: http://llvm.org/releases/download.html

Here's the status page for compatibility with MSVC (clang 3.9, i.e. current development version): http://clang.llvm.org/docs/MSVCCompatibility.html

J_Darnley · on April 28, 2016

Well I see your edit mentions Windows and that was the problem when I last tried clang. It failed to compile FFmpeg. I don't know whether that is because it doesn't work, or didn't work, or my package was just outdated. GCC works. And once you force gcc to have colored messages there seems to be little advantage to using clang over it.

Keyframe · on April 27, 2016

Out of inertia, I use gcc on all three platforms and it has been good towards me. Sometimes I need to compile a specific library on OSX with clang, but it works just fine with gcc ones as well (using ar provided with XCode CLI Tools). Mind you that I use C, not C++ and I tend to grab a GCC snapshot now and then and compile it and use it like that.

kkylin · on April 27, 2016

It also depends a little bit on your style of programming. For example, in C code, I personally like using gcc extensions like nested functions. As far as I know this is not supported by clang (but perhaps someone more knowledgeble can chime in).

Kristine1975 · on April 27, 2016

>nested functions

They are not supported. In C++11 you can use lambdas:

  auto const local_func = [&](...) {
    ...
  };

And then later:

  local_func(...);

CyberDildonics · on April 27, 2016

Not only that but you can even create a struct inside a function and overload operator(). I don't know if this is possible in C++03 or not.

Kristine1975 · on April 27, 2016

>I don't know if this is possible in C++03 or not.

It is/was. It could be used to manually create a lambda function.

lultimouomo · on April 27, 2016

But note that you cannot use such class as a template parameter, as it has no linkage. Which makes it mostly useless, since you usually want to use lambdas with templates.

This restriction has been lifted in C++11.

CyberDildonics · on April 27, 2016

Inside of a function too?

kkylin · on April 27, 2016

Cool, thanks!

pjmlp · on April 27, 2016

There are quite a few other compilers available, if you look into commercial UNIXes, embedded and real time OSes, game consoles, HPC.

wyldfire · on April 27, 2016

> This releases features various improvements in the emitted diagnostics, including improved locations, location ranges, suggestions for misspelled identifiers, option names etc., fix-it hints and a couple of new warnings have been added.

Every time I hear about new GCC releases all I can think of is how the compiler wars have clearly inspired lots of new feature development. From the high level it seems like GCC is still catching up with clang. Are there cases where GCC has leapt ahead of clang in some features?

rawnlq · on April 27, 2016

They are ahead with certain experimental features such as Concepts

mordocai · on April 27, 2016

I believe the discussion in the other comment thread is correct in that GCC generates better binaries.

mioelnir · on April 27, 2016

I always find it amusing that in any 'clang compiles faster, gcc provides better binaries' discussion there are always immediately people chiming in that this is based on old data from gcc 4.9. But clang supposedly hasn't improved its binaries in all that time?

Unless you target a feature or cpu architecture that only one of them supports, it is close to a coin toss nowadays which compiler builds the faster binary.

manaskarekar · on April 27, 2016

>The C++ frontend now defaults to C++14 standard instead of C++98 it has been defaulting to previously

Honest question, in production, is this sort of a thing pretty much a non-issue?

Edit: Sorry, to be clearer, I meant to ask, effect of stuff like changing defaults between different versions. Perhaps, it is a non-issue because most production code is (/maybe) version locked.

Also, a little vague, but out of curiosity, how often do C / C++ shops upgrade to the latest and greatest?

pjmlp · on April 27, 2016

> Also, a little vague, but out of curiosity, how often do C / C++ shops upgrade to the latest and greatest?

Very very slow, specially in the enterprise and embedded areas.

Back when I used to work in C++, the compilers at my employers were version locked by IT admins, not developers. To make sure all company development projects were in sync.

Getting a new compiler version was akin to have a new OS version deployed across the company, from overall process.

lmm · on April 27, 2016

I don't understand the question. Is there a difference between C++14 and C++98? Yes. Are there important features in C++14 that aren't in C++98? Yes. Is there existing code that builds fine under C++98 and will break when compiled under C++14? Almost certainly yes as well.

lallysingh · on April 27, 2016

Yes, the differences between C++14 and 98 are night-and-day different. Quite a few important language features, and plenty of new libraries (including threading and lambdas).

There aren't any (AFAIK) explicit breaks of '98 code in '14. So the theory is that code should compile with the new compiler fine, but you'll likely hit some issues.

Check out Scott Meyers book: Effective Modern C++ for the deltas.

slrz · on April 27, 2016

There are definitely backwards-incompatible changes in the things that were added since C++98. For one, there's a whole set of new keywords that's going to break any code that used them as identifiers.

Then there's a keyword that was re-purposed to do something totally different: in C++, auto used to be a storage duration specifier. In C++11 it is used to indicate type inference.

There were also changes that, while not breaking source compatibility, basically required implementations to break binary compatibility going from C++98 to C++11. This in particular resulted in a gigantic mess.

With regard to breaking changes the C++ committee is not anywhere near as conservative as the one for C.

MereInterest · on April 27, 2016

There is one break that I know of. `>>` is now parsed as two tokens, and will preferentially mean "close two templates" instead of "right bit shift" in cases of ambiguity.

unwind · on April 27, 2016

I work in embedded software, and I've sometimes felt mildly reactionary for suggesting that yeah, we can use C99. It's not "new". Really! We have "bool" now, for when things are, you know, Boolean! And we can declare variables where we need them, not in a huge prelude in the function ...

caf · on April 27, 2016

ITYM "revolutionary" - a "reactionary" is basically the opposite (in a shop that uses C89, a reactionary might advocate for switching back to K&R C).

phkahler · on April 27, 2016

more importantly you can - and should - use int16_t and int32_t.

Gibbon1 · on April 27, 2016

I know they are C99 but I swear I was using stdint or something like it circa 1996

caf · on April 27, 2016

More useful than bool and mixed-declarations-and-code, in my opinion, are: designated initialisers; compound literals; inline functions; and varargs macros.

twoodfin · on April 27, 2016

"Non-issue" is probably going a little far, but if you're cutting edge enough in production to be deploying the latest gcc release, you probably have been running with the new frontend already, and ironed out the few deliberate incompatibilities and deprecations in C++11/14 long ago.

dnlrn · on April 27, 2016

Depends on what kind of production you mean. In your own projects you should set the C++ standard yourself anyway so this is a non-issue. If you are a distribution maintainer, it probably means you have to patch some makefiles that assume that gcc always compiles with c++98

cm3 · on April 27, 2016

This will be fun (not) for distributions. Clang and G++ also seem to be slower when a newer C++ standard is enabled. I don't really buy the argument that changing the default will help new users who want to use C++11 features. Why not make C++14 the default then?

Kristine1975 · on April 27, 2016

>Why not make C++14 the default then?

I might be misunderstanding something, but gcc 6.1 does make C++14 the default.

cm3 · on April 27, 2016

No, you're right, I misread it.

bluecalm · on April 27, 2016

It's a non-issue in production because you can use the older standard if you want still. It's just a compiler switch.

larozin · on April 27, 2016

We get it:

> C++ Concepts are now supported when compiling with -fconcepts.

cm3 · on April 27, 2016

I've recently had to enable C++11 in a project because a header I use added C++11 features, and it appears that going from C++98 to C++11 makes Clang and G++ considerably slower. Can anybody else confirm this?

majewsky · on April 27, 2016

Did you test the --std change in isolation, without your new headers? Some libraries' headers are ridiculously expensive when they make heavy use of templates. My favorite in that regard is Boost::Qi.

cm3 · on April 27, 2016

How do I test it in isolation without including and making use it as before (aka no code change in the consumer of the previously C++98 header)?

jononor · on April 27, 2016

Checkout the version before you introduced the C++11 header. Alternatively, fix the header to be C++98 compatible (may of course be non-trivial).

netheril96 · on April 27, 2016

Just compile the previous version that conforms to C++98 with different std flags and do the comparison.

cm3 · on April 27, 2016

Right, that makes sense. Will do, though once confirmed, I will curse C++1x even more, I guess :).

cm3 · on April 27, 2016

Is there a difference between -std and --std? I've always used -std=FOO.

denim_chicken · on April 27, 2016

Not surprising, since c++11 std headers are generally much larger and more complex.

cm3 · on April 27, 2016

This may explain it, hadn't thought of that. The header in question used C++11 atomics and therefore included a new header, which of course, due to the non-existing module system of C++ mean, means my code pulls it in as well. I know of the plans to add a module system to C++, but (1) I'm not sure it would solve this issue and require each header to have a corresponding module (2) and am skeptical of the availability and support in the field. It is much, much easier to just use Rust or OCaml, honestly, and the combination of Rust+OCaml or Rust+Haskell might be the sweet spot.

the_mitsuhiko · on April 27, 2016

What happened to 6.0.

Aissen · on April 27, 2016

It was the development version.

See "Version Numbering Scheme for GCC 5 and Up" here https://gcc.gnu.org/develop.html .

hyperpape · on April 27, 2016

I'm still confused after reading this. Does this mean that 6.0, 6.1.1, 6.2.1 etc. do not actually name specific code bases that can be downloaded? Or is 6.0 the name of a specific commit in the code?

geofft · on April 27, 2016

Between GCC 4.9 and GCC 5, they made basically the same change Linux made between 2.6.39 and 3.0: they dropped one of the components in the version number. The amount of change between GCC 5, 6, 7, etc. are equivalent to the amount of change between 4.7, 4.8, 4.9, etc.

They're also using the version 0 in the second component to indicate prereleases. GCC 6.0.0 was the development version of GCC 6. The release candidate was 6.0.1. The first release of GCC 6, which came out today, is 6.1.0. Then there will be development under the name 6.1.1, and the first bugfix release on the GCC 6 line will be 6.2.0. The amount of change between 6.1.0 and 6.2.0 is equivalent to the amount of change between 4.9.0 and 4.9.1 in the old numbering scheme.

hyperpape · on April 27, 2016

Thanks for the explanation. I realize this is bikeshedding, but is there a rationale behind using 6.0.1 as a release candidate? I get making major versions less rare, but not sure I get he rest of it.

Kristine1975 · on April 27, 2016

You can download snapshots of 6.0 e.g. here: ftp://gd.tuwien.ac.at/gnu/gcc/snapshots/

majewsky · on April 27, 2016

Ah, thanks for the link. I was wondering because Arch Linux is still on GCC 5.3, yet the release mail says that "the last release" (which I considered to be 6.0) to be about a year old.

beeforpork · on April 27, 2016

And again, my favorite accidental usages of undefined behaviour in C++ are optimised into BS.

(1) if (this == NULL): this is super useful in non-virtual functions, because NULL is the object that subclasses all classes, so the only universal error object.

(2) memset in operator new: in C++03, we did not have initialisers inside struct{}, so quicky zero the obj to avoid error-prone manual inits of all embedded ints.

This is really becoming ridiculous. Having a notion of how C/C++ corresponds to what the machine will do seems to be more and more regarded as evil.

Matheus28 · on April 28, 2016

Except `if(this == nullptr)` doesn't always work: https://blogs.msdn.microsoft.com/oldnewthing/20160224-00/?p=...

You might get a pointer to NULL+8 or something like that if the class inherents from more than one class.

Maybe you should just write compliant code after all.

nhamausi · on April 29, 2016

Hurray for C++14 "by default"

Filligree · on April 27, 2016

Oh, hooray. I wonder which programs are going to break because of optimizations this time?

treehau5 · on April 27, 2016

I think this type of thinking is cynical and unhelpful. We complain that compiler technology is not keeping pace, then when they put in god knows how many hundreds of hours of work improving, we complain when we might have to spend a few to go back through our programs and make sure everything's okay. We want everyone else to do the work for us.

jabl · on April 27, 2016

Knee-jerk rants aside, there is of course the underlying point that C and C++ have such wide scope for undefined and implementation-defined behavior that no mortal programmer can write a non-trivial C/C++ program that follows the standard to the letter.

Whether it's possible to salvage C/C++ by specifying some safe subset (for some ideas towards this see e.g. Regehr's "safe C" or DJB's "boring C") without sacrificing too much performance remains to be seen. Another option, then, would be to start over from a clean slate, e.g. Rust.

While most likely having less of an impact on the world at large, at least personally I find learning myself Rust more fulfilling than the prospect of fighting political battles in the ISOC committée. YMMV, of course.

tl;dr: The above has all to do with the specification of the C and C++ languages rather than compiler writers attempt to exploit the spec to its fullest potential. So yeah, as long as "compiler goodness" is (marketing-wise at least) determined by SPECcpu scores, it's unfair to blame compiler writers for this mess.

necessity · on April 27, 2016

If it's undefined behavior according to the standard then it's following the standard.

You know of course there is a trade-off between the possibility of optimization and strict compliance to a standard with zero or very little undefined behavior.

It's wrong to say C is "lost" because of these characteristics. It might not be a good fit for what you want to do, then use whatever you think is best. Just as Rust might be better than C for some cases, the reverse is true for others.

jjnoakes · on April 27, 2016

This is absolutely not true. I work with many C and C++ programmers, and we ship very large amounts of C and C++ code with a very high adherence to the standards.

jcranmer · on April 27, 2016

I would assert that virtually every non-trivial C/C++ program (>100KLOC) contains undefined behavior.

Here's a list of undefined behaviors that's almost impossible to get rid of:

* Data race. This is particularly fun for anyone who started doing multithreaded code pre-C11/C++11, since volatile does not let you use code across multiple threads without locking.

* Signed integer overflow. Are you sure that there is absolutely no input to your program that would not cause one of your thousands of signed arithmetic operations to overflow?

* Buffer overflow. This is something like 90% of all security vulnerabilities.

* Uninitialized variables. Note that -Wuninitialized doesn't catch all cases, although this is relatively easy to mitigate with a paranoid style guide.

* Strict aliasing rules. Better yet, if you have any sort of custom memory allocation scheme, you're pretty much guaranteed to break this, since the only way you can access an object with a dynamic type of bytes (signed/unsigned char) is via signed/unsigned char. Functions like memcpy or malloc cannot legally be written in C without breaking this behavior. Also, there is not (to my knowledge) any dynamic checker for violations of this property, unlike the other things in this list.

Your program probably has undefined behavior. You just don't know it yet, and your compiler hasn't figured out how to squeak out a 0.5% speedup from screwing you over because of it yet.

jjnoakes · on April 27, 2016

> I would assert that virtually every non-trivial C/C++ program

First, citation needed.

Second, the programs I work on would fall outside of your "virtually every".

> [big list of things]

This list doesn't prove anything. There's a similar list in the standards themselves. That's how we know what to avoid.

Oh, and we have an in-house static analysis program that catches all of those. And more.

Our code simply has no undefined behavior. It passes our static analysis program, it passes UBSAN, and no compiler has ever miscompiled our code (unless it was due to a compiler bug).

You can try all you like, but there's no way for you to convince me that our code has undefined behavior. And there are plenty of projects out there similar to ours.

> if you have any sort of custom memory allocation scheme, you're pretty much guaranteed to break this

Not true. You should learn about the aliasing rules before you speak authoritatively about them.

> Functions like memcpy or malloc cannot legally be written in C without breaking this behavior

Also not true.

> Your program probably has undefined behavior.

The chances of that are much less than the chances of you not knowing what you are talking about.

dzdt · on April 27, 2016

Sibling post gave a case study as a citation providing evidence of the assertion: http://blog.regehr.org/archives/1292

In that case (sqlite) code that passed UBSan, ASan, valgrind, and compiled correctly on all current compilers was studied. A new dynamic undefined behavior checker found additional UB defects at a rate over 1 per thousand lines of code.

I would believe your codebase could have a defect rate one, maybe even two orders of magnitude lower. But short of a formal code-correctness proof, better than that seems unlikely.

jcranmer · on April 27, 2016

It is nothing short of sheer hubris to believe that you have avoided all undefined behavior, particularly given that there exist undefined behaviors that have no extant static or dynamic checkers (hi, strict aliasing).

I do know the strict aliasing rules quite well. As I said in a cousin post, the set of permissible accesses to a lvalue are governed by the dynamic type of an object. As a consequence, strict aliasing queries are not symmetric, which is to say, P* could point to an object of type Q* but not vice versa. The case where this will come up is with signed/unsigned char. If you have a char foo[]; as the dynamic object, it is positively illegal to access that with anything other than unsigned or signed char. This is what really screws up a lot of code.

jjnoakes · on April 27, 2016

Strict aliasing violations are quite simple for a static analysis program to find and report.

> I do know the strict aliasing rules quite well.

Not well enough, if you think what you wrote above ("memcpy or malloc cannot legally be written in C without breaking this behavior") is true.

astrange · on April 28, 2016

> Functions like memcpy or malloc cannot legally be written in C without breaking this behavior

> Also not true.

You cannot implement malloc in C because malloc always returns a pointer that's not a part of an existing allocation. The malloc in libc has special dispensation from the compiler to do this (GCC "malloc" attribute).

Similarly you can't implement pthread mutexes in C because they imply optimization barriers (all global memory might change) that a C function with a visible implementation wouldn't have.

dsfuoi · on April 28, 2016

You can easily implement malloc. I don't understand what relevance with this your paragraph has.

C11 has atomics and all the necessary primitives, fences, locking, etc..., to implement pthread_mutex, including its own mutex.

corysama · on April 27, 2016

Strict aliasing is the only item on your list that I could see leading someone to get upset with the optimizer. And, maaaaaaybe signed integer overflow. With strict aliasing, I don't think your examples are correct. They all involve typecasting pointers, but don't involve interacting with the original type later. So, memcpy uses a void* as char* and MyAllocator uses char* as MyType*, but in both cases they never go back and expect predictible new values in the original types.

The rest are clearly taught in CS100 to be doorways to chaos. In particular, complaining that volatile doesn't get rid of the need for locks is just making noise. Might as well complain that auto doesn't get rid of the need for locks. They are both unrelated to threading.

Kristine1975 · on April 27, 2016

>And, maaaaaaybe signed integer overflow.

Overflow checking is sometimes done like this (where a is a signed integer):

  if (a + 100 < a) ...

Of course this invokes undefined behavior, and some people get angry when compilers remove the check: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475

anttihaapala · on April 28, 2016

And of course the code is wrong, because the behaviour is undefined - in 64-bit architectures it is very common to use 64-bit registers for `int` even if in memory `sizeof(int) == 4`; thus `int a = INT_MAX;`, `a + 100` could still be greater than `INT_MAX`.

netheril96 · on April 27, 2016

Two points:

- Data race leads to subtle bugs on all languages and runtimes, including Java and C#. They are just not called undefined behavior in those languages.

- char* and unsigned char* are allowed to alias any pointer. This is one of the exceptions to the strict aliasing rule.

lomnakkus · on April 27, 2016

> Data race leads to subtle bugs on all languages and runtimes, including Java and C#. They are just not called undefined behavior in those languages.

That's because they aren't. You might get incorrect results or exceptions or whatnot, but NOT Undefined Behavior aka. nose-demons. What can happen in cases of data races is always constrained by the VM model. (Obviously, this is modulo bugs in the actual VM implementation, but that probably goes without saying.)

netheril96 · on April 28, 2016

In practice, undefined behavior is unpredictable because compiler does optimization with the assumption that you do not have undefined behavior. When you do have them, the optimization won't preserve the "as-if" rule.

The same applies to Java/C# when data race is present. The JIT must be generating optimized codes assuming that no data race occurs, because it is impossible to detect or correct them (at least in current implementation). When you do have data race, the bugs will be as subtle and Schrödinger's as if data race occurs in a C program.

the_why_of_y · on April 28, 2016

Data races in Java/C# races will result in incorrect values (which may be serious if your program is doing anything important and subtle to find, as you say), but they will not ever result in "undefined behavior" in the way ISO C defines the term. Specifically a data race in Java/C# will not cause your program to do out-of-bounds memory accesses or use-after-free or corrupting unrelated objects because other parts of the respective VM specifications prevent such outcomes.

For the details see e.g. http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html...

jcranmer · on April 27, 2016

There is a very important bit of strict aliasing that your statement misses. Yes, char* and unsigned char* can be used to access any lvalue. However, only signed/unsigned char* can be used to access a signed/unsigned char array. (Strict aliasing is defined via the dynamic type of an object, not which alias pairs are necessarily must-not-alias. Symmetry is not inherent, and in the case of char*, it is indeed not present).

netheril96 · on April 28, 2016

When you write custom memory allocator, you request memory with `malloc` or `mmap` first. The return value of those functions are raw storage, not objects with dynamic type of character array. Neither is the return value of you custom `allocate`. Strict aliasing rule does not apply.

dsfuoi · on April 27, 2016

There are other exceptions. Signed or unsigned or a qualified type may alias the corresponding type. A type that includes the type as a member.

qwyxzy · on April 27, 2016

> Functions like memcpy or malloc cannot legally be written in C without breaking this behavior

Sorry, I don't understand this. Does this mean that these C standard library functions cannot be written in C without breaking the ISO C standard?

PeCaN · on April 27, 2016

Parent is wrong. char, signed char, unsigned char, and void pointers can all legally alias anything. Those functions can be written in standard ISO C.

pdw · on April 27, 2016

Can you name an open-source C or C++ project that you believe avoids undefined behavior? Because some people have spend serious time looking without success. E.g. http://blog.regehr.org/archives/1292

jjnoakes · on April 27, 2016

I don't think many open source projects do, no. Most of them aren't extremely quality-focused, and don't spend the money to develop or purchase some of the static analysis and formal verification tools that help a code base reach the level of quality I'm referring to.

But if I tried hard enough, I'm sure I could find a few that have been run through some sort of formal verification tool.

Whether I could or not doesn't change my assertion at all though.

necessity · on April 27, 2016

I cannot see the language being at fault in cases such as the ones pointed out by the article:

> SQLite’s vdbe struct has a member called aMem that uses 1-based array indexing. To avoid wasting an element, this array is initialized like this: p->aMem = allocSpace(...); p->aMem--;

dzdt · on April 27, 2016

In its deep history, the C language was defined by what the implementations did. Only later was it standardized, with an attempt to permit existing implementations be viewed as conforming or easily updated to be. Even decades later, though, the culture of developers tends to the view that something which has always worked in every important implementation should be allowed and continue to work in new implementations. This is such an example. Only a new generation of compiler writers are challenging this, trying to shift to a language-lawyer interpretation of the standards.

kps · on April 27, 2016

  > the culture of developers tends to the view that something which has
  > always worked in every important implementation should be allowed and
  > continue to work in new implementations

Yes, and X3J11 stated as its first guiding principle,

  > Existing code is important, existing implementations are not.

But then they committed the original sin against “simple C” by inventing ‘volatile’, breaking systems code written when p[0]=x was expected to write to location p. Unlike ‘const’, with which the programmer grants additional license to the compiler, C89 granted the non-‘volatile’ license to the compiler by default. In retrospect, I think C would have been better off retaining do-what-I-wrote as the default, and requiring the programmer to grant the compiler license to do otherwise. C already had the ‘register’ keyword to indicate that a variable should be compiled for speed and need not be literally preserved according to a naïve reading of the code.

Someone · on April 27, 2016

Very high adherence < follows the standard to the letter.

=> You need stronger arguments to make the "absolutely not true" claim.

jjnoakes · on April 27, 2016

"Very high adherence" means we pass UBSAN, we pass our in-house static analysis tool, we've never been miscompiled by any compiler (modern, old, bleeding-edge, on multiple platforms) unless it was due to a compiler bug, ...

By all practical definitions, and by any measurable way to declare a codebase free of undefined behavior, ours is.

And I bet others out there are too. There are formal verifiers for C programs that are used in industries. Unless there's a bug in the verifier, the program that it verifies is, by definition, free of undefined behavior.

Just because you, or the open source projects you use, don't code to the standards, or don't have or use the tools that help you do so, doesn't mean there aren't industries out there who can, and do.

DannyBee · on April 27, 2016

Except, they do that too now, with undefined behavior sanitizer available in both llvm and gcc. Sure, it doesn't catch everything everything, but ...

lmm · on April 27, 2016

> We complain that compiler technology is not keeping pace

Who was complaining about that?

> when they put in god knows how many hundreds of hours of work improving, we complain when we might have to spend a few to go back through our programs and make sure everything's okay. We want everyone else to do the work for us.

Their whole job is to compile our programs - what else is a compiler good for? I don't know who GCC's users are these days - people who care more about benchmarks than working code? Because that seems to be who they're optimising for. Then again I suppose that's the entirety of C/C++'s market these days.

chriswarbo · on April 27, 2016

> Their whole job is to compile our programs - what else is a compiler good for? I don't know who GCC's users are these days - people who care more about benchmarks than working code? Because that seems to be who they're optimising for.

If all you want the compiler to do is make a working program, just disable all optimisations.

dave2000 · on April 27, 2016

Are you saying you have an example of valid c++ code which fails to compile, and that this problem is getting worse, or just that you get unexpected behaviour when write bad code?

coldpie · on April 27, 2016

Compiler bugs caused by optimizations aren't at all unheard of. We got bit by one recently: https://bugs.winehq.org/show_bug.cgi?id=38653

That said, I agree with treehau5.

lmm · on April 27, 2016

"Valid" in that the standard absolutely requires that compilers give them some particular meaning, no matter how rules-lawyery the compiler is? No. "Valid" in the sense that the intent is perfectly obvious to any reader? Yes, plenty. "bad"? No, unless you use the standard as your definition of good/bad.

lomnakkus · on April 27, 2016

> the intent is perfectly obvious to any reader

So compilers need to judge what is "perfectly obvious to any reader", now? And you're willing to deal with the fallout of different compilers having different conceptions of "perfectly obvious"?

That's... an ambitious project.

Maybe your time would be better spent convincing the C++ committee to define some more behavior instead of leaving it undefined.

lmm · on April 27, 2016

The C++ committee exists to standardise the behaviour that major compilers implement. Trying to get them to define behaviour where there isn't already at least broad support would be putting the cart before the horse.

wtetzner · on April 27, 2016

If you're not interested in performance, but are interested in working code, then why wouldn't you pick something other than C/C++?

Kristine1975 · on April 27, 2016

Same procedure as every year: Only programs that don't conform to the C/C++ standard.

Not counting true compiler bugs of course.

Edit: The release notes at https://gcc.gnu.org/gcc-6/changes.html mention three programs that apparently will break:

Value range propagation now assumes that the this pointer of C++ member functions is non-null. This eliminates common null pointer checks but also breaks some non-conforming code-bases (such as Qt-5, Chromium, KDevelop)

jordigh · on April 27, 2016

Or the D standard, or the Java standard, or the Fortran standard, or the Golang standard...

It is the GNU compiler collection, after all. I'm actually particularly excited about a new release of gdc myself. I've been reading Alexandrescu's excellent The D Programming Language, and I think it may be just the better C++ that I always wanted, ahead of Golang and Rust.

lmm · on April 27, 2016

GCC has an Ada implementation too - you might (or might not) consider that.

Filligree · on April 27, 2016

At least with the C standard, it's theoretically possible to learn it. I'd challenge you to find one person in your company who fully understands the C++ standard.

easytiger · on April 27, 2016

Stroustroup in his book claims he doesn't fully understand it.

rleigh · on April 27, 2016

Well, this is the language where Turing-complete template metaprogramming was "discovered" as an emergent property of the system!

CyberDildonics · on April 27, 2016

Giant, complicated, wildly successful.

octoploid · on April 27, 2016

Well, just use -fsanitize=undefined and it will point out (most of) the isuses for you.

Ace17 · on April 27, 2016

Provided that your test suite properly trigger the execution of tricky parts, in tricky situations (You do have a test suite, right?)

Kristine1975 · on April 27, 2016

>You do have a test suite, right?

Of course I do. I call it "customers".

Ace17 · on April 27, 2016

Ok great then, and don't forget to tell them to compile using "-fsanitize=undefined" !

typon · on April 27, 2016

We do love our customers don't we? They make the best testers.

pjmlp · on April 27, 2016

Well, not all C compilers have it.

pjmlp · on April 27, 2016

And I challenge you to find someone that knows all C rules about UB across all major compilers.

octo_t · on April 27, 2016

Undefined behaviour is the same across all compilers. The compilers aren't the things which make the behaviour undefined, its the C standard.

pjmlp · on April 27, 2016

No it is not, if you ever tried to write portable C code you should clearly be aware that each C compiler has its own flavor of undefined behavior.

The C standard is very generous in terms of undefined behavior freedom.

dsfuoi · on April 27, 2016

By definition made in the C Standard, undefined behavior is: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

If you're using some other definition, then say so, otherwise your comments make no sense.

pjmlp · on April 28, 2016

Given my-app.c, each C compiler out there will pick a different UB subset and not everything that is defined as UB in the standard.

They are free to do that and still be compliant, after all UB means anything goes.

lorenzhs · on April 29, 2016

That compilers may behave differently is exactly the point of UB, if they all had to behave the same it wouldn't be undefined, after all!

jb1991 · on April 27, 2016

True dat. C is a smaller language and is possible to know the whole language, but not the whole spectrum of UB. C++ is a vastly larger language it is nearly impossible to know the whole language, but much more likely to understand the spectrum of UB.

DannyBee · on April 27, 2016

Actually, some: "Type-based alias analysis now disambiguates accesses to different pointers. This improves precision of the alias oracle by about 20-30% on higher-level C++ programs. Programs doing invalid type punning of pointer types may now need -fno-strict-aliasing to work correctly."

ben0x539 · on April 27, 2016

They're helpfully listed in the release notes!

> This eliminates common null pointer checks but also breaks some non-conforming code-bases (such as Qt-5, Chromium, KDevelop)

jbb555 · on April 27, 2016

We do seem to have got into a race to make C and C++ compilers less and less useful while still following the letter of the rules (but certainly NOT the spirit).

Kristine1975 · on April 27, 2016

I think modern C and C++ compilers are very useful. Then again I know that these languages are full of huge pitfalls, so I know what I'm getting into, and when I make a mistake I blame myself, not the compilers.

I also use the features of modern C/C++ compilers such as warnings (-Weverything on clang, then turn off those warnings I'm not interested in) and sanitizers (Undefined Behavior Sanitizer, Address Sanitizer).

wtetzner · on April 27, 2016

I was under the impression that the whole point of undefined behavior in the spec was to give implementers flexibility. Am I wrong? If not, then it sounds like it's very much in the spirit of the rules.

jordigh · on April 27, 2016

That's implementation-defined behaviour (do something nonstandard and document it) or unspecified behaviour (do something specific and nonstandard but don't have to document it).

Undefined behaviour allows the compiler writer to have the code do anything unreliable and unpredictable and not document it. Its purpose is to allow compiler writers to completely ignore the erroneous code that results in UB and not spend effort trying to diagnose it or implement it. Most compilers still make an effort to diagnose some UB anyway, though.

lmm · on April 27, 2016

The original reason was to ensure the standard was compatible with existing compilers (i.e. to leave behaviour which varied between existing compilers unstandardized) and to allow hardware-native behaviour for things like integer overflow and extended shifts, not to allow aggressive optimizations.