HN2new | past | comments | ask | show | jobs | submitlogin
So you think you know C? (2016) (wordsandbuttons.online)
378 points by goranmoomin on July 6, 2019 | hide | past | favorite | 322 comments


Having written a conforming C compiler, at one point I knew everything there was to know about C (I forget details now and then, or confusing them with C++ and D).

But knowing every engineering detail is not the same thing as knowing how to program in C effectively. It's like being the engineer who designs a Grand Prix car. It does not mean you can drive it faster around the track than anyone else. Not even close.

For example, the C preprocessor is surprisingly complicated. I had to scrap it and rewrite it completely 3 times. If you try to make use of all those oddities, my advice is don't waste your time. Over time I removed all the C preprocessor tricks from my own code and just wrote ordinary C in its place. Much better.


I couldn't agree more. After spending many years working with LLVM, which is at its heart a C compiler, and understanding why it has to do the sometimes-terrifying things it has to do to get C to run well, I've become very paranoid when writing C or C++. My C/C++ code is as boring as possible.

(In fact I try to avoid writing C or C++ whenever possible these days; undefined behavior in the language is too pernicious and unfixable without breaking compatibility. I think both languages are approaching obsolescence.)


> My C/C++ code is as boring as possible.

One advantage to being an older programmer is I don't feel any need to show off any more. I try to make it so obvious that anyone would look at it and think that's so simple, anyone could do it.

It's surprisingly hard to write simple code. Any idjit can come up with Rube Goldberg code.


Though different people consider different things simple. To one a loop is fine, to another a map is a better choice.


> to another a map is a better choice.

Are you thinking more of JS or another functional scripting language than C/C++? C doesn’t have map, so it’s not an option, and in C++ it’s called something else.

In JS, I so wish that I could switch to functional constructs like map permanently, but map and foreach are much slower than loops, an order of magnitude or more for tight loops. I’m still forced to use loops in performance critical code, even if I consider map a better choice.

Coming back to C++ after having been in JS land for 5 years, C++ feels constantly difficult to use, and all the names for the functional primitives don’t seem to make intuitive sense like they do in JS.


> order of magnitude or more for tight loops

I just learned this about JS's map/foreach a few weeks ago. I didn't think it was possible to be more disappointed by JS than I already was, but somehow I managed it.


map/filter/reduce is simpler, but it takes some getting used to. Loops have worn a deep rut in my brain.


As someone who was introduced to programming via map/filter/reduce, for loops are incredibly more complicated.

I don’t think I’ve ever written one correctly on first try.


I've used loops so much I don't even see the loop anymore as a collection of constructs, I see it as a single thing. There are a lot of easy mistakes to make with loops, but I don't make them anymore (of course, by writing that, I will make one!). For example:

    #include <stdbool.h>
    #include <stdio.h>
    typedef long T;
    bool find(T *array, size_t dim, T t) {
      int i;
      for (i = 0; i <= dim; i++);
      {
        int v = array[i];
        if (v == t)
          return true;
      }
   }
There are 5 errors in that example.


Is the fifth error that there are only four errors?


Are you primarily a JS/front-end dev?

Because this is crazy to me! Some languages only use for loops (Go).


Full-stack, spent most of my time on the backend in ruby or scala.


> I don’t think I’ve ever written one correctly on first try.

Please don't take offense, but this is odd to me. I honestly severely doubt I am some sort of programming super genius, but I have never had any issues setting looping logic correctly. (I took four years to teach myself programming & CS and now I've been at my first professional dev job for ~6 months.) None of my colleagues seem to have such issues either. What are you experiencing trouble with most? Off-by-one?


For those who may have trouble with loops, take heart: after 45 years of non-stop programming, often 10+ hours a day, I still find myself sometimes doing mental loop simulations with small (few element) data to make sure a loop is correct.

It's usually much easier to take extra time to make sure it's right than to debug it later.


I mean, I've written at most a couple dozen in my 7 years as a professional programmer (usually tight loops for perf), so sheer unfamiliarity is a big factor.

But yeah, syntax (what order the arguments go in) and off-by-one issues are the majority, I think. Plus figuring out what my initial accumulator needs to be.

Idk, map/filter and friends are just a much more direct mapping of how I think of programming.


Unfortunately, loops still seem to be the easiest way to iterate over two collections simultaneously.


“#zip”?

Or maybe I’m misunderstanding the use case?


Which only works if the sequences align perfectly. Handling misaligned collections is more awkward with functional constructs.

Functional graph programming is also still a bit of an open problem. There are awkward scenarios in both cases.


Ah, I see what you mean. Yeah, that’s always awkward

Not sure how you deal with that with for loops either. Increment the iteration var in the body of the loop? (Seems scary to me, but like I said, I’ve got terrible intuition with them)


For something like iterators:

    var ie1 = foo.GetEnumerator();
    var ie2 = bar.GetEnumerator();

    while(true)
    {
        var has1 = ie1.MoveNext();
        var has2 = ie2.MoveNext();
        if (!has1 && !has2)
            break;
        if (has1)
            // do something with ie1.Current
        if (has2)
            // do something with ie2.Current
    }


Same here.

C++ only when Java or .NET need it's assistance, or integration with OS APIs that require C++ (NDK, WinUI, DX).

Then C only when there is no alternative (customer wants it, we only do C here, required lib is C only e.g. SDL, ...).

It is an herculean project, but maybe some day LLVM could be rewritten into something else. After all it isn't the first compiler stack, just the one that got most famous.


What would we write drivers in then if C was obsoleted?


Apple says the future is C++ and Swift.

Microsoft says the future is a mix of constrained C++ (Core Guidelines), Rust and AOT C#.

Google says the future is C++ and Java (as of Treble) on Android, with Go, C++ and Rust on ChromeOS and Fuchsia.

ARM says C++ on mbed.

GenodeOS says C++ and Ada.

Newton, Symbian, Bada and BeOS used C++.

C is married with UNIX, they were born to each other, other OSes have long followed other paths.


Minor point, but isn't Android moving to Kotlin? I don't think they see Java being a sustainable choice for the long term.


Yes they are, but in what concerns Treble and platform libraries, they plan to keep their Java 8 variant around.

Apparently they are also adding support for desugaring Java 10 language features (yep 10 not 12) that don't rely on new JVM bytecodes (as per Google IO talk about state of Android tooling).

At least for now.


Rust? I know a lot of people evangelise rust — to the point of annoyance of others — but as we move into an era where the entire world is run on computers, it is just not acceptable to have decades old infrastructure susceptible to bugs often caused by someone not understanding undefined or implementation dependent behaviour.


Is still a little early, but as someone who works at the bare metal level, Rust shows early promise.

The features required are still unstable (as in not in the stable version of the compiler), but it’s getting there, and doing it fast.

In any case, firmware behaves a lot like banking software, language change will take time


Cargo feels like too bloated to me for it to be something suitable for something low level like embedded. Unless the rust team can make it more appealing to use the language without cargo, I don't think there's much future.


In bare metal systems, cargo would run on the developers host system, so that should not be an issue.

Also, most toolchains of the sector are way more complex to maintain and way more bloated.


We use rust in embedded and love cargo! Coming from C++ and the endless mess of build systems that exist there, cargo is a breath of fresh air! What don't you like about it?


The Oberon system has drivers in, well, Oberon (which is a high-level Pascal successor with garbage collection). Low-level memory access is done through magical peek/poke functions, in which all the dirtyness is concentrated (and these might not be allowed in user programs - not sure). This means the language as a whole is not littered with unsafe pointers just to service the tiny subset of programs that need them.


And it is used commercially.

Astrobe has been selling development kits for ARM based boards for years now.


pcwalton might have an opinion :). He is Rust lead designer.


Not lead designer :)

But yes, Rust, or even in userspace, as newer and/or more microkernel-ish OS's allow for. Apple is doing work to allow drivers to be written in Swift...


I believe DriverKit is still C++.


Just some parts of it.

Some device classes (not to mix with OOP ones) can only be programmed with C++, while others can be developed in any compiled language able to link to the OS APIs.

There is a WWDC session on it.


I watched it, and they're pretty clear that driver extensions (using DriverKit, like I mentioned) must be written in C or C++. System extensions can use any language.


I was thinking about whole driver feature set, so got it wrong.


...and of course pre-Apple, DriverKit was Objective-C.


And pre-OS X, Mac OS was Object Pascal. :)


C has a standard with multiple competing compilers.

Rust does not have these same features to date.


With multiple slightly incompatible competing compilers.


D as BetterC !


Zig


And that's why people claiming that C++ is more complicated than C because it has an even bigger specification miss the point.

What counts is how easy to use in practice. You can get along just fine in C++ without knowing the exact aliasing rules from C or or how to specialize a template. What matters is that the extra features of C++ makes actual programming simpler, not harder. (for example destructor (RAII), standard library, classes, ...)


> You can get along just fine in C++ without knowing the exact aliasing rules from C

Nope, you can't. These are exactly the things that introduce undefined behavior (i.e. total breakage) if you aren't very careful about what you're doing at all times. Don't take my word for it, check out what the C++ designers themselves state about the issue in the C++ Core Guidelines. C/C++ is far from simple, and thinking that you can just make things up as you go along is a serious mistake.


You are absolutely right, but my point is that when you do modern C++, in the application code, it is very unlikely that you need to use reinterpret_cast in your code, and therefore you don't need to know all the subtlety about it.

So despite C++ being more complex than C, if you limit yourself to some practical subset, it is actually easier than C.


Same can be (and has been) said about JavaScript. Or C. Or any language with any kind of issues.

And it doesn't work in practice.


I think it does sometimes? At least for enterprise software.

We use both C and C++ embedded as well as C++/Qt for the desktop control system, and we have no issue keeping to a sane subset of C++.

There are clearly defined rules and code review does the remaining enforcement. And it's not even hard or time consuming as everyone is pretty much aligned and every small issue can be easily resolved with a quick chat.


Strong agreement here.

You can limit yourself by choice to certain areas of the language that you know inside out (you know the asm they produce, etc.) and use it to solve problems. Don't worry about every single corner case. As you said, you can easily forget those things, especially if it's not your day-to-day job.

The same idea and principle can be found in "JavaScript - The Good Parts".


Having recently been doing some web development, I'd argue that JavaScript's design (even today, despite some modern additions) is so conducive to undebuggable spaghetti code (i.e. this ) that there really are no good parts.

Sometimes the technology is objectively the wrong choice i.e. compiling javascript to native code (Not a JIT).


Some languages do fight your attempts to write good code every step of the way :-)

No, I won't name names.


I'm currently writing a library that loads, parses and includes c headers at compile time, in D.

It's not finished but let's just say I've managed to make yours and Andrei's thoughtful design into a monster


Over time I removed all the C preprocessor tricks from my own code and just wrote ordinary C in its place. Much better.

I don't get it. If my macros produce standards-compliant code and they make my code easier to read and understand, why shouldn't I use them?

My goal as a developer is to write clean performant bug-free code... not to make life easy for compiler developers.


> If my macros produce standards-compliant code and they make my code easier to read and understand, why shouldn't I use them?

The problem is they don't make code easier to read and understand. Worse, the unhygienic nature of C macros makes it hard to contain them.

I haven't seen your code, so I'm speaking based on what I've seen of mine and others' code. If you dial it back, the person who has to deal with your code after you leave will appreciate it.

More generally speaking, if you're doing metaprogramming with the C macro system, you've outgrown the language and should consider a more powerful one.


I once worked at a place that had a platform specific "DEBUG" log macro.

It worked something like: DEBUG(msg); Except, it was defined in such a way that you actually had to have two closing parens, like DEBUG(msg)); It looked syntactically invalid, but whatever the macro did required it.

The entire code base was littered with WTFs like that...


I hated debug macros. There just was never a clean way to write them. I was determined that D would not suffer from that problem. `debug` is a keyword in D, and you can do things like:

    debug printf("I got here\n");
and the printf only gets compiled in when compiling with -debug. (Any statement can be used after the printf.) Even better, semantic checks for debug statements are relaxed - for example, purity is not checked for them.

Meaning you can embed debug printf's in functions marked 'pure', instead of having to use a monad.


Because when you work on a team, not everyone is a C Gandalf, probably not even yourself a couple of months later when fixing a bug with everyone screaming that the system is down.

Exaggerating here, but rule of thumb is that it takes twice to debug as it takes to write it, so how long do you want to take to do maintenance fixes?


Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

--Brian Kernighan


rule of thumb is that it takes twice to debug as it takes to write it

Ok, so... macros which let me write code faster should help with debugging time as well? ;-)


It is an exponential curve with cleverness as input parameter.

So it depends how complex they are.


Can you assure that your preprocessor tricks always generate compliant code?

For me writing embedded code is the ultimate test for your programming skills, since a lot of C toolchains for embedded devices (as in bare metal embedded) are unstable, only almost compliant and full of weird behaviors and hacks.


Here's an example: https://github.com/Tarsnap/libcperciva/blob/master/datastruc...

As long as you invoke use it properly, e.g.

ELASTICARRAY_DECL(PTRLIST, ptrlist, void *);

there's no way it will create non-compliant code.


How ('yeah im going to write this') do you start such project like a c compiler? Seems like a huge thing to write. What did you write first? Did you already have a lot of compiler knowledge so you had a good idea/structure/flow etc in your mind already? And how long did it take you?

tia


I'd written a couple compiler like toys before, but nothing like a real compiler. I just started writing it. Took a couple years.


A compliant C compiler is probably a year's work in a modern language give or take?


It takes 3-5 months just to write a compliant preprocessor. Writing a basic code generator is a year, writing a basic optimizer is another year.

But if you want a competitive compiler, better pencil out 10 years.


> at one point I knew everything there was to know about C

after taking the test, wouldn't this be:

at one point I knew everything there was to know about (my implementation of) C

also: one time long ago I tried to use the c-preprocessor to preprocess a data file. ha ha ha ha ha. (conclusion: don't do that)


Somewhat related, what would you recommend for someone who has never programmed in C?


There are a few problems with the questionnaire. "I don't know" is pretty generic choice to be given/choosen.

Say for example, in question 5, the statement "return i++ + ++i;" is undefined, because the value of i is read and modified twice in a single sequence point (and of course, the order of addition is unspecified), which is not allowed in C. So the answer is "Undefined." (The explanation given in the page not accurate enough in terms of C)

And for question 1, the code is valid, but the result is not strictly defined. It depends on the implementation. So the answer is "Implementation defined."

The usage of "main()" hurts me, the strictly conforming way is to write it as "int main(void)" (or similar)

I feel like the questionnaire piss off people who really knows C.


To me "I don't know" is a very apt choice. It makes the point clear that indeed reading the code does not allow you to know the result, which is quite a pitfall.

In your comment you are jumping from "I don't know", which is the first step, to wanting to explain why.


There were multiple questions where I would have answered "It's undefined" or "It's implementation defined", but those weren't options. It's not that I don't know the answer; I know the answer ("it's implementation defined according to the spec, but on essentially every relevant platform, the result will be X"), but the "it's implementation defined" part of my answer isn't an option, so the only possible answer becomes "on essentially every relevant platform, the answer is X".

Using "I don't know" as a substitute for "I know that the standard clearly covers this, and it says that the result depends on the implementation" does seem to be designed to piss off people who know C. If they really wanted to get the point across that you don't know what is and isn't implementation defined or undefined, they shouldn't be using vague questions to mislead people; they should just plainly ask questions which people don't know the answer to.

I hate this kind of questioning where you 100% know the subject matter the quiz is asking about, but the question and possible choices is so vague you have to try to interpret what you suspect the person who wrote the quiz wants the answer to be. I once had an exam which was full of that kind of multiple choice question, and guessed the exam author's intentions wrong on most of them.


the quiz asks what each one would evaluate to. “Undefined” or “implementation dependent” are not answers to that question. “I don’t know [because it is undefined]” or “I don’t know [based on the given information]” are logically consistent answers to the question that was asked


Ok, say I made a quiz where I ask you about what `((((16 <= 16) << 16) >> 16) <= 16)` evaluates to. I could give you the options 0, 1, 16, or "I don't know". If I as a quiz author wanted to test your knowledge about shift operators and esoteric uses of equality operators and your ability to reason through an expression, I would mark "1" as correct, and "0", "16", and "I don't know" as incorrect (I would include the "I don't know" option just because that's a common thing to do in quizzes, to not force people to guess one of the options if they don't actually know the answer).

My point isn't that there's no logically consistent answer. My point is that there are _two_ logically consistent answers, and which one is correct depends on the unknowable state of mind of the quiz author. On the other hand, if the options included "It's implementation defined" or "it's undefined", the author would have made their expectations clear, and the quiz would actually test people's knowledge of C rather than people's ability to try to reason about what sort of answer the author expects.


But the point is that I do know what that will print on my computer(s), with my compiler(s), on my architecture(s). "I don't know" is too generic a statement.


On your computer yes, but the question specifically doesn't tell you what computer and compiler are used. So thus you can't know what the number is.


Even on a given CPU with a given compiler you usually do not know until you've tried.

This is an article about the C language and the starting point with all the examples is that you do not know for a fact what the result will be.


I understand your claim. But I've gotta say this claim is maybe a little too aggressive. I know for a fact that on a Sun-3 (68020 SunOS Desktop Pizza Box) using either gcc or the bundled cc, all of them would have been the same answer, and the answer would have been known to the coder, before running that code (unless you unleashed one of the gcc command line dogs). Except maybe #5, because who does this?


> So the answer is "Undefined."

The code is undefined, but "I don't know." is still the correct answer for what happens to the variable.


> The code is undefined, but "I don't know." is still the correct answer for what happens to the variable.

Well, then a better choice would be "I can't know."


"I cant know" is a subset of "I dont know"

"I dont know" was absolutely the correct answer.


Except from the perspective of a pragmatics linguistic analysis, "I don't know" has a social context of "There's an answer, and I don't know it."

In this case, a non-C programmer should answer "I don't know" to all of them. A person with a passing familiarity should answer similarly. A seasoned pro would be forced to answer the same. Making it a rather useless tool for distinguishing people who think they know C but are honest when faced with their limitations or those who truly know it and know the answer is undetermined, which is supposed to be the point of the exercise.


>which is supposed to be the point of the exercise.

I don't think that assumption is justified. Someone could say the point of the exercise is to illustrate that C is confusing.


You don't think the claim is justified because some might say otherwise. Some is irrelevant. Some might say lots of mutually exclusive interpretations. It's the author's intent that matters, and the context of the Author's post indicates the some interpretation isn't the author's intent. His post begins with the question "So you think you know C?". He then goes on to present a test that is, by his own words, intended identify to test takers whether or not they really understand the intricacies of C, and to think critically about that source of their knowledge "I had to learn to rely on the standard instead of folklore; to trust measurements and not presumptions; to take “things that simply work” skeptically"

Never once does the author mention that C is confusing, use the word confusing, or otherwise indicate that general idea. If you're getting that impression, it's your own reading into it. I'm not even saying you'd be incorrect, but that's not the author's intent, which was the basis of my comment.


If what "some" might say about what the author intends is irrelevant, then what you say about what the author intends is also irrelevant, because you are just some person (unless you're the author). My point was why should I trust your interpretation of what the author intends more than anyone else's.

>"So you think you know C?"

That goes along with the interpretation that the point is to illustrate C is confusing. It would go along with something like "You think you know it, you think it's simple, well actually you don't know it, it's confusing."

>intended identify to test takers whether or not they really understand the intricacies of C, and to think critically about that source of their knowledge

Yes, its intent is to indicate to test takers that a lot of them don't really understand the intricacies of C, which demonstrates that C is more confusing than they originally thought.

>Never once does the author mention that C is confusing, use the word confusing, or otherwise indicate that general idea.

Here are some quotes that indicate the idea that C is confusing:

>C is not that simple.

>It’s only reasonable that the type of short int and an expression with the largest integer being short int would be the same. But the reasonable doesn’t mean right for C.

>Actually, it’s much more complicated than that. Take a peek at the standard, you’ll enjoy it.

>The third one is all about dark corners.

>The test is clearly provocative and may even be a little offensive.

Then the author says that he did C for 15 years and thought he knew it, but then realized he didn't. That indicates to me either that the author is saying that he's not smart, or that C is confusing. The second appears to be the point the author is actually making.


My interpretation is based directly on what the author states. Your "some" is based on a vague aggregate group whose interpretations, in aggregate, would be diverse and often contradictory and mutually exclusive. Personally, I trust the explicit an implied interpretation of the author's direct statements than you mere speculation as to what others might interpret.


If you don't like "some" then replace it with me. I interpret it as the author saying C is confusing.

My interpretation is also based on what the author states, fairly explicitly. And I don't think there's anything that explicitly contradicts my interpretation.


You say it's confusing because the author says it's not simple. The same might be said of any language. Or of any learning specialty at all. It's not synonymous with confusing. You're severely stretching the meaning of the author's words when you say the author's point was to say that C is confusing. It's what you infer because you were confused, which points to this being personal to you, not the general intent of the author.


Showing how people misunderstand the intricacies of C is much closer to "C is confusing" than "distinguishing who is honest about their limitations".


And yet its not confusing. Given the confines of any particular implementation and compiler the behavior can be known without confusion. The author never directly mentions or implies that their intent is to convey that C in confusing. Quite the contrary, they indicate their intent is to demonstrate that certain segments of people who believe they know C don't in fact understand its intricacies.


> Given the confines of any particular implementation and compiler the behavior can be known without confusion.

Only through extreme levels of compiler code inspection, as it can vary based on optimization heuristics.

> Quite the contrary, they indicate their intent is to demonstrate that certain segments of people who believe they know C don't in fact understand its intricacies.

Demonstrating that people don't know C is subtly different from an intent of testing whether people know C. The point being made is about C itself.


> "I cant know" is a subset of "I dont know"

There is a world of difference between "don't know" and "can't know", as the first implies a shortcoming on the side of the developer while second one states that the question is patently meaningless to someone who does master the language.


Compile it, look at the assembly. You can know. The answer will vary from place to place, but it isn't non-existent.


With what compiler? Targeting what architecture? The question is underspecified. "I can't know" is correct.


Q: What is a leaky partial abstraction of the C standard?

Ans: A compiler.


And then you update your compiler and something completely different happens.


Or your standards conforming but mischievous compiler does something nondeterministic ;)


1) This made me curious. Are any of the compilers in real use nondeterministic?

2) Probably that's not needed? A normal optimizing compiler just inlines the function somewhere new — and boom? Then again, can that really happen with practical contemporary compilers and this exact statement?


Most compilers are nondeterministic in small ways. For example, it's common to use hash tables that are keyed by pointer address and then iterate over the entries in storage order, so the order in which certain things are emitted will change from run to run. This is why "deterministic builds" are such a big deal, and not just an obvious thing that you get for free.

I don't know what the chances are that such a thing could ever translate into good assembly being emitted in one run and bad assembly being emitted in the next.


Register allocation can be quite tricky, and sometimes it can only explore a small part of the problem space, so if you don't start the algorithm with exactly the same seed you might end up with significantly different code in certain functions.


UB includes the code not compiling, although it rarely happens in practice.


> Compile it, look at the assembly. You can know. The answer will vary from place to place, but it isn't non-existent.

This pretty much is the definition of "undefined behaviour" in the context of a standardized language specification.


Whoa there! You mean “unspecified behavior”. int i = [unspecified] means that i has some value, but the spec doesn’t determine the value. Undefined behavior means that all your secrets might be sold to the highest bidder, your centrifuges might explode, and your computer is now full of ransomware.


> Whoa there! You mean “unspecified behavior”. int i = [unspecified] means that i has some value, but the spec doesn’t determine the value. Undefined behavior means that all your secrets might be sold to the highest bidder, your centrifuges might explode, and your computer is now full of ransomware.

That's only when using Boehm GC[0] in kernel device drivers that self-modify. Or any MSVC binary.

:-D

0 - https://www.hboehm.info/gc/


Whilst my other comment was intended to be jovial, it is hard to say if that was accurately conveyed. So this one will be serious.

The original problem definition, as specified by @pksadiq, read thusly:

> Say for example, in question 5, the statement "return i++ + ++i;" is undefined ...

This inspired a response by @Filligree of:

> Compile it, look at the assembly. You can know. The answer will vary from place to place, but it isn't non-existent.

Given the original constraint of an undefined statement result, and the suggested activity to address same, I posited that the recommended action is an exemplar of observing the product of undefined behaviour.

You then contributed:

> You mean “unspecified behavior”.

As per c-faq.com[0], there are three categories identified relating to this topic:

1 - implementation-defined: The implementation must pick some behavior; it may not fail to compile the program.

2 - unspecified: Like implementation-defined, except that the choice need not be documented.

3 - undefined: Anything at all can happen; the Standard imposes no requirements.

Whereas you imply a standards-conformant implementation of "return i++ + ++i;" is unspecified (category #2), it is, in fact, undefined (category #3). The support for this assertion is as follows.

As per the same site, Question 3.8[1] includes:

> Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.

And further states:

> ... if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written. This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification.

And concludes with an example stating:

> ... the Standard declares that it is undefined, and that portable programs simply must not use such constructs.

Therefore, the original expression presented by @pksadiq is in fact an exemplar of an undefined expression as defined by category #3 shown above. Since both it and the message to which I originally responded satisfy same, I stand by my response given to @Filligree as having had informally defined the standard C concept of "undefined behaviour."

0 - http://c-faq.com/ansi/undef.html

1 - http://c-faq.com/expr/seqpoints.html


> Whereas you imply a standards-conformant implementation of "return i++ + ++i;" is unspecified (category #2), it is, in fact, undefined (category #3).

You're misreading things. amluto's assertion is that "The answer will vary from place to place, but it isn't non-existent." is a description of category #2. That assertion is basically correct, depending on how exactly you define "place to place".

An informal definition of category #3 is "The answer can vary from place to place, or not exist at all." ideally followed by "It might crash or run unrelated code or even prevent the preceding code from running." It's flat-out wrong to say a value "isn't non-existent" when it comes to source code exhibiting undefined behavior.


> The code is undefined, but "I don't know." is still the correct answer for what happens to the variable.

Actually, as undefined behavior should not be used at all then the correct answer should be "nevermimd these examples, they are all bug-ridden".


No, it isn't. The correct answer to #2 is "According to the standard, the result is implementation defined, but on my target platform, 0". "I don't know" is the wrong answer.


The C specification does not say that undefined behaviour must give a deterministic result on a given platform. All you can say is "this one time I compiled and then ran this code, it gave 0". There is no requirement that the code compiles at all, nor that the same compiler on the same platform produces the same binary on every run, nor that the resulting binary produces the same result on every run, nor that the binary produces any result, nor that it doesn't sometimes produce a result and sometimes not, nor that the compiler doesn't sometimes produces a binary and sometimes not ... undefined behaviour is exactly that: undefined behaviour.


I'm well aware of what undefined behavior is. I still know it's undefined behavior and can read my compiler manual to answer the question of how the code behaves. "I don't know" is simply wrong.


> and can read my compiler manual to answer the question of how the code behaves.

Which is both not true (because the compiler manual usually won't define undefined behaviour) and irrelevant (because the questions were about C, not about a compiler).


In the example I choose (#2) most compilers totally specify the behavior. And the question was (right from the article) "what the return value would be?" In order for a function to return, it must be run. In order for a function to be run, it must be compiled. In order for a function to be compiled, there must be a compiler (or interpreter, I suppose).

You're being pedantic about something silly, but you're also wrong in your pedantry.


Somewhat related, my introductory classes involved a lot of games around pre- and post- increments and short circuiting. While I get that understanding these operations is fundamentally important, is understanding ridiculous combinations of them important? I mean, these were the basis of large portions of some quizzes and midterms. I get playing with them from a theoretical perspective, as this can literally be done in many languages, but why force freshmen to play this deep mental gymnastics? Maybe a play at making the classes weeder classes and no other reason.


The questions are testing whether you really understand the basic rules of the language. Often times, the best way to test whether you really get the rules is to raise them in some odd context, so that you can’t just pattern match to figure out the result.


I don't know how convoluted the questions on your midterms were but one good reason that kind of irritating thing pops up in tests is that it's quite common in real world C code. Think of the old K&R string copy example.


Have you ever worked with a pre-ANSI (K&R) C compiler? Omitting the return type for main is legal in those old compilers. Newer ones give you a warning.


Omitting the return type is perfectly conforming C89.


> I feel like the questionnaire piss off people who really knows C.

I hated this test. I’ve spent 12 years working on C targeting various flavors of arm and x86.

Just because the behavior is undefined when compiled without warnings and run on a Soviet water integrator doesn’t mean the language is undefined for the 99.995% of the industry uses.

Behavior of c89 or later with -Wall -Werror on modern clang, gcc, icc, visual studio, is well understood on arm, x86, mips, risc, ppc, Cortex-m and just about every other hardware architecture.

But, C is a pia, and I’ve been using rust instead :)


It’s not just that stuff. What pissed me off was asking about the return code of a comparator. That’s just bad form. You’re only supposed to check for zero or nonzero. I have never used the value beyond that, and if you are, that’s a problem.


You’re incorrect. The result of a comparison is guaranteed to be zero or one in C. Similarly for the exclamation-point “not” operator, and || and &&.

This isn’t a recent standardization; it’s been an explicitly specified feature of C pretty much since the very beginning of the language. See page 7 of the prehistoric https://www.bell-labs.com/usr/dmr/www/cman.pdf


I found it an amusing excrcise, if not terribly relevant, even as someone who spends 90% of his dev time in C.

What rubs me about these sorts of articles is they make some presumption about the importance and nessecisity of writing truely portable C, as if the "C Standard" were in and of itself a terribly useful tool. This is in contrast to where I live most of the time which is "GCC as an assembler macro language" (for a popular exposition on this subject see https://raphlinus.github.io/programming/rust/2018/08/17/unde...). And yeah, reading through the problem set I was critiquing it in context of my shop's standards, where we might be packing and padding, using cacheline alignment, static assertions about sizeof things, specific integer types, etc. So these sorts of articles just come off as a little pendantic to folks like me. I don't doubt they're useful for some folks, and I guess it's interesting to come up from the depths of non-standard GNU extensions and march= flags to see what I take for granted.


It's very much worth reading, Linus Torvalds' opinion of standards that's linked in that article, but I'll link it again here: https://lkml.org/lkml/2018/6/5/769

"So standards are not some kind of holy book that has to be revered. Standards too need to be questioned."

The way I see it, a lot of compiler writers are basically taking the standard as gospel and ignoring everything else "because the standard doesn't say we can't" --- and that's a huge problem, because behaviour that the standard doesn't define often has a far more common-sense meaning that programmers expect. IMHO the onus should really be on the authors of compilers to find that reasonable meaning. In fact, the standard even suggests that one possible undefined behaviour is something like "behave in a manner characteristic of the environment" (can't remember nor be bothered looking up the standard.)


This is a common misconception. Compiler authors don't exploit undefined behavior to make themselves seem smart, or because they like breaking code. They exploit undefined behavior because somebody filed a bug saying some code was slow, and exploiting UB was the simplest way--or, in many cases, the only way--to fix the performance problem.

GCC and Clang do give you the option to avoid optimizations based on undefined behavior: compile at -O0. We think of the low-level nature of C as being good for optimization, but in many cases the C language as people expect it to work is at odds with fast code.

It's fascinating to actually dive into the specific instances of undefined behavior exploitation that get the most complaints. In each such case, there is virtually always a good reason for it. For example, treating signed overflow of integers as UB is important to avoid polluting perfectly ordinary loops with movsx instructions everywhere on x86-64. It's easy to see why compiler developers added these optimizations: someone filed a bug saying "hey, why is my loop full of movsx", and the developers fixed the problem.

Edit: Should be movsx instead of movzx, sorry.


Could you go into a little bit more detail regarding the movzx? Aren't 32-bit registers always zero-extended on x86-64?


Sure. Here's an in-depth explanation from Fabian Giesen: https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759...


Thanks, rygorous is always a great read - although sometimes a little overwhelming. If I got the gist of it, I have a small correction to your comment: the issue is about movsxd (sign extended integer indexes), not movzx (zero extension).


It's easy to see why compiler developers added these optimizations: someone filed a bug saying "hey, why is my loop full of movsx", and the developers fixed the problem.

"fixed" by breaking other expectations. Regardless of what the spec says, that's still a stupid way to do things. There's a child comment below which examines this case in detail; and the real solution is to make the analysis better, not use UB as a catch-all excuse.


> compiler writers are basically taking the standard as gospel

I would be rather disappointed if they didn't, honestly.


Consider the following statements:

1) The standard says I must do this, so I must do it.

2) The standard doesn't say I must not do this (but does allow me to either do it or not do it), so it's totally OK if I do it.

I think you're thinking of cases covered by statement 1, and I think pretty much everyone agrees that compiler writers should behave that way for the standard to mean anything.

The issues arise in cases covered by statement 2. Just because the standard allows a behavior doesn't mean that the behavior is a good one. And yes, code relying on you not having the behavior is not following the standard, and that's something the authors of that code should consider addressing. But on the other hand, the standard may allow a lot of behaviors that only make sense in some situations but not others (totally true of the C standard, depending on the underlying hardware) and as a compiler writer you should think carefully about what behaviors you actually want to implement.

AS a concrete example, you _could_ write a C compiler targeting x86-64 which has sizeof(uint64_t) == 1, sizeof(unsigned int) == 1, sizeof(unsigned long) == 2, and sizeof(unsigned long long) == 2 (so 64-bit char, 64-bit short, 64-bit int, 128-bit long, 128-bit long long). Would this be a good idea? Probably not, unless you are trying to use it as a way to test for bugs in code that you will want to run on an architecture where those sizes would actually make sense...


It's a collective action problem. If we want to give up runtime performance and get stronger guarantees about what code will be understood to mean, we should revise the standard and start using new optimizers that respect it. If every compiler goes its own way, I only benefit from what they already agreed on.


GCC and many other compilers have been known to change the consequences of undefined behavior unpredictably when upgrading, changing compiler flags, etc. For some examples that matters.


Knowing what the standard says and keeping to it as much as possible is important because every now and then, a major compiler finds some exciting new way to optimise code based on undefined behaviour, and breaks code that assumed GCC would always do some seemingly obvious reasonable thing it did when the author tested it.


If you use C as an assembler macro language, you aren't actually writing C. You're likely to get burned someday, unless you compile at -O0.


> as if the "C Standard" were in and of itself a terribly useful tool

Not necessarily, I took it to mean that engineering is holistic and things like compiler behavior in the face of undefined parts of the standard are important to account for.


[flagged]


Hey, please don't add personal attacks on top of your substantive points in HN threads. It helps nothing and makes the thread nastier and evokes worse from others. Also it's against the site guidelines: https://hackertimes.com/newsguidelines.html.


Where the author goes wrong is in assuming that somehow "I don't know" can be a final answer to these things. No, it is absolutely fucking vital that you know how the compiler will pad your structures in C. Similarly to the "what size is an int" on your architecture - on an ATmega8 this is 16 bit, but the chip can't actually do all 16 bit operations in single instructions.


I took that to be the point of the article though, that just looking at the code wasn't enough to know and you needed to go further to answer these cases for your exact use case or target platform.


Further: Unless your code is compiled, deployed to a rocket, and fired off the Earth never to return, the question of “what is my platform?” is meaningless in the context of writing good C.

So, today, using the compiler installed on your system right now, sizeof(int) = 32. Great. That means nothing, and changes nothing about whether your code is correct. You should not write code relying on it. Just like you should not measure the output of the questions on this test, and declare that you know what the answers are.


>Unless your code is compiled, deployed to a rocket, and fired off the Earth never to return, the question of “what is my platform?” is meaningless in the context of writing good C.

While I feel the tone of your comparison was intended to be a bit hyberbolic, the reality is a bulk of modern C development occurs in a context similar to the one you describe. Further the thought, utterly foreign to the vast majority of software developers, that the physical machine may not be some utterly abstract and constantly mutating target which there is no hope of understanding is, imo, one of the great dying arts of software engineering - a death perpetuated by the same sort of folks who think CS education should be carried on in Java.

I contend that, these days, most C is written to target a particular compiler, physical machine, and/or device.


There is vastly more old C code than new, and it didn't target the x64 or ARM architectures it's running on now. Where it wasn't portable, that was a defect that had to be fixed.

My first job was a 4GL targeting customers running DOS on the 80286, complete with runtime linking. 100% of that work has been abandoned due to incompatibility. It contributed nothing to the profession beyond what I personally learned.


There is a Mac program BBEdit that was first written to target 68K 32 bit Macs, then PPC 32 bit Macs, then 32 bit x86 Macs and then 64 bit Macs. Probably within the next 3 years it will target ARM Macs.

The author said he never did a full scale rewrite. He slowly migrated code from one platform to the next.

Today, Apple’s code runs on both ARM and x86 and with Marzipan, as will developers code. True most will be in Objective C, but some low level code is still in C.


I hope I'm being on topic and reasonable to point out that the result of the sizeof operator is in "number of chars", not bits.


This is why, decades ago, the C world moved on, and added types like int32_t and size_t, so programmers can say what they mean.


My score is 3/5.

One immediate redflag I have noticed is using "int", "char", "short" as if they have a definite size. They don't. C standard only guarantees a minimum size. For example, many PDPs are 36-bit. Assuming the size of a variable is a common practice nowadays, but at least one should use uint8_t, int32_t, etc. from stdint.h.

But I was still tricked, it should be obvious in hindsight, 12 years of schooling led me to think: If the author was asking these questions, at least one or two questions must be answerable (even if it's technically incorrect, but you'd better to guess the original intention of a question). So I still tried to guess and got two wrong answers... Get to be careful next time...


In school, I was taught to choose the "best answer" on a test or quiz, and if you don't know something, choose the answer that looks right to you.

This test.. it reverses that entirely.


It is school that gets it wrong. In real life, knowing that you don't know something and acting accordingly is often way better than taking a guess.


I only got two out of five. One so really didn’t know and I remembered enough C to remember sequence points.


Unfortunately the "best answer" is taken to mean "the wrong answer that the teacher programmed into you".


4/5 here. In fact after the third question provided "I don't know" as an answer I started to suspect something was up — especially since the author said only one answer was the right one ... why even provide "I don't know" then, I wondered?

I knew "int" was sort of platform-dependent (was 16-bits generally when I was learning to code, later 32-bit became more typical) — so combined with that niggle and all the "I don't knows", I (correctly) reevaluated by first couple of answers.

Still, didn't realize the last one was compiler-dependent.


The third one has another implementation defined aspect. We do not know the value of a space ' ', in ascii it is 0x20(32) but that depends on the system, in EBCDIC it is 0x40(64).



Replacing "I don't know" with "this is undefined in the spec" or "this is implementation specific" would probably increase the pass rates.


“I don’t know” is not the right answer. I do know. I know the answer to be “Unspecified”.


But you _don't_ know what it returns, because the behavior is either undefined or implementation defined, and you don't know the implementation.


But you do know that it's undefined or implementation defined. It's a "known unknown".


Yes, so you knowingly click "I don't know". You don't know the value, and you know that you don't know the value.


Look, do you know what will it return or not?


Not without knowing the compiler, which I think was the point.


I answered Idk to all. After the first two, the pattern became clear and I felt like if I wrote a compiler myself, the answers could be very different.

I've worked on 16 bit C code, 32 and now 64 bit code. So I knew that the behavior was implementation and optimization dependent. :)

Ignorance is bliss in C.


I posted this in the embedded C-shop I work, under the comment that exactly in this place all should pass this test. Sadly only 1 in 5 passed the test (yours truly). Admittedly, this test was binary: either you pass or fail all of the questions (which is also sort of a give-away).

In the end this test proved to be a really valuable, because the "I don't know" drove the point home, specially for smart folks who don't like to answer any test, ever, with "I don't know".


5/5 - but I came to Hacker News to procrastinate from the C parser I've been writing.


Wait, which of your projects needs a C parser?!


A new one


Well, that's a copout. Of course, if you take absolutely any computer architecture, you can't assume simple things like sizeof(int) or data structure alignment. But if sizeof(int) is at 4 and data needs to be aligned by its own size - like on any real architecture relevant today - many of these questions have a deterministic answer. In practice, compiler bugs are a much bigger issue than architecture assumptions.


What’s the latest compiler bug that caused a serious issue for you? (Genuinely interested, no sarcasm here.)


I failed on one, number 4. I bravely assumed 16 bit integers cannot exist. Can anyone name a concrete platform/compiler where int is/was 16 bit. Or is this just a theoretical option left open by the spec?


Turbo C on MS-DOS for one. In fact 16-bit int was the norm on that platform, because the architecture didn't have 32-bit general purpose registers.

In the C89 days, you'd use 'short' in aggregates (structs and array) for values you knew wouldn't exceed 16 bits so didn't want to potentially waste space; 'long' in situations where you knew 16 bits wouldn't be enough; and 'int' the rest of the time (where 16 bits was enough, and there weren't any storage benefits to outweigh the performance benefit of using the native word size).


Why couldn't they? C had already existed for a couple of decades when 32-bit machines started getting popular. `int`, as the default integer type, usually is the size of the machine word for best performance. It would make no sense to have slow, emulated 32-bit ´int`s on a 16-bit system, never mind 8-bit ones.


Many C compilers that target 8-bit (sometimes 16-bit) machines. MOS 6502, Zilog Z80, and Motorola 6809. Modern examples include Intel 8051, AVR and PIC.


AVR, so certain Arduinos. On an Arduino Uno, sizeof(int) returns 2.


So does AVR's (ex?) competitor, PIC8. This comes from the documentation of Microchip's C18 compiler. https://i.stack.imgur.com/1uV3l.jpg


Most compilers for 8-bit and 16-bit CPUs, e.g. Z80 or 80286.


My first thought was MSDOS/Turbo C, and I found this page when looking for confirmation: http://synfare.com/599N105E/hwdocs/sizes.html


My amiga c compiler (aztex manx) allowed either 16 or 32 bit ints. All/most systems libraries used 32 bit parameters, despite this I insisted on the 16 bit version "for performance". In hindsight this was sort of insane: one missing L (say in "1L" for casting to long) meant a not so quick floppy disk reboot). :-)

Anyhow, for a computer with 16 bit wide data bus, having 16 bit ints might be justified by performance (and/or reducing memory usage.)


Digital Mars C and C++ for DOS and (early) Windows.


I’ve worked with Arduinos where int was 16 bit.


x86 for one, up until 32-bit was introduced with the 80386.


pdp-11


Sad to say I scored perfectly, due to a similar early disillusionment on embedded platforms, and years of pain porting code between 16- and 32-bit architectures when the author thought they knew the size of “int”.


At the end of the test, the author talks about automation programming for a nuclear power plant. I don’t think I could ever sleep the same at night after writing something like that.


> I don’t think I could ever sleep the same at night after writing something like that

In these situations, you likely know your hardware and know your compiler, so you can actually provide an answer for 4 of the questions. The last one is a situation where someone should tell you not to get cute in the code review.

I wrote C in telecom and finance and in both places we enforced a rule: when you define a structure, put a comment after each element that says what you think the structure offset should be, and at the end of the structure #define a constant that says what you think the size of the structure should be. In a code review, if anyone noticed something that didn't look right, you could talk about it. In testing, you could also check that sizeof(foo_s) == FOO_S_SIZE and fail if it wasn't.

In some of our code, we would test the size of various types and structures on startup and immediately exit if they weren't what we expected. We'd print type sizes to logs to help debugging if there was ever a problem. We were supporting a single code base that ran on big endian, little endian, X86, Itanium, SPARC, ARM. Compilers change, but automated tests of type and structure sizes catch things immediately.

It may sound like a lot of work, but it actually isn't at all. It also helps a lot with long-term maintainability.


> In some of our code, we would test the size of various types and structures on startup

This is one of the things that C++ has actually improved a lot recently: doing this with static_assert is much nicer in terms of catching problems early... And yes, it's great for long-term maintainability.


C has had standardized static assert since the C11 spec was released. See [1] for instance.

[1]: https://stackoverflow.com/a/7287341


I never thought I’d hear myself saying this, but that syntax is uglier than C++’s.


Excellent! I haven't followed the C standards as closely; glad they added this as well.


Neat techniques. I had done similar stuff in some protocol code that i had a chance to write.


Particularly writing it in C... It isn't a language well suited to be fully defined (see this very article for why), and no, Rust/Go aren't either. But Ada derivative or Haskell perhaps, there's some amazing tooling for safety critical systems and the languages themselves lend themselves to exposing side-effects.


> But Ada derivative or Haskell perhaps

Ada, maybe? I don't know enough about it to comment. You definitely don't want to use Haskell for that sort of work load, though, at least not directly. Laziness-by-default is precisely the sort of hard-to-reason-about logic you don't want in that sort of application.

That said, if I I had no alternative but to try and tackle this problem, I would seriously consider a strategy where I would write a Haskell program that would generate the actual program (potentially in ASM directly) for me.


I scored perfectly. I've been programming in C since 1989, on various platforms (started with the Amiga, then VAX/VMS, Linux x86, and various embedded systems.)


I emailed this to my boss telling him I got 0/5. He just setup a 1:1 meeting first thing Monday AM.


Let us know how it goes...


C programer here. I try to never think assume I know C.


If we're going to be --pedantic, shouldn't the author specify the exact standard of the C language under test? A lot of companies have varying implementations of C and perhaps some do specify some of the behavior at hand here.


Implementations of C≠C standards, though of course I'm sure some random obscure compiler has tried to call their C dialect a "standard" at some point.


The better wording would be D) not enough information to give a definitive answer. It's like the old gotchya question; what is 1+1 ? of course the answer is it depends on if you are using binary or a base of integer >2.


Yeah, or "undefined behaviour".

But that reminds me of the joke: there are 10 kinds of people: those who know binary, those who don't, and those who didn't know the '10' was written in base 3.


Or doing the whole thing mod 2.


4/5 I tricked myself into thinking I could figure out the last one....oops.


Isn't there a more fundamental flaw in these questions? main() always returns an int, whether that's 4 or 8 bytes, 0 or 1 means success or failure depends on the implementation. here's a bit of a discussion https://stackoverflow.com/questions/204476/what-should-main-...

Reminds me of the dumb exams some teachers would set to trick you when in school to make themselves feel superior.


What's wrong with it returning int here? They aren't using any variables bigger int, and the meaning of return codes is irrelevant.


Omitting the return type for main is legal in C89.


> And at this point, I only have to apologize. The test is clearly provocative and may even be a little offensive. I’m sorry if it causes any aggravation. [...] It was a research project in nuclear power plant automation, where absolutely no underspecification was tolerable.

I appreciate the apology here, and I can totally understand the concern about the spec in a safety critical environment.

Still, all questions on this test except the first are clearly examples of things you should never ever do in production code, which might undermine the message a bit? Yes, you can write bad code, and that’s true in every language I’ve ever used.

I’m guessing it would be hard to find a modern compiler on Windows, Mac or Linux that produced padding other than rounding up to nearest 4 bytes?

Sizeof(a+b) is obviously a weird thing to do.

char a = ‘ ‘ * 13 produces an overflow warning in gcc.

(((((i >= i) << i) >> i) <= i)) I hope nobody really did that.

return i++ + ++i; Not doing exactly this was drilled into us in CS 101. Still, I’d be interested to hear about a compiler that doesn’t return 1, since many people rely on the fact that ++i is pre-increment and i++ is post-increment. I don’t doubt one is out there, I’m curious to know which.

There probably weren’t much better choices 20 years ago... what would be the best choices today for a branch new nuclear power plant?


Still, I’d be interested to hear about a compiler that doesn’t return 1

gcc (Ubuntu 8.3.0-6ubuntu1~18.10.1) 8.3.0 returns 2 (executes from left to right... at least the first time I ran it).


Wow, you’re right. Me too on ubu 16. Okay, I guess it’s not about pre or post increment. Maybe I should read the spec... ;) And good reason not to do this in code!


But a code based on several standards, not only on C standard. For example we know, that basic source character sets should contain space (C11 5.2.1), we know that character constants has type "int" and that character constants represent value equal to the code of a symbol (C11 5.4.4). We know used source character set and we know code of the space character. We can configure specific source character set on the POSIX compatible systems. We know that return statement in the main function equivalent to the call to exit function (C11 5.1.2.2.3), we know that only 8 least significant bits will be used from returned value (POSIX.1-2001 definition of "exit"), we know that INT_MAX should be at least 32767 (C11 5.2.4.2.1) so we sure that the result that we got from "return" statement in the "main" function is positive integer from 0 to 512. Finally if we configure source character set to be sure that ' ' has code 32 we know for sure that we got value 416 in the specified example. So we know for sure answer on question 3 based on the C11, POSIX.1-2017, and ISO 646 standards.


I don't know if POSIX specifies CHAR_MAX, but on most systems, storing the value into a char will change the value, making it everything but 416: https://tio.run/##S0oszvifnFiiYFeSWlyil6xgY@Pq7/Y/M69EITcxM0...


My mistake: We have (32*13) with minimal possible CHAR_BIT is 8. So it either 416 for char bigger then eight bits, or 160 for unsigned eight bits char, or -96 for signed eight bits char. Then it extended to the signed integer value (one of this three values), and then we got result as (int)(status & 0377). For all three cases result will be 160.


This quiz wasn't illuminating at all. You generally start with assuming and validating a "C Datatype Model" i.e. ILP32/LP64 etc. for your system. Once you know that, these questions are easily answered.

If you want to see some real tricky C code, see some of the articles here: https://locklessinc.com/articles/


OK you tricked me. But the test would have been better if the answer was "undefined behavior" instead of "I don't know".


Not all of them are undefined behavior


Sure, but "I don't know" makes it seem like you're just saying "pass", which, when you finish the test, is obviously not what it represents.


The one that got me is this:

uint32_t foo = 1; foo <<= 32;

Here, I figured that foo would always be 0. Wrong. It was always 0 with GCC, but this is undefined in the spec and code like this can have a different value in clang. I actually had to make a security update to my little open source project because of this (although the code I wrote did not manifest the bug in an insecure way, even with clang).


I don't know any c, so I answered "I don't know" on all the questions. Turns out I'm actually a C genius.


> Eventually, I had to learn to rely on the standard instead of folklore; to trust measurements and not presumptions...

Indeed, testing your assumptions, because even a defined standard may result in a differing implementation of it. Especially in critical applications, testing the expectations gives some sense of a defined behavior.

This quizz is amusing as a mental exercise and a parable, but in reality all of these cases had to be fleshed out on a real platform, with real compiler and ... specified expectations of the behavior.

None of the cases in fact communicate a clear intent, except maybe #1 to figure out the padded size, still it's somewhat open-ended. Perhaps returning a specific condition (return sizeof(struct ...)==5; ) would show a clear intent. Not that it would change the right answer, just such a case may indeed be true on a specific platform, compile flags erc.


Just to flesh this out, I put the puzzle code through gcc 6.3, on ideone

https://ideone.com/PziSzq

Result:

1:8 2:0 3:-96 4:1 5:2

Sure, there's a bunch of warnings emitted... fortunately.


Pro tip: avoid `int`s as much as possible in your C code. Use <stdint.h> defined macros instead.


There is a time and place for int32_t.


> There is a time and place for int32_t.

And time_t[0] is not one.

;-)

0 - https://pubs.opengroup.org/onlinepubs/9699919799/functions/t...


I guess that joke was hard to C.


I submitted a GitHub issue for the `main()` definitions and some other quibbles.

https://github.com/akalenuk/wordsandbuttons/issues/12


It's been corrected in the repo, but as I write this the web page has not been updated.


UB nasal demons etc etc.

But often we encounter UB in code that's already shipped. So it's good to have an intuition about what machine code was actually emitted, for example when deciding if a crash report is due to this particular UB, or not.


Or you could just look at the binary you shipped: by the time it’s compiled, you don’t really have to deal with undefined behavior.


5/5, but I don't think this test is very good at capturing the more obscure features of C, they all just deal with the fact that platforms have different datatypes/alignment requirements, except for the last one. I think a better example would be the following:

  int a=1, b=2, i, j;
  i = a += 2, a + b;
  j = (a += 2, a + b);
Whats the value of a, b, i, j? Hint: i and j are different.

Which begs the question, why does C have those features in the first place? The only C code where it's reasonably common seems to be in crypto algorithms.


5 and 7? If I remember the behavior of the comma operator correctly.

Anyway, I am inclined to agree that these are misfeatures. I’d almost certainly ask for it to be changed in code review.


I knew there was something up because long ago when developing for Arduino boards as part of a course, my mentor educated me on the difference of size of datatypes across different architectures.


I scored 2, but mostly out of luck for being 100% sure only about the 1st question, as I encountered alignment problems a lot of times in the past when dealing with structures to be sent through the network, and also between different architectures (and sometimes endianess too). If memory serves, there are #pragma directives to force the compiler to align structure members to a given interval, but they're compiler dependent and would make a non portable piece of code even less portable.


It is curious because, in each of the questions I encountered problems with the own questionnaire and I tended to answer in the way the writer thought.

And then you find "it's a trap".

I think the questionnaire is not honest enought, a better answer for D should have been: "we need more information" or "there are programming inconsistences"...

I thought that when selecting "I don't know" I was telling "I don't know what's happening with the code and the inner datails".


I managed to answer all of the questions correctly; I recognized them as undefined. Still, there may be enough definition on the set of computers the program will run on, to do, in some cases (for example, that char is at least 8-bits, and/or that it is ASCII).

But the thing that would be better to do, in my opinion, may be like having LLVM with macros (including standard macros for dealing with differences of systems, and user-defined macros for your own use).


There is a difference between "I don't know" and "undefined behaviour".

I know that most of this stuff is undefined according to the spec. But I (might) also know what my particular GCC version does in these cases.

What I know for sure, that those programs do output something, and it's not "I don't know" (the string) ; )

What I don't know, is what level of sophistication the author assumes.


Funny. My first thought of question one is we'd need to make assumptions about which architecture we're working on to know this answer. By the time I got to question 3, I realized the author's trend. This is both the curse and blessing of C, a language that gives you just barely a high level translation layer over the raw silicon.


It is not difficult to overcome these limitations of C by typedef definitive types like signed int32, unsigned int8 and so on. Many embedded C .h have that as a standard way of clearing things up. of course you can always sizeof(int) or whatever. (BTW this quiz or one like has been around a long time, but still a good reminder).


The author's explanations of the first three answers aren't sufficient. There is no requirement within C for `int` and `char` to be different sizes. Similarly, you don't know what the resolution of ' '*13 will be. It's architecture dependent.

C, for all it's simplicity, is a relatively complex language.


At least for the third question (char a = ' ' * 13;) XCode warns "Implicit conversion from 'int' to 'char' changes value from 416 to -96".

And for the fifth (return i++ + ++i;) it warns "Multiple unsequenced modifications to 'i'".

I would not skip those warnings.


There is a minor error in the explanation for the third one: the minimum allowed value for CHAR_BIT in C is 8 (it does not affect the result, principally because the value of ' ' could be anything in the range of char).


That's the one where I wanted most to quibble with the "explanation" of why there is no set answer: To me, the top of the line answer is that the value of the string constant ' ' as an integer is implementation-defined (or maybe it depends on the execution environment? See, I'll get the precise wording wong too); anyway, space is 32 in ASCII but 64 in EBCDIC. The most you could say is that it's not zero (and maybe that it's not -1? I'd have to check how EOF is defined)..


My friend doesn't know how to code and got the whole questionnaire correct.


Can't even make the test. Which version of C? Which platform? Under DOS in the 90's, the answer to the first question would have been 3, it's not even proposed in the options.


Each of the questions has exactly one answer that is correct regardless of version and platform.


It would be correct to answer that I don't have enough information to choose an option. It's incorrect to answer that I don't know.


No, but there are (at least) two interpretations of ”I don’t know.” It’s a nice little exercise in epistemology.


This must be the first time my lack of knowledge gave me full score.


This is the language we still choose to trust our credit cards with.


I hope you were joking. This is just a "gotcha" quiz and not relevant at all.


I'm almost not. I made the point because this level of ambiguity and "do it yourself" is consistent throughout the language.

I know why we still use C, but the use of C is inherently prone to security problems.

C does not provide bounds checking by default, so it can be forgotten (Heartbleed) and the lack of either static checking, RAII or garbage collection (Not as a library e.g. Boehm) makes memory corruption all but inevitable.


People are forgetting that it is the very "looseness" of C that is responsible for its great success. The sheer volume of code in C (specifically, any number of complex and critical software) is a testament to that. People keep parroting the same old tired tropes about C without reflection and thought. All the problems, both real and imaginary, in the language have been worked through/around since the beginning by simple discipline, guidelines and external libraries. I am always annoyed when people bring up "memory corruption" as if it were some primordial sin. The power to manipulate raw memory in whatever way i want is so crucial that i am willing to live with the downside of possible corruption. In fact most of the people i have worked with and myself never found this to be so much of a problem as everybody else makes it out to be. We always followed good guidelines, had special libraries for memory allocation as needed and testing procedures to catch memory leaks. Everything worked out fine.

In conclusion, the power given to Programmers by C far outweighs any of its perceived downsides in real-world scenarios.


This is fine for your or my software but the risk of these bugs no matter how rare is too great for mass deployed code in something similar to OpenSSL.

Any good alternative still allows you manipulate raw memory, but provide a safe alternative which makes it much harder to fuck up.

What power do I actually lose by using a safer language?


The OpenSSL "Heartbleed" bug that you bring-up is not related to inherent failures of the C language but something else. Just as an aside, i actually have some background in implementation of security protocols (specifically IPSec framework) and FIPS certification for a cryptographic algorithms library, though by no means am i an expert. In the security community many people believe that "Heartbleed" was an intentional plant. See https://www.smh.com.au/technology/man-who-introduced-serious... OpenSSL is such a heavily used and vetted piece of software that the probability of this being an "accidental bug" is very very low and my money is on it having been deliberately inserted i.e. deliberately used C language features towards a nefarious goal. So this is not a good example to bring up.

Now coming to your other point, in today's environment, it is true that you do not lose much for the most part when using a safer language because somebody else has done the dirty work in the implementation of the corresponding language's runtimes, compilers, libraries and ABIs. Without the latter you cannot have the former. After all at some point you have to move out of the cocoon provided by the language and meet real hardware (a good example is bare-metal programming on MCUs). And that is where C is needed and any challengers have to provide exactly similar "ugly, dangerous and unsafe" features if they want to dethrone the champ.


4/5, forgot that integer promotion is more complicated than what intuition would tell you.

To be honest if was pretty obvious what the right answer was, but tried to answer honestly anyway ;)


Yeah I assumed short would be at least as big as a char and would this be comparing size of short against short. Didn’t realize it would get promoted to int.


I took it as "What will this do on your compiler?"


Agh, 0/5! They were correct in the context of MSVC tho.


Went to an IRC chat room when I was learning C in school. Asked if you could return a pointer to something that lives on the stack. Was talked down by an all-knowing dude telling me to go read K&R again. Proceeded to write a code sample [1] that showed it is possible (it's not really stable but works reliably in recursive calls IIRC).

I do not like this attitude (then again it was just one random dude).

[1] https://www.onlinegdb.com/HyO5VXRxS


> it is possible (it’s not really stable but works reliably in recursive calls IIRC). [...] I do not like this attitude

You might want to listen. You’re getting the K&R comment and the downvotes because this does not work, ever. It’s a really, really bad idea. In recursive calls, it might not crash right away, but you will have bad data, the memory at the pointer address will have been overwritten by the next stack frame that’s placed there.

Don’t ever return pointers to local memory because the memory is “gone” and unsafe to use the moment your function returns. Even if you try it and think it works, it can and probably will crash or run incorrectly in any other scenario - different person, different computer, different compiler, different day...

Your comments about getting a warning and ‘However if you wrap the local’s address... it “works”’ should be clues. The warning is the compiler telling you not to do it. The workaround doesn’t work, it only compiles. By using aliasing, you’re only tricking the compiler into not warning you, but the warning is there for a reason.


Listening to what ? To the dude that tells me that's not possible and proceeds to dump a big pile of authority on top of my head or to my own experiment that tells me another story ?

I would have preferred to be told:

- yes and no. You'll get warnings if you try to return a pointer to a local, however, doing this and that, you can manage to do it.

- but once you have achieved that, the result will be dependent on the way the stack is handled (not really in your control). You'll feel some comfort doing this in recursive calls, however beware of signal.h.

But this isn't the answer I received. I guess C programmers do not know the difference between what you can do (however risky) and what you shouldn't do. Also when someone asks such "weird" questions, do not assume he's a beginner with no notion of what constructs he can handle safely, maybe he's someone trying to find the limits of C – and once these limits are identified it can be a good conversation starter about C's internal and the way various compilers differ.

Edit: also downvotes on HN are not like downvotes on Reddit: there's actually a limit (-2 ?). Below this the comment disappears. Conclusion: only downvote when the comment engages in antisocial behavior (not respecting the rules or common human decency, etc ...), not when you disagree with it. I always upvote an unfairly downvoted comment for these reasons.


I was trying to help by explaining it, instead of saying go read K&R, but I don’t get the feeling you really heard or understood me. There is no other story. There is no yes and no. There is only no. You cannot manage to do it. It does not work to return local memory from a function, ever, period. Once you return, it is 100% unsafe to try to use the memory from your previous stack. There is absolute zero comfort in recursive calls.

You are mistaking some luck in having it not crash once for thinking that it’s okay in some situations. It’s not okay under any circumstances. That’s what makes this even more dangerous. Your program could crash at any time. It might run a thousand times and then suddenly start crashing. It might always run for you, and then crash on other people. But just because it runs once without crashing doesn’t mean it’s working.

A signal is not the only way your function’s stack can get stomped on the very next instruction after you return. Other processes and other threads can do it, the memory system can relocate your program or another one into your previous memory space. Recursive calls are guaranteed to stomp on previous stack frames when your recursion depth decreases and then increases, the previous stack will be overwritten.

Returning a pointer to a local stack frame is always incorrect. It’s not risky, it’s wrong.

BTW: you have the ability to see comments below the downvote limit, go to your profile settings and turn on showdead.

I didn’t downvote you, if that’s why you were trying to explain voting behavior to me, but you will find on HN that downvotes and upvotes both happen for a wide variety of reasons, and are not limited to either whether people agree, nor whether the comments are polite. Downvotes are often cast for comments that break site guidelines, for example just failing to assume good faith can get you downvoted. So can making blanket generalizations about a group of people, like the above “I guess C programmers do not know the difference...”. See the comments section here: https://hackertimes.com/newsguidelines.html

I sometimes upvote what appear to be unfairly downvoted comments to me. I usually upvote people who read and respond to me, regardless of whether I agree with them.


    - Java: no
    - Ruby: no
    - PHP:  no
    - C:    yes and no


?? I don’t understand what you mean. Those other languages don’t have pointers, they only have references, but what do they have to do with this?

Why do you still think there’s some yes in C? It’s not making sense yet that your memory is gone after you return? Returning a pointer to a local variable is exactly the same as calling delete or free on a pointer and then reading from it. You officially don’t own the memory after a return statement, so if you try to use it, then what happens is indeterminate. Again, since it doesn’t seem to be sinking in: it is always wrong to return a pointer to local memory. But, if you really really don’t want to listen, and you’re sure it works sometimes, then I say go for it!


Yes.

Signal handlers allow C programs to respond to events outside of the normal control flow (see signal.h, etc.). This means that once a function, say fnc1, has returned, the memory on the stack that was used by fnc1 can end up being reused at any point in time. A signal, perhaps generated completely asynchronous to the program itself by a different process, causes a stack frame to be allocated (possibly on top of fnc1’s old stack frame) for use by the corresponding signal handler. This could happen at any time, even before fnc1’s caller gets a chance to use the pointer returned by fnc1.


Thanks, that was interesting.


While I got my 5/5, I think the answers should have been “I seriously don’t want to know”.


Don't we need to know the sizeof int and void* on this machine?

[edit - that seems to be the point sigh]


tl;dr - C has some undefined behavior, if you plan on things working based on your experience with one compiler and one computer, you will be surprised.


I wonder how many C programmers have experience with writing multi platform/compiler code. I have done a lot of C and C++ but never had to port it so I would not be surprised if I had made a ton of mistakes.


I failed the first couple, rapidly getting into I don't know.

Answered the rest, IDK.

Laughed! Great exercize.


I got #2 wrong, and definitely should have known about integer promotion.


What are the best strategies to cope with the undefined areas in C?


The best one is to not write C. The second best is probably to read the standard papers, be very careful (perhaps using a secure style guideline) and making generous use of tooling to help you catch mistakes.


Either don't use C, or use C and avoid UB with Sanitizers and skill (i.e. using a typedef'd that you guarantee that the size of x is ...)


Could someone explain why in question 5

i = 0; return i++ + ++i;

Could produce different results?


There's explanations at the bottom after you click the button.


I read it but I still don’t understand. Isn’t it 0 + 2 vs 1 + 1? How would it produce something other than 2?


Imagine a dumb piece of hardware storing your variables. Two pieces of a statement try to do conflicting things to the same variable at the same time. This can cause the data to get corrupted, or the entire chip to have a fault. The C standard allows an implementation like this.


So for the left-to-right operand evaluation order I thought it’s this: look at the left operand, take zero. Then post-increment, now it’s one. Move on to the right operand. Pre-increment what is 1, so now it’s 2. Take 2. So it 0 + 2?


How do you get 0 + 2?

I could imagine the post-increment happening after the sum: giving 0 + 1…


That’s what I imagined, and what I get with Cygwin gcc, the result is 1. I thought that the post-increment was supposed to always only happen after the “statement”, but I was wrong. Other compilers, like gcc on Ubuntu return 2.

This looks like a good explanation of why both are correct and why it’s confusing: https://stackoverflow.com/a/4445841


My answer to a lot of these questions is "If you write code like this and check it in to our corporate repository, I will cut out your heart and make you eat it."


Don't forget to include whoever reviewed it.


LOL, yeah, multiplying the space character by 13. I gritted my teeth when I saw that.


Well there may be some use for the code in question 1.


1/5. I don’t know C.


By the third question it was obvious that all of them are implementation-dependent or undefined behaviour.


In practice, things in C are not as undefined as the ISO working group specifies them. It is virtually inconceivable that a mainstream compiler stack would do anything other than what you'd expect with example four. As for struct alignment, that's something that most C programmers should know is implementation-defined (which is one of the reasons we even have sizeof to begin with, apart from the mere convenience of it).

Sure, you've made your point, but you've made it in a ham-fisted way which doesn't really help people understand why a given undefined or implementation-defined behaviour is the way it is, and what things they should verify about the implementation in order to predict where their code will not work.


> It is virtually inconceivable that a mainstream compiler stack would do anything other than what you'd expect with example four.

I disagree.

I can easily conceive of a (compiler, architecture, compiler options) tuple that simply crashes with an error at compile time or runtime with that code. Namely, some compiler for a 16-bit architecture with "sanitization" options enabled and optimizations disabled.

Integer overflow is one of the easiest "undefined behavior" cases to identify with mechanical checks. Much easier than bounds checking for example, where a general solution is quite tricky.


Sure, but portable C programs are not written for systems with 16-bit machine words practically ever anymore. Nobody's expecting to run libopus 1.3.1 unmodified on an 8051 (even if it might well do, it probably uses stdint.h anyway!). Furthermore, I've used toolchains for machines with 16-bit machine words, which made int 32 bits; surely this isn't uncommon.


I think this objection boils down to your perspective on what we mean by "C".

It's reasonable for some folks (especially working programmers who need to "get stuff done") to think that "what a reasonable compiler in their problem domain" would do is what "C" means. It's equally reasonable for other folks (especially compiler writers, verification experts, researchers, etc.) to think the ISO standard is what "C" means.

It would be great for the standard to be more "reasonable" and have less undefined behavior. But I, for one, cannot think of a more horrible, thankless chore than actually trying to make that happen. So much code is written in "C", and there are so many compilers and platforms, modern and legacy, that "C" runs on, each with their own notion of "reasonable," that it will take an incredible amount of work.


C compilers predate the C standard by many years, so “C is what my compiler does” is certainly a valid perspective.

Also, the vast majority of programming languages don’t have standards at all, and people still use them productively.


Examples of languages with a standards document, describing language semantics and standard library, not necessarily an ISO one.

Java, JavaScript, Modula-2, Pascal, C++, Fortran, C#, F#, VB.NET, Eiffel, D, Ada, Common Lisp, Scheme, Python, Scala, Haskell.


> It would be great for the standard to be more "reasonable" and have less undefined behavior.

GCC and Clang are mostly compatible, and as far as the low-hanging fruit is concerned. If you consider them the authority, it generally resolves most interesting questions about what ISO decline to specify. I do not think that there is any great burning need for ISO to go and define things more rigorously.


There are other compilers.


j = 1

k = 2

i = ++j+++++k++

i = ?

j = ?

k = ?


Failed all of them. I totally agree with what the author is writing. Don’t presume anything but measure. I also tend to code in a way that it’s not necessary to know all the intricacies. Its not as clever as many like and often makes for longer code but is usually easier to read.


> Don’t presume anything but measure.

I think that's exactly the wrong takeaway here. Most of these have a well-defined result on a given platform (host + abi). You can measure that result. But it'll be different on a different platform. And the others don't have a well-defined result — a real-world compiler will produce a result, and you can measure that, but it might be different tomorrow. Unless you know the difference between platform-specific behavior and undefined behavior, you don't know which ones to avoid.


Rather: don't presume anything, read the language specification for the version you are writing (C89, C99, etc).

"Measuring" is exactly the wrong thing to do because often it is indicative of only your specific architecture/compiler.


The real answer is, all 5 questions don't compile so they don't return anything.


What compiler are you using?


The tone is insufferable, the test is beyond useless.


> And at this point, I only have to apologize. The test is clearly provocative and may even be a little offensive. I’m sorry if it causes any aggravation.


What a lot of people don’t get is that it’s not pointers or manual memory management or even lack of language level support for “modern features” like object oriented programming, exceptions etc. that make C a pain to use. No it is the undefined and implementation dependent behaviours. There are simply so many of them that even experienced C programmers may, at times, run into trouble.

C is essentially a portable assembler. It’s not enough to learn the language, you need to have deep understanding of the underlying hardware and compiler infrastructure.


It is an unsavory set of questions that has no bearing on practical work, also implementation dependent.

The answer is: No, I do not know C or any other language for that matter. I know some implementations of various languages just enough to write sound and readable code. And as I forced to use a bit more languages than I like I rely on local/Internet search to keep my brains concentrated on accomplishing the actual task rather then effing them up trying to figure out some esoteric constructs.


The questions are a bit cute, but they’re not testing obscure corner cases of the language. It’s not like asking if you have the trigraphs memorized. It’s testing fundamental rules about how C works: overflow, integer promotion, order of operations, memory layout, etc.


They're all obscure in a "I wouldn't write code like that or let it go through review" sense. They're very synthetic cases.


But would you write code that adds integers of two different types?Then you need to know the integer promotion rules. And this is just us just a way of testing that.


Of course it has a bearing on practical work, and that it’s implementation-dependent is the whole point. These are mistaken assumptions people make in the real world.


i didn't do the quiz, but just looking at the first question I have run across structure packing/padding issues so its not entirely useless


This is a perfect example of what I hate about some tests.

Didn't quite know the purpose of the test, there's a difference between code for any machine and any compiler and gcc running on some vanilla x86, which is pretty common, and could have been the content (ex: you say it's undefined, that's obvious, everyone knows that already, but it's still deterministic... Here's a breakdown of what happens in practice, blah blah blah). There's a real difference between the kind of "knowing" here and say the kind of "knowing" with unallocated pointers.

If it said "esoteric implementation on exotic hardware" then it's easy, you know what they are trying to do. If it said C89 you also know what's up. But how it's presented, it's a guess.

This was endemic throughout schooling. Instructors would say "just do your best" and I'd be like "wtf? There's like 2,3, maybe 4 perspectives on this with different answers depending on how clever you're trying to be or what you're trying to get at... Might as well put "I'm thinking of a number 1 through 5" on the exam".


You can easily get bitten by at least one of those examples just by compiling your application written on x86 to ARM to get it running on Android, so not sure if it's as esoteric as you think.


Or on different compilers, or the same compiler with different options, etc.


Or a new release of the same compiler.


But the test doesn't ask about C as compiled by X compiler on Y architecture. It just asks about C.

(You also seem to be assuming GCC is more predictable about undefined behaviour than it actually is.)


> but it's still deterministic

That's an assumption though. If it's undefined, the result may even depend on the order of some internal hash table which is affected by other code.


>but it's still deterministic

Undefined behavior means the standard says demons can fly out of your nose. Those demons might behave nondeterministically.


On a typical modern arch, there are consistent and reasonable answers:

1. 8 (https://godbolt.org/z/Udcuj0)

2. 0 (https://godbolt.org/z/kZz1YQ)

3. -96 (https://godbolt.org/z/1Xq_Oo)

4. 1 (https://godbolt.org/z/MIo0s_)

5. 2 (https://godbolt.org/z/J0dsVW)

I agree, you can argue that it's unspecified, undefined, or whatever. It might not be well defined by the C specification, but none of these programs produce surprising output. Programming in the real world requires that you are able to read and write code like this, even if it requires that you investigate (and depend) the specific behaviour of your compiler/platform.


No, for the cases that involve undefined behavior, the results can differ arbitrarily depending on compiler settings, optimization settings, the presence of seemingly irrelevant code, or, in principle, the phase of the moon.

The behavior of any program that evaluates `i++ + ++i` is undefined. The solution is not to find out how it happens to behave in some circumstances. It's to find clearer code that expresses whatever the original intent was.


Sorry, but it’s this kind of attitude that leads to impossible-to-port code, platform lock-in, subtle bugs when tool chains change, and worst case: buffer overflows and outages. These programs may today produce “well defined” output on your favorite systems, but that doesn’t change the fact that it is invoking various flavors of undefined, implementation-defined behavior. It’s not safe C just because it works for me.

It’s an unfortunate truth that programming in the real world involves programmers who dare to explore these corners of C and claim to have answers to these questions. Stay away! Knowing C means knowing what is not defined as much as knowing what is.


You have to know what the standard allows because every optimizer change tries to be more aggressive without violating it. People have been bitten by error checks whose object code was elided because the error "can't happen". You have to decide whether you need code that will always work, or code that seemed to work for a while.


Switch the compiler to ARM, which is as "typical modern arch" as it gets, and see for yourself.


I don't think it's surprising when you switch machine architecture and/or word sizes that you get different results. In fact for me, that's completely normal and to be expected.


I sure hope you're not writing the libraries that I use in C/C++ :)


> I don't think it's surprising when you switch machine architecture and/or word sizes that you get different results.

I'd like these things to be listed and marked "implementation defined", personally.


Consistent as in works for me and I ran it twice!, not consistent as in guaranteed to continue working after sudo apt upgrade gcc. Undefined behaviour can and will be used to make assumptions for optimisations that will bite you.


> produce well defined output.

Perhaps “produce consistent results” would be more appropriate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: