Having written a conforming C compiler, at one point I knew everything there was to know about C (I forget details now and then, or confusing them with C++ and D).
But knowing every engineering detail is not the same thing as knowing how to program in C effectively. It's like being the engineer who designs a Grand Prix car. It does not mean you can drive it faster around the track than anyone else. Not even close.
For example, the C preprocessor is surprisingly complicated. I had to scrap it and rewrite it completely 3 times. If you try to make use of all those oddities, my advice is don't waste your time. Over time I removed all the C preprocessor tricks from my own code and just wrote ordinary C in its place. Much better.
I couldn't agree more. After spending many years working with LLVM, which is at its heart a C compiler, and understanding why it has to do the sometimes-terrifying things it has to do to get C to run well, I've become very paranoid when writing C or C++. My C/C++ code is as boring as possible.
(In fact I try to avoid writing C or C++ whenever possible these days; undefined behavior in the language is too pernicious and unfixable without breaking compatibility. I think both languages are approaching obsolescence.)
One advantage to being an older programmer is I don't feel any need to show off any more. I try to make it so obvious that anyone would look at it and think that's so simple, anyone could do it.
It's surprisingly hard to write simple code. Any idjit can come up with Rube Goldberg code.
Are you thinking more of JS or another functional scripting language than C/C++? C doesn’t have map, so it’s not an option, and in C++ it’s called something else.
In JS, I so wish that I could switch to functional constructs like map permanently, but map and foreach are much slower than loops, an order of magnitude or more for tight loops. I’m still forced to use loops in performance critical code, even if I consider map a better choice.
Coming back to C++ after having been in JS land for 5 years, C++ feels constantly difficult to use, and all the names for the functional primitives don’t seem to make intuitive sense like they do in JS.
I just learned this about JS's map/foreach a few weeks ago. I didn't think it was possible to be more disappointed by JS than I already was, but somehow I managed it.
I've used loops so much I don't even see the loop anymore as a collection of constructs, I see it as a single thing. There are a lot of easy mistakes to make with loops, but I don't make them anymore (of course, by writing that, I will make one!). For example:
#include <stdbool.h>
#include <stdio.h>
typedef long T;
bool find(T *array, size_t dim, T t) {
int i;
for (i = 0; i <= dim; i++);
{
int v = array[i];
if (v == t)
return true;
}
}
> I don’t think I’ve ever written one correctly on first try.
Please don't take offense, but this is odd to me. I honestly severely doubt I am some sort of programming super genius, but I have never had any issues setting looping logic correctly. (I took four years to teach myself programming & CS and now I've been at my first professional dev job for ~6 months.) None of my colleagues seem to have such issues either. What are you experiencing trouble with most? Off-by-one?
For those who may have trouble with loops, take heart: after 45 years of non-stop programming, often 10+ hours a day, I still find myself sometimes doing mental loop simulations with small (few element) data to make sure a loop is correct.
It's usually much easier to take extra time to make sure it's right than to debug it later.
I mean, I've written at most a couple dozen in my 7 years as a professional programmer (usually tight loops for perf), so sheer unfamiliarity is a big factor.
But yeah, syntax (what order the arguments go in) and off-by-one issues are the majority, I think. Plus figuring out what my initial accumulator needs to be.
Idk, map/filter and friends are just a much more direct mapping of how I think of programming.
Ah, I see what you mean. Yeah, that’s always awkward
Not sure how you deal with that with for loops either. Increment the iteration var in the body of the loop? (Seems scary to me, but like I said, I’ve got terrible intuition with them)
var ie1 = foo.GetEnumerator();
var ie2 = bar.GetEnumerator();
while(true)
{
var has1 = ie1.MoveNext();
var has2 = ie2.MoveNext();
if (!has1 && !has2)
break;
if (has1)
// do something with ie1.Current
if (has2)
// do something with ie2.Current
}
C++ only when Java or .NET need it's assistance, or integration with OS APIs that require C++ (NDK, WinUI, DX).
Then C only when there is no alternative (customer wants it, we only do C here, required lib is C only e.g. SDL, ...).
It is an herculean project, but maybe some day LLVM could be rewritten into something else. After all it isn't the first compiler stack, just the one that got most famous.
Yes they are, but in what concerns Treble and platform libraries, they plan to keep their Java 8 variant around.
Apparently they are also adding support for desugaring Java 10 language features (yep 10 not 12) that don't rely on new JVM bytecodes (as per Google IO talk about state of Android tooling).
Rust? I know a lot of people evangelise rust — to the point of annoyance of others — but as we move into an era where the entire world is run on computers, it is just not acceptable to have decades old infrastructure susceptible to bugs often caused by someone not understanding undefined or implementation dependent behaviour.
Cargo feels like too bloated to me for it to be something suitable for something low level like embedded. Unless the rust team can make it more appealing to use the language without cargo, I don't think there's much future.
We use rust in embedded and love cargo! Coming from C++ and the endless mess of build systems that exist there, cargo is a breath of fresh air! What don't you like about it?
The Oberon system has drivers in, well, Oberon (which is a high-level Pascal successor with garbage collection). Low-level memory access is done through magical peek/poke functions, in which all the dirtyness is concentrated (and these might not be allowed in user programs - not sure). This means the language as a whole is not littered with unsafe pointers just to service the tiny subset of programs that need them.
But yes, Rust, or even in userspace, as newer and/or more microkernel-ish OS's allow for. Apple is doing work to allow drivers to be written in Swift...
Some device classes (not to mix with OOP ones) can only be programmed with C++, while others can be developed in any compiled language able to link to the OS APIs.
I watched it, and they're pretty clear that driver extensions (using DriverKit, like I mentioned) must be written in C or C++. System extensions can use any language.
And that's why people claiming that C++ is more complicated than C because it has an even bigger specification miss the point.
What counts is how easy to use in practice. You can get along just fine in C++ without knowing the exact aliasing rules from C or or how to specialize a template.
What matters is that the extra features of C++ makes actual programming simpler, not harder. (for example destructor (RAII), standard library, classes, ...)
> You can get along just fine in C++ without knowing the exact aliasing rules from C
Nope, you can't. These are exactly the things that introduce undefined behavior (i.e. total breakage) if you aren't very careful about what you're doing at all times. Don't take my word for it, check out what the C++ designers themselves state about the issue in the C++ Core Guidelines. C/C++ is far from simple, and thinking that you can just make things up as you go along is a serious mistake.
You are absolutely right, but my point is that when you do modern C++, in the application code, it is very unlikely that you need to use reinterpret_cast in your code, and therefore you don't need to know all the subtlety about it.
So despite C++ being more complex than C, if you limit yourself to some practical subset, it is actually easier than C.
I think it does sometimes? At least for enterprise software.
We use both C and C++ embedded as well as C++/Qt for the desktop control system, and we have no issue keeping to a sane subset of C++.
There are clearly defined rules and code review does the remaining enforcement.
And it's not even hard or time consuming as everyone is pretty much aligned and every small issue can be easily resolved with a quick chat.
You can limit yourself by choice to certain areas of the language that you know inside out (you know the asm they produce, etc.) and use it to solve problems. Don't worry about every single corner case. As you said, you can easily forget those things, especially if it's not your day-to-day job.
The same idea and principle can be found in "JavaScript - The Good Parts".
Having recently been doing some web development, I'd argue that JavaScript's design (even today, despite some modern additions) is so conducive to undebuggable spaghetti code (i.e. this ) that there really are no good parts.
Sometimes the technology is objectively the wrong choice i.e. compiling javascript to native code (Not a JIT).
> If my macros produce standards-compliant code and they make my code easier to read and understand, why shouldn't I use them?
The problem is they don't make code easier to read and understand. Worse, the unhygienic nature of C macros makes it hard to contain them.
I haven't seen your code, so I'm speaking based on what I've seen of mine and others' code. If you dial it back, the person who has to deal with your code after you leave will appreciate it.
More generally speaking, if you're doing metaprogramming with the C macro system, you've outgrown the language and should consider a more powerful one.
I once worked at a place that had a platform specific "DEBUG" log macro.
It worked something like: DEBUG(msg); Except, it was defined in such a way that you actually had to have two closing parens, like DEBUG(msg)); It looked syntactically invalid, but whatever the macro did required it.
The entire code base was littered with WTFs like that...
I hated debug macros. There just was never a clean way to write them. I was determined that D would not suffer from that problem. `debug` is a keyword in D, and you can do things like:
debug printf("I got here\n");
and the printf only gets compiled in when compiling with -debug. (Any statement can be used after the printf.) Even better, semantic checks for debug statements are relaxed - for example, purity is not checked for them.
Meaning you can embed debug printf's in functions marked 'pure', instead of having to use a monad.
Because when you work on a team, not everyone is a C Gandalf, probably not even yourself a couple of months later when fixing a bug with everyone screaming that the system is down.
Exaggerating here, but rule of thumb is that it takes twice to debug as it takes to write it, so how long do you want to take to do maintenance fixes?
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
Can you assure that your preprocessor tricks always generate compliant code?
For me writing embedded code is the ultimate test for your programming skills, since a lot of C toolchains for embedded devices (as in bare metal embedded) are unstable, only almost compliant and full of weird behaviors and hacks.
How ('yeah im going to write this') do you start such project like a c compiler? Seems like a huge thing to write. What did you write first? Did you already have a lot of compiler knowledge so you had a good idea/structure/flow etc in your mind already? And how long did it take you?
There are a few problems with the questionnaire. "I don't know" is pretty generic choice to be given/choosen.
Say for example, in question 5, the statement "return i++ + ++i;" is undefined, because the value of i is read and modified twice in a single sequence point (and of course, the order of addition is unspecified), which is not allowed in C. So the answer is "Undefined." (The explanation given in the page not accurate enough in terms of C)
And for question 1, the code is valid, but the result is not strictly defined. It depends on the implementation. So the answer is "Implementation defined."
The usage of "main()" hurts me, the strictly conforming way is to write it as "int main(void)" (or similar)
I feel like the questionnaire piss off people who really knows C.
To me "I don't know" is a very apt choice. It makes the point clear that indeed reading the code does not allow you to know the result, which is quite a pitfall.
In your comment you are jumping from "I don't know", which is the first step, to wanting to explain why.
There were multiple questions where I would have answered "It's undefined" or "It's implementation defined", but those weren't options. It's not that I don't know the answer; I know the answer ("it's implementation defined according to the spec, but on essentially every relevant platform, the result will be X"), but the "it's implementation defined" part of my answer isn't an option, so the only possible answer becomes "on essentially every relevant platform, the answer is X".
Using "I don't know" as a substitute for "I know that the standard clearly covers this, and it says that the result depends on the implementation" does seem to be designed to piss off people who know C. If they really wanted to get the point across that you don't know what is and isn't implementation defined or undefined, they shouldn't be using vague questions to mislead people; they should just plainly ask questions which people don't know the answer to.
I hate this kind of questioning where you 100% know the subject matter the quiz is asking about, but the question and possible choices is so vague you have to try to interpret what you suspect the person who wrote the quiz wants the answer to be. I once had an exam which was full of that kind of multiple choice question, and guessed the exam author's intentions wrong on most of them.
the quiz asks what each one would evaluate to. “Undefined” or “implementation dependent” are not answers to that question. “I don’t know [because it is undefined]” or “I don’t know [based on the given information]” are logically consistent answers to the question that was asked
Ok, say I made a quiz where I ask you about what `((((16 <= 16) << 16) >> 16) <= 16)` evaluates to. I could give you the options 0, 1, 16, or "I don't know". If I as a quiz author wanted to test your knowledge about shift operators and esoteric uses of equality operators and your ability to reason through an expression, I would mark "1" as correct, and "0", "16", and "I don't know" as incorrect (I would include the "I don't know" option just because that's a common thing to do in quizzes, to not force people to guess one of the options if they don't actually know the answer).
My point isn't that there's no logically consistent answer. My point is that there are _two_ logically consistent answers, and which one is correct depends on the unknowable state of mind of the quiz author. On the other hand, if the options included "It's implementation defined" or "it's undefined", the author would have made their expectations clear, and the quiz would actually test people's knowledge of C rather than people's ability to try to reason about what sort of answer the author expects.
But the point is that I do know what that will print on my computer(s), with my compiler(s), on my architecture(s). "I don't know" is too generic a statement.
I understand your claim. But I've gotta say this claim is maybe a little too aggressive. I know for a fact that on a Sun-3 (68020 SunOS Desktop Pizza Box) using either gcc or the bundled cc, all of them would have been the same answer, and the answer would have been known to the coder, before running that code (unless you unleashed one of the gcc command line dogs). Except maybe #5, because who does this?
Except from the perspective of a pragmatics linguistic analysis, "I don't know" has a social context of "There's an answer, and I don't know it."
In this case, a non-C programmer should answer "I don't know" to all of them. A person with a passing familiarity should answer similarly. A seasoned pro would be forced to answer the same. Making it a rather useless tool for distinguishing people who think they know C but are honest when faced with their limitations or those who truly know it and know the answer is undetermined, which is supposed to be the point of the exercise.
You don't think the claim is justified because some might say otherwise. Some is irrelevant. Some might say lots of mutually exclusive interpretations. It's the author's intent that matters, and the context of the Author's post indicates the some interpretation isn't the author's intent. His post begins with the question "So you think you know C?". He then goes on to present a test that is, by his own words, intended identify to test takers whether or not they really understand the intricacies of C, and to think critically about that source of their knowledge "I had to learn to rely on the standard instead of folklore; to trust measurements and not presumptions; to take “things that simply work” skeptically"
Never once does the author mention that C is confusing, use the word confusing, or otherwise indicate that general idea. If you're getting that impression, it's your own reading into it. I'm not even saying you'd be incorrect, but that's not the author's intent, which was the basis of my comment.
If what "some" might say about what the author intends is irrelevant, then what you say about what the author intends is also irrelevant, because you are just some person (unless you're the author). My point was why should I trust your interpretation of what the author intends more than anyone else's.
>"So you think you know C?"
That goes along with the interpretation that the point is to illustrate C is confusing. It would go along with something like "You think you know it, you think it's simple, well actually you don't know it, it's confusing."
>intended identify to test takers whether or not they really understand the intricacies of C, and to think critically about that source of their knowledge
Yes, its intent is to indicate to test takers that a lot of them don't really understand the intricacies of C, which demonstrates that C is more confusing than they originally thought.
>Never once does the author mention that C is confusing, use the word confusing, or otherwise indicate that general idea.
Here are some quotes that indicate the idea that C is confusing:
>C is not that simple.
>It’s only reasonable that the type of short int and an expression with the largest integer being short int would be the same. But the reasonable doesn’t mean right for C.
>Actually, it’s much more complicated than that. Take a peek at the standard, you’ll enjoy it.
>The third one is all about dark corners.
>The test is clearly provocative and may even be a little offensive.
Then the author says that he did C for 15 years and thought he knew it, but then realized he didn't. That indicates to me either that the author is saying that he's not smart, or that C is confusing. The second appears to be the point the author is actually making.
My interpretation is based directly on what the author states. Your "some" is based on a vague aggregate group whose interpretations, in aggregate, would be diverse and often contradictory and mutually exclusive. Personally, I trust the explicit an implied interpretation of the author's direct statements than you mere speculation as to what others might interpret.
If you don't like "some" then replace it with me. I interpret it as the author saying C is confusing.
My interpretation is also based on what the author states, fairly explicitly. And I don't think there's anything that explicitly contradicts my interpretation.
You say it's confusing because the author says it's not simple. The same might be said of any language. Or of any learning specialty at all. It's not synonymous with confusing. You're severely stretching the meaning of the author's words when you say the author's point was to say that C is confusing. It's what you infer because you were confused, which points to this being personal to you, not the general intent of the author.
And yet its not confusing. Given the confines of any particular implementation and compiler the behavior can be known without confusion. The author never directly mentions or implies that their intent is to convey that C in confusing. Quite the contrary, they indicate their intent is to demonstrate that certain segments of people who believe they know C don't in fact understand its intricacies.
> Given the confines of any particular implementation and compiler the behavior can be known without confusion.
Only through extreme levels of compiler code inspection, as it can vary based on optimization heuristics.
> Quite the contrary, they indicate their intent is to demonstrate that certain segments of people who believe they know C don't in fact understand its intricacies.
Demonstrating that people don't know C is subtly different from an intent of testing whether people know C. The point being made is about C itself.
There is a world of difference between "don't know" and "can't know", as the first implies a shortcoming on the side of the developer while second one states that the question is patently meaningless to someone who does master the language.
1) This made me curious. Are any of the compilers in real use nondeterministic?
2) Probably that's not needed? A normal optimizing compiler just inlines the function somewhere new — and boom? Then again, can that really happen with practical contemporary compilers and this exact statement?
Most compilers are nondeterministic in small ways. For example, it's common to use hash tables that are keyed by pointer address and then iterate over the entries in storage order, so the order in which certain things are emitted will change from run to run. This is why "deterministic builds" are such a big deal, and not just an obvious thing that you get for free.
I don't know what the chances are that such a thing could ever translate into good assembly being emitted in one run and bad assembly being emitted in the next.
Register allocation can be quite tricky, and sometimes it can only explore a small part of the problem space, so if you don't start the algorithm with exactly the same seed you might end up with significantly different code in certain functions.
Whoa there! You mean “unspecified behavior”. int i = [unspecified] means that i has some value, but the spec doesn’t determine the value. Undefined behavior means that all your secrets might be sold to the highest bidder, your centrifuges might explode, and your computer is now full of ransomware.
> Whoa there! You mean “unspecified behavior”. int i = [unspecified] means that i has some value, but the spec doesn’t determine the value. Undefined behavior means that all your secrets might be sold to the highest bidder, your centrifuges might explode, and your computer is now full of ransomware.
That's only when using Boehm GC[0] in kernel device drivers that self-modify. Or any MSVC binary.
Whilst my other comment was intended to be jovial, it is hard to say if that was accurately conveyed. So this one will be serious.
The original problem definition, as specified by @pksadiq, read thusly:
> Say for example, in question 5, the statement "return i++ + ++i;" is undefined ...
This inspired a response by @Filligree of:
> Compile it, look at the assembly. You can know. The answer will vary from place to place, but it isn't non-existent.
Given the original constraint of an undefined statement result, and the suggested activity to address same, I posited that the recommended action is an exemplar of observing the product of undefined behaviour.
You then contributed:
> You mean “unspecified behavior”.
As per c-faq.com[0], there are three categories identified relating to this topic:
1 - implementation-defined: The implementation must pick some behavior; it may not fail to compile the program.
2 - unspecified: Like implementation-defined, except that the choice need not be documented.
3 - undefined: Anything at all can happen; the Standard imposes no requirements.
Whereas you imply a standards-conformant implementation of "return i++ + ++i;" is unspecified (category #2), it is, in fact, undefined (category #3). The support for this assertion is as follows.
As per the same site, Question 3.8[1] includes:
> Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.
And further states:
> ... if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written. This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification.
And concludes with an example stating:
> ... the Standard declares that it is undefined, and that portable programs simply must not use such constructs.
Therefore, the original expression presented by @pksadiq is in fact an exemplar of an undefined expression as defined by category #3 shown above. Since both it and the message to which I originally responded satisfy same, I stand by my response given to @Filligree as having had informally defined the standard C concept of "undefined behaviour."
> Whereas you imply a standards-conformant implementation of "return i++ + ++i;" is unspecified (category #2), it is, in fact, undefined (category #3).
You're misreading things. amluto's assertion is that "The answer will vary from place to place, but it isn't non-existent." is a description of category #2. That assertion is basically correct, depending on how exactly you define "place to place".
An informal definition of category #3 is "The answer can vary from place to place, or not exist at all." ideally followed by "It might crash or run unrelated code or even prevent the preceding code from running." It's flat-out wrong to say a value "isn't non-existent" when it comes to source code exhibiting undefined behavior.
No, it isn't. The correct answer to #2 is "According to the standard, the result is implementation defined, but on my target platform, 0". "I don't know" is the wrong answer.
The C specification does not say that undefined behaviour must give a deterministic result on a given platform. All you can say is "this one time I compiled and then ran this code, it gave 0". There is no requirement that the code compiles at all, nor that the same compiler on the same platform produces the same binary on every run, nor that the resulting binary produces the same result on every run, nor that the binary produces any result, nor that it doesn't sometimes produce a result and sometimes not, nor that the compiler doesn't sometimes produces a binary and sometimes not ... undefined behaviour is exactly that: undefined behaviour.
I'm well aware of what undefined behavior is. I still know it's undefined behavior and can read my compiler manual to answer the question of how the code behaves. "I don't know" is simply wrong.
> and can read my compiler manual to answer the question of how the code behaves.
Which is both not true (because the compiler manual usually won't define undefined behaviour) and irrelevant (because the questions were about C, not about a compiler).
In the example I choose (#2) most compilers totally specify the behavior. And the question was (right from the article) "what the return value would be?" In order for a function to return, it must be run. In order for a function to be run, it must be compiled. In order for a function to be compiled, there must be a compiler (or interpreter, I suppose).
You're being pedantic about something silly, but you're also wrong in your pedantry.
Somewhat related, my introductory classes involved a lot of games around pre- and post- increments and short circuiting. While I get that understanding these operations is fundamentally important, is understanding ridiculous combinations of them important? I mean, these were the basis of large portions of some quizzes and midterms. I get playing with them from a theoretical perspective, as this can literally be done in many languages, but why force freshmen to play this deep mental gymnastics? Maybe a play at making the classes weeder classes and no other reason.
The questions are testing whether you really understand the basic rules of the language. Often times, the best way to test whether you really get the rules is to raise them in some odd context, so that you can’t just pattern match to figure out the result.
I don't know how convoluted the questions on your midterms were but one good reason that kind of irritating thing pops up in tests is that it's quite common in real world C code. Think of the old K&R string copy example.
Have you ever worked with a pre-ANSI (K&R) C compiler? Omitting the return type for main is legal in those old compilers. Newer ones give you a warning.
> I feel like the questionnaire piss off people who really knows C.
I hated this test. I’ve spent 12 years working on C targeting various flavors of arm and x86.
Just because the behavior is undefined when compiled without warnings and run on a Soviet water integrator doesn’t mean the language is undefined for the 99.995% of the industry uses.
Behavior of c89 or later with -Wall -Werror on modern clang, gcc, icc, visual studio, is well understood on arm, x86, mips, risc, ppc, Cortex-m and just about every other hardware architecture.
But, C is a pia, and I’ve been using rust instead :)
It’s not just that stuff. What pissed me off was asking about the return code of a comparator. That’s just bad form. You’re only supposed to check for zero or nonzero. I have never used the value beyond that, and if you are, that’s a problem.
You’re incorrect. The result of a comparison is guaranteed to be zero or one in C. Similarly for the exclamation-point “not” operator, and || and &&.
This isn’t a recent standardization; it’s been an explicitly specified feature of C pretty much since the very beginning of the language. See page 7 of the prehistoric https://www.bell-labs.com/usr/dmr/www/cman.pdf
I found it an amusing excrcise, if not terribly relevant, even as someone who spends 90% of his dev time in C.
What rubs me about these sorts of articles is they make some presumption about the importance and nessecisity of writing truely portable C, as if the "C Standard" were in and of itself a terribly useful tool. This is in contrast to where I live most of the time which is "GCC as an assembler macro language" (for a popular exposition on this subject see https://raphlinus.github.io/programming/rust/2018/08/17/unde...). And yeah, reading through the problem set I was critiquing it in context of my shop's standards, where we might be packing and padding, using cacheline alignment, static assertions about sizeof things, specific integer types, etc. So these sorts of articles just come off as a little pendantic to folks like me. I don't doubt they're useful for some folks, and I guess it's interesting to come up from the depths of non-standard GNU extensions and march= flags to see what I take for granted.
It's very much worth reading, Linus Torvalds' opinion of standards that's linked in that article, but I'll link it again here: https://lkml.org/lkml/2018/6/5/769
"So standards are not some kind of holy book that has to be revered. Standards too need to be questioned."
The way I see it, a lot of compiler writers are basically taking the standard as gospel and ignoring everything else "because the standard doesn't say we can't" --- and that's a huge problem, because behaviour that the standard doesn't define often has a far more common-sense meaning that programmers expect. IMHO the onus should really be on the authors of compilers to find that reasonable meaning. In fact, the standard even suggests that one possible undefined behaviour is something like "behave in a manner characteristic of the environment" (can't remember nor be bothered looking up the standard.)
This is a common misconception. Compiler authors don't exploit undefined behavior to make themselves seem smart, or because they like breaking code. They exploit undefined behavior because somebody filed a bug saying some code was slow, and exploiting UB was the simplest way--or, in many cases, the only way--to fix the performance problem.
GCC and Clang do give you the option to avoid optimizations based on undefined behavior: compile at -O0. We think of the low-level nature of C as being good for optimization, but in many cases the C language as people expect it to work is at odds with fast code.
It's fascinating to actually dive into the specific instances of undefined behavior exploitation that get the most complaints. In each such case, there is virtually always a good reason for it. For example, treating signed overflow of integers as UB is important to avoid polluting perfectly ordinary loops with movsx instructions everywhere on x86-64. It's easy to see why compiler developers added these optimizations: someone filed a bug saying "hey, why is my loop full of movsx", and the developers fixed the problem.
Thanks, rygorous is always a great read - although sometimes a little overwhelming. If I got the gist of it, I have a small correction to your comment: the issue is about movsxd (sign extended integer indexes), not movzx (zero extension).
It's easy to see why compiler developers added these optimizations: someone filed a bug saying "hey, why is my loop full of movsx", and the developers fixed the problem.
"fixed" by breaking other expectations. Regardless of what the spec says, that's still a stupid way to do things. There's a child comment below which examines this case in detail; and the real solution is to make the analysis better, not use UB as a catch-all excuse.
1) The standard says I must do this, so I must do it.
2) The standard doesn't say I must not do this (but does allow me to either do it or not do it), so it's totally OK if I do it.
I think you're thinking of cases covered by statement 1, and I think pretty much everyone agrees that compiler writers should behave that way for the standard to mean anything.
The issues arise in cases covered by statement 2. Just because the standard allows a behavior doesn't mean that the behavior is a good one. And yes, code relying on you not having the behavior is not following the standard, and that's something the authors of that code should consider addressing. But on the other hand, the standard may allow a lot of behaviors that only make sense in some situations but not others (totally true of the C standard, depending on the underlying hardware) and as a compiler writer you should think carefully about what behaviors you actually want to implement.
AS a concrete example, you _could_ write a C compiler targeting x86-64 which has sizeof(uint64_t) == 1, sizeof(unsigned int) == 1, sizeof(unsigned long) == 2, and sizeof(unsigned long long) == 2 (so 64-bit char, 64-bit short, 64-bit int, 128-bit long, 128-bit long long). Would this be a good idea? Probably not, unless you are trying to use it as a way to test for bugs in code that you will want to run on an architecture where those sizes would actually make sense...
It's a collective action problem. If we want to give up runtime performance and get stronger guarantees about what code will be understood to mean, we should revise the standard and start using new optimizers that respect it. If every compiler goes its own way, I only benefit from what they already agreed on.
GCC and many other compilers have been known to change the consequences of undefined behavior unpredictably when upgrading, changing compiler flags, etc. For some examples that matters.
Knowing what the standard says and keeping to it as much as possible is important because every now and then, a major compiler finds some exciting new way to optimise code based on undefined behaviour, and breaks code that assumed GCC would always do some seemingly obvious reasonable thing it did when the author tested it.
> as if the "C Standard" were in and of itself a terribly useful tool
Not necessarily, I took it to mean that engineering is holistic and things like compiler behavior in the face of undefined parts of the standard are important to account for.
Hey, please don't add personal attacks on top of your substantive points in HN threads. It helps nothing and makes the thread nastier and evokes worse from others. Also it's against the site guidelines: https://hackertimes.com/newsguidelines.html.
Where the author goes wrong is in assuming that somehow "I don't know" can be a final answer to these things. No, it is absolutely fucking vital that you know how the compiler will pad your structures in C. Similarly to the "what size is an int" on your architecture - on an ATmega8 this is 16 bit, but the chip can't actually do all 16 bit operations in single instructions.
I took that to be the point of the article though, that just looking at the code wasn't enough to know and you needed to go further to answer these cases for your exact use case or target platform.
Further: Unless your code is compiled, deployed to a rocket, and fired off the Earth never to return, the question of “what is my platform?” is meaningless in the context of writing good C.
So, today, using the compiler installed on your system right now, sizeof(int) = 32. Great. That means nothing, and changes nothing about whether your code is correct. You should not write code relying on it. Just like you should not measure the output of the questions on this test, and declare that you know what the answers are.
>Unless your code is compiled, deployed to a rocket, and fired off the Earth never to return, the question of “what is my platform?” is meaningless in the context of writing good C.
While I feel the tone of your comparison was intended to be a bit hyberbolic, the reality is a bulk of modern C development occurs in a context similar to the one you describe. Further the thought, utterly foreign to the vast majority of software developers, that the physical machine may not be some utterly abstract and constantly mutating target which there is no hope of understanding is, imo, one of the great dying arts of software engineering - a death perpetuated by the same sort of folks who think CS education should be carried on in Java.
I contend that, these days, most C is written to target a particular compiler, physical machine, and/or device.
There is vastly more old C code than new, and it didn't target the x64 or ARM architectures it's running on now. Where it wasn't portable, that was a defect that had to be fixed.
My first job was a 4GL targeting customers running DOS on the 80286, complete with runtime linking. 100% of that work has been abandoned due to incompatibility. It contributed nothing to the profession beyond what I personally learned.
There is a Mac program BBEdit that was first written to target 68K 32 bit Macs, then PPC 32 bit Macs, then 32 bit x86 Macs and then 64 bit Macs. Probably within the next 3 years it will target ARM Macs.
The author said he never did a full scale rewrite. He slowly migrated code from one platform to the next.
Today, Apple’s code runs on both ARM and x86 and with Marzipan, as will developers code. True most will be in Objective C, but some low level code is still in C.
One immediate redflag I have noticed is using "int", "char", "short" as if they have a definite size. They don't. C standard only guarantees a minimum size. For example, many PDPs are 36-bit. Assuming the size of a variable is a common practice nowadays, but at least one should use uint8_t, int32_t, etc. from stdint.h.
But I was still tricked, it should be obvious in hindsight, 12 years of schooling led me to think: If the author was asking these questions, at least one or two questions must be answerable (even if it's technically incorrect, but you'd better to guess the original intention of a question). So I still tried to guess and got two wrong answers... Get to be careful next time...
4/5 here. In fact after the third question provided "I don't know" as an answer I started to suspect something was up — especially since the author said only one answer was the right one ... why even provide "I don't know" then, I wondered?
I knew "int" was sort of platform-dependent (was 16-bits generally when I was learning to code, later 32-bit became more typical) — so combined with that niggle and all the "I don't knows", I (correctly) reevaluated by first couple of answers.
Still, didn't realize the last one was compiler-dependent.
The third one has another implementation defined aspect. We do not know the value of a space ' ', in ascii it is 0x20(32) but that depends on the system, in EBCDIC it is 0x40(64).
I answered Idk to all. After the first two, the pattern became clear and I felt like if I wrote a compiler myself, the answers could be very different.
I've worked on 16 bit C code, 32 and now 64 bit code. So I knew that the behavior was implementation and optimization dependent. :)
I posted this in the embedded C-shop I work, under the comment that exactly in this place all should pass this test. Sadly only 1 in 5 passed the test (yours truly). Admittedly, this test was binary: either you pass or fail all of the questions (which is also sort of a give-away).
In the end this test proved to be a really valuable, because the "I don't know" drove the point home, specially for smart folks who don't like to answer any test, ever, with "I don't know".
Well, that's a copout. Of course, if you take absolutely any computer architecture, you can't assume simple things like sizeof(int) or data structure alignment. But if sizeof(int) is at 4 and data needs to be aligned by its own size - like on any real architecture relevant today - many of these questions have a deterministic answer. In practice, compiler bugs are a much bigger issue than architecture assumptions.
I failed on one, number 4. I bravely assumed 16 bit integers cannot exist. Can anyone name a concrete platform/compiler where int is/was 16 bit. Or is this just a theoretical option left open by the spec?
Turbo C on MS-DOS for one. In fact 16-bit int was the norm on that platform, because the architecture didn't have 32-bit general purpose registers.
In the C89 days, you'd use 'short' in aggregates (structs and array) for values you knew wouldn't exceed 16 bits so didn't want to potentially waste space; 'long' in situations where you knew 16 bits wouldn't be enough; and 'int' the rest of the time (where 16 bits was enough, and there weren't any storage benefits to outweigh the performance benefit of using the native word size).
Why couldn't they? C had already existed for a couple of decades when 32-bit machines started getting popular. `int`, as the default integer type, usually is the size of the machine word for best performance. It would make no sense to have slow, emulated 32-bit ´int`s on a 16-bit system, never mind 8-bit ones.
Many C compilers that target 8-bit (sometimes 16-bit) machines. MOS 6502, Zilog Z80, and Motorola 6809. Modern examples include Intel 8051, AVR and PIC.
My amiga c compiler (aztex manx) allowed either 16 or 32 bit ints. All/most systems libraries used 32 bit parameters, despite this I insisted on the 16 bit version "for performance". In hindsight this was sort of insane: one missing L (say in "1L" for casting to long) meant a not so quick floppy disk reboot). :-)
Anyhow, for a computer with 16 bit wide data bus, having 16 bit ints might be justified by performance (and/or reducing memory usage.)
Sad to say I scored perfectly, due to a similar early disillusionment on embedded platforms, and years of pain porting code between 16- and 32-bit architectures when the author thought they knew the size of “int”.
At the end of the test, the author talks about automation programming for a nuclear power plant. I don’t think I could ever sleep the same at night after writing something like that.
> I don’t think I could ever sleep the same at night after writing something like that
In these situations, you likely know your hardware and know your compiler, so you can actually provide an answer for 4 of the questions. The last one is a situation where someone should tell you not to get cute in the code review.
I wrote C in telecom and finance and in both places we enforced a rule: when you define a structure, put a comment after each element that says what you think the structure offset should be, and at the end of the structure #define a constant that says what you think the size of the structure should be. In a code review, if anyone noticed something that didn't look right, you could talk about it. In testing, you could also check that sizeof(foo_s) == FOO_S_SIZE and fail if it wasn't.
In some of our code, we would test the size of various types and structures on startup and immediately exit if they weren't what we expected. We'd print type sizes to logs to help debugging if there was ever a problem. We were supporting a single code base that ran on big endian, little endian, X86, Itanium, SPARC, ARM. Compilers change, but automated tests of type and structure sizes catch things immediately.
It may sound like a lot of work, but it actually isn't at all. It also helps a lot with long-term maintainability.
> In some of our code, we would test the size of various types and structures on startup
This is one of the things that C++ has actually improved a lot recently: doing this with static_assert is much nicer in terms of catching problems early... And yes, it's great for long-term maintainability.
Particularly writing it in C... It isn't a language well suited to be fully defined (see this very article for why), and no, Rust/Go aren't either. But Ada derivative or Haskell perhaps, there's some amazing tooling for safety critical systems and the languages themselves lend themselves to exposing side-effects.
Ada, maybe? I don't know enough about it to comment. You definitely don't want to use Haskell for that sort of work load, though, at least not directly. Laziness-by-default is precisely the sort of hard-to-reason-about logic you don't want in that sort of application.
That said, if I I had no alternative but to try and tackle this problem, I would seriously consider a strategy where I would write a Haskell program that would generate the actual program (potentially in ASM directly) for me.
I scored perfectly. I've been programming in C since 1989, on various platforms (started with the Amiga, then VAX/VMS, Linux x86, and various embedded systems.)
If we're going to be --pedantic, shouldn't the author specify the exact standard of the C language under test? A lot of companies have varying implementations of C and perhaps some do specify some of the behavior at hand here.
The better wording would be D) not enough information to give a definitive answer. It's like the old gotchya question; what is 1+1 ? of course the answer is it depends on if you are using binary or a base of integer >2.
But that reminds me of the joke: there are 10 kinds of people: those who know binary, those who don't, and those who didn't know the '10' was written in base 3.
Isn't there a more fundamental flaw in these questions? main() always returns an int, whether that's 4 or 8 bytes, 0 or 1 means success or failure depends on the implementation. here's a bit of a discussion https://stackoverflow.com/questions/204476/what-should-main-...
Reminds me of the dumb exams some teachers would set to trick you when in school to make themselves feel superior.
> And at this point, I only have to apologize. The test is clearly provocative and may even be a little offensive. I’m sorry if it causes any aggravation. [...] It was a research project in nuclear power plant automation, where absolutely no underspecification was tolerable.
I appreciate the apology here, and I can totally understand the concern about the spec in a safety critical environment.
Still, all questions on this test except the first are clearly examples of things you should never ever do in production code, which might undermine the message a bit? Yes, you can write bad code, and that’s true in every language I’ve ever used.
I’m guessing it would be hard to find a modern compiler on Windows, Mac or Linux that produced padding other than rounding up to nearest 4 bytes?
Sizeof(a+b) is obviously a weird thing to do.
char a = ‘ ‘ * 13 produces an overflow warning in gcc.
(((((i >= i) << i) >> i) <= i)) I hope nobody really did that.
return i++ + ++i; Not doing exactly this was drilled into us in CS 101. Still, I’d be interested to hear about a compiler that doesn’t return 1, since many people rely on the fact that ++i is pre-increment and i++ is post-increment. I don’t doubt one is out there, I’m curious to know which.
There probably weren’t much better choices 20 years ago... what would be the best choices today for a branch new nuclear power plant?
Wow, you’re right. Me too on ubu 16. Okay, I guess it’s not about pre or post increment. Maybe I should read the spec... ;) And good reason not to do this in code!
But a code based on several standards, not only on C standard. For example we know, that basic source character sets should contain space (C11 5.2.1), we know that character constants has type "int" and that character constants represent value equal to the code of a symbol (C11 5.4.4). We know used source character set and we know code of the space character. We can configure specific source character set on the POSIX compatible systems. We know that return statement in the main function equivalent to the call to exit function (C11 5.1.2.2.3), we know that only 8 least significant bits will be used from returned value (POSIX.1-2001 definition of "exit"), we know that INT_MAX should be at least 32767 (C11 5.2.4.2.1) so we sure that the result that we got from "return" statement in the "main" function is positive integer from 0 to 512. Finally if we configure source character set to be sure that ' ' has code 32 we know for sure that we got value 416 in the specified example. So we know for sure answer on question 3 based on the C11, POSIX.1-2017, and ISO 646 standards.
My mistake: We have (32*13) with minimal possible CHAR_BIT is 8. So it either 416 for char bigger then eight bits, or 160 for unsigned eight bits char, or -96 for signed eight bits char. Then it extended to the signed integer value (one of this three values), and then we got result as (int)(status & 0377). For all three cases result will be 160.
This quiz wasn't illuminating at all. You generally start with assuming and validating a "C Datatype Model" i.e. ILP32/LP64 etc. for your system. Once you know that, these questions are easily answered.
Here, I figured that foo would always be 0. Wrong. It was always 0 with GCC, but this is undefined in the spec and code like this can have a different value in clang. I actually had to make a security update to my little open source project because of this (although the code I wrote did not manifest the bug in an insecure way, even with clang).
> Eventually, I had to learn to rely on the standard instead of folklore; to trust measurements and not presumptions...
Indeed, testing your assumptions, because even a defined standard may result in a differing implementation of it. Especially in critical applications, testing the expectations gives some sense of a defined behavior.
This quizz is amusing as a mental exercise and a parable, but in reality all of these cases had to be fleshed out on a real platform, with real compiler and ... specified expectations of the behavior.
None of the cases in fact communicate a clear intent, except maybe #1 to figure out the padded size, still it's somewhat open-ended. Perhaps returning a specific condition (return sizeof(struct ...)==5; ) would show a clear intent. Not that it would change the right answer, just such a case may indeed be true on a specific platform, compile flags erc.
But often we encounter UB in code that's already shipped. So it's good to have an intuition about what machine code was actually emitted, for example when deciding if a crash report is due to this particular UB, or not.
5/5, but I don't think this test is very good at capturing the more obscure features of C, they all just deal with the fact that platforms have different datatypes/alignment requirements, except for the last one. I think a better example would be the following:
int a=1, b=2, i, j;
i = a += 2, a + b;
j = (a += 2, a + b);
Whats the value of a, b, i, j? Hint: i and j are different.
Which begs the question, why does C have those features in the first place? The only C code where it's reasonably common seems to be in crypto algorithms.
I knew there was something up because long ago when developing for Arduino boards as part of a course, my mentor educated me on the difference of size of datatypes across different architectures.
I scored 2, but mostly out of luck for being 100% sure only about the 1st question, as I encountered alignment problems a lot of times in the past when dealing with structures to be sent through the network, and also between different architectures (and sometimes endianess too). If memory serves, there are #pragma directives to force the compiler to align structure members to a given interval, but they're compiler dependent and would make a non portable piece of code even less portable.
It is curious because, in each of the questions I encountered problems with the own questionnaire and I tended to answer in the way the writer thought.
And then you find "it's a trap".
I think the questionnaire is not honest enought, a better answer for D should have been: "we need more information" or "there are programming inconsistences"...
I thought that when selecting "I don't know" I was telling "I don't know what's happening with the code and the inner datails".
I managed to answer all of the questions correctly; I recognized them as undefined. Still, there may be enough definition on the set of computers the program will run on, to do, in some cases (for example, that char is at least 8-bits, and/or that it is ASCII).
But the thing that would be better to do, in my opinion, may be like having LLVM with macros (including standard macros for dealing with differences of systems, and user-defined macros for your own use).
Funny. My first thought of question one is we'd need to make assumptions about which architecture we're working on to know this answer. By the time I got to question 3, I realized the author's trend. This is both the curse and blessing of C, a language that gives you just barely a high level translation layer over the raw silicon.
It is not difficult to overcome these limitations of C by typedef definitive types like signed int32, unsigned int8 and so on. Many embedded C .h have that as a standard way of clearing things up. of course you can always sizeof(int) or whatever. (BTW this quiz or one like has been around a long time, but still a good reminder).
The author's explanations of the first three answers aren't sufficient. There is no requirement within C for `int` and `char` to be different sizes. Similarly, you don't know what the resolution of ' '*13 will be. It's architecture dependent.
C, for all it's simplicity, is a relatively complex language.
There is a minor error in the explanation for the third one: the minimum allowed value for CHAR_BIT in C is 8 (it does not affect the result, principally because the value of ' ' could be anything in the range of char).
That's the one where I wanted most to quibble with the "explanation" of why there is no set answer: To me, the top of the line answer is that the value of the string constant ' ' as an integer is implementation-defined (or maybe it depends on the execution environment? See, I'll get the precise wording wong too); anyway, space is 32 in ASCII but 64 in EBCDIC. The most you could say is that it's not zero (and maybe that it's not -1? I'd have to check how EOF is defined)..
Can't even make the test. Which version of C? Which platform? Under DOS in the 90's, the answer to the first question would have been 3, it's not even proposed in the options.
I'm almost not. I made the point because this level of ambiguity and "do it yourself" is consistent throughout the language.
I know why we still use C, but the use of C is inherently prone to security problems.
C does not provide bounds checking by default, so it can be forgotten (Heartbleed) and the lack of either static checking, RAII or garbage collection (Not as a library e.g. Boehm) makes memory corruption all but inevitable.
People are forgetting that it is the very "looseness" of C that is responsible for its great success. The sheer volume of code in C (specifically, any number of complex and critical software) is a testament to that. People keep parroting the same old tired tropes about C without reflection and thought. All the problems, both real and imaginary, in the language have been worked through/around since the beginning by simple discipline, guidelines and external libraries. I am always annoyed when people bring up "memory corruption" as if it were some primordial sin. The power to manipulate raw memory in whatever way i want is so crucial that i am willing to live with the downside of possible corruption. In fact most of the people i have worked with and myself never found this to be so much of a problem as everybody else makes it out to be. We always followed good guidelines, had special libraries for memory allocation as needed and testing procedures to catch memory leaks. Everything worked out fine.
In conclusion, the power given to Programmers by C far outweighs any of its perceived downsides in real-world scenarios.
This is fine for your or my software but the risk of these bugs no matter how rare is too great for mass deployed code in something similar to OpenSSL.
Any good alternative still allows you manipulate raw memory, but provide a safe alternative which makes it much harder to fuck up.
What power do I actually lose by using a safer language?
The OpenSSL "Heartbleed" bug that you bring-up is not related to inherent failures of the C language but something else. Just as an aside, i actually have some background in implementation of security protocols (specifically IPSec framework) and FIPS certification for a cryptographic algorithms library, though by no means am i an expert. In the security community many people believe that "Heartbleed" was an intentional plant. See https://www.smh.com.au/technology/man-who-introduced-serious... OpenSSL is such a heavily used and vetted piece of software that the probability of this being an "accidental bug" is very very low and my money is on it having been deliberately inserted i.e. deliberately used C language features towards a nefarious goal. So this is not a good example to bring up.
Now coming to your other point, in today's environment, it is true that you do not lose much for the most part when using a safer language because somebody else has done the dirty work in the implementation of the corresponding language's runtimes, compilers, libraries and ABIs. Without the latter you cannot have the former. After all at some point you have to move out of the cocoon provided by the language and meet real hardware (a good example is bare-metal programming on MCUs). And that is where C is needed and any challengers have to provide exactly similar "ugly, dangerous and unsafe" features if they want to dethrone the champ.
Yeah I assumed short would be at least as big as a char and would this be comparing size of short against short. Didn’t realize it would get promoted to int.
Went to an IRC chat room when I was learning C in school. Asked if you could return a pointer to something that lives on the stack. Was talked down by an all-knowing dude telling me to go read K&R again. Proceeded to write a code sample [1] that showed it is possible (it's not really stable but works reliably in recursive calls IIRC).
I do not like this attitude (then again it was just one random dude).
> it is possible (it’s not really stable but works reliably in recursive calls IIRC). [...] I do not like this attitude
You might want to listen. You’re getting the K&R comment and the downvotes because this does not work, ever. It’s a really, really bad idea. In recursive calls, it might not crash right away, but you will have bad data, the memory at the pointer address will have been overwritten by the next stack frame that’s placed there.
Don’t ever return pointers to local memory because the memory is “gone” and unsafe to use the moment your function returns. Even if you try it and think it works, it can and probably will crash or run incorrectly in any other scenario - different person, different computer, different compiler, different day...
Your comments about getting a warning and ‘However if you wrap the local’s address... it “works”’ should be clues. The warning is the compiler telling you not to do it. The workaround doesn’t work, it only compiles. By using aliasing, you’re only tricking the compiler into not warning you, but the warning is there for a reason.
Listening to what ? To the dude that tells me that's not possible and proceeds to dump a big pile of authority on top of my head or to my own experiment that tells me another story ?
I would have preferred to be told:
- yes and no. You'll get warnings if you try to return a pointer to a local, however, doing this and that, you can manage to do it.
- but once you have achieved that, the result will be dependent on the way the stack is handled (not really in your control). You'll feel some comfort doing this in recursive calls, however beware of signal.h.
But this isn't the answer I received. I guess C programmers do not know the difference between what you can do (however risky) and what you shouldn't do. Also when someone asks such "weird" questions, do not assume he's a beginner with no notion of what constructs he can handle safely, maybe he's someone trying to find the limits of C – and once these limits are identified it can be a good conversation starter about C's internal and the way various compilers differ.
Edit: also downvotes on HN are not like downvotes on Reddit: there's actually a limit (-2 ?). Below this the comment disappears. Conclusion: only downvote when the comment engages in antisocial behavior (not respecting the rules or common human decency, etc ...), not when you disagree with it. I always upvote an unfairly downvoted comment for these reasons.
I was trying to help by explaining it, instead of saying go read K&R, but I don’t get the feeling you really heard or understood me. There is no other story. There is no yes and no. There is only no. You cannot manage to do it. It does not work to return local memory from a function, ever, period. Once you return, it is 100% unsafe to try to use the memory from your previous stack. There is absolute zero comfort in recursive calls.
You are mistaking some luck in having it not crash once for thinking that it’s okay in some situations. It’s not okay under any circumstances. That’s what makes this even more dangerous. Your program could crash at any time. It might run a thousand times and then suddenly start crashing. It might always run for you, and then crash on other people. But just because it runs once without crashing doesn’t mean it’s working.
A signal is not the only way your function’s stack can get stomped on the very next instruction after you return. Other processes and other threads can do it, the memory system can relocate your program or another one into your previous memory space. Recursive calls are guaranteed to stomp on previous stack frames when your recursion depth decreases and then increases, the previous stack will be overwritten.
Returning a pointer to a local stack frame is always incorrect. It’s not risky, it’s wrong.
BTW: you have the ability to see comments below the downvote limit, go to your profile settings and turn on showdead.
I didn’t downvote you, if that’s why you were trying to explain voting behavior to me, but you will find on HN that downvotes and upvotes both happen for a wide variety of reasons, and are not limited to either whether people agree, nor whether the comments are polite. Downvotes are often cast for comments that break site guidelines, for example just failing to assume good faith can get you downvoted. So can making blanket generalizations about a group of people, like the above “I guess C programmers do not know the difference...”. See the comments section here: https://hackertimes.com/newsguidelines.html
I sometimes upvote what appear to be unfairly downvoted comments to me. I usually upvote people who read and respond to me, regardless of whether I agree with them.
?? I don’t understand what you mean. Those other languages don’t have pointers, they only have references, but what do they have to do with this?
Why do you still think there’s some yes in C? It’s not making sense yet that your memory is gone after you return? Returning a pointer to a local variable is exactly the same as calling delete or free on a pointer and then reading from it. You officially don’t own the memory after a return statement, so if you try to use it, then what happens is indeterminate. Again, since it doesn’t seem to be sinking in: it is always wrong to return a pointer to local memory. But, if you really really don’t want to listen, and you’re sure it works sometimes, then I say go for it!
Signal handlers allow C programs to respond to events outside of the normal control flow (see signal.h, etc.). This means that once a function, say fnc1, has returned, the memory on the stack that was used by fnc1 can end up being reused at any point in time. A signal, perhaps generated completely asynchronous to the program itself by a different process, causes a stack frame to be allocated (possibly on top of fnc1’s old stack frame) for use by the corresponding signal handler. This could happen at any time, even before fnc1’s caller gets a chance to use the pointer returned by fnc1.
tl;dr - C has some undefined behavior, if you plan on things working based on your experience with one compiler and one computer, you will be surprised.
I wonder how many C programmers have experience with writing multi platform/compiler code. I have done a lot of C and C++ but never had to port it so I would not be surprised if I had made a ton of mistakes.
The best one is to not write C. The second best is probably to read the standard papers, be very careful (perhaps using a secure style guideline) and making generous use of tooling to help you catch mistakes.
Imagine a dumb piece of hardware storing your variables. Two pieces of a statement try to do conflicting things to the same variable at the same time. This can cause the data to get corrupted, or the entire chip to have a fault. The C standard allows an implementation like this.
So for the left-to-right operand evaluation order I thought it’s this: look at the left operand, take zero. Then post-increment, now it’s one. Move on to the right operand. Pre-increment what is 1, so now it’s 2. Take 2. So it 0 + 2?
That’s what I imagined, and what I get with Cygwin gcc, the result is 1. I thought that the post-increment was supposed to always only happen after the “statement”, but I was wrong. Other compilers, like gcc on Ubuntu return 2.
My answer to a lot of these questions is "If you write code like this and check it in to our corporate repository, I will cut out your heart and make you eat it."
In practice, things in C are not as undefined as the ISO working group specifies them. It is virtually inconceivable that a mainstream compiler stack would do anything other than what you'd expect with example four. As for struct alignment, that's something that most C programmers should know is implementation-defined (which is one of the reasons we even have sizeof to begin with, apart from the mere convenience of it).
Sure, you've made your point, but you've made it in a ham-fisted way which doesn't really help people understand why a given undefined or implementation-defined behaviour is the way it is, and what things they should verify about the implementation in order to predict where their code will not work.
> It is virtually inconceivable that a mainstream compiler stack would do anything other than what you'd expect with example four.
I disagree.
I can easily conceive of a (compiler, architecture, compiler options) tuple that simply crashes with an error at compile time or runtime with that code. Namely, some compiler for a 16-bit architecture with "sanitization" options enabled and optimizations disabled.
Integer overflow is one of the easiest "undefined behavior" cases to identify with mechanical checks. Much easier than bounds checking for example, where a general solution is quite tricky.
Sure, but portable C programs are not written for systems with 16-bit machine words practically ever anymore. Nobody's expecting to run libopus 1.3.1 unmodified on an 8051 (even if it might well do, it probably uses stdint.h anyway!). Furthermore, I've used toolchains for machines with 16-bit machine words, which made int 32 bits; surely this isn't uncommon.
I think this objection boils down to your perspective on what we mean by "C".
It's reasonable for some folks (especially working programmers who need to "get stuff done") to think that "what a reasonable compiler in their problem domain" would do is what "C" means. It's equally reasonable for other folks (especially compiler writers, verification experts, researchers, etc.) to think the ISO standard is what "C" means.
It would be great for the standard to be more "reasonable" and have less undefined behavior. But I, for one, cannot think of a more horrible, thankless chore than actually trying to make that happen. So much code is written in "C", and there are so many compilers and platforms, modern and legacy, that "C" runs on, each with their own notion of "reasonable," that it will take an incredible amount of work.
> It would be great for the standard to be more "reasonable" and have less undefined behavior.
GCC and Clang are mostly compatible, and as far as the low-hanging fruit is concerned. If you consider them the authority, it generally resolves most interesting questions about what ISO decline to specify. I do not think that there is any great burning need for ISO to go and define things more rigorously.
Failed all of them. I totally agree with what the author is writing. Don’t presume anything but measure. I also tend to code in a way that it’s not necessary to know all the intricacies. Its not as clever as many like and often makes for longer code but is usually easier to read.
I think that's exactly the wrong takeaway here. Most of these have a well-defined result on a given platform (host + abi). You can measure that result. But it'll be different on a different platform. And the others don't have a well-defined result — a real-world compiler will produce a result, and you can measure that, but it might be different tomorrow. Unless you know the difference between platform-specific behavior and undefined behavior, you don't know which ones to avoid.
> And at this point, I only have to apologize. The test is clearly provocative and may even be a little offensive. I’m sorry if it causes any aggravation.
What a lot of people don’t get is that it’s not pointers or manual memory management or even lack of language level support for “modern features” like object oriented programming, exceptions etc. that make C a pain to use. No it is the undefined and implementation dependent behaviours. There are simply so many of them that even experienced C programmers may, at times, run into trouble.
C is essentially a portable assembler. It’s not enough to learn the language, you need to have deep understanding of the underlying hardware and compiler infrastructure.
It is an unsavory set of questions that has no bearing on practical work, also implementation dependent.
The answer is: No, I do not know C or any other language for that matter. I know some implementations of various languages just enough to write sound and readable code. And as I forced to use a bit more languages than I like I rely on local/Internet search to keep my brains concentrated on accomplishing the actual task rather then effing them up trying to figure out some esoteric constructs.
The questions are a bit cute, but they’re not testing obscure corner cases of the language. It’s not like asking if you have the trigraphs memorized. It’s testing fundamental rules about how C works: overflow, integer promotion, order of operations, memory layout, etc.
But would you write code that adds integers of two different types?Then you need to know the integer promotion rules. And this is just us just a way of testing that.
Of course it has a bearing on practical work, and that it’s implementation-dependent is the whole point. These are mistaken assumptions people make in the real world.
This is a perfect example of what I hate about some tests.
Didn't quite know the purpose of the test, there's a difference between code for any machine and any compiler and gcc running on some vanilla x86, which is pretty common, and could have been the content (ex: you say it's undefined, that's obvious, everyone knows that already, but it's still deterministic... Here's a breakdown of what happens in practice, blah blah blah). There's a real difference between the kind of "knowing" here and say the kind of "knowing" with unallocated pointers.
If it said "esoteric implementation on exotic hardware" then it's easy, you know what they are trying to do. If it said C89 you also know what's up. But how it's presented, it's a guess.
This was endemic throughout schooling. Instructors would say "just do your best" and I'd be like "wtf? There's like 2,3, maybe 4 perspectives on this with different answers depending on how clever you're trying to be or what you're trying to get at... Might as well put "I'm thinking of a number 1 through 5" on the exam".
You can easily get bitten by at least one of those examples just by compiling your application written on x86 to ARM to get it running on Android, so not sure if it's as esoteric as you think.
I agree, you can argue that it's unspecified, undefined, or whatever. It might not be well defined by the C specification, but none of these programs produce surprising output. Programming in the real world requires that you are able to read and write code like this, even if it requires that you investigate (and depend) the specific behaviour of your compiler/platform.
No, for the cases that involve undefined behavior, the results can differ arbitrarily depending on compiler settings, optimization settings, the presence of seemingly irrelevant code, or, in principle, the phase of the moon.
The behavior of any program that evaluates `i++ + ++i` is undefined. The solution is not to find out how it happens to behave in some circumstances. It's to find clearer code that expresses whatever the original intent was.
Sorry, but it’s this kind of attitude that leads to impossible-to-port code, platform lock-in, subtle bugs when tool chains change, and worst case: buffer overflows and outages. These programs may today produce “well defined” output on your favorite systems, but that doesn’t change the fact that it is invoking various flavors of undefined, implementation-defined behavior. It’s not safe C just because it works for me.
It’s an unfortunate truth that programming in the real world involves programmers who dare to explore these corners of C and claim to have answers to these questions. Stay away! Knowing C means knowing what is not defined as much as knowing what is.
You have to know what the standard allows because every optimizer change tries to be more aggressive without violating it. People have been bitten by error checks whose object code was elided because the error "can't happen". You have to decide whether you need code that will always work, or code that seemed to work for a while.
I don't think it's surprising when you switch machine architecture and/or word sizes that you get different results. In fact for me, that's completely normal and to be expected.
Consistent as in works for me and I ran it twice!, not consistent as in guaranteed to continue working after sudo apt upgrade gcc. Undefined behaviour can and will be used to make assumptions for optimisations that will bite you.
But knowing every engineering detail is not the same thing as knowing how to program in C effectively. It's like being the engineer who designs a Grand Prix car. It does not mean you can drive it faster around the track than anyone else. Not even close.
For example, the C preprocessor is surprisingly complicated. I had to scrap it and rewrite it completely 3 times. If you try to make use of all those oddities, my advice is don't waste your time. Over time I removed all the C preprocessor tricks from my own code and just wrote ordinary C in its place. Much better.