Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

  >   temp.a[dat[i]] = i; // This is what I'd meant
I see.

  >   struct blob temp1 = x; // Allowed initial value
With, I presume, a eye toward further producing:

  x.a[dat[i]] = i;
  y.a[dat[i]] = i;
?

> Compilers may replace an automatic object whose address is not observable with two objects,

That makes sense.

> do a sign-extending load from `x` and/or `y`

I assume you mean zero-extending; otherwise `x=255` would result in `result=-1`, which is clearly wrong.

> Should a compiler be required to initialize "result" in that situation, or should programmers be required to allow for the possibility that if they don't initialize an automatic object it might behave somewhat strangely?

Of course not. Result (assuming mode&3 == 0) is undefined, and behaviour characteristic of the environment is that result (aka eg eax) can hold any (say) 32-bit value (whether that's 0..FFFF'FFFF or -8000'0000..7FFF'FFFF depends on what operations are applied, but `int` suggests the latter).

None of this involves that the compiler infering objective (and frequently false) properties of the input program (such as "this loop will terminate" or "p != NULL"), though.



> With, I presume, a eye toward further producing: x.a[dat[i]] = i; y.a[dat[i]] = i;

Bingo.

> I assume you mean zero-extending; otherwise `x=255` would result in `result=-1`, which is clearly wrong.

Naturally.

> None of this involves that the compiler infering objective (and frequently false) properties of the input program (such as "this loop will terminate" or "p != NULL"), though.

Thus the need to use an abstraction model which allows optimizations to alter observable aspects of a program whose behavior is, generally, defined. I wouldn't describe such things as "behavior characteristic of the environment", though the environment would affect the ways in which the effects of optimizations might be likely to manifest themselves.

Note that programs intended for different tasks on different platforms will benefit from slightly--but critically--different abstraction models, and there needs to be a way for programs to specify when deviations from the "load/store machine model" which would normally be acceptable, aren't. For example, there should be a way of indicating that a program requires that automatic objects always behave as though initialized with Unspecified rather than Indeterminate Value.

A good general-purpose abstraction model, however, should allow a compiler to make certain assumptions about the behaviors of constructs, or substitute alternative constructs whose behaviors would be allowed to differ, but would not allow a compiler to make assumptions about the behaviors of constructs it has changed to violate them.

Consider, for example:

    typedef void proc(int);  // Ever seen this shorthand for prototypes?
    proc do_something1, do_something2, do_something3;

    void test2(int z)
    {
      if (z < 60000) do_something3(z);
    }

    int q;
    void test1(int x)
    {
      q = x*60000/60000;
      if (q < 60000) do_something1(q);
      int y = x*60000/60000;
      if (y < 60000) do_something2(y);
      test2(y);
    }
Under a good general-purpose model, a compiler could generate code that could never set q to a value greater than INT_MAX/60000, and a 32-bit compiler that did so could assume that q's value would always be in range and thus omit the comparison. A compiler could also generate code that would simply set q to x, but would forfeit the right to assume that it couldn't be greater than INT_MAX/60000.

There could be optimization value in allowing a compiler to treat automatic objects "symbolically", allowing the second assignment/test combination to become:

      if (x*60000/60000 < 60000) 
        do_something2(x*60000/60000);
even though the effect of the substituted expression might not be consistent. I wouldn't favor allowing inconsistent substitutions by default, but would favor having a means of waiving normal behavioral guarantees against them for local automatic objects whose address is not taken. On the other hand, there would need to be an operator which, when given an operand with a non-determinisitic value, would choose in Unspecified fashion from among the possibilities; to minimize security risks that could be posed by such values, I would say that function arguments should by default behave as though passed through that operator.

The guiding principle I would use in deciding that the value substitution would be reasonable when applied to y but not q or z would be that a programmer would be able to see how y's value is assigned, and see that it could produce something whose behavior would be "unusual", but a programmer looking at test2() would have no reason to believe such a thing about z.


> I wouldn't describe such things as "behavior characteristic of the environment",

`result` being a 32-bit integer (register) of dubious signedness is behaviour characteristic of the environment, which the implementation is sometimes obliged to paper over (eg with `and eax FF`) in the interests of being able to write correct code.

> A good general-purpose abstraction model, however, should allow a compiler to make certain assumptions about the behaviors of constructs, or substitute alternative constructs whose behaviors would be allowed to differ, but would not allow a compiler to make assumptions about the behaviors of constructs it has changed to violate them.

> Under a good general-purpose model, a compiler could generate code that could never set q to a value greater than INT_MAX/60000, and a 32-bit compiler that did so could assume that q's value would always be in range and thus omit the comparison. A compiler could also generate code that would simply set q to x, but would forfeit the right to assume that it couldn't be greater than INT_MAX/60000.

Yes, clearly.

> I wouldn't favor allowing inconsistent substitutions by default, but would favor having a means of waiving normal behavioral guarantees

In that case, I'm not sure what we're even arguing about; the language standard might or might not standardize a way of specifying said waiver, but as long as it's not lumped in with -On or -std=blah that are necessary to get a proper compiler, it has no bearing on real-world programmers that're just trying get working code. Hell, I'd welcome a -Ounsafe or whatever, just to see what sort of horrible mess it makes, as long -Ono-unsafe exists and is the default.


> Yes, clearly.

Unfortunately, the C Standard doesn't specify an abstraction model that is amenable to the optimization of usable programs.

> In that case, I'm not sure what we're even arguing about; the language standard might or might not standardize a way of specifying said waiver, but as long as it's not lumped in with -On or -std=blah that are necessary to get a proper compiler, it has no bearing on real-world programmers that're just trying get working code. Hell, I'd welcome a -Ounsafe or whatever, just to see what sort of horrible mess it makes, as long -Ono-unsafe exists and is the default.

The only reason for contention between compiler writers and programmers is a desire to allow compilers to optimized based upon the assumption that a program won't do certain things. The solution to that contention would be to have a means of inviting optimizations in cases where they would be safe and useful, analogous to what `restrict` would be if the definition of "based upon" wasn't so heinously broken.


> to allow compilers to optimized based upon the assumption that a program won't do certain things.

Emphasis mine. This is always wrong. Correct (and thus legitimate-to-optize-based-on) knowledge of program behavior is derived by actually looking at what the program actually does, eg "p can never be NULL because if is was, a previous jz/bz/cmovz pc would have taken us somewhere else"[0]. Optimising "based on" undefined behaviour is only legitimate to the extent that it consists of choosing the most convenient option from the space of concrete realizations of particular undefined behaviour that are consistent with the environment (especially the hardware).

0: Note that I don't say "a previous if-else statement", because when we say "p can never be NULL", we're already in the process of looking for reasons to remove if-else statements.


There are many cases where accommodating weird corner cases would be expensive, and would only be useful for some kinds of program. Requiring that all implementations intended for all kinds of task handle corner cases that won't be relevant for most kinds of tasks would needlessly degrade efficiency. The problem is that there's no way for programs to specify which corner cases they do or don't need.


> Requiring that all implementations intended for all kinds of task handle corner cases that won't be relevant for most kinds of tasks would needlessly degrade efficiency.

Yes, that's what undefined behaviour is for. Eg requiring that implementations handle integer overflow needlessly degrades efficiency of the overwhelming majority of tasks where integers do not if fact overflow.

> The problem is that there's no way for programs to specify which corner cases they do or don't need.

Wait, are you just asking (the situationally appropriate equivalent of) `(int32_t)((uint32_t)x+(uint32_t)y)` and/or `#pragma unsafe assert(p!=NULL)`? Because while it's a shame the standard doesn't provide standardized ways to specify these things (as I admitted upthread) programs are prefectly capable of using the former, and implementations are perfectly capable of supporting the latter; I'm just arguing that the defaults should be sensible.


In many cases, the semantics programmers would require are much looser than anything provided for by the Standard. For example, if a programmer requires an expression that computes (x \* y / z) when there is no overflow, and computes an arbitrary value with no side effects when there is an overflow, a programmer could write the expression with unsigned and signed casting operators, but that would force a compiler generate machine code to actually perform the multiplication and division even in cases where it knows that y will always be twice z. Under "yield any value with no side effects" semantics, a compiler could replace the expression with (x \* 2), which would be much faster to compute.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: