Tests are not meant for ensuring it works with all inputs. You cant simply just ...

Gibbon1 · on May 9, 2017

I'm reminded of a guy tasked with testing a 32 bit floating point library. After a number of false starts he realized the most effective way possible.

Brute force.

Oh other story. An OG (original geek) I know once proved the floating point unit on a mainframe was broken via brute force. Bad part meant the decimal floats used by and only by the accounting department sometimes produced erroneous results.

Me I have a pitiful amount of unit tests for the main codebase I work on. One module which does have unit tests, also had a bug that resulted in us having to replace a couple of thousand units in the field.

Otherwise most of the bugs that chap my keister aren't about procedural correctness, they're about state.

stavros · on May 9, 2017

How do you test it by brute force? How is the algorithm supposed to know what the right thing to return is, unless you rewrite the whole thing, correctly this time? And, if you've done that, just replace the actual function with the test and be done with it.

jameshart · on May 9, 2017

One answer to that is to apply property based testing. If you're testing an addition algorithm, for example, you might test that

   A + B == B + A
   A + 0 == A
   (A + B) + C == A + (B + C)
   A + B != A + C (given B != C)

slavik81 · on May 9, 2017

It's a good idea in principle, though your third and fourth properties are not true of floating point addition.

When you have one available, testing against the results against a reference implementation is a little more direct. Though, given how easy those property tests are, you might as well do both.

detaro · on May 9, 2017

Your comparison source could be running on a different platform, or be slower, or ... and thus not be a drop-in replacement. (E.g. in the example of a broken floating-point unit, compare the results with a software emulation)

stavros · on May 9, 2017

Hmm, true, it may have been a different platform/slower, thanks. It can't have been hardware/software, as the GP said they were testing a library (which implies software).

dagw · on May 9, 2017

How is the algorithm supposed to know what the right thing to return is

Some times I have, for example, matlab code that I 'know' is right and want to verify that my C/Python/Julia... code does the same thing. So then I just call the matlab code, call the new code and check if they're the same.

Or I have a slow exact method and want to see if my faster approximation always gives an answer 'close enough' to the correct answer.

stephencanon · on May 9, 2017

With floating-point math libraries especially, it's frequently the case that it's relatively easy to compute the desired result accurately by using higher precision for internal computations; e.g. you might compare against the same function evaluated in double and then rounded to float.

This sounds pretty sketchy at first, but it turns out that almost all of the difficulty (and hence almost all of the bugs) are in getting the last few bits right, and the last few bits of a much-higher precision result don't matter when you're testing a lower-precision implementation, so combined with some basic sanity checks this is actually a very reasonable testing strategy (and widely used in industry).

praptak · on May 9, 2017

If a function takes a single 32 bit argument you can run it on every possible value.

detaro · on May 9, 2017

… this reply does nothing to answer the parent's question. It's obvious that you can run it on all possible values, but that doesn't answer the question of the return values. (unless you only want to verify "doesn't crash" or something similarly basic)

praptak · on May 9, 2017

I believe the question was edited after I wrote my answer. This, or I just misread it - unfortunately HN doesn't let us know. Anyway, I'm providing another answer to the question as it is now.

praptak · on May 9, 2017

Validating the answer is different than providing it:

   FLOAT_EQ(sqrt(x) * sqrt(x), x) && sqrt(x)>=0.0

is a pretty good test for the square root even though it doesn't compute it.

UweSchmidt · on May 9, 2017

What tools would that be?

I'd like to hear about real world test scenarios where all possible inputs are tested.

dahauns · on May 9, 2017

In practice, testing _all_ possible inputs is usually not a feasible goal, and often downright impossible (since it would mean solving the halting problem). While there are some languages and problem domains where formal verification methods are a possibility, for most of software development the best you can do is reducing solution space.

A sane type system would be a start for example. And it's no coincidence that functional programming and practices with emphasis on immutability are on the rise; Rusts ownership system is a direct consequence as well.

TDD _is_ important, if simply for enabling well-factored code and somewhat guarding against regression bugs. But - decades after Dijkstras statement (which someone has already posted in this thread) - the code coverage honeymoon finally seems to be over.

_pmf_ · on May 9, 2017

Polyspace [0] allows full state space checking. Usually, this is feasible because the kind of modules you test are heavily constrained 12 bit (or fewer) fixed point values (sensor output/ actor input) and contain no algebraic loops. With floats and/or large integers, this quickly becomes unwieldy (verification then takes weeks).

[0] https://de.mathworks.com/products/polyspace.html

Kubuxu · on May 9, 2017

One of them would be a fuzz testing. You start with a predefined 'corpus' of inputs which are then modified by the fuzzer to reach yet not covered code paths. In the process it discovers many bugs and crashes.

The input fuzzing process is rarely purely random. There are advanced techniques that allow the fuzzer to link input data to conditions of not covered branches.

It is quite useful mechanism for checking inputs, formats, behaviour patterns (if you have two solutions but one model, one simple that works 100% but is slowish and one more complex but very fast).

See: https://github.com/dvyukov/go-fuzz#trophies and http://lcamtuf.coredump.cx/afl/

orangepanda · on May 9, 2017

Fuzzing is essentially the same as just throwing values at it hoping its all okay.

What I meant was what @AstralStorm said about mathematical and logic proofs that it works for all defined values.

jacques_chester · on May 9, 2017

Fuzz testing is useful at testing whole systems in vivo.

Proofs are powerful, but you can still make errors in the proof or the transcription into code. And of course you don't know what other emergent behaviours will surprise you when the proved code interacts with unproved code.

Depending on your means and needs, you want to try both.

nuclx · on May 9, 2017

Parent was talking about full coverage. While fuzzers are great tools, able to generate test inputs for easily coverable parts of the code, formally correct software has to be designed for provability.

AstralStorm · on May 9, 2017

Many embedded safety and medical applications prove correctness for all inputs and their combinations. Somme also verify error behaviour. (Out of range.) Granted, this is a relatively small input space, typically a few sensors.