The findings really should be independent of the code. Reproduction should occur by taking the methodology and re-implementing the software and running new experiments.
That's exactly the philosophy we follow e.g. in particle physics and its a common excuse to dismiss all guidelines made in the article.
However, this kind of validation/falsification is often done between different research groups (maybe using different but formally equivalent approaches) while people within the same group have to deal with the 10 years old code base.
I myself had very bad experience with extending the undocumented Fortran 77 code (lots of gotos and common blocks) of my supervisor. Finally, I decided to rewrite the whole thing including my new results instead of just somehow embedding my results into the old code for two reasons: (1) I'm presumably faster in rewriting the whole thing including my new research rather than struggling with the old code and (2) I simply would not trust in the numerical results/phenomenology produced by the code.
After all, I'm wasting 2 months of my PhD for the marriage of my own results with known results which -in principle- could have been done within one day if the code base would allow for it.
So yes, If it's a one-man-show I would not give too much on code quality (though unit tests and git can safe quite a lot of time during development) but if there is a chance that someone else is going to touch the code in near future it will save time to your colleagues and improve the overall (scientific) productivity.
> If it's a one-man-show I would not give too much on code quality
This makes me a little uneasy, as I'm not too worried about code quality can easily translate into Yes I know my code is full of undefined behaviour, and I don't care.
> PS: quite excited about my first post here
Welcome to HN! reddit has more cats, Slashdot has more jokes about sharks and laserbeams, but somehow we get by.
Are we talking actual undefined behavior or just behavior that's undefined by the language standard?
The latter isn't great practice, but if your environment handles behavior deterministically, and you publish the version of the compiler you're using, it doesn't seem to be a problem for this type of code.
> Are we talking actual undefined behavior or just behavior that's undefined by the language standard?
'Undefined behaviour' is a term-of-art in C/C++ programming, there's no ambiguity.
> if your environment handles behavior deterministically, and you publish the version of the compiler you're using, it doesn't seem to be a problem for this type of code.
Code should be correct by construction, not correct by coincidence. Results from such code shouldn't be considered publishable. Mathematicians don't get credit for invalid proofs that happen to reach a conclusion which is correct.
Again, this isn't some theoretical quibble. There are plenty of sneaky ways undefined behaviour can manifest and cause trouble. [0][1][2]
In the domain of safety-critical software development in C, extreme measures are taken to ensure the absence of undefined behaviour. If scientists adopt a sloppier attitude toward code quality, they should expect to end up publishing invalid results. Frankly, this isn't news, and I'm surprised the standards seem to be so low.
Also, of all the languages out there, C and C++ are among the most unforgiving of minor bugs, and are a bad choice of language for writing poor-quality code. Ada and Java, for instance, won't give you undefined behaviour for writing int i; int j = i;.
I think its poor practice, but undefined behavior shouldn't instantly invalidate results. In fact, this mindset is what keeps people from publishing the code in the first place.
Let the scientists publish UB code, and even the artifacts produced, the executables. Then, if such problems are found in the code by professionals, they can investigate it fully and find if it leads to a tangible flaw that invalidates the research or not.
You would drive yourself mad pointing out places in math proofs where some steps, even seemingly important ones, were skipped. But the papers are not retracted unless such a gap actually holds a flaw that invalidates the rest of thr proof.
Let thdm publish their gross, awful, and even buggy code. Sometimes the bugs don't effect the outcomes.
Granted, it's not a guarantee that the results are wrong, but it's a serious issue with the experiment. I agree it wouldn't generally make sense to retract a publication unless it can be determined that the results are invalid. It should be possible to independently investigate this, if the source-code and input data are published, as they should be.
(It isn't universally true that reproduction of the experiment should be practical given that the source and data are published, as it may be difficult to reproduce supercomputer-powered experiments. iirc, training AlphaGo cost several million dollars of compute time, for instance.)
> this mindset is what keeps people from publishing the code in the first place
As I explained in [0], this attitude makes no sense at all. It has no place in modern science, and it's unfortunate the publication norms haven't caught up.
Scientific publication is meant to enable critical independent review of work, not to shield scientists from criticism from their peers, which is the exact opposite.
> Let the scientists publish UB code, and even the artifacts produced, the executables. Then, if such problems are found in the code by professionals, they can investigate it fully and find if it leads to a tangible flaw that invalidates the research or not.
I'm not sure what to make of 'professionals', but otherwise I agree, go ahead and publish the binaries too, as much as applicable. Could be a valuable addition. (In some cases it might not be possible/practical to publish machine-code binaries, such as when working with GPUs, or Java. These platforms tend to be JIT based, and hostile to dumping and restoring exact binaries.)
> Code should be correct by construction, not correct by coincidence.
Glad we agree, if you're aware of how your compiler handles these things, you can construct it to be correct in this way.
It won't be portable at all (even to the next patch version of the compiler), I would never let it pass a code review, but that doesn't sound like an issue that's relevant here.
> if you're aware of how your compiler handles these things, you can construct it to be correct in this way.
I presume we agree but I'll do my usual rant against UB: Deliberately introducing undefined behaviour into your code is playing with fire, and trying to outsmart the compiler is generally a bad idea. Unless the compiler documentation officially commits to a certain behaviour (rollover arithmetic for signed types, say), then you should take steps to avoid undefined behaviour. Otherwise, you're just going with guesswork, and if the compiler generates insane code, the standards documents define it to be your fault.
It might be reasonable to make carefully disciplined and justified exceptions, but that should be done very cautiously. JIT relies on undefined behaviour, for instance, as ultimately you're treating an array as a function pointer.
> It won't be portable at all (even to the next patch version of the compiler)
Right, doing this kind of thing is extremely fragile. Does it ever crop up in real-life? I've never had cause to rely on this kind of thing.
It would be possible to use a static assertion to ensure my code only compiles on the desired compiler, preventing unpleasant surprises elsewhere, but I've never seen a situation where it's helpful.
This isn't the same thing as relying on 'ordinary' compiler-specific functionality, such as GCC's fixed-point functionality. Such code will simply refuse to compile on other compilers.
> I would never let it pass a code review, but that doesn't sound like an issue that's relevant here.
Disagree. It should be possible to independently reproduce the experiment. Robust code helps with this. Code shouldn't depend on an exact compiler version, there's no good reason code should.
> After all, I'm wasting 2 months of my PhD for the marriage of my own results with known results which -in principle- could have been done within one day if the code base would allow for it.
Sounds like it is quite good science to do that, because it puts the computation on a pair of independent feet.
Otherwise, it could just be that the code you are using as a bug and nobody notes until it is too late.
I see your and MaxBarraclough concerns. In my case, there exist 5-6 codes which do -at their core- the same thing as ours does and they all have been cross-checked against each other within either theoretical or numerical precision (where possible). That's the spirit that sjburt was referring to, I guess, and which triggered me because it is only true to a certain extend.
The cross-checking is anyways good scientific practise, not only because of bugs in the code (that's actually a sub-leading problem imho), but because of the degree of difficulty of the problems and the complexity of their solutions (and their reproducibility). In that sense, cross-checking should discover both, scientific "bugs" and programming-bugs. The "debugging" is partly also done at the community level - at least in our field of research.
However, it is also a matter of efficiency. I -and many others too- need to re-implement not because of bug-hunting/cross-checking but simply because we do not understand the "ugly" code of our colleagues and instead of taking the risk to break existing code we simply write new one which is extremely inefficient (others may take the risk and then waste months on debugging and reverse-engineering which is also inefficient).
So my point on writing "good code" is not so much about avoiding bugs but about being kind to you colleagues, saving them nerves and time (which they can then spend on actual science) and thus also saving taxpayers money...