Is Fortran really the best language or it's just too hard to rewrite the code in...

AnimalMuppet · on Jan 28, 2019

Thirty years of edge case fixes are harder to re-write than anyone expects. Even harder is trusting the result. Did you get all the edge cases in the re-write? Will my code be inaccurate because of one you missed?

geofft · on Jan 28, 2019

I should clarify - I mean that it's the best language because there's a large body of existing code and a community around it, not because Fortran is inherently better. I think you could rewrite it in Rust or Julia or something if you had a bunch of engineering effort and also enough organizing effort to make a good community around it and convince the non-NumPy/SciPy users to move to your new thing too.

steveklabnik · on Jan 28, 2019

Fun coincidence: the benchmark game’s n-body benchmark just had a Rust version that is now the fastest one submitted so far, beating out fortran https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

tobias2014 · on Jan 28, 2019

To pre-empt any wrong interpretations or conclusions: The Rust version heavily relies on optimization annotations and using direct calls to SSE functions. The Fortran version uses no special code and relies purely on the compiler optimizing code and algorithms. The Rust version relies on hand tuned pieces and is several times longer. Just a matter of work for someone to write an equivalent version in Fortran or C that will be equally fast or faster.

If you want to get a real-world impression look at the other Rust implementations that roughly correspond to the Fortran code. They are almost 2 times as slow. This gives you some real-world insight on how much performance you can achieve using Fortran instead of Rust and spending the same time writing code.

dagw · on Jan 28, 2019

That is the big point about Fortran that keeps getting overlooked. Sure heavily optimized C/Rust writting by an expert in writing fast numeric code will absolutely hold its own against the equivalent Fortran. However naively written C/Rust written by non-programmers in the clearest most obvious way possible will almost always be much slower than the equivalent Fortran code.

steveklabnik · on Jan 28, 2019

The C version was ported to be the Rust one, it seems, and uses those intrinsics too. And, eventually this code will get to be a bit higher level while having the same output; those libraries are still a bit experimental though.

Overall, good points!

igouy · on Jan 28, 2019

> The C version was ported to be the Rust one, it seems, and uses those intrinsics too.

No. Although the Rust program was initially presented to me as a "port of fastest C SIMD variant" the programmer made additional optimizations not found in the C program:

- Moving the loop from outside into "bodies_advance(..)" (SSE pipelining(?))

- Bundle intermediate variables/arrays as struct NBodySim (caching)

- Fit array-sizes within struct NBodySim to the number of bodies (caching)

----

https://www.reddit.com/r/rust/comments/akgmef/rust_nbody_ben...

steveklabnik · on Jan 28, 2019

Nice, thanks!

ChrisRackauckas · on Jan 28, 2019

There are some Julia-based matmuls which are very close in performance with OpenBLAS, showing that it can happen. Written by an undergrad too.

xiphias2 · on Jan 28, 2019

Matmul is a very common operation, you get problems when you have more exotic computations.

As an example where it got harder for me to find code is numerical CDF function approximation for bivariate / trivariate normal distributions. This is of course just my example, but I'm sure there are a lot of similar operations that are really hard to rewrite because the math is so complex.

ChrisRackauckas · on Jan 28, 2019

That stuff is actually the easy stuff. If you write standard Julia code for those kinds of algorithms you'll hit C or Fortran speeds your first time if you know what you're doing. Lower levels kernels like BLAS have a lot of cache optimizations with how they do things like blocking, so it took awhile for things like StaticArrays to be used in a way to recreate what's going on there. There is still some work needed on mutable stack allocated buffers in order to handle to optimize more of the cache handling but it is quite close now!

xiphias2 · on Jan 28, 2019

If it's so easy can you show me an example code for it? The speed doesn't matter, it's the algorithm that's important. Please make sure that you handle all the numerical instabilities, and show me the Julia code for binomial and trinomial normal CDF computation that is not using integration, but a fast, correct and precise approximation.

ChrisRackauckas · on Jan 28, 2019

Did you not look at StatFuns.jl? Bivariate is here: https://github.com/JuliaStats/StatsFuns.jl/blob/e21bc26b1773... . For trivariate you'd just do the same kind of translation process if you have a Fortran code which has an appropriate license of course. If you have an example of a Fortran file for trivariate, we can open an issue on StatFuns.jl and get an undergrad via Google Summer of Code to translate it over, or implement from scratch from a paper's description.

The reason why this kind of code is easy to translate is because there is a direct mapping of language features (difficult to translate code is code that uses unique language structures). The only real difficulty of mapping (non-object oriented non-distributed) Fortran into a higher level language is keeping the speed.

xiphias2 · on Jan 28, 2019

It's cool, thank you very much!

I was looking at Distributions.jl when I was searching for it, but didn't find it there.