I'm curious what the use case of memcpy is in highly-optimised software. Are there any scenarios where copying bytes is better than using a char*+length tuple?
There can be hardware-driven requirements that force you to simply have data in a particular place in memory, and if you want that data to hang around you might need to manually move it somewhere else.
There's also the case where your API accepts a pointer and a size, but you don't want to have lingering pointers into the caller's memory, so you have to copy the data over to the "inside" of the API. This kind of design is perhaps less common in demo software, but certainly plausible in embedded products which at least try to be somewhat optimized.
That is exactly it. For example, on the Atari ST you display graphics by copying the bitmaps to the screen address.
Much of the C code is used during precomputation of data before the actual time-critical code is run. This involves copying lots of data in order to set it up so that as little computation as possible is performed in the actual time-critical parts.