I'd be interested to see benchmarks to consider point one. At face value, I fully agree with your point and doubt it has any benefit. However, it is less that has to be swapped in for your program to run. Would be curious if this has an odd cache friendliness for an application.
The tooling answer to this, it seems, would be to support statically linked libraries. But again, I would want to see numbers before personally worrying about this.
So a few things to consider in regards to point 1:
* Your executable is going to be loaded as a whole page regardless of how large it is, on most platforms this means you'll need at least 4k of User VM.
* You'll need a page table, which has its own overhead. If someone was going to push back on the libc assertion I'd expect it here, a the PTEs for libc and and the VDSOs cannot be shared between processes (as far as I'm aware).
* I would expect it to in theory RUN faster assuming it was a small toy program like the example, this is because there is less work to be done even with shared pages.
Right, my question is along the lines of avoiding the pages of libc. An easy question here would be how many pages libc takes up. I'm assuming not many, but more than one.
I think there is a strong argument that this page is often already paged into memory from everyone using it. However, if the function you used from it would have fit in the pages you were already using for your application, I could imagine some benefit.
I continue to stress, though, that this is just imagined. Numbers would be first thing I would have to collect before acting on this. (And I hope it doesn't sound like I am tasking you or anyone else with this. That is not my intent.)
At least two is the best answer I can give without specifying a specific libc and architecture. Something to think about is that the binary size of libc is only half the story. Even if the executable portion of libc fits in a single page. A libc implementation has a lot of per thread and per process statics it holds onto.
A good libc developer could actually put these into separate pages based on how often they change. In other words, if a static only is ever set once then coalesce it into a page with other statics that are only set once and are not process dependent.
Why that's important: because of fork, when fork creates a new process it sets the parent processes pages read only and then preforms copy on write when they are modified. In theory you can share both the binary and some of the statics between all the processes.
The tooling answer to this, it seems, would be to support statically linked libraries. But again, I would want to see numbers before personally worrying about this.