You can use strace to get this information. Linux also offers seccomp and landlo...

fefe23 · on April 27, 2022

I would like to emphatically point out that: no, you can't. You can't use strace to get this information.

Let's say you want to use seccomp to whitelist allowed syscalls. Your code opens a file, uses stat to get the file size, then mallocs that many bytes and reads the file contents into the buffer.

Trivial program, right?

glibc will turn open into openat, stat into statx, malloc can become either nothing, or sbrk, or mmap, and maybe also munmap. If you hit an error, it will also call write to output an error message to stderr.

dietlibc will do open, stat, mmap.

This is also libc version dependent!

But can you at least use strace to know which files will be opened? No, not even that! Because the code may open some files only under certain circumstances. For example, localtime will open /etc/localtime -- but then it will cache the result for a while. /etc/localtime may be a symlink. If you construct a container you would also need the thing it points to.

What if you use a malloc that has explicit hugepage support?

Also note that seccomp and landlock are very much different from pledge and unveil. I am particularly appalled by landlock as compared to unveil. Go look up the landlock API if you don't believe me, and compare it to the unveil man page. If you thought the hoops seccomp makes you jump through are ridiculous, you haven't seen anything yet.

OpenBSD is not all gold either. pledge is not transitive. If your process pledges something but is allowed to execve something else, the process you exec is not bound by your pledge.

There is MUCH room for improvement all around.

However, the persisting idea that you can use a "training mode" or observe via strace to construct a whitelist IS WRONG and dangerously so. Unless you have 100% test coverage (and if you had, why do you still need a sandbox then?) you will miss error handling and optional code paths that you didn't enable in the configuration, and handling code for circumstances you didn't trigger and didn't foresee.

goodpoint · on April 27, 2022

> Unless you have 100% test coverage (and if you had, why do you still need a sandbox then?)

100% test coverage does not guarantee absence of vulnerabilities in any way.

> you will miss error handling and optional code paths that you didn't enable in the configuration, and handling code for circumstances you didn't trigger and didn't foresee.

Having sandboxed many things, this is plain false.

Normal applications and daemons have no reason to call reboot() or stime() and so on in any obscure code path.

If they do, the sandbox should stop them and that's a feature and not a bug.

staticassertion · on April 27, 2022

Their point about test coverage is that you don't know what syscalls will be made, or with what arguments, without running the program with enough coverage to find out.

This is probably the most well known issue with sandboxing - maintaining the sandbox. It's especially hard with seccomp, because you could upgrade your distro, or a dependency, and suddenly you're making a different system call.

throwaway82652 · on April 27, 2022

>Also note that seccomp and landlock are very much different from pledge and unveil.

No, this is incorrect. They fundamentally do the same things. You're trying to say they're different because the API is different but that's completely missing the point. If you really prefer the API of them then you can go and use one of the emulations of pledge and unveil that have been built on top of seccomp and landock. They work because there isn't really anything special going on there.

mmis1000 · on April 27, 2022

Docker actually comes with a seccomp and an apparmor config that bans many things by default. while it is absolutely not noticeable unless you are trying something like docker in docker. There are so many syscall that shouldn't even be relevant to normal programs.

fefe23 · on April 27, 2022

That's why I explicitly mention "whitelisting".

If you just want to ban some obscure syscalls and call it a day, you can do that. It will probably even be helpful to some degree.

I personally think our aspirations should be higher than "let's ban ptrace(2)".

mmis1000 · on April 27, 2022

It's actually a bit more than that. The syscall docker allowed isn't really fixed. It is affected by what linux capabilities the container had granted. Like: if you whitelist the container about CAP_SYS_PTRACE, you probably also want ptrace(2) to be whitelisted. Instead of a all or nothing/your program will still break even cap added model.

salawat · on April 28, 2022

Then decompile the binary, bust out gdb, and do some forensics. Or realize you're about to embark on a multi-decade mission Learn It All(TM) which equates to maybe learning the quirks of enough niches of the programming community to maybe not be surprised by what you find... Eh.. 40% of the time.

staticassertion · on April 27, 2022

How is the libc issue specific to linux? OpenBSD has a libc too.

loeg · on April 27, 2022

OpenBSD doesn't support syscalls from alternative libcs, so you reliably get consistent syscall behavior.

But I believe GP is mostly talking about pledge(2), a pretty easy way to implement common sets of seccomp-like restrictions, and unveil(2), an easy way to limit path visibility. These are OpenBSD security features that Linux does not have direct equivalents of.