Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

> You don't debug a unikernel, mostly; you just kill and replace it

Curious as to how you would drive to root cause the bugs that caused the crash in the first place? If you don't root cause, won't subsequent versions still retain the same bugs?

There are bugs that can only manifest themselves in production. Any system where we don't have the ability to debug and reproduce these classes of problems in prod is essentially a non-starter for folks looking to operate reliable software.



Personally, I wouldn't use unikernels for my custom app-tier code† (unless my app-tier either was Erlang, or had a framework providing all the same runtime infrastructure that Erlang's OTP does.)

Instead, unikernels seem to me to instead be a good way to harden high-quality, battle-tested server software. Of two main kinds:

• Long-running persistent-state servers that have their own management capabilities. For example, hardening Postgres (or Redis, or memcached) into a unikernel would be a great idea. A database server already cleans and maintains itself internally, and already offers "administration" facilities that completely moot OS-level interaction with the underlying hardware. (In fact, Postgres often requires you to avoid doing OS-level maintenance, because Postgres needs to manage changes on the box itself to be sure its own ACID guarantees are maintained. If the software has its own dedicated ops role—like a DBA—it's likely a perfect fit for unikernel-ification.)

• Entirely stateless network servers. Nginx would make a great unikernel, as would Tor, as would Bind (in non-authoritative mode), as would OpenVPN. This kind of software can be harshly killed+restarted whenever it gets wedged; errors are almost never correlated/clustered; and error rates are low enough (below the statistical noise-floor where the problem may as well have been a cosmic ray flipping bits of memory) that you can just dust your hands of the responsibility for the few bugs that do occur (unless you're operating at Facebook/Google scale.) This is the same sort of software that currently works great containerized.

---

† The best solution for your custom app-tier, if you really want to go the unikernels-for-everything route, might be a hybrid approach. If you run your app in both unikernel-image, and OS-process-on-a-Linux-VM-image setups, you'll get automatic instrumentation "for free" from the Linux-process instances, and production-level performance+security from the unikernel instances. You could effectively think of the Linux-process images as being your "debug build."

Two ways of deploying such hybrid releases come to mind:

• Create cluster-pools consisting of nodes from both builds and load-balance between them. Any problems that happen should eventually happen on an instrumented (OS-process) node. This still requires you to maintain Linux VMs in production, though.

• Create a beta program for your service. Run the Linux-process images on the "beta production" environment, and the unikernel-images on the "stable production" environment. Beta users (a self-selected group) will be interacting with your new code and hopefully making it fall down in a place where it's still instrumented; paying customers will be working with a black-box system that—hopefully—most of the faults will have been worked out of. Weird things that happen in stable can be investigated (for a stateful app) by mirroring the state to the beta environment, or (for a stateless app) by accessing the data that causes the weird behavior through the beta frontend.


> In fact, Postgres often requires you to avoid doing OS-level maintenance, because Postgres needs to manage changes on the box itself to be sure its own ACID guarantees are maintained.

Huh? What are you talking about?


With unikernels you get a lot more consistency. E.g. I once saw a bug that came down to one server using reiserfs and another using ext2. But there's no way to have that problem with a unikernels.

But sure, you need a debugger. So you use one. I'm not sure why the author seems to think that's so hard.


> But sure, you need a debugger. So you use one. I'm not sure why the author seems to think that's so hard.

The author wrote and continues to contribute to DTrace, which is an incredibly advanced facility for debugging and root causing problems. GDB (for example) doesn't help you solve performance problems or root-cause them, because now your performance problem has become ptrace (or whatever tracing facility GDB uses on that system).

The point he was making is that there are problems with porting DTrace to a unikernel (it violates the whole "let's remove everything" principle, and you couldn't practically modify what DTrace probes you're using at runtime becuase the only process is your misbehaving app -- good luck getting it to enable the probes you'd like to enable).


You can't modify them from within your app, sure. You modify them from a (privileged) outside context. Allowing the app to instrument itself that thoroughly violates the principle of least privilege the author was so fond of.


> With unikernels you get a lot more consistency

That's not unique to unikernels. You can get that with Docker containers or EC2 instances on AWS.


There's a lot more that can happen differently there. Docker doesn't hide all the details of the filesystem, kernel version, or the like. With EC2 instances you're still running a kernel that has a lot of moving parts of its own.


> . With EC2 instances you're still running a kernel that has a lot of moving parts of its own.

I'm sure that's true, but it's not a statement directly about consistency.


It's a lot harder to get consistency out of a non-unikernel system running on EC2 - e.g. IME the linux boot/hardware probing process can behave nondeterministically before it even starts running your user program.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: