Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

From this article and others, it’s still unclear to me what the state-handling and state-sharing model of Erlang is. Presumably, the granularity of the crashing/restarting sequential processes is also the granularity of in-memory state sharing. But what about external state, like databases, queues, file systems? For example, if a process has taken an item off a queue and then crashes before having fully processed it, how is that accounted for? Or you might not even know from the outside if it has been fully, partially, or not at all processed yet. This is an example where correct error handling or not crashing is crucial, in my experience. Or what about processing pipelines where a component in the middle crashes. Is there something like that in Erlang? Is there an article explaining Erlang from that perspective?


> For example, if a process has taken an item off a queue and then crashes before having fully processed it, how is that accounted for?

I have worked with people that had deployed huge amounts on the BEAM that had a real problem with the answer to that, and resort to magical thinking.

When erlang processes "crash", assuming the whole system didn't crash, they almost certainly alerted a monitoring process of the fact, so that a process can be quickly restarted. This is the core of how supervision trees in erlang are built.

There are a lot of subtleties to that. The whole system may or may not be a single BEAM instance, and if more than one then they can be distributed, i.e. processes on one machine receive failure messages from processes on others, and can restart the processes elsewhere. These mechanisms on a practical basis are sufficient to automatically pick up the majority of transient failures. (I should add there are two classic ways to blow up a BEAM instance which make this less good than it should be: a bad C function call "NIF" for native something function, or posting messages to a process faster than it can consume them, which will eventually cause an OOM).

But this differs from the underlying philosophy of the runtime, which is that things are only done when they're done, and you should expect failures at any time. This maps on to their messaging paradigm.

What you actually sound like you want is a universe more like FoundationDB and QuiCK https://www.foundationdb.org/files/QuiCK.pdf where the DB and worker queue all live in one single transactional space, which certainly makes reasoning about a lot of these things easier, but have nothing to do with erlang.


> what about [...] if a process has taken an item off a queue and then crashes before having fully processed it

> you might not even know from the outside if it has been fully, partially, or not at all processed yet

Erlang does not propose a unique solution to distributed problems, just good primitives.

So the answer would be the same; you'd keep track in the queue if the element was partially popped, but not completed, and you report back to the queue that the processing failed and that the element should be fully put back.

So in Erlang you might monitor a worker process and requeue items handled by processes that failed.


Thanks. So Erlang is really only about managing process lifetimes and simple RPC? In my experience processes often have meaningful internal state, meaningful in the sense that it matters if it gets lost due to a crash. If I understand correctly, Erlang doesn’t provide any particular model or mechanisms for dealing with that?


Like fidotron said, a process's internal state is lost if it crashes (or exits).

If you want that state to be durable, you need to store it durably. Mnesia provides (optional) distributed transactions which may be appropriate for durability needs (lots of details). Or you could externalize durability to other systems.

Erlang is wonderful, but it's not magic. It won't prevent hardware failures, so if an Erlang process fetches something from a queue and the cpu stops for whatever reason, you've got a tricky situation. Erlang does offer a way for a process to monitor other processes, including processes on remote nodes, so your process will be notified if the other process crashes or if the other node is disconnected; but if the other node is disconnected, you don't know what happened to the other process --- maybe it's still running and there's a connectivity issue, maybe the whole host OS crashed. You could perhaps set bidirectional monitors, and then know that the remote process would be notified of the disconnection as well, if it still was running... but you wouldn't know if the process finished (sucessfully or not) after the connectivity failed but before the failure was detected and processed.


> In my experience processes often have meaningful internal state, meaningful in the sense that it matters if it gets lost due to a crash.

The erlang process state will be simply what it has on the stack. (Ignoring things like ETS tables for the moment).

Erlang has the concept of ports, used to interface to the world outside, that provide a sort of hook for cleanup in the event of a crash. Ports belong to processes, in the event of a crash all associated ports are cleaned up. You can also set this sort of thing up between purely erlang processes as well.

As the other commenter observed, erlang gives you the primitives to make distributed systems work; it does not prescribe solutions, especially around distributed transactions, which imo is one of the reasons some of the hype around the BEAM is misguided.


Erlang at least used to come with an in-memory database called Mnesia, that in the places I've encountered it depended on replicating all the state to every server, which usually caused some scaling issues.

There's nothing outright stopping you from doing proper design and building separate erlang services that exchange state with regular protocols, but there does seem to be a temptation to just put all erlang in one big monolith and then run into very hard memory and scaling issues when usage and data grows.

One high profile erlang user in the payment industry was mainly constrained by how big a server they could buy, as all their code ran on a single server with a hot standby. They have since moved to java, and rethought how they managed shared state

Facebook managed to get ejabberd, the xmpp server written in erlang, to back their first Messenger, but it involved sharding to give each ejabberd-instance a small enough data set to cope, and a clever way to replicate presence data outside of erlang (storing it in compact memory blocks on each ejabberd server, and shipping them wholesale to a presence service at a regular cadence).

Pretty soon they tore ejabberd out, metaphorically burned it in a field and salted the earth... but how much of that was the fault of erlang itself, and how much it was the issue of having one corner with erlang in a largely C++ world isn't known to me.


Mnesia isn't in-memory only. It also journals to disk. You can also use disk only tables that don't hold the whole table in memory (but from what I've read, perf sucks... otoh, a lot of what people say about Mnesia conflicts with my experience, so maybe disc_copies is worth trying).

OTP ships with mnesia_frag which allows fragmenting a logical table into many smaller tables. You don't need to have all of the tables on all of the nodes that share an mnesia schema. That's at least one way to scale mnesia beyond what fits in memory on a single node. Single nodes are pretty big though; we were running 512GB mnesia nodes 10 years ago on commodity hardware, and GCP says 32TB is available. You can do a lot within a limit of 32TB per node.

There's other ways to shard too, at WhatsApp pre-FB, our pattern was to run mnesia schemas with 4 nodes where one half of the nodes were in service, the other was in our standby colo, all nodes had all the tables in this schema, and requests would be sharded so each schema group would only serve 1/N users and each of the two active nodes in a schema group would get half of the requests (except during failure/maintenance). We found 4 node schemas were easiest to operate, and ensuring that in normal operations, a single node (and in most cases, a single worker process) would touch specific data made us comfortable running our data operations in the async_dirty context that avoids locking.

We did have scaling challenges (many of which you can watch old Erlang Factory presentations about), but it was all surmountable, and many of the things would be easier today given improvements to BEAM and improvements in available servers.


> For example, if a process has taken an item off a queue and then crashes before having fully processed it, how is that accounted for?

I'm not sure I understand the question - all queue systems I've used separate delivery and acknowledgement, so if a process crashes during processing the messages will be redelivered once it restarts.

Do you have a concrete example of a flow you're curious about?

Maybe these could help:

- https://ferd.ca/the-zen-of-erlang.html

- https://jlouisramblings.blogspot.com/2010/11/on-erlang-state...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: