Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I've always been interested in distributed stream processing platforms (Storm, Spark, Samza, Flink, etc) - and I've been interested in a distributed processing platform that wasn't on the JVM (there used to be one called Concord). That said, I came across differential dataflow a while ago (as I also began writing more and more Rust).

I think the biggest issue is the documentation, not so much on writing code, but on building an actual production service using it. I think most of us can now grok that you have a Kafka Stream on one end and a datastore on the other, and the quintessential map/reduce hello world is WordCount.java. That doesn't isn't clear from the differential dataflow documentation - I remember thinking how are they getting data from the outside world into this thing, then thinking maybe I don't understand this project at all.

Consider the example in the ReadMe - the hello world is "counting degrees in a graph". While it gives you an idea of how simple it is to express that compuation, it isn't interactive - it's unclear how one might change the input parameters (or if that's even possible). The hardest part of most of these frameworks is glue - but once you have that running then exploring what's possible is much easier. Differential Dataflow doesn't provide that for me right off the bat.

That said - I'm not surprised, when I last checked it out Rust Kafka drivers weren't all there and it seemed to be evolving parallel to everything else. I think what would make it more popular is a mental translation of common Spark tasks (like WordCount) to differential dataflow.



> it's unclear how one might change the input parameters (or if that's even possible).

Yeah, the readme is pretty dense on terminology that is unfamiliar (at least to me).

It answers that question like this:

> In the examples above, we can add to and remove from edges, dynamically altering the graph, and get immediate feedback on how the results change

but it would be great to show an example of that in code, as otherwise it is easy to assume that "reachable" is a fixed result set, when the whole point of the system is that presumably you can subscribe to changes in "reachable" as "roots" or "edges" change.


This was exactly my experience as well. I needed an non-JVM streaming platform. Differential Dataflow seemed like a possible fit, but I wasn't able to unlock the magic. Most likely a product maturity issue (docs, examples, defined use cases, etc.) than a technical one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: