Hacker Timesnew | past | comments | ask | show | jobs | submitlogin
Dolphin Scheduler (github.com/apache)
83 points by based2 on Dec 26, 2019 | hide | past | favorite | 25 comments


I try DolphinScheduler online demo: http://106.75.43.194:8888/

easy to use.


mail to dailidong66@gmail.com, and tell him you want to try online demo.


DolphinScheduler ranks among top 10 most valuable projects in OSChina GVP(Gitee Most Valuable Project)


How does this compare to something like Airflow or Luigi?



This may be of interest:

https://github.com/mikub/titanoboa


Thanks for linking! There seem to be similarities, but looking (very) briefly at the DolphinScheduler these are the potential differences:

- number of contributors :D

- titanoboa can process even a potentially cyclic graph

- in titanoboa you can write step functions directly in high level programming languages such as clojure and java (so not just bash or python) and deploy them directly during runtime

- the clustering setup in titanoboa is master-less

- titanoboa does not have such direct integration with Spark as it employs some map-reduce patterns internally

But all-in-all I have to say that DolphinScheduler seems quite nice! Also would have to compcomplement it on the nice documentation (again, just briefly skimming through it).

(edit: formatting)


That looks interesting but remind me more about Apache Camel.


Also interesting that its one of the first open source applications from China I've seen.


I've actually stopped paying attention to https://github.com/vitalets/github-trending-repos because there are so many Chinese repositories each week.

It's just rare to get them on HN, because it's a nightmare to go through their docs and they're usually not even attempting to write their code in English. Basically unusable for all intents and purposes, even if it were quality software.


> Basically unusable for all intents and purposes, even if it were quality software.

Only if you don’t have anyone who can’t read Chinese on your team. Also most repositories are not documented at well or at all anyway so language hardly matters.


yes, most repositories arent documented well either, thats definitely true and was part of my point, really.

how are you going to figure out why you're encountering a bug if not even the code itself is written in english?

its fine for learning repositories or simple toy projects, but if you actually want your code to be used... please use the world language. (and no, english isnt my native language either)


I wonder if we’re hitting a point where a better decompiler would be useful. Transliterate the code into your first language, English or not.


A decompiler can't pull contextually appropriate variable and function/method names out of nowhere (not to mention comments), which is the big roadblock when reading foreign-language code.

That is, you're just as likely to be able to follow foreign-language code as you are decompiled code. Either way, you've basically thrown out all the documentation and swapped out all the names for gibberish.


It seems like having only some team members be able to really understand the code would still be a risk.


This looks great, I've always wanted something like this. I've always had autosys or controlm at work and they both suck.

I'd just prefer if it had been around longer. Any other open source alternatives out there? I only know of Airflow, k8s Cronjobs.



> Any other open source alternatives out there?

Here's a big handful of them: https://github.com/pditommaso/awesome-pipeline



I was just reading up on Broadway, written in Elixir, (https://hexdocs.pm/broadway/Broadway.html) that provides the fundamentals of batching/job control. It’s by the creator of Elixir and is based on 7 years of libraries in the area so the fundamentals are pretty well honed.


I've had alot of success using Apache NiFi as a distributed scheduler / general purpose workflow tool.


I'd love to see what kind of complexity you are managing there, and how.


The only other one I can think of off the top of my head is dagster: https://github.com/dagster-io/dagster

It’s made by Nick Schrock of graphql fame, among others. I’m sure there are 100s of these projects though.


From the beginning of Dagster's readme:

> Dagster is a system for building modern data applications.

> Combining an elegant programming model and beautiful tools, Dagster allows infrastructure engineers, data engineers, and data scientists to seamlessly collaborate to process and produce the trusted, reliable data needed in today's world.

Two paragraphs, communicating zero bits of information. I wish Github repositories, of all places, didn't contain such noninformative copy.


This looks almost exactly like Airflow - I wonder what Apache’s plan is for both of these to coexist.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: