Selecting Workflow Managers
Problemβ
There are currently over 300 workflow managers available to choose from and itβs difficult to select the most appropriate one for a project. The factors to take into account range from the properties of the workflow manager itself to project-specific contexts.
Contextβ
Data workflow managers can be defined as tools for defining, managing, and executing workflows. In this very broad definition, we have a variety of tools that cover very different goals such as defining (i.e. developing) and executing (i.e. scheduling) workflows.
The large variety of workflow tools mirrors the variety of needs that people have when using and automating their data processing steps.
If...
- Your data analysis workflow includes many steps
- You have to frequently (manually) re-run a lot of steps
- You have a lot of different dependencies for different steps
- You have a lot of code for your steps
- It's easier to re-run all steps than to re-run specific steps
Then you should DEFINITELY use a workflow manager. If you don't wanna read the details, we developed a simple app to help filtering out workflows with few simple questions:
β‘οΈ β‘οΈ CLICK HERE to try Workflow Explorer β¬
οΈ β¬
οΈ
Contentβ
How Do Workflow Managers Make Your Life Easier?β
-
Understandability: following a framework gives you space to clean and optimize your code. Most workflow managers will make each analysis "step" clearer.
-
Modularity: you can run only the steps you need. In some cases you might even create components that others can re-use, or even re-use components yourself
-
Automation: one command usually allows you to have full control over a workflow run. Many systems take care of dependencies for you.
-
Sharing and collaborating: collaborators or people interested in your analyses can navigate your code through a well-defined framework.
-
Reproducibility: same input, same output, anywhere. For your analyses to stand the test of time.
What to Avoid When Using Workflow Managers?β
- Being familiar with the workflow itself
- Assessing the limitations of the workflow manager
- Discussing with someone to get a second opinion
- Planning a simple implementation first
What Are General Limits of Workflow Managers?β
Interoperability between different workflow managers is more of a luxury than a given. The Common Workflow Language has attempted to set itself as a interoperable system to be able to "translate" workflow code across frameworks, but not all developers use it.
Conclusionβ
The world of workflow managers is still very unstructured so generally make sure to understand your needs and get a second opinion.