What is it?
Manband is an open source Ruby workflow engine. It is made of workflow engines and job managers. It is fully scalable (both engines and job managers) and support errors (loss of a server for example).
Workflows are a YAML format text description of the commands to run. It executes command-line scripts on input file patterns.
It also provides an optional web interface to manage the workflows.
How it works
Manband works with RabbitMQ as main AMQP messaging.
Input programs creates a workflow definition and sends a start message to master queue. One of the master (workflowhandler) processes the message and looks at nodes to be managed. Master sends a run message run to the node queue. A node (jobhandler) processes the message, run the workflow node command and sends back a finish message. Master processes finish message and treat next nodes.
- Workflow suspend/resume
- Jobs status monitoring
- Stop workflow on 1 job error, resume on errors
- Parallel managementt of data according to regexp string (one job per matching file)
- Job breakpoints
- Message tracking
- S3 storage
- Multi instances
- Execute with user rights
- IF case management
- Interactive jobs (support job acknowledge by an other party)
Some configuration can be set via a config file with the --conf option.
Program will also search for a file ".manband" in user home directory.
If none is used, program will use defaults.
An example conf file is available in test directory (conf.yaml).
It is possible to configure the job handler to execute the command with the user rights. To do so, just add "sudo: true". This option must be used in a secure environment only and users must match system users.
A workflow or a job can be suspended. If a workflow is suspended, then all current jobs end their treatment and pause. A workflow resume will trigger a continue on all suspended jobs or to restart all jobs in error. A workflow can be suspended during its run at any time.
If a job is in error, then workflow is set in error. It is possible to resume the workflow to restart the jobs in error.
Command can refer to the result directory of an other job (which should be over at the time of execution). To do this, a special syntax is used #node.nodename# with a regexp to select files in the directory. Each match will execute a job instance. Job is over when all instances are over. If nodename starts with local, one can specify a local directory instead of a nodename.
node0: description: local regexp, must start with node.localxxxx command: helloworld.sh -3 -i #node.local1# next: node2 local1: url: /home/me/myworkflowinputfiles regexp: '^test'
node2: description: blablabla command: anotherscript.sh -i #node.node0# -o world.out node0: regexp: '\.out$'
A node can define some runtime variables. Those variables can be send via command line (-var) at workflow startup or in the web interface at any time.
node2: description: blablabla command: anotherscript.sh -i #var.myvarname# -o world.out
If variable is not set at the time of the job run, job will go in error status. Variable can be set via command-line after error and job resumed.
Such variables can be mixed with a breakpoint to pause after a job, check manually some results and adapt a parameter during the workfow execution.
It is possible to specify that a job should go to a specific queue. At least one jobhandler should listen on this queue.
node0: description: local regexp, must start with node.localxxxx queue: helloqueue command: helloworld.sh -3 -i #node.local1# next: node2
A node can be declared with a breakpoint e.g., it will be preset to suspended state. Node command will be run, but stopped after the run. Node can be resumed to continue the treatment.
node0: description: set a breakpoint command: sleep 10 breakpoint: any message here
It is possible to specify in root node to execute N run of the workflow, based on an input directory (and optional regular expression). The workflow handler will run one workflow per input file in the directory. Each matching file will be symlinked in root directory of the workflow and can be used as a reference in other nodes:
root: description: root node command: none next: node0 url: "/tmp/test" # Launch 1 workflow instance per file in /tmp/test regexp: yaml # matching yaml node0: description: node0 refer to input file used in root node command: 'cat #node.root# > hello.out' next: node2 root: regexp: 'yaml$' # our regular expression will match only one file ....
Workflow with attribute instances=0 are single worflows. If intances is higher than 0, then it matches the number of worflows launched.
In the web interface, the list of workflows will only show main workflows, not sub workflows.
The IF treatment is a special node. After a IF, it is not possible to merge branches back to a same node (node wouldn't know which node input to take). After a IF, all nodes in the NOT branch will be skipped. For a IF node, command will be executed. Exit code value will be used to select the branch.
node1: description: test command: test -e /tmp/if.test next: node2,node3 type: if
If command exit code is 0, node2 will be selected. If code is 1, node3 will be selected.
A node can be defined as interactive. In this case, the job handler will not send a FINISH message to the workflow handler that will keep waiting for an answer. This feature can be used to trigger a script that will ask a user a manual interaction. In this case, the script in charge of the interaction will send the FINISH message when the user has done what we expect from him.
To define a job as interactive:
node1: description: test command: send_a_mail_and_wait_for_user_action.sh next: node2 type: interactive
The script will have in its environment the info needed to acknowledge the job:
* WID: workflow ID * ID: job ID * INSTANCE: instance of the job when used with multiple commands
Next node will be executed when the message is sent to the workflow handler. An helper script ackband.rb is available to send this message.
workflow: name: sworkflow description: sample workflows root: description: root node, executes both node0 and node1 command: none next: node0,node1 node0: description: first root node, creates an output file and call node2 command: 'echo "hello world" > hello.out' next: node2 node1: description: sleep for a while command: sleep 10 next: node3 node2: description: again a new file created command: 'echo "goodbye" > goodbye.out' node3: description: after node1, creates node3.out command: 'echo "node3" > node3.out'
To store results to S3, user S3 credentials must be added to the database. One can then define a "store all" option:
options: store: all workflow: name: samplew description: sample workflow
or per node:
node0: description: set a breakpoint command: echo "hello" > hello.out store: true
It is possible also via the web interface to request S3 storage of a node directory after its run.
Multiple files management
If a node declare a regexp in its command line and regexp matches multiple files (instances), then:
- all files are processed, then next jobs are run
Messages are differentiated on instance id in the message. If instance id is 0, then there is only 1 instance for this job. Else instance ids start at 1 and commands are read in an array of commands.
In case of error, the whole job is set in error state. At resume, all instances will be run again.
In debug mode, all messages are recorded in the database
Each handler can be assigned a specific id for better tracking
Nodehandler can specify a queue at startup. It will receive jobs only for this queue
The web interface shows current workflow per user with status and actions (suspend, resume,...). A default account is created (admin/admin) at startup.
Some screenshots of the web interface webband
get message and workflow detail in web page but also json format
support of mutiple repositories. system needs shared storage between nodes, but several mounts could be configured to be dispatched between workflows and/or users