Home My Page Projects manband
Summary Activity Tracker Lists Tasks News SCM Files Mediawiki

INRIAGForge

Main Page

From manband Wiki

Jump to: navigation, search

Contents

What is it?

Manband is an open source Ruby workflow engine. It is made of workflow engines and job managers. It is fully scalable (both engines and job managers) and support errors (loss of a server for example).

Workflows are a YAML format text description of the commands to run. It executes command-line scripts on input file patterns.

It also provides an optional web interface to manage the workflows.

How it works

Manband works with RabbitMQ as main AMQP messaging.

Input programs creates a workflow definition and sends a start message to master queue. One of the master (workflowhandler) processes the message and looks at nodes to be managed. Master sends a run message run to the node queue. A node (jobhandler) processes the message, run the workflow node command and sends back a finish message. Master processes finish message and treat next nodes.

Features

  • Workflow suspend/resume
  • Jobs status monitoring
  • Stop workflow on 1 job error, resume on errors
  • Parallel managementt of data according to regexp string (one job per matching file)
  • Job breakpoints
  • Message tracking
  • S3 storage
  • Multi instances
  • Execute with user rights
  • IF case management
  • Interactive jobs (support job acknowledge by an other party)

Configuration

Some configuration can be set via a config file with the --conf option.

Program will also search for a file ".manband" in user home directory.

If none is used, program will use defaults.

An example conf file is available in test directory (conf.yaml).

Sudo

It is possible to configure the job handler to execute the command with the user rights. To do so, just add "sudo: true". This option must be used in a secure environment only and users must match system users.

Suspend/resume

A workflow or a job can be suspended. If a workflow is suspended, then all current jobs end their treatment and pause. A workflow resume will trigger a continue on all suspended jobs or to restart all jobs in error. A workflow can be suspended during its run at any time.

Errors

If a job is in error, then workflow is set in error. It is possible to resume the workflow to restart the jobs in error.

Node definition

Node reference

Command can refer to the result directory of an other job (which should be over at the time of execution). To do this, a special syntax is used #node.nodename# with a regexp to select files in the directory. Each match will execute a job instance. Job is over when all instances are over. If nodename starts with local, one can specify a local directory instead of a nodename.

   node0:
     description: local regexp, must start with node.localxxxx
     command: helloworld.sh -3 -i #node.local1#
     next: node2
     local1:
       url: /home/me/myworkflowinputfiles
       regexp: '^test'
   node2:
     description: blablabla
     command: anotherscript.sh -i #node.node0# -o world.out
     node0:
       regexp: '\.out$'

Runtime variables

A node can define some runtime variables. Those variables can be send via command line (-var) at workflow startup or in the web interface at any time.

   node2:
     description: blablabla
     command: anotherscript.sh -i #var.myvarname# -o world.out

If variable is not set at the time of the job run, job will go in error status. Variable can be set via command-line after error and job resumed.

Such variables can be mixed with a breakpoint to pause after a job, check manually some results and adapt a parameter during the workfow execution.


Queue selection

It is possible to specify that a job should go to a specific queue. At least one jobhandler should listen on this queue.

   node0:
     description: local regexp, must start with node.localxxxx
     queue: helloqueue
     command: helloworld.sh -3 -i #node.local1#
     next: node2

Breakpoints

A node can be declared with a breakpoint e.g., it will be preset to suspended state. Node command will be run, but stopped after the run. Node can be resumed to continue the treatment.

   node0:
     description: set a breakpoint
     command: sleep 10
     breakpoint: any message here

Multi instances

It is possible to specify in root node to execute N run of the workflow, based on an input directory (and optional regular expression). The workflow handler will run one workflow per input file in the directory. Each matching file will be symlinked in root directory of the workflow and can be used as a reference in other nodes:

   root:
      description: root node
      command: none
      next: node0
      url: "/tmp/test"    # Launch 1 workflow instance per file in /tmp/test
      regexp: yaml # matching yaml
   node0:
     description: node0 refer to input file used in root node
     command: 'cat  #node.root# > hello.out'
     next: node2
     root:
       regexp: 'yaml$' # our regular expression will match only one file
   ....

Workflow with attribute instances=0 are single worflows. If intances is higher than 0, then it matches the number of worflows launched.

In the web interface, the list of workflows will only show main workflows, not sub workflows.


IF mngt

The IF treatment is a special node. After a IF, it is not possible to merge branches back to a same node (node wouldn't know which node input to take). After a IF, all nodes in the NOT branch will be skipped. For a IF node, command will be executed. Exit code value will be used to select the branch.

   node1:
     description: test
     command: test -e  /tmp/if.test
     next: node2,node3
     type: if

If command exit code is 0, node2 will be selected. If code is 1, node3 will be selected.

Interactive jobs

A node can be defined as interactive. In this case, the job handler will not send a FINISH message to the workflow handler that will keep waiting for an answer. This feature can be used to trigger a script that will ask a user a manual interaction. In this case, the script in charge of the interaction will send the FINISH message when the user has done what we expect from him.

To define a job as interactive:

   node1:
     description: test
     command: send_a_mail_and_wait_for_user_action.sh
     next: node2
     type: interactive

The script will have in its environment the info needed to acknowledge the job:

* WID: workflow ID
* ID: job ID
* INSTANCE: instance of the job when used with multiple commands

Next node will be executed when the message is sent to the workflow handler. An helper script ackband.rb is available to send this message.

Sample workflow

   workflow:
     name: sworkflow
     description: sample workflows
     root:
        description: root node, executes both node0 and node1
        command: none
        next: node0,node1
     node0:
       description: first root node, creates an output file and call node2
       command: 'echo "hello world" > hello.out'
       next: node2
     node1:
       description: sleep for a while
       command: sleep 10
       next: node3
     node2:
       description: again a new file created
       command: 'echo "goodbye" > goodbye.out'
     node3:
       description: after node1, creates node3.out
       command: 'echo "node3" > node3.out'

S3 storage

To store results to S3, user S3 credentials must be added to the database. One can then define a "store all" option:

   options:
     store: all
   workflow:
     name: samplew
     description: sample workflow

or per node:

   node0:
     description: set a breakpoint
     command: echo "hello" > hello.out
     store: true

It is possible also via the web interface to request S3 storage of a node directory after its run.

Multiple files management

If a node declare a regexp in its command line and regexp matches multiple files (instances), then:

  • all files are processed, then next jobs are run

Messages are differentiated on instance id in the message. If instance id is 0, then there is only 1 instance for this job. Else instance ids start at 1 and commands are read in an array of commands.

In case of error, the whole job is set in error state. At resume, all instances will be run again.

Workflow handler

In debug mode, all messages are recorded in the database

Job handler

Each handler can be assigned a specific id for better tracking

Nodehandler can specify a queue at startup. It will receive jobs only for this queue

Web interface

The web interface shows current workflow per user with status and actions (suspend, resume,...). A default account is created (admin/admin) at startup.


Screenshots

Some screenshots of the web interface webband

Roadmap

Web

get message and workflow detail in web page but also json format

shared storage

support of mutiple repositories. system needs shared storage between nodes, but several mounts could be configured to be dispatched between workflows and/or users