Release Name: starpu-1.2.0rc1

Release Notes
StarPU 1.2.0rc1 (svn revision 14851)

New features:
  * MIC Xeon Phi support
  * SCC support
  * New function starpu_sched_ctx_exec_parallel_code to execute a
    parallel code on the workers of the given scheduler context
  * MPI:
        - New internal communication system : a unique tag called
	  is now used for all communications, and a system
	  of hashmaps on each node which stores pending receives has been
	  implemented. Every message is now coupled with an envelope, sent
	  before the corresponding data, which allows the receiver to
	  allocate data correctly, and to submit the matching receive of
	  the envelope.
        - New function
   	  starpu_mpi_irecv_detached_sequential_consistency which
	  allows to enable or disable the sequential consistency for
	  the given data handle (sequential consistency will be
	  enabled or disabled based on the value of the function
	  parameter and the value of the sequential consistency
	  defined for the given data)
        - New functions starpu_mpi_task_build() and
        - New flag STARPU_NODE_SELECTION_POLICY to specify a policy for
          selecting a node to execute the codelet when several nodes
	  own data in W mode.
	- New selection node policies can be un/registered with the
	  functions starpu_mpi_node_selection_register_policy() and
	- New environment variable STARPU_MPI_COMM which enables
	  basic tracing of communications.
	- New function starpu_mpi_init_comm() which allows to specify
	  a MPI communicator.

  * New STARPU_COMMUTE flag which can be passed along STARPU_W or STARPU_RW to
    let starpu commute write accesses.
  * Out-of-core support, through registration of disk areas as additional memory
    nodes. It can be enabled programmatically or through the STARPU_DISK_SWAP*
    environment variables.
  * Reclaiming is now periodically done before memory becomes full. This can
    be controlled through the STARPU_*_AVAILABLE_MEM environment variables.
  * New hierarchical schedulers which allow the user to easily build
    its own scheduler, by coding itself each "box" it wants, or by
    combining existing boxes in StarPU to build it. Hierarchical
    schedulers have very interesting scalability properties.
  * Add STARPU_CUDA_ASYNC and STARPU_OPENCL_ASYNC flags to allow asynchronous
    CUDA and OpenCL kernel execution.
    many asynchronous tasks are submitted in advance on CUDA and
    OpenCL devices. Setting the value to 0 forces a synchronous
    execution of all tasks.
  * Add CUDA concurrent kernel execution support through
    the STARPU_NWORKER_PER_CUDA environment variable.
  * Add CUDA and OpenCL kernel submission pipelining, to overlap costs and allow
    concurrent kernel execution on Fermi cards.
  * New locality work stealing scheduler (lws).
  * Add STARPU_VARIABLE_NBUFFERS to be set in cl.nbuffers, and nbuffers and
    modes field to the task structure, which permit to define codelets taking a
    variable number of data.
  * Add support for implementing OpenMP runtimes on top of StarPU
  * New performance model format to better represent parallel tasks.
    Used to provide estimations for the execution times of the
    parallel tasks on scheduling contexts or combined workers.
  * starpu_data_idle_prefetch_on_node and
    starpu_idle_prefetch_task_input_on_node allow to queue prefetches to be done
    only when the bus is idle.
  * Make starpu_data_prefetch_on_node not forcibly flush data out, introduce
    starpu_data_fetch_on_node for that.

Small features:
  * Tasks can now have a name (via the field const char *name of
    struct starpu_task)
  * New functions starpu_data_acquire_cb_sequential_consistency() and
    starpu_data_acquire_on_node_cb_sequential_consistency() which allows
    to enable or disable sequential consistency
  * New configure option --enable-fxt-lock which enables additional
    trace events focused on locks behaviour during the execution
  * Functions starpu_insert_task and starpu_mpi_insert_task are
    renamed in starpu_task_insert and starpu_mpi_task_insert. Old
    names are kept to avoid breaking old codes.
  * New configure option --enable-calibration-heuristic which allows
    the user to set the maximum authorized deviation of the
    history-based calibrator.
  * Allow application to provide the task footprint itself.
  * New function starpu_sched_ctx_display_workers() to display worker
    information belonging to a given scheduler context
  * The option --enable-verbose can be called with
    --enable-verbose=extra to increase the verbosity
  * Add codelet size, footprint and tag id in the paje trace.
  * Add STARPU_TAG_ONLY, to specify a tag for traces without making StarPU
    manage the tag.
  * On Linux x86, spinlocks now block after a hundred tries. This avoids
    typical 10ms pauses when the application thread tries to submit tasks.
  * New function char *starpu_worker_get_type_as_string(enum starpu_worker_archtype type)
  * Improve static scheduling by adding support for specifying the task
    execution order.
  * Add starpu_worker_can_execute_task_impl and
    starpu_worker_can_execute_task_first_impl to optimize getting the
    working implementations
  * Add STARPU_MALLOC_NORECLAIM flag to allocate without running a reclaim if
    the node is out of memory.
  * New flag STARPU_DATA_MODE_ARRAY for the function family
    starpu_task_insert to allow to define a array of data handles
    along with their access modes.
  * New configure option --enable-new-check to enable new testcases
    which are known to fail
  * Add starpu_memory_allocate and _deallocate to let the application declare
    its own allocation to the reclaiming engine.
    disable CUDA costs simulation in simgrid mode.

  * Data interfaces (variable, vector, matrix and block) now define
    pack und unpack functions
  * StarPU-MPI: Fix for being able to receive data which have not yet
    been registered by the application (i.e it did not call
    starpu_data_set_tag(), data are received as a raw memory)
  * StarPU-MPI: Fix for being able to receive data with the same tag
    from several nodes (see mpi/tests/gather.c)
  * Remove the long-deprecated cost_model fields and task->buffers field.
  * Fix complexity of implicit task/data dependency, from quadratic to linear.

Small changes:
  * Rename function starpu_trace_user_event() as