Home My Page Projects StarPU
Summary Activity Forums Tracker Lists Tasks Docs News SCM Files

Project Filelist for StarPU

File Release Notes and Changelog

Release Name: starpu-1.2.0

Release Notes
StarPU 1.2.0 (svn revision 18521)
==============================================

New features:
  * MIC Xeon Phi support
  * SCC support
  * New function starpu_sched_ctx_exec_parallel_code to execute a
    parallel code on the workers of the given scheduler context
  * MPI:
        - New internal communication system : a unique tag called
	  is now used for all communications, and a system
	  of hashmaps on each node which stores pending receives has been
	  implemented. Every message is now coupled with an envelope, sent
	  before the corresponding data, which allows the receiver to
	  allocate data correctly, and to submit the matching receive of
	  the envelope.
        - New function
   	  starpu_mpi_irecv_detached_sequential_consistency which
	  allows to enable or disable the sequential consistency for
	  the given data handle (sequential consistency will be
	  enabled or disabled based on the value of the function
	  parameter and the value of the sequential consistency
	  defined for the given data)
        - New functions starpu_mpi_task_build() and
  	  starpu_mpi_task_post_build()
        - New flag STARPU_NODE_SELECTION_POLICY to specify a policy for
          selecting a node to execute the codelet when several nodes
	  own data in W mode.
	- New selection node policies can be un/registered with the
	  functions starpu_mpi_node_selection_register_policy() and
	  starpu_mpi_node_selection_unregister_policy()
	- New environment variable STARPU_MPI_COMM which enables
	  basic tracing of communications.
	- New function starpu_mpi_init_comm() which allows to specify
	  a MPI communicator.
  * New STARPU_COMMUTE flag which can be passed along STARPU_W or STARPU_RW to
    let starpu commute write accesses.
  * Out-of-core support, through registration of disk areas as additional memory
    nodes. It can be enabled programmatically or through the STARPU_DISK_SWAP*
    environment variables.
  * Reclaiming is now periodically done before memory becomes full. This can
    be controlled through the STARPU_*_AVAILABLE_MEM environment variables.
  * New hierarchical schedulers which allow the user to easily build
    its own scheduler, by coding itself each "box" it wants, or by
    combining existing boxes in StarPU to build it. Hierarchical
    schedulers have very interesting scalability properties.
  * Add STARPU_CUDA_ASYNC and STARPU_OPENCL_ASYNC flags to allow asynchronous
    CUDA and OpenCL kernel execution.
  * Add STARPU_CUDA_PIPELINE and STARPU_OPENCL_PIPELINE to specify how
    many asynchronous tasks are submitted in advance on CUDA and
    OpenCL devices. Setting the value to 0 forces a synchronous
    execution of all tasks.
  * Add CUDA concurrent kernel execution support through
    the STARPU_NWORKER_PER_CUDA environment variable.
  * Add CUDA and OpenCL kernel submission pipelining, to overlap costs and allow
    concurrent kernel execution on Fermi cards.
  * New locality work stealing scheduler (lws).
  * Add STARPU_VARIABLE_NBUFFERS to be set in cl.nbuffers, and nbuffers and
    modes field to the task structure, which permit to define codelets taking a
    variable number of data.
  * Add support for implementing OpenMP runtimes on top of StarPU
  * New performance model format to better represent parallel tasks.
    Used to provide estimations for the execution times of the
    parallel tasks on scheduling contexts or combined workers.
  * starpu_data_idle_prefetch_on_node and
    starpu_idle_prefetch_task_input_on_node allow to queue prefetches to be done
    only when the bus is idle.
  * Make starpu_data_prefetch_on_node not forcibly flush data out, introduce
    starpu_data_fetch_on_node for that.
  * Add data access arbiters, to improve parallelism of concurrent data
    accesses, notably with STARPU_COMMUTE.
  * Anticipative writeback, to flush dirty data asynchronously before the
    GPU device is full. Disabled by default. Use STARPU_MINIMUM_CLEAN_BUFFERS
    and STARPU_TARGET_CLEAN_BUFFERS to enable it.
  * Add starpu_data_wont_use to advise that a piece of data will not be used
    in the close future.
  * Enable anticipative writeback by default.
  * New scheduler 'dmdasd' that considers priority when deciding on
    which worker to schedule
  * Add the capability to define specific MPI datatypes for
    StarPU user-defined interfaces.
  * Add tasks.rec trace output to make scheduling analysis easier.
  * Add Fortran 90 module and example using it
  * New StarPU-MPI gdb debug functions
  * Generate animated html trace of modular schedulers.
  * Add asynchronous partition planning. It only supports coherency through
    the main RAM for now.
  * Add asynchronous partition planning. It only supports coherency through
    the home node of data for now.
  * Add STARPU_MALLOC_SIMULATION_FOLDED flag to save memory when simulating.
  * Include application threads in the trace.
  * Add starpu_task_get_task_scheduled_succs to get successors of a task.
  * Add graph inspection facility for schedulers.
  * New STARPU_LOCALITY flag to mark data which should be taken into account
    by schedulers for improving locality.
  * Experimental support for data locality in ws and lws.
  * Add a preliminary framework for native Fortran support for StarPU

Small features:
  * Tasks can now have a name (via the field const char *name of
    struct starpu_task)
  * New functions starpu_data_acquire_cb_sequential_consistency() and
    starpu_data_acquire_on_node_cb_sequential_consistency() which allows
    to enable or disable sequential consistency
  * New configure option --enable-fxt-lock which enables additional
    trace events focused on locks behaviour during the execution
  * Functions starpu_insert_task and starpu_mpi_insert_task are
    renamed in starpu_task_insert and starpu_mpi_task_insert. Old
    names are kept to avoid breaking old codes.
  * New configure option --enable-calibration-heuristic which allows
    the user to set the maximum authorized deviation of the
    history-based calibrator.
  * Allow application to provide the task footprint itself.
  * New function starpu_sched_ctx_display_workers() to display worker
    information belonging to a given scheduler context
  * The option --enable-verbose can be called with
    --enable-verbose=extra to increase the verbosity
  * Add codelet size, footprint and tag id in the paje trace.
  * Add STARPU_TAG_ONLY, to specify a tag for traces without making StarPU
    manage the tag.
  * On Linux x86, spinlocks now block after a hundred tries. This avoids
    typical 10ms pauses when the application thread tries to submit tasks.
  * New function char *starpu_worker_get_type_as_string(enum starpu_worker_archtype type)
  * Improve static scheduling by adding support for specifying the task
    execution order.
  * Add starpu_worker_can_execute_task_impl and
    starpu_worker_can_execute_task_first_impl to optimize getting the
    working implementations
  * Add STARPU_MALLOC_NORECLAIM flag to allocate without running a reclaim if
    the node is out of memory.
  * New flag STARPU_DATA_MODE_ARRAY for the function family
    starpu_task_insert to allow to define a array of data handles
    along with their access modes.
  * New configure option --enable-new-check to enable new testcases
    which are known to fail
  * Add starpu_memory_allocate and _deallocate to let the application declare
    its own allocation to the reclaiming engine.
  * Add STARPU_SIMGRID_CUDA_MALLOC_COST and STARPU_SIMGRID_CUDA_QUEUE_COST to
    disable CUDA costs simulation in simgrid mode.
  * Add starpu_task_get_task_succs to get the list of children of a given
    task.
  * Add starpu_malloc_on_node_flags, starpu_free_on_node_flags, and
    starpu_malloc_on_node_set_default_flags to control the allocation flags
    used for allocations done by starpu.
  * Ranges can be provided in STARPU_WORKERS_CPUID
  * Add starpu_fxt_autostart_profiling to be able to avoid autostart.
  * Add arch_cost_function perfmodel function field.
  * Add STARPU_TASK_BREAK_ON_SCHED, STARPU_TASK_BREAK_ON_PUSH, and
  STARPU_TASK_BREAK_ON_POP environment variables to debug schedulers.
  * Add starpu_sched_display tool.
  * Add starpu_memory_pin and starpu_memory_unpin to pin memory allocated
    another way than starpu_malloc.
  * Add STARPU_NOWHERE to create synchronization tasks with data.
  * Document how to switch between differents views of the same data.
  * Add STARPU_NAME to specify a task name from a starpu_task_insert call.
  * Add configure option to disable fortran --disable-fortran
  * Add configure option to give path for smpirun executable --with-smpirun
  * Add configure option to disable the build of tests --disable-build-tests
  * Add starpu-all-tasks debugging support
  * New function
    void starpu_opencl_load_program_source_malloc(const char *source_file_name, char **located_file_name, char **located_dir_name, char **opencl_program_source)
    which allocates the pointers located_file_name, located_dir_name
    and opencl_program_source.
  * Add submit_hook and do_schedule scheduler methods.
  * Add starpu_sleep.
  * Add starpu_task_list_ismember.
  * Add _starpu_fifo_pop_this_task.
  * Add STARPU_MAX_MEMORY_USE environment variable.
  * Add starpu_worker_get_id_check().
  * New function starpu_mpi_wait_for_all(MPI_Comm comm) that allows to
    wait until all StarPU tasks and communications for the given
    communicator are completed.
  * New function starpu_codelet_unpack_args_and_copyleft() which
    allows to copy in a new buffer values which have not been unpacked by
    the current call
  * Add STARPU_CODELET_SIMGRID_EXECUTE flag.
  * Add STARPU_CL_ARGS flag to starpu_task_insert() and
    starpu_mpi_task_insert() functions call

Changes:
  * Data interfaces (variable, vector, matrix and block) now define
    pack und unpack functions
  * StarPU-MPI: Fix for being able to receive data which have not yet
    been registered by the application (i.e it did not call
    starpu_data_set_tag(), data are received as a raw memory)
  * StarPU-MPI: Fix for being able to receive data with the same tag
    from several nodes (see mpi/tests/gather.c)
  * Remove the long-deprecated cost_model fields and task->buffers field.
  * Fix complexity of implicit task/data dependency, from quadratic to linear.

Small changes:
  * Rename function starpu_trace_user_event() as
    starpu_fxt_trace_user_event()
  * "power" is renamed into "energy" wherever it applies, notably energy
    consumption performance models
  * Update starpu_task_build() to set starpu_task::cl_arg_free to 1 if
    some arguments of type ::STARPU_VALUE are given.
  * Simplify performance model loading API
  * Better semantic for environment variables STARPU_NMIC and
    STARPU_NMICDEVS, the number of devices and the number of cores.
    STARPU_NMIC will be the number of devices, and STARPU_NMICCORES
    will be the number of cores per device.