Home My Page Projects StarPU
Summary Activity Forums Tracker Lists Tasks Docs News SCM Files Mediawiki

Project Filelist for StarPU

File Release Notes and Changelog

Release Name: starpu-1.1.0

Release Notes
StarPU 1.1.0 (svn revision 11960)
==============================================
The scheduling context release

New features:
  * OpenGL interoperability support.
  * Capability to store compiled OpenCL kernels on the file system
  * Capability to load compiled OpenCL kernels
  * Performance models measurements can now be provided explicitly by
    applications.
  * Capability to emit communication statistics when running MPI code
  * Add starpu_unregister_submit, starpu_data_acquire_on_node and
    starpu_data_invalidate_submit
  * New functionnality to wrapper starpu_insert_task to pass a array of
	data_handles via the parameter STARPU_DATA_ARRAY
  * Enable GPU-GPU direct transfers.
  * GCC plug-in
	- Add `registered' attribute
	- A new pass was added that warns about the use of possibly
	  unregistered memory buffers.
  * SOCL
        - Manual mapping of commands on specific devices is now
	  possible
        - SOCL does not require StarPU CPU tasks anymore. CPU workers
          are automatically disabled to enhance performance of OpenCL
          CPU devices
  * New interface: COO matrix.
  * Data interfaces: The pack operation of user-defined data interface
    defines a new parameter count which should be set to the size of
    the buffer created by the packing of the data.
  * MPI:
        - Communication statistics for MPI can only be enabled at
	  execution time by defining the environment variable
	  STARPU_COMM_STATS
        - Communication cache mechanism is enabled by default, and can
	  only be disabled at execution time by setting the
	  environment variable STARPU_MPI_CACHE to 0.
        - Initialisation functions starpu_mpi_initialize_extended()
  	  and starpu_mpi_initialize() have been made deprecated. One
	  should now use starpu_mpi_init(int *, char ***, int). The
	  last parameter indicates if MPI should be initialised.
        - Collective detached operations have new parameters, a
	  callback function and a argument. This is to be consistent
	  with the detached point-to-point communications.
        - When exchanging user-defined data interfaces, the size of
	  the data is the size returned by the pack operation, i.e
	  data with dynamic size can now be exchanged with StarPU-MPI.
  * Add experimental simgrid support, to simulate execution with various
    number of CPUs, GPUs, amount of memory, etc.
  * Add support for OpenCL simulators (which provide simulated execution time)
  * Add support for Temanejo, a task graph debugger
  * Theoretical bound lp output now includes data transfer time.
  * Update OpenCL driver to only enable CPU devices (the environment
        variable STARPU_OPENCL_ONLY_ON_CPUS must be set to a positive
	value when executing an application)
  * Add Scheduling contexts to separate computation resources
    	- Scheduling policies take into account the set of resources corresponding
	to the context it belongs to
	- Add support to dynamically change scheduling contexts
	(Create and Delete a context, Add Workers to a context, Remove workers from a context)
	- Add support to indicate to which contexts the tasks are submitted
  * Add the Hypervisor to manage the Scheduling Contexts automatically
    	- The Contexts can be registered to the Hypervisor
	- Only the registered contexts are managed by the Hypervisor
	- The Hypervisor can detect the initial distribution of resources of 
	a context and constructs it consequently (the cost of execution is required)
    	- Several policies can adapt dynamically the distribution of resources
	in contexts if the initial one was not appropriate
	- Add a platform to implement new policies of redistribution
	of resources
  * Implement a memory manager which checks the global amount of
    memory available on devices, and checks there is enough memory
    before doing an allocation on the device.
  * Discard environment variable STARPU_LIMIT_GPU_MEM and define
    instead STARPU_LIMIT_CUDA_MEM and STARPU_LIMIT_OPENCL_MEM
  * Introduce new variables STARPU_LIMIT_CUDA_devid_MEM and
    STARPU_LIMIT_OPENCL_devid_MEM to limit memory per specific device
  * Introduce new variable STARPU_LIMIT_CPU_MEM to limit memory for
    the CPU devices
  * New function starpu_malloc_flags to define a memory allocation with
    constraints based on the following values:
    - STARPU_MALLOC_PINNED specifies memory should be pinned
    - STARPU_MALLOC_COUNT specifies the memory allocation should be in
      the limits defined by the environment variables STARPU_LIMIT_xxx
      (see above). When no memory is left, starpu_malloc_flag tries
      to reclaim memory from StarPU and returns -ENOMEM on failure.
  * starpu_malloc calls starpu_malloc_flags with a value of flag set
    to STARPU_MALLOC_PINNED
  * Define new function starpu_free_flags similarly to starpu_malloc_flags
  * Define new public API starpu_pthread which is similar to the
    pthread API. It is provided with 2 implementations: a pthread one
    and a Simgrid one. Applications using StarPU and wishing to use
    the Simgrid StarPU features should use it.
  * Allow to have a dynamically allocated number of buffers per task,
    and so overwrite the value defined --enable-maxbuffers=XXX
  * Performance models files are now stored in a directory whose name
    include the version of the performance model format. The version
    number is also written in the file itself.
    When updating the format, the internal variable
    _STARPU_PERFMODEL_VERSION should be updated. It is then possible
    to switch easily between differents versions of StarPU having
    different performance model formats.
  * Tasks can now define a optional prologue callback which is executed
    on the host when the task becomes ready for execution, before getting
    scheduled.
  * Small CUDA allocations (<= 4MiB) are now batched to avoid the huge
    cudaMalloc overhead.
  * Prefetching is now done for all schedulers when it can be done whatever
    the scheduling decision.
  * Add a watchdog which permits to easily trigger a crash when StarPU gets
    stuck.
  * Document how to migrate data over MPI.
  * New function starpu_wakeup_worker() to be used by schedulers to
    wake up a single worker (instead of all workers) when submitting a
    single task.
  * The functions starpu_sched_set/get_min/max_priority set/get the
    priorities of the current scheduling context, i.e the one which
    was set by a call to starpu_sched_ctx_set_context() or the initial
    context if the function has not been called yet.
Small features:
  * Add starpu_worker_get_by_type and starpu_worker_get_by_devid
  * Add starpu_fxt_stop_profiling/starpu_fxt_start_profiling which permits to
    pause trace recording.
  * Add trace_buffer_size configuration field to permit to specify the tracing
    buffer size.
  * Add starpu_codelet_profile and starpu_codelet_histo_profile, tools which draw
    the profile of a codelet.
  * File STARPU-REVISION --- containing the SVN revision number from which
    StarPU was compiled --- is installed in the share/doc/starpu directory
  * starpu_perfmodel_plot can now directly draw GFlops curves.
  * New configure option --enable-mpi-progression-hook to enable the
    activity polling method for StarPU-MPI.
  * Permit to disable sequential consistency for a given task.
  * New macro STARPU_RELEASE_VERSION
  * New function starpu_get_version() to return as 3 integers the
    release version of StarPU.
  * Enable by default data allocation cache
  * New function starpu_perfmodel_directory() to print directory
    storing performance models. Available through the new option -d of
    the tool starpu_perfmodel_display
  * New batch files to execute StarPU applications under Microsoft
    Visual Studio (They are installed in path_to_starpu/bin/mvsc)/
  * Add cl_arg_free, callback_arg_free, prologue_callback_arg_free fields to
    enable automatic free(cl_arg); free(callback_arg);
    free(prologue_callback_arg) on task destroy.
  * New function starpu_task_build
Changes:
  * Rename all filter functions to follow the pattern
    starpu_DATATYPE_filter_FILTERTYPE. The script
    tools/dev/rename_filter.sh is provided to update your existing
    applications to use new filters function names.
  * Renaming of diverse functions and datatypes. The script
    tools/dev/rename.sh is provided to update your existing
    applications to use the new names. It is also possible to compile
    with the pkg-config package starpu-1.0 to keep using the old
    names. It is however recommended to update your code and to use
    the package starpu-1.1.

  * Fix the block filter functions.
  * Fix StarPU-MPI on Darwin.
  * The FxT code can now be used on systems other than Linux.
  * Keep only one hashtable implementation common/uthash.h
  * The cache of starpu_mpi_insert_task is fixed and thus now enabled by
    default.
  * Improve starpu_machine_display output.
  * Standardize objects name in the performance model API
  * SOCL
    - Virtual SOCL device has been removed
    - Automatic scheduling still available with command queues not
      assigned to any device
    - Remove modified OpenCL headers. ICD is now the only supported
      way to use SOCL.
    - SOCL test suite is only run when environment variable
      SOCL_OCL_LIB_OPENCL is defined. It should contain the location
      of the libOpenCL.so file of the OCL ICD implementation.
  * Fix main memory leak on multiple unregister/re-register.
  * Improve hwloc detection by configure
  * Cell:
    - It is no longer possible to enable the cell support via the
      gordon driver
    - Data interfaces no longer define functions to copy to and from
      SPU devices
    - Codelet no longer define pointer for Gordon implementations
    - Gordon workers are no longer enabled
    - Gordon performance models are no longer enabled
  * Fix data transfer arrows in paje traces
  * The "heft" scheduler no longer exists. Users should now pick "dmda"
    instead.
  * StarPU can now use poti to generate paje traces.
  * Rename scheduling policy "parallel greedy" to "parallel eager"
  * starpu_scheduler.h is no longer automatically included by
    starpu.h, it has to be manually included when needed
  * New batch files to run StarPU applications with Microsoft Visual C
  * Add examples/release/Makefile to test StarPU examples against an
    installed version of StarPU. That can also be used to test
    examples using a previous API.
  * Tutorial is installed in ${docdir}/tutorial
  * Schedulers eager_central_policy, dm and dmda no longer erroneously respect
    priorities. dmdas has to be used to respect priorities.
  * StarPU-MPI: Fix potential bug for user-defined datatypes. As MPI
    can reorder messages, we need to make sure the sending of the size
    of the data has been completed.
  * Documentation is now generated through doxygen.
  * Modification of perfmodels output format for future improvements.
  * Fix for properly dealing with NAN on windows systems
  * Function starpu_sched_ctx_create() now takes a variable argument
    list to define the scheduler to be used, and the minimum and
    maximum priority values
  * The functions starpu_sched_set/get_min/max_priority set/get the
    priorities of the current scheduling context, i.e the one which
    was set by a call to starpu_sched_ctx_set_context() or the initial
    context if the function was not called yet.

Small changes:
  * STARPU_NCPU should now be used instead of STARPU_NCPUS. STARPU_NCPUS is
	still available for compatibility reasons.
  * include/starpu.h includes all include/starpu_*.h files, applications
	therefore only need to have #include <starpu.h>
  * Active task wait is now included in blocked time.
  * Fix GCC plugin linking issues starting with GCC 4.7.
  * Fix forcing calibration of never-calibrated archs.
  * CUDA applications are no longer compiled with the "-arch sm_13"
    option. It is specifically added to applications which need it.
  * Explicitly name the non-sleeping-non-running time "Overhead", and use
    another color in vite traces.
  * Use C99 variadic macro support, not GNU.
  * Fix performance regression: dmda queues were inadvertently made
    LIFOs in r9611.