Project Filelist for StarPU
File Release Notes and Changelog
Release Name: starpu-1.1.0
Release Notes
StarPU 1.1.0 (svn revision 11960) ============================================== The scheduling context release New features: * OpenGL interoperability support. * Capability to store compiled OpenCL kernels on the file system * Capability to load compiled OpenCL kernels * Performance models measurements can now be provided explicitly by applications. * Capability to emit communication statistics when running MPI code * Add starpu_unregister_submit, starpu_data_acquire_on_node and starpu_data_invalidate_submit * New functionnality to wrapper starpu_insert_task to pass a array of data_handles via the parameter STARPU_DATA_ARRAY * Enable GPU-GPU direct transfers. * GCC plug-in - Add `registered' attribute - A new pass was added that warns about the use of possibly unregistered memory buffers. * SOCL - Manual mapping of commands on specific devices is now possible - SOCL does not require StarPU CPU tasks anymore. CPU workers are automatically disabled to enhance performance of OpenCL CPU devices * New interface: COO matrix. * Data interfaces: The pack operation of user-defined data interface defines a new parameter count which should be set to the size of the buffer created by the packing of the data. * MPI: - Communication statistics for MPI can only be enabled at execution time by defining the environment variable STARPU_COMM_STATS - Communication cache mechanism is enabled by default, and can only be disabled at execution time by setting the environment variable STARPU_MPI_CACHE to 0. - Initialisation functions starpu_mpi_initialize_extended() and starpu_mpi_initialize() have been made deprecated. One should now use starpu_mpi_init(int *, char ***, int). The last parameter indicates if MPI should be initialised. - Collective detached operations have new parameters, a callback function and a argument. This is to be consistent with the detached point-to-point communications. - When exchanging user-defined data interfaces, the size of the data is the size returned by the pack operation, i.e data with dynamic size can now be exchanged with StarPU-MPI. * Add experimental simgrid support, to simulate execution with various number of CPUs, GPUs, amount of memory, etc. * Add support for OpenCL simulators (which provide simulated execution time) * Add support for Temanejo, a task graph debugger * Theoretical bound lp output now includes data transfer time. * Update OpenCL driver to only enable CPU devices (the environment variable STARPU_OPENCL_ONLY_ON_CPUS must be set to a positive value when executing an application) * Add Scheduling contexts to separate computation resources - Scheduling policies take into account the set of resources corresponding to the context it belongs to - Add support to dynamically change scheduling contexts (Create and Delete a context, Add Workers to a context, Remove workers from a context) - Add support to indicate to which contexts the tasks are submitted * Add the Hypervisor to manage the Scheduling Contexts automatically - The Contexts can be registered to the Hypervisor - Only the registered contexts are managed by the Hypervisor - The Hypervisor can detect the initial distribution of resources of a context and constructs it consequently (the cost of execution is required) - Several policies can adapt dynamically the distribution of resources in contexts if the initial one was not appropriate - Add a platform to implement new policies of redistribution of resources * Implement a memory manager which checks the global amount of memory available on devices, and checks there is enough memory before doing an allocation on the device. * Discard environment variable STARPU_LIMIT_GPU_MEM and define instead STARPU_LIMIT_CUDA_MEM and STARPU_LIMIT_OPENCL_MEM * Introduce new variables STARPU_LIMIT_CUDA_devid_MEM and STARPU_LIMIT_OPENCL_devid_MEM to limit memory per specific device * Introduce new variable STARPU_LIMIT_CPU_MEM to limit memory for the CPU devices * New function starpu_malloc_flags to define a memory allocation with constraints based on the following values: - STARPU_MALLOC_PINNED specifies memory should be pinned - STARPU_MALLOC_COUNT specifies the memory allocation should be in the limits defined by the environment variables STARPU_LIMIT_xxx (see above). When no memory is left, starpu_malloc_flag tries to reclaim memory from StarPU and returns -ENOMEM on failure. * starpu_malloc calls starpu_malloc_flags with a value of flag set to STARPU_MALLOC_PINNED * Define new function starpu_free_flags similarly to starpu_malloc_flags * Define new public API starpu_pthread which is similar to the pthread API. It is provided with 2 implementations: a pthread one and a Simgrid one. Applications using StarPU and wishing to use the Simgrid StarPU features should use it. * Allow to have a dynamically allocated number of buffers per task, and so overwrite the value defined --enable-maxbuffers=XXX * Performance models files are now stored in a directory whose name include the version of the performance model format. The version number is also written in the file itself. When updating the format, the internal variable _STARPU_PERFMODEL_VERSION should be updated. It is then possible to switch easily between differents versions of StarPU having different performance model formats. * Tasks can now define a optional prologue callback which is executed on the host when the task becomes ready for execution, before getting scheduled. * Small CUDA allocations (<= 4MiB) are now batched to avoid the huge cudaMalloc overhead. * Prefetching is now done for all schedulers when it can be done whatever the scheduling decision. * Add a watchdog which permits to easily trigger a crash when StarPU gets stuck. * Document how to migrate data over MPI. * New function starpu_wakeup_worker() to be used by schedulers to wake up a single worker (instead of all workers) when submitting a single task. * The functions starpu_sched_set/get_min/max_priority set/get the priorities of the current scheduling context, i.e the one which was set by a call to starpu_sched_ctx_set_context() or the initial context if the function has not been called yet. Small features: * Add starpu_worker_get_by_type and starpu_worker_get_by_devid * Add starpu_fxt_stop_profiling/starpu_fxt_start_profiling which permits to pause trace recording. * Add trace_buffer_size configuration field to permit to specify the tracing buffer size. * Add starpu_codelet_profile and starpu_codelet_histo_profile, tools which draw the profile of a codelet. * File STARPU-REVISION --- containing the SVN revision number from which StarPU was compiled --- is installed in the share/doc/starpu directory * starpu_perfmodel_plot can now directly draw GFlops curves. * New configure option --enable-mpi-progression-hook to enable the activity polling method for StarPU-MPI. * Permit to disable sequential consistency for a given task. * New macro STARPU_RELEASE_VERSION * New function starpu_get_version() to return as 3 integers the release version of StarPU. * Enable by default data allocation cache * New function starpu_perfmodel_directory() to print directory storing performance models. Available through the new option -d of the tool starpu_perfmodel_display * New batch files to execute StarPU applications under Microsoft Visual Studio (They are installed in path_to_starpu/bin/mvsc)/ * Add cl_arg_free, callback_arg_free, prologue_callback_arg_free fields to enable automatic free(cl_arg); free(callback_arg); free(prologue_callback_arg) on task destroy. * New function starpu_task_build Changes: * Rename all filter functions to follow the pattern starpu_DATATYPE_filter_FILTERTYPE. The script tools/dev/rename_filter.sh is provided to update your existing applications to use new filters function names. * Renaming of diverse functions and datatypes. The script tools/dev/rename.sh is provided to update your existing applications to use the new names. It is also possible to compile with the pkg-config package starpu-1.0 to keep using the old names. It is however recommended to update your code and to use the package starpu-1.1. * Fix the block filter functions. * Fix StarPU-MPI on Darwin. * The FxT code can now be used on systems other than Linux. * Keep only one hashtable implementation common/uthash.h * The cache of starpu_mpi_insert_task is fixed and thus now enabled by default. * Improve starpu_machine_display output. * Standardize objects name in the performance model API * SOCL - Virtual SOCL device has been removed - Automatic scheduling still available with command queues not assigned to any device - Remove modified OpenCL headers. ICD is now the only supported way to use SOCL. - SOCL test suite is only run when environment variable SOCL_OCL_LIB_OPENCL is defined. It should contain the location of the libOpenCL.so file of the OCL ICD implementation. * Fix main memory leak on multiple unregister/re-register. * Improve hwloc detection by configure * Cell: - It is no longer possible to enable the cell support via the gordon driver - Data interfaces no longer define functions to copy to and from SPU devices - Codelet no longer define pointer for Gordon implementations - Gordon workers are no longer enabled - Gordon performance models are no longer enabled * Fix data transfer arrows in paje traces * The "heft" scheduler no longer exists. Users should now pick "dmda" instead. * StarPU can now use poti to generate paje traces. * Rename scheduling policy "parallel greedy" to "parallel eager" * starpu_scheduler.h is no longer automatically included by starpu.h, it has to be manually included when needed * New batch files to run StarPU applications with Microsoft Visual C * Add examples/release/Makefile to test StarPU examples against an installed version of StarPU. That can also be used to test examples using a previous API. * Tutorial is installed in ${docdir}/tutorial * Schedulers eager_central_policy, dm and dmda no longer erroneously respect priorities. dmdas has to be used to respect priorities. * StarPU-MPI: Fix potential bug for user-defined datatypes. As MPI can reorder messages, we need to make sure the sending of the size of the data has been completed. * Documentation is now generated through doxygen. * Modification of perfmodels output format for future improvements. * Fix for properly dealing with NAN on windows systems * Function starpu_sched_ctx_create() now takes a variable argument list to define the scheduler to be used, and the minimum and maximum priority values * The functions starpu_sched_set/get_min/max_priority set/get the priorities of the current scheduling context, i.e the one which was set by a call to starpu_sched_ctx_set_context() or the initial context if the function was not called yet. Small changes: * STARPU_NCPU should now be used instead of STARPU_NCPUS. STARPU_NCPUS is still available for compatibility reasons. * include/starpu.h includes all include/starpu_*.h files, applications therefore only need to have #include <starpu.h> * Active task wait is now included in blocked time. * Fix GCC plugin linking issues starting with GCC 4.7. * Fix forcing calibration of never-calibrated archs. * CUDA applications are no longer compiled with the "-arch sm_13" option. It is specifically added to applications which need it. * Explicitly name the non-sleeping-non-running time "Overhead", and use another color in vite traces. * Use C99 variadic macro support, not GNU. * Fix performance regression: dmda queues were inadvertently made LIFOs in r9611.