Project Filelist for StarPU
File Release Notes and Changelog
Release Name: starpu-1.2.0
StarPU 1.2.0 (svn revision 18521) ============================================== New features: * MIC Xeon Phi support * SCC support * New function starpu_sched_ctx_exec_parallel_code to execute a parallel code on the workers of the given scheduler context * MPI: - New internal communication system : a unique tag called is now used for all communications, and a system of hashmaps on each node which stores pending receives has been implemented. Every message is now coupled with an envelope, sent before the corresponding data, which allows the receiver to allocate data correctly, and to submit the matching receive of the envelope. - New function starpu_mpi_irecv_detached_sequential_consistency which allows to enable or disable the sequential consistency for the given data handle (sequential consistency will be enabled or disabled based on the value of the function parameter and the value of the sequential consistency defined for the given data) - New functions starpu_mpi_task_build() and starpu_mpi_task_post_build() - New flag STARPU_NODE_SELECTION_POLICY to specify a policy for selecting a node to execute the codelet when several nodes own data in W mode. - New selection node policies can be un/registered with the functions starpu_mpi_node_selection_register_policy() and starpu_mpi_node_selection_unregister_policy() - New environment variable STARPU_MPI_COMM which enables basic tracing of communications. - New function starpu_mpi_init_comm() which allows to specify a MPI communicator. * New STARPU_COMMUTE flag which can be passed along STARPU_W or STARPU_RW to let starpu commute write accesses. * Out-of-core support, through registration of disk areas as additional memory nodes. It can be enabled programmatically or through the STARPU_DISK_SWAP* environment variables. * Reclaiming is now periodically done before memory becomes full. This can be controlled through the STARPU_*_AVAILABLE_MEM environment variables. * New hierarchical schedulers which allow the user to easily build its own scheduler, by coding itself each "box" it wants, or by combining existing boxes in StarPU to build it. Hierarchical schedulers have very interesting scalability properties. * Add STARPU_CUDA_ASYNC and STARPU_OPENCL_ASYNC flags to allow asynchronous CUDA and OpenCL kernel execution. * Add STARPU_CUDA_PIPELINE and STARPU_OPENCL_PIPELINE to specify how many asynchronous tasks are submitted in advance on CUDA and OpenCL devices. Setting the value to 0 forces a synchronous execution of all tasks. * Add CUDA concurrent kernel execution support through the STARPU_NWORKER_PER_CUDA environment variable. * Add CUDA and OpenCL kernel submission pipelining, to overlap costs and allow concurrent kernel execution on Fermi cards. * New locality work stealing scheduler (lws). * Add STARPU_VARIABLE_NBUFFERS to be set in cl.nbuffers, and nbuffers and modes field to the task structure, which permit to define codelets taking a variable number of data. * Add support for implementing OpenMP runtimes on top of StarPU * New performance model format to better represent parallel tasks. Used to provide estimations for the execution times of the parallel tasks on scheduling contexts or combined workers. * starpu_data_idle_prefetch_on_node and starpu_idle_prefetch_task_input_on_node allow to queue prefetches to be done only when the bus is idle. * Make starpu_data_prefetch_on_node not forcibly flush data out, introduce starpu_data_fetch_on_node for that. * Add data access arbiters, to improve parallelism of concurrent data accesses, notably with STARPU_COMMUTE. * Anticipative writeback, to flush dirty data asynchronously before the GPU device is full. Disabled by default. Use STARPU_MINIMUM_CLEAN_BUFFERS and STARPU_TARGET_CLEAN_BUFFERS to enable it. * Add starpu_data_wont_use to advise that a piece of data will not be used in the close future. * Enable anticipative writeback by default. * New scheduler 'dmdasd' that considers priority when deciding on which worker to schedule * Add the capability to define specific MPI datatypes for StarPU user-defined interfaces. * Add tasks.rec trace output to make scheduling analysis easier. * Add Fortran 90 module and example using it * New StarPU-MPI gdb debug functions * Generate animated html trace of modular schedulers. * Add asynchronous partition planning. It only supports coherency through the main RAM for now. * Add asynchronous partition planning. It only supports coherency through the home node of data for now. * Add STARPU_MALLOC_SIMULATION_FOLDED flag to save memory when simulating. * Include application threads in the trace. * Add starpu_task_get_task_scheduled_succs to get successors of a task. * Add graph inspection facility for schedulers. * New STARPU_LOCALITY flag to mark data which should be taken into account by schedulers for improving locality. * Experimental support for data locality in ws and lws. * Add a preliminary framework for native Fortran support for StarPU Small features: * Tasks can now have a name (via the field const char *name of struct starpu_task) * New functions starpu_data_acquire_cb_sequential_consistency() and starpu_data_acquire_on_node_cb_sequential_consistency() which allows to enable or disable sequential consistency * New configure option --enable-fxt-lock which enables additional trace events focused on locks behaviour during the execution * Functions starpu_insert_task and starpu_mpi_insert_task are renamed in starpu_task_insert and starpu_mpi_task_insert. Old names are kept to avoid breaking old codes. * New configure option --enable-calibration-heuristic which allows the user to set the maximum authorized deviation of the history-based calibrator. * Allow application to provide the task footprint itself. * New function starpu_sched_ctx_display_workers() to display worker information belonging to a given scheduler context * The option --enable-verbose can be called with --enable-verbose=extra to increase the verbosity * Add codelet size, footprint and tag id in the paje trace. * Add STARPU_TAG_ONLY, to specify a tag for traces without making StarPU manage the tag. * On Linux x86, spinlocks now block after a hundred tries. This avoids typical 10ms pauses when the application thread tries to submit tasks. * New function char *starpu_worker_get_type_as_string(enum starpu_worker_archtype type) * Improve static scheduling by adding support for specifying the task execution order. * Add starpu_worker_can_execute_task_impl and starpu_worker_can_execute_task_first_impl to optimize getting the working implementations * Add STARPU_MALLOC_NORECLAIM flag to allocate without running a reclaim if the node is out of memory. * New flag STARPU_DATA_MODE_ARRAY for the function family starpu_task_insert to allow to define a array of data handles along with their access modes. * New configure option --enable-new-check to enable new testcases which are known to fail * Add starpu_memory_allocate and _deallocate to let the application declare its own allocation to the reclaiming engine. * Add STARPU_SIMGRID_CUDA_MALLOC_COST and STARPU_SIMGRID_CUDA_QUEUE_COST to disable CUDA costs simulation in simgrid mode. * Add starpu_task_get_task_succs to get the list of children of a given task. * Add starpu_malloc_on_node_flags, starpu_free_on_node_flags, and starpu_malloc_on_node_set_default_flags to control the allocation flags used for allocations done by starpu. * Ranges can be provided in STARPU_WORKERS_CPUID * Add starpu_fxt_autostart_profiling to be able to avoid autostart. * Add arch_cost_function perfmodel function field. * Add STARPU_TASK_BREAK_ON_SCHED, STARPU_TASK_BREAK_ON_PUSH, and STARPU_TASK_BREAK_ON_POP environment variables to debug schedulers. * Add starpu_sched_display tool. * Add starpu_memory_pin and starpu_memory_unpin to pin memory allocated another way than starpu_malloc. * Add STARPU_NOWHERE to create synchronization tasks with data. * Document how to switch between differents views of the same data. * Add STARPU_NAME to specify a task name from a starpu_task_insert call. * Add configure option to disable fortran --disable-fortran * Add configure option to give path for smpirun executable --with-smpirun * Add configure option to disable the build of tests --disable-build-tests * Add starpu-all-tasks debugging support * New function void starpu_opencl_load_program_source_malloc(const char *source_file_name, char **located_file_name, char **located_dir_name, char **opencl_program_source) which allocates the pointers located_file_name, located_dir_name and opencl_program_source. * Add submit_hook and do_schedule scheduler methods. * Add starpu_sleep. * Add starpu_task_list_ismember. * Add _starpu_fifo_pop_this_task. * Add STARPU_MAX_MEMORY_USE environment variable. * Add starpu_worker_get_id_check(). * New function starpu_mpi_wait_for_all(MPI_Comm comm) that allows to wait until all StarPU tasks and communications for the given communicator are completed. * New function starpu_codelet_unpack_args_and_copyleft() which allows to copy in a new buffer values which have not been unpacked by the current call * Add STARPU_CODELET_SIMGRID_EXECUTE flag. * Add STARPU_CL_ARGS flag to starpu_task_insert() and starpu_mpi_task_insert() functions call Changes: * Data interfaces (variable, vector, matrix and block) now define pack und unpack functions * StarPU-MPI: Fix for being able to receive data which have not yet been registered by the application (i.e it did not call starpu_data_set_tag(), data are received as a raw memory) * StarPU-MPI: Fix for being able to receive data with the same tag from several nodes (see mpi/tests/gather.c) * Remove the long-deprecated cost_model fields and task->buffers field. * Fix complexity of implicit task/data dependency, from quadratic to linear. Small changes: * Rename function starpu_trace_user_event() as starpu_fxt_trace_user_event() * "power" is renamed into "energy" wherever it applies, notably energy consumption performance models * Update starpu_task_build() to set starpu_task::cl_arg_free to 1 if some arguments of type ::STARPU_VALUE are given. * Simplify performance model loading API * Better semantic for environment variables STARPU_NMIC and STARPU_NMICDEVS, the number of devices and the number of cores. STARPU_NMIC will be the number of devices, and STARPU_NMICCORES will be the number of cores per device.