What's New
Index
Major Updates in Slurm Version 14.11
Slurm Version 14.11 was released in November 2014. Major enhancements include:
- Added job array data structure and removed the array size restriction.
- Performance for job array operations and database interactions are dramatically improved.
- Added support for reserving CPUs and/or memory on a compute node for system use. This can largely eliminate system noise from applications.
- Added support for allocation of generic resources by model type for heterogeneous systems (e.g. request a Kepler GPU, a Tesla GPU, or a GPU of any type).
- Added support for non-consumable generic resources that are shared, but limited in number.
- Added support for automatic job requeue policy based on exit value.
- Added user options to set the CPU governor (OnDemand, Performance, PowerSave or UserSpace) in addition to being able to explicitly set the CPU frequency currently available.
- Added support for an advanced reservation start time that remains constant relative to the current time.
- Added reporting Slurm message traffic by user, type, count and time consumed.
Major Updates Planned for Slurm Version 15.08
Slurm Version 15.08 is scheduled for release in August 2015. Major enhancements to include:
- Convert charging from being based upon CPU time allocated to a more general system billing unit, which can be computed as a function of many different resources (e.g. CPU, memory, power, GPUs, etc.). A job's consumption of all these resources will be logged in Slurm's database.
- Add the ability for a compute node to be allocated to multiple jobs, but restricted to a single user.
- Support added for cluster-wide power capping.
- A partition can now have an associated Quality Of Service (QOS). This will allow a partition to have all of the limits available to a QOS.
- Add support for QOS-based job preemption to be used with job suspend/resume mechanism.
- Add support for burst buffers, data storage available for before, during and/or after job computation in support of data staging, checkpoint, etc. Plugins provied for Cray Data Warp and a generic script-based interface.
- Add support for optimized job allocations with respect to SGI Hypercube topology.
- Add support for Remote CUDA (rCUDA)
- Add support for PMI Exascale (PMIx) for improved MPI scalability.
- Add support asymmetric resource allocation and MPMD programming. Multiple resource allocation specification (memory, CPUs, GPUs, etc.) will be supported in a single job allocation.
- Add support for communication gateway nodes to improve scalability.
- Add layouts framework, which will be the basis for further developments toward optimizing scheduling with respect to additional parameters such as temperature and power consumption.
Major Updates in Slurm Version 16.05 and beyond
Detailed plans for release dates and contents of additional Slurm releases have not been finalized. Anyone desiring to perform Slurm development should notify slurm-dev@schedmd.com to coordinate activities. Future development plans includes:
- Improved support for GPU affinity with respect to CPUs and network resources.
- Integration with FLEXlm (Flexnet Publisher) license management.
- Distributed architecture to support the management of resources with Intel MIC processors.
- IP communications over InfiniBand network for improved performance.
- Fault-tolerance and jobs dynamic adaptation through communication protocol between Slurm, MPI libraries and the application.
- Improved support for high-throughput computing (e.g. multiple slurmctld daemons on a single cluster).
- Add Kerberos credential support including credential forwarding and refresh.
- Improved support for provisioning and virtualization.
- Provide a web-based Slurm administration tool.
Last modified 31 March 2015