Out of core computation with dask¶
xarray integrates with dask to support streaming computation on datasets that don’t fit into memory.
Currently, dask is an entirely optional feature for xarray. However, the benefits of using dask are sufficiently strong that dask may become a required dependency in a future version of xarray.
For a full example of how to use xarray’s dask integration, read the blog post introducing xarray and dask.
What is a dask array?¶

Dask divides arrays into many small pieces, called chunks, each of which is presumed to be small enough to fit into memory.
Unlike NumPy, which has eager evaluation, operations on dask arrays are lazy. Operations queue up a series of tasks mapped over blocks, and no computation is performed until you actually ask values to be computed (e.g., to print results to your screen or write to disk). At that point, data is loaded into memory and computation proceeds in a streaming fashion, block-by-block.
The actual computation is controlled by a multi-processing or thread pool, which allows dask to take full advantage of multiple processers available on most modern computers.
For more details on dask, read its documentation.
Reading and writing data¶
The usual way to create a dataset filled with dask arrays is to load the
data from a netCDF file or files. You can do this by supplying a chunks
argument to open_dataset()
or using the
open_mfdataset()
function.
In [1]: ds = xr.open_dataset('example-data.nc', chunks={'time': 10})
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-1-9c7ea69516aa> in <module>()
----> 1 ds = xr.open_dataset('example-data.nc', chunks={'time': 10})
/build/python-xarray-Gnyvsx/python-xarray-0.9.6/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables)
299 lock = _default_lock(filename_or_obj, engine)
300 with close_on_error(store):
--> 301 return maybe_decode_store(store, lock)
302 else:
303 if engine is not None and engine != 'scipy':
/build/python-xarray-Gnyvsx/python-xarray-0.9.6/xarray/backends/api.py in maybe_decode_store(store, lock)
228
229 if chunks is not None:
--> 230 from dask.base import tokenize
231 # if passed an actual file path, augment the token with
232 # the file modification time
/usr/lib/python3/dist-packages/dask/base.py in <module>()
12 import warnings
13
---> 14 from toolz import merge, groupby, curry, identity
15 from toolz.functoolz import Compose
16
ModuleNotFoundError: No module named 'toolz'
In [2]: ds