Interpolating data¶
Xarray offers flexible interpolation routines, which have a similar interface to our indexing.
Note
interp
requires scipy installed.
Scalar and 1-dimensional interpolation¶
Interpolating a DataArray
works mostly like labeled
indexing of a DataArray
,
In [1]: da = xr.DataArray(
...: np.sin(0.3 * np.arange(12).reshape(4, 3)),
...: [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
...: )
...:
# label lookup
In [2]: da.sel(time=3)
Out[2]:
<xarray.DataArray (space: 3)>
array([ 0.427, 0.141, -0.158])
Coordinates:
time int64 3
* space (space) float64 0.1 0.2 0.3
# interpolation
In [3]: da.interp(time=2.5)
Out[3]:
<xarray.DataArray (space: 3)>
array([0.701, 0.502, 0.259])
Coordinates:
* space (space) float64 0.1 0.2 0.3
time float64 2.5
Similar to the indexing, interp()
also accepts an
array-like, which gives the interpolated result as an array.
# label lookup
In [4]: da.sel(time=[2, 3])
Out[4]:
<xarray.DataArray (time: 2, space: 3)>
array([[ 0.974, 0.863, 0.675],
[ 0.427, 0.141, -0.158]])
Coordinates:
* time (time) int64 2 3
* space (space) float64 0.1 0.2 0.3
# interpolation
In [5]: da.interp(time=[2.5, 3.5])
Out[5]:
<xarray.DataArray (time: 2, space: 3)>
array([[0.701, 0.502, 0.259],
[ nan, nan, nan]])
Coordinates:
* space (space) float64 0.1 0.2 0.3
* time (time) float64 2.5 3.5
To interpolate data with a numpy.datetime64 coordinate you can pass a string.
In [6]: da_dt64 = xr.DataArray(
...: [1, 3], [("time", pd.date_range("1/1/2000", "1/3/2000", periods=2))]
...: )
...:
In [7]: da_dt64.interp(time="2000-01-02")
Out[7]:
<xarray.DataArray ()>
array(2.)
Coordinates:
time datetime64[ns] 2000-01-02
The interpolated data can be merged into the original DataArray
by specifying the time periods required.
In [8]: da_dt64.interp(time=pd.date_range("1/1/2000", "1/3/2000", periods=3))
Out[8]:
<xarray.DataArray (time: 3)>
array([1., 2., 3.])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03
Interpolation of data indexed by a CFTimeIndex
is also
allowed. See Non-standard calendars and dates outside the Timestamp-valid range for examples.
Note
Currently, our interpolation only works for regular grids.
Therefore, similarly to sel()
,
only 1D coordinates along a dimension can be used as the
original coordinate to be interpolated.
Multi-dimensional Interpolation¶
Like sel()
, interp()
accepts multiple coordinates. In this case, multidimensional interpolation
is carried out.
# label lookup
In [9]: da.sel(time=2, space=0.1)
Out[9]:
<xarray.DataArray ()>
array(0.974)
Coordinates:
time int64 2
space float64 0.1
# interpolation
In [10]: da.interp(time=2.5, space=0.15)
Out[10]:
<xarray.DataArray ()>
array(0.601)
Coordinates:
time float64 2.5
space float64 0.15
Array-like coordinates are also accepted:
# label lookup
In [11]: da.sel(time=[2, 3], space=[0.1, 0.2])
Out[11]:
<xarray.DataArray (time: 2, space: 2)>
array([[0.974, 0.863],
[0.427, 0.141]])
Coordinates:
* time (time) int64 2 3
* space (space) float64 0.1 0.2
# interpolation
In [12]: da.interp(time=[1.5, 2.5], space=[0.15, 0.25])
Out[12]:
<xarray.DataArray (time: 2, space: 2)>
array([[0.888, 0.867],
[0.601, 0.381]])
Coordinates:
* time (time) float64 1.5 2.5
* space (space) float64 0.15 0.25
interp_like()
method is a useful shortcut. This
method interpolates an xarray object onto the coordinates of another xarray
object. For example, if we want to compute the difference between
two DataArray
s (da
and other
) staying on slightly
different coordinates,
In [13]: other = xr.DataArray(
....: np.sin(0.4 * np.arange(9).reshape(3, 3)),
....: [("time", [0.9, 1.9, 2.9]), ("space", [0.15, 0.25, 0.35])],
....: )
....:
it might be a good idea to first interpolate da
so that it will stay on the
same coordinates of other
, and then subtract it.
interp_like()
can be used for such a case,
# interpolate da along other's coordinates
In [14]: interpolated = da.interp_like(other)
In [15]: interpolated
Out[15]:
<xarray.DataArray (time: 3, space: 3)>
array([[0.787, 0.911, nan],
[0.912, 0.789, nan],
[0.348, 0.069, nan]])
Coordinates:
* time (time) float64 0.9 1.9 2.9
* space (space) float64 0.15 0.25 0.35
It is now possible to safely compute the difference other - interpolated
.
Interpolation methods¶
We use scipy.interpolate.interp1d
for 1-dimensional interpolation and
scipy.interpolate.interpn()
for multi-dimensional interpolation.
The interpolation method can be specified by the optional method
argument.
In [16]: da = xr.DataArray(
....: np.sin(np.linspace(0, 2 * np.pi, 10)),
....: dims="x",
....: coords={"x": np.linspace(0, 1, 10)},
....: )
....:
In [17]: da.plot.line("o", label="original")
Out[17]: [<matplotlib.lines.Line2D at 0x7fa84acb79a0>]
In [18]: da.interp(x=np.linspace(0, 1, 100)).plot.line(label="linear (default)")
Out[18]: [<matplotlib.lines.Line2D at 0x7fa84acb7d60>]
In [19]: da.interp(x=np.linspace(0, 1, 100), method="cubic").plot.line(label="cubic")
Out[19]: [<matplotlib.lines.Line2D at 0x7fa84acb7ee0>]
In [20]: plt.legend()
Out[20]: <matplotlib.legend.Legend at 0x7fa84ae9b3a0>

Additional keyword arguments can be passed to scipy’s functions.
# fill 0 for the outside of the original coordinates.
In [21]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": 0.0})
Out[21]:
<xarray.DataArray (x: 10)>
array([ 0. , 0. , 0. , 0.814, 0.604, -0.604, -0.814, 0. , 0. , 0. ])
Coordinates:
* x (x) float64 -0.5 -0.2778 -0.05556 0.1667 ... 0.8333 1.056 1.278 1.5
# 1-dimensional extrapolation
In [22]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": "extrapolate"})
Out[22]:
<xarray.DataArray (x: 10)>
array([-2.893, -1.607, -0.321, 0.814, 0.604, -0.604, -0.814, 0.321, 1.607, 2.893])
Coordinates:
* x (x) float64 -0.5 -0.2778 -0.05556 0.1667 ... 0.8333 1.056 1.278 1.5
# multi-dimensional extrapolation
In [23]: da = xr.DataArray(
....: np.sin(0.3 * np.arange(12).reshape(4, 3)),
....: [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
....: )
....:
In [24]: da.interp(time=4, space=np.linspace(-0.1, 0.5, 10), kwargs={"fill_value": None})
Out[24]:
<xarray.DataArray (space: 10)>
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
Coordinates:
time int64 4
* space (space) float64 -0.1 -0.03333 0.03333 0.1 ... 0.3 0.3667 0.4333 0.5
Advanced Interpolation¶
interp()
accepts DataArray
as similar to sel()
, which enables us more advanced interpolation.
Based on the dimension of the new coordinate passed to interp()
, the dimension of the result are determined.
For example, if you want to interpolate a two dimensional array along a particular dimension, as illustrated below,
you can pass two 1-dimensional DataArray
s with
a common dimension as new coordinate.
For example:
In [25]: da = xr.DataArray(
....: np.sin(0.3 * np.arange(20).reshape(5, 4)),
....: [("x", np.arange(5)), ("y", [0.1, 0.2, 0.3, 0.4])],
....: )
....:
# advanced indexing
In [26]: x = xr.DataArray([0, 2, 4], dims="z")
In [27]: y = xr.DataArray([0.1, 0.2, 0.3], dims="z")
In [28]: da.sel(x=x, y=y)
Out[28]:
<xarray.DataArray (z: 3)>
array([ 0. , 0.427, -0.773])
Coordinates:
x (z) int64 0 2 4
y (z) float64 0.1 0.2 0.3
Dimensions without coordinates: z
# advanced interpolation
In [29]: x = xr.DataArray([0.5, 1.5, 2.5], dims="z")
In [30]: y = xr.DataArray([0.15, 0.25, 0.35], dims="z")
In [31]: da.interp(x=x, y=y)
Out[31]:
<xarray.DataArray (z: 3)>
array([ 0.556, 0.635, -0.466])
Coordinates:
x (z) float64 0.5 1.5 2.5
y (z) float64 0.15 0.25 0.35
Dimensions without coordinates: z
where values on the original coordinates
(x, y) = ((0.5, 0.15), (1.5, 0.25), (2.5, 0.35))
are obtained by the
2-dimensional interpolation and mapped along a new dimension z
.
If you want to add a coordinate to the new dimension z
, you can supply
DataArray
s with a coordinate,
In [32]: x = xr.DataArray([0.5, 1.5, 2.5], dims="z", coords={"z": ["a", "b", "c"]})
In [33]: y = xr.DataArray([0.15, 0.25, 0.35], dims="z", coords={"z": ["a", "b", "c"]})
In [34]: da.interp(x=x, y=y)
Out[34]:
<xarray.DataArray (z: 3)>
array([ 0.556, 0.635, -0.466])
Coordinates:
x (z) float64 0.5 1.5 2.5
y (z) float64 0.15 0.25 0.35
* z (z) <U1 'a' 'b' 'c'
For the details of the advanced indexing, see more advanced indexing.
Interpolating arrays with NaN¶
Our interp()
works with arrays with NaN
the same way that
scipy.interpolate.interp1d and
scipy.interpolate.interpn do.
linear
and nearest
methods return arrays including NaN,
while other methods such as cubic
or quadratic
return all NaN arrays.
In [35]: da = xr.DataArray([0, 2, np.nan, 3, 3.25], dims="x", coords={"x": range(5)})
In [36]: da.interp(x=[0.5, 1.5, 2.5])
Out[36]:
<xarray.DataArray (x: 3)>
array([ 1., nan, nan])
Coordinates:
* x (x) float64 0.5 1.5 2.5
In [37]: da.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[37]:
<xarray.DataArray (x: 3)>
array([nan, nan, nan])
Coordinates:
* x (x) float64 0.5 1.5 2.5
To avoid this, you can drop NaN by dropna()
, and
then make the interpolation
In [38]: dropped = da.dropna("x")
In [39]: dropped
Out[39]:
<xarray.DataArray (x: 4)>
array([0. , 2. , 3. , 3.25])
Coordinates:
* x (x) int64 0 1 3 4
In [40]: dropped.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[40]:
<xarray.DataArray (x: 3)>
array([1.19 , 2.508, 2.93 ])
Coordinates:
* x (x) float64 0.5 1.5 2.5
If NaNs are distributed randomly in your multidimensional array,
dropping all the columns containing more than one NaNs by
dropna()
may lose a significant amount of information.
In such a case, you can fill NaN by interpolate_na()
,
which is similar to pandas.Series.interpolate()
.
In [41]: filled = da.interpolate_na(dim="x")
In [42]: filled
Out[42]:
<xarray.DataArray (x: 5)>
array([0. , 2. , 2.5 , 3. , 3.25])
Coordinates:
* x (x) int64 0 1 2 3 4
This fills NaN by interpolating along the specified dimension. After filling NaNs, you can interpolate:
In [43]: filled.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[43]:
<xarray.DataArray (x: 3)>
array([1.309, 2.316, 2.738])
Coordinates:
* x (x) float64 0.5 1.5 2.5
For the details of interpolate_na()
,
see Missing values.
Example¶
Let’s see how interp()
works on real data.
# Raw data
In [44]: ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
/build/python-xarray-iHArcV/python-xarray-0.21.0/xarray/tutorial.py in open_dataset(name, cache, cache_dir, engine, **kws)
113 try:
--> 114 import pooch
115 except ImportError as e:
ModuleNotFoundError: No module named 'pooch'
The above exception was the direct cause of the following exception:
ImportError Traceback (most recent call last)
<ipython-input-44-f62677abb3ca> in <module>
----> 1 ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)
/build/python-xarray-iHArcV/python-xarray-0.21.0/xarray/tutorial.py in open_dataset(name, cache, cache_dir, engine, **kws)
114 import pooch
115 except ImportError as e:
--> 116 raise ImportError(
117 "tutorial.open_dataset depends on pooch to download and manage datasets."
118 " To proceed please install pooch."
ImportError: tutorial.open_dataset depends on pooch to download and manage datasets. To proceed please install pooch.
In [45]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))
In [46]: ds.air.plot(ax=axes[0])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-46-cb8f083667be> in <module>
----> 1 ds.air.plot(ax=axes[0])
/build/python-xarray-iHArcV/python-xarray-0.21.0/xarray/core/common.py in __getattr__(self, name)
237 with suppress(KeyError):
238 return source[name]
--> 239 raise AttributeError(
240 f"{type(self).__name__!r} object has no attribute {name!r}"
241 )
AttributeError: 'Dataset' object has no attribute 'air'
In [47]: axes[0].set_title("Raw data")
Out[47]: Text(0.5, 1.0, 'Raw data')
# Interpolated data
In [48]: new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.dims["lon"] * 4)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-48-6afeae5c8ac9> in <module>
----> 1 new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.dims["lon"] * 4)
/build/python-xarray-iHArcV/python-xarray-0.21.0/xarray/core/common.py in __getattr__(self, name)
237 with suppress(KeyError):
238 return source[name]
--> 239 raise AttributeError(
240 f"{type(self).__name__!r} object has no attribute {name!r}"
241 )
AttributeError: 'Dataset' object has no attribute 'lon'
In [49]: new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.dims["lat"] * 4)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-49-40fe8b954339> in <module>
----> 1 new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.dims["lat"] * 4)
/build/python-xarray-iHArcV/python-xarray-0.21.0/xarray/core/common.py in __getattr__(self, name)
237 with suppress(KeyError):
238 return source[name]
--> 239 raise AttributeError(
240 f"{type(self).__name__!r} object has no attribute {name!r}"
241 )
AttributeError: 'Dataset' object has no attribute 'lat'
In [50]: dsi = ds.interp(lat=new_lat, lon=new_lon)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-50-589e3eedfe36> in <module>
----> 1 dsi = ds.interp(lat=new_lat, lon=new_lon)
NameError: name 'new_lat' is not defined
In [51]: dsi.air.plot(ax=axes[1])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-51-433e5c8b0d13> in <module>
----> 1 dsi.air.plot(ax=axes[1])
NameError: name 'dsi' is not defined
In [52]: axes[1].set_title("Interpolated data")
Out[52]: Text(0.5, 1.0, 'Interpolated data')

Our advanced interpolation can be used to remap the data to the new coordinate. Consider the new coordinates x and z on the two dimensional plane. The remapping can be done as follows
# new coordinate
In [53]: x = np.linspace(240, 300, 100)
In [54]: z = np.linspace(20, 70, 100)
# relation between new and original coordinates
In [55]: lat = xr.DataArray(z, dims=["z"], coords={"z": z})
In [56]: lon = xr.DataArray(
....: (x[:, np.newaxis] - 270) / np.cos(z * np.pi / 180) + 270,
....: dims=["x", "z"],
....: coords={"x": x, "z": z},
....: )
....:
In [57]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))
In [58]: ds.air.plot(ax=axes[0])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-58-cb8f083667be> in <module>
----> 1 ds.air.plot(ax=axes[0])
/build/python-xarray-iHArcV/python-xarray-0.21.0/xarray/core/common.py in __getattr__(self, name)
237 with suppress(KeyError):
238 return source[name]
--> 239 raise AttributeError(
240 f"{type(self).__name__!r} object has no attribute {name!r}"
241 )
AttributeError: 'Dataset' object has no attribute 'air'
# draw the new coordinate on the original coordinates.
In [59]: for idx in [0, 33, 66, 99]:
....: axes[0].plot(lon.isel(x=idx), lat, "--k")
....:
In [60]: for idx in [0, 33, 66, 99]:
....: axes[0].plot(*xr.broadcast(lon.isel(z=idx), lat.isel(z=idx)), "--k")
....:
In [61]: axes[0].set_title("Raw data")
Out[61]: Text(0.5, 1.0, 'Raw data')
In [62]: dsi = ds.interp(lon=lon, lat=lat)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-62-eb48b9af7203> in <module>
----> 1 dsi = ds.interp(lon=lon, lat=lat)
/build/python-xarray-iHArcV/python-xarray-0.21.0/xarray/core/dataset.py in interp(self, coords, method, assume_sorted, kwargs, method_non_numeric, **coords_kwargs)
3121
3122 coords = either_dict_or_kwargs(coords, coords_kwargs, "interp")
-> 3123 indexers = dict(self._validate_interp_indexers(coords))
3124
3125 if coords:
/build/python-xarray-iHArcV/python-xarray-0.21.0/xarray/core/dataset.py in _validate_interp_indexers(self, indexers)
2239 ) -> Iterator[tuple[Hashable, Variable]]:
2240 """Variant of _validate_indexers to be used for interpolation"""
-> 2241 for k, v in self._validate_indexers(indexers):
2242 if isinstance(v, Variable):
2243 if v.ndim == 1:
/build/python-xarray-iHArcV/python-xarray-0.21.0/xarray/core/dataset.py in _validate_indexers(self, indexers, missing_dims)
2202 from .dataarray import DataArray
2203
-> 2204 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims)
2205
2206 # all indexers should be int, slice, np.ndarrays, or Variable
/build/python-xarray-iHArcV/python-xarray-0.21.0/xarray/core/utils.py in drop_dims_from_indexers(indexers, dims, missing_dims)
832 invalid = indexers.keys() - set(dims)
833 if invalid:
--> 834 raise ValueError(
835 f"Dimensions {invalid} do not exist. Expected one or more of {dims}"
836 )
ValueError: Dimensions {'lat', 'lon'} do not exist. Expected one or more of Frozen({'x': 3, 'y': 4})
In [63]: dsi.air.plot(ax=axes[1])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-63-433e5c8b0d13> in <module>
----> 1 dsi.air.plot(ax=axes[1])
NameError: name 'dsi' is not defined
In [64]: axes[1].set_title("Remapped data")
Out[64]: Text(0.5, 1.0, 'Remapped data')
