skbio.stats.distance.
mantel
(x, y, method='pearson', permutations=999, alternative='two-sided', strict=True, lookup=None)[source]¶Compute correlation between distance matrices using the Mantel test.
State: Experimental as of 0.4.0.
The Mantel test compares two distance matrices by computing the correlation between the distances in the lower (or upper) triangular portions of the symmetric distance matrices. Correlation can be computed using Pearson’s product-moment correlation coefficient or Spearman’s rank correlation coefficient.
As defined in [1], the Mantel test computes a test statistic \(r_M\) given two symmetric distance matrices \(D_X\) and \(D_Y\). \(r_M\) is defined as
where
and \(n\) is the number of rows/columns in each of the distance matrices. \(stand(D_X)\) and \(stand(D_Y)\) are distance matrices with their upper triangles containing standardized distances. Note that since \(D_X\) and \(D_Y\) are symmetric, the lower triangular portions of the matrices could equivalently have been used instead of the upper triangular portions (the current function behaves in this manner).
If method='spearman'
, the above equation operates on ranked distances
instead of the original distances.
Statistical significance is assessed via a permutation test. The rows and columns of the first distance matrix (x) are randomly permuted a number of times (controlled via permutations). A correlation coefficient is computed for each permutation and the p-value is the proportion of permuted correlation coefficients that are equal to or more extreme than the original (unpermuted) correlation coefficient. Whether a permuted correlation coefficient is “more extreme” than the original correlation coefficient depends on the alternative hypothesis (controlled via alternative).
Parameters: |
|
---|---|
Returns: |
|
Raises: |
|
See also
DistanceMatrix()
, scipy.stats.pearsonr()
, scipy.stats.spearmanr()
, pwmantel()
Notes
The Mantel test was first described in [2]. The general algorithm and
interface are similar to vegan::mantel
, available in R’s vegan
package [3].
np.nan
will be returned for the p-value if permutations is zero or if
the correlation coefficient is np.nan
. The correlation coefficient will
be np.nan
if one or both of the inputs does not have any variation
(i.e. the distances are all constant) and method='spearman'
.
References
[1] | Legendre, P. and Legendre, L. (2012) Numerical Ecology. 3rd English Edition. Elsevier. |
[2] | Mantel, N. (1967). “The detection of disease clustering and a generalized regression approach”. Cancer Research 27 (2): 209-220. PMID 6018555. |
[3] | http://cran.r-project.org/web/packages/vegan/index.html |
Examples
Import the functionality we’ll use in the following examples:
>>> from skbio import DistanceMatrix
>>> from skbio.stats.distance import mantel
Define two 3x3 distance matrices:
>>> x = DistanceMatrix([[0, 1, 2],
... [1, 0, 3],
... [2, 3, 0]])
>>> y = DistanceMatrix([[0, 2, 7],
... [2, 0, 6],
... [7, 6, 0]])
Compute the Pearson correlation between them and assess significance using a two-sided test with 999 permutations:
>>> coeff, p_value, n = mantel(x, y)
>>> print(round(coeff, 4))
0.7559
Thus, we see a moderate-to-strong positive correlation (\(r_M=0.7559\)) between the two matrices.
In the previous example, the distance matrices (x
and y
) have the
same IDs, in the same order:
>>> x.ids
('0', '1', '2')
>>> y.ids
('0', '1', '2')
If necessary, mantel
will reorder the distance matrices prior to
running the test. The function also supports a lookup
dictionary that
maps distance matrix IDs to new IDs, providing a way to match IDs between
distance matrices prior to running the Mantel test.
For example, let’s reassign the distance matrices’ IDs so that there are no matching IDs between them:
>>> x.ids = ('a', 'b', 'c')
>>> y.ids = ('d', 'e', 'f')
If we rerun mantel
, we get the following error notifying us that there
are nonmatching IDs (this is the default behavior with strict=True
):
>>> mantel(x, y)
Traceback (most recent call last):
...
ValueError: IDs exist that are not in both distance matrices.
If we pass strict=False
to ignore/discard nonmatching IDs, we see that
no matches exist between x and y, so the Mantel test still cannot be
run:
>>> mantel(x, y, strict=False)
Traceback (most recent call last):
...
ValueError: No matching IDs exist between the distance matrices.
To work around this, we can define a lookup
dictionary to specify how
the IDs should be matched between distance matrices:
>>> lookup = {'a': 'A', 'b': 'B', 'c': 'C',
... 'd': 'A', 'e': 'B', 'f': 'C'}
lookup
maps each ID to 'A'
, 'B'
, or 'C'
. If we rerun
mantel
with lookup
, we get the same results as the original
example where all distance matrix IDs matched:
>>> coeff, p_value, n = mantel(x, y, lookup=lookup)
>>> print(round(coeff, 4))
0.7559
mantel
also accepts input that is array_like
. For example, if we
redefine x and y as nested Python lists instead of DistanceMatrix
instances, we obtain the same result:
>>> x = [[0, 1, 2],
... [1, 0, 3],
... [2, 3, 0]]
>>> y = [[0, 2, 7],
... [2, 0, 6],
... [7, 6, 0]]
>>> coeff, p_value, n = mantel(x, y)
>>> print(round(coeff, 4))
0.7559
It is import to note that reordering/matching of IDs (and hence the
strict
and lookup
parameters) do not apply when input is
array_like
because there is no notion of IDs.