skbio.stats.composition.
ancom
(table, grouping, alpha=0.05, tau=0.02, theta=0.1, multiple_comparisons_correction=None, significance_test=None)[source]¶Performs a differential abundance test using ANCOM.
State: Experimental as of 0.4.1.
This is done by calculating pairwise log ratios between all features and performing a significance test to determine if there is a significant difference in feature ratios with respect to the variable of interest.
In an experiment with only two treatments, this test tests the following hypothesis for feature \(i\)
where \(u_i^{(1)}\) is the mean abundance for feature \(i\) in the first group and \(u_i^{(2)}\) is the mean abundance for feature \(i\) in the second group.
Parameters: | table : pd.DataFrame
grouping : pd.Series
alpha : float, optional
tau : float, optional
theta : float, optional
multiple_comparisons_correction : {None, ‘holm-bonferroni’}, optional
significance_test : function, optional
|
---|---|
Returns: | pd.DataFrame
|
See also
multiplicative_replacement
, scipy.stats.ttest_ind
, scipy.stats.f_oneway
, scipy.stats.wilcoxon
, scipy.stats.kruskal
Notes
The developers of this method recommend the following significance tests
([R60], Supplementary File 1, top of page 11): the standard parametric
t-test (scipy.stats.ttest_ind
) or one-way ANOVA
(scipy.stats.f_oneway
) if the number of groups is greater
than 2, or non-parametric variants such as Wilcoxon rank sum
(scipy.stats.wilcoxon
) or Kruskal-Wallis (scipy.stats.kruskal
)
if the number of groups is greater than 2. Because one-way ANOVA is
equivalent to the standard t-test when the number of groups is two,
we default to scipy.stats.f_oneway
here, which can be used when
there are two or more groups. Users should refer to the documentation
of these tests in SciPy to understand the assumptions made by each test.
This method cannot handle any zero counts as input, since the logarithm
of zero cannot be computed. While this is an unsolved problem, many
studies have shown promising results by replacing the zeros with pseudo
counts. This can be also be done via the multiplicative_replacement
method.
References
[R59] | (1, 2) Holm, S. “A simple sequentially rejective multiple test procedure”. Scandinavian Journal of Statistics (1979), 6. |
[R60] | (1, 2) Mandal et al. “Analysis of composition of microbiomes: a novel method for studying microbial composition”, Microbial Ecology in Health & Disease, (2015), 26. |
Examples
First import all of the necessary modules:
>>> from skbio.stats.composition import ancom
>>> import pandas as pd
Now let’s load in a pd.DataFrame with 6 samples and 7 unknown bacteria:
>>> table = pd.DataFrame([[12, 11, 10, 10, 10, 10, 10],
... [9, 11, 12, 10, 10, 10, 10],
... [1, 11, 10, 11, 10, 5, 9],
... [22, 21, 9, 10, 10, 10, 10],
... [20, 22, 10, 10, 13, 10, 10],
... [23, 21, 14, 10, 10, 10, 10]],
... index=['s1','s2','s3','s4','s5','s6'],
... columns=['b1','b2','b3','b4','b5','b6','b7'])
Then create a grouping vector. In this scenario, there are only two classes, and suppose these classes correspond to the treatment due to a drug and a control. The first three samples are controls and the last three samples are treatments.
>>> grouping = pd.Series([0, 0, 0, 1, 1, 1],
... index=['s1','s2','s3','s4','s5','s6'])
Now run ancom
and see if there are any features that have any
significant differences between the treatment and the control.
>>> results = ancom(table, grouping)
>>> results['W']
b1 0
b2 4
b3 1
b4 1
b5 1
b6 0
b7 1
Name: W, dtype: int64
The W-statistic is the number of features that a single feature is tested to be significantly different against. In this scenario, b2 was detected to have significantly different abundances compared to four of the other species. To summarize the results from the W-statistic, let’s take a look at the results from the hypothesis test:
>>> results['reject']
b1 False
b2 True
b3 False
b4 False
b5 False
b6 False
b7 False
Name: reject, dtype: bool
From this we can conclude that only b2 was significantly different between the treatment and the control.