skbio.stats.power.
paired_subsamples
(meta, cat, control_cats, order=None, strict_match=True)[source]¶Draws a list of samples varied by cat and matched for control_cats
State: Experimental as of 0.4.0.
This function is designed to provide controlled samples, based on a metadata category. For example, one could control for age, sex, education level, and diet type while measuring exercise frequency.
Parameters: |
|
---|---|
Returns: | ids – a set of ids which satisfy the criteria. These are not grouped by cat. An empty array indicates there are no sample ids which satisfy the requirements. |
Return type: | array |
Examples
If we have a mapping file for a set of random individuals looking at housing, sex, age and antibiotic use.
>>> import pandas as pd
>>> import numpy as np
>>> meta = {'SW': {'HOUSING': '2', 'SEX': 'M', 'AGE': np.nan, 'ABX': 'Y'},
... 'TS': {'HOUSING': '2', 'SEX': 'M', 'AGE': '40s', 'ABX': 'Y'},
... 'CB': {'HOUSING': '3', 'SEX': 'M', 'AGE': '40s', 'ABX': 'Y'},
... 'BB': {'HOUSING': '1', 'SEX': 'M', 'AGE': '40s', 'ABX': 'Y'}}
>>> meta = pd.DataFrame.from_dict(meta, orient="index")
>>> meta #doctest: +SKIP
ABX HOUSING AGE SEX
BB Y 1 40s M
CB Y 3 40s M
SW Y 2 NaN M
TS Y 2 40s M
We may want to vary an individual’s housing situation, while holding constant their age, sex and antibiotic use so we can estimate the effect size for housing, and later compare it to the effects of other variables.
>>> from skbio.stats.power import paired_subsamples
>>> ids = paired_subsamples(meta, 'HOUSING', ['SEX', 'AGE', 'ABX'])
>>> np.hstack(ids) #doctest: +ELLIPSIS
array(['BB', 'TS', 'CB']...)
So, for this set of data, we can match TS, CB, and BB based on their age, sex, and antibiotic use. SW cannot be matched in either group because strict_match was true, and there is missing AGE data for this sample.