pybedtools.contrib.long_range_interaction.cis_trans_interactions

pybedtools.contrib.long_range_interaction.cis_trans_interactions(iterator, n, extra, verbose=True)[source]

Converts the output from tag_bedpe into a pandas DataFrame containing information about regions that contact each other in cis (same fragment) or trans (different fragments).

For example, given a BEDPE file representing 3D interactions in the genome, we want to identify which transcription start sites are connected to distal regions containing a peak.

>>> bedpe = pybedtools.example_bedtool('test_bedpe.bed')
>>> print(bedpe) 
chr1  1  10  chr1  50   90   pair1  5  +  -  x1
chr1  2  15  chr1  200  210  pair2  1  +  +  y1
>>> tsses = pybedtools.example_bedtool('test_tsses.bed')
>>> print(tsses) 
chr1  5   6   gene1
chr1  60  61  gene2
chr1  88  89  gene3
>>> peaks = pybedtools.example_bedtool('test_peaks.bed')
>>> print(peaks) 
chr1  3  4  peak1  50  .

Here's what the tracks look like. Note that pair1 is evidence of a gene1-gene2 interaction and a gene1-gene3 interaction:

 TRACKS:

               1         2  /         5         6    / 8         9     / 20
     0123456789012345678901 / 2345678901234567890123 / 012345678901234 / 0123456789
pair1 |||||||||------------ / --------|||||||||||||| / ||||||||||||||| /
pair2  |||||||||||||------- / ---------------------- / --------------- / ||||||||||
tsses     1                 /                   2    /         3
peaks   1
>>> from pybedtools.contrib.long_range_interaction import tag_bedpe
>>> from pybedtools.contrib.long_range_interaction import cis_tran_interactions
>>> iterator, n, extra = tag_bedpe(bedpe, {'tss': tsses, 'pk': peaks})
>>> for (label, group1, group2) in iterator:
...    for i in group1:
...        print(i, end='')
...    for i in group2:
...        print(i, end='')  
chr1  1    10   pair1  5  +  x1  pk   chr1  3   4   peak1  50  .  1
chr1  1    10   pair1  5  +  x1  tss  chr1  5   6   gene1  1
chr1  50   90   pair1  5  -  x1  tss  chr1  60  61  gene2  1
chr1  50   90   pair1  5  -  x1  tss  chr1  88  89  gene3  1
chr1  2    15   pair2  1  +  y1  pk   chr1  3   4   peak1  50  .  1
chr1  2    15   pair2  1  +  y1  tss  chr1  5   6   gene1  1
chr1  200  210  pair2  1  +  y1  .    -1    -1  .   -1     .  0

Now we run the same thing, but now aggregate it. Note that each piece of interaction evidence has its own line. The first line shows that pair1 has gene1 and peak1 in the same fragment, and that they are connected to gene2. The second line shows again that gene1 and peak1 are in the same fragmet and that they are also connected to gene3:

>>> iterator, n, extra = tag_bedpe(bedpe, {'tss': tsses, 'pk': peaks})
>>> df =  cis_trans_interactions(iterator, n, extra)
>>> print(df)
  target_label target_name cis_label cis_name distal_label distal_name  label
0          tss       gene1        pk    peak1          tss       gene2  pair1
1          tss       gene1        pk    peak1          tss       gene3  pair1
2           pk       peak1       tss    gene1          tss       gene2  pair1
3           pk       peak1       tss    gene1          tss       gene3  pair1
4          tss       gene2       tss    gene3          tss       gene1  pair1
5          tss       gene2       tss    gene3           pk       peak1  pair1
6          tss       gene3       tss    gene2          tss       gene1  pair1
7          tss       gene3       tss    gene2           pk       peak1  pair1
8          tss       gene1        pk    peak1            .           .  pair2
9           pk       peak1       tss    gene1            .           .  pair2

If we only care about genes:

>>> print(df[df.target_label == 'tss'])
  target_label target_name cis_label cis_name distal_label distal_name  label
0          tss       gene1        pk    peak1          tss       gene2  pair1
1          tss       gene1        pk    peak1          tss       gene3  pair1
4          tss       gene2       tss    gene3          tss       gene1  pair1
5          tss       gene2       tss    gene3           pk       peak1  pair1
6          tss       gene3       tss    gene2          tss       gene1  pair1
7          tss       gene3       tss    gene2           pk       peak1  pair1
8          tss       gene1        pk    peak1            .           .  pair2

Note that in pair2, there is no evidence of interaction between gene1 and gene2.

What interacts distally with gene2's TSS?

>>> print(set(df.ix[df.target_name == 'gene2', 'distal_name']).difference('.'))
set([u'gene1', u'peak1'])