skbio.sequence.
GrammaredSequence
(sequence, metadata=None, positional_metadata=None, interval_metadata=None, lowercase=False, validate=True)[source]¶Store sequence data conforming to a character set.
This is an abstract base class (ABC) that cannot be instantiated.
This class is intended to be inherited from to create grammared sequences with custom alphabets.
Raises: | ValueError – If sequence characters are not in the character set [1]. |
---|
References
[1] | Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. May 10, 1985; 13(9): 3021-3030. A Cornish-Bowden |
Examples
Note in the example below that properties either need to be static or use skbio’s classproperty decorator.
>>> from skbio.sequence import GrammaredSequence
>>> from skbio.util import classproperty
>>> class CustomSequence(GrammaredSequence):
... @classproperty
... def degenerate_map(cls):
... return {"X": set("AB")}
...
... @classproperty
... def definite_chars(cls):
... return set("ABC")
...
...
... @classproperty
... def default_gap_char(cls):
... return '-'
...
... @classproperty
... def gap_chars(cls):
... return set('-.')
>>> seq = CustomSequence('ABABACAC')
>>> seq
CustomSequence
--------------------------
Stats:
length: 8
has gaps: False
has degenerates: False
has definites: True
--------------------------
0 ABABACAC
>>> seq = CustomSequence('XXXXXX')
>>> seq
CustomSequence
-------------------------
Stats:
length: 6
has gaps: False
has degenerates: True
has definites: False
-------------------------
0 XXXXXX
Attributes
alphabet |
Return valid characters. |
default_gap_char |
Gap character to use when constructing a new gapped sequence. |
default_write_format |
|
definite_chars |
Return definite characters. |
degenerate_chars |
Return degenerate characters. |
degenerate_map |
Return mapping of degenerate to definite characters. |
gap_chars |
Return characters defined as gaps. |
interval_metadata |
IntervalMetadata object containing info about interval features. |
metadata |
dict containing metadata which applies to the entire object. |
nondegenerate_chars |
Return non-degenerate characters. |
observed_chars |
Set of observed characters in the sequence. |
positional_metadata |
pd.DataFrame containing metadata along an axis. |
values |
Array containing underlying sequence characters. |
Built-ins
bool(gs) |
Returns truth value (truthiness) of sequence. |
x in gs |
Determine if a subsequence is contained in this sequence. |
copy.copy(gs) |
Return a shallow copy of this sequence. |
copy.deepcopy(gs) |
Return a deep copy of this sequence. |
gs1 == gs2 |
Determine if this sequence is equal to another. |
gs[x] |
Slice this sequence. |
__init_subclass__ |
This method is called when a class is subclassed. |
iter(gs) |
Iterate over positions in this sequence. |
len(gs) |
Return the number of characters in this sequence. |
gs1 != gs2 |
Determine if this sequence is not equal to another. |
reversed(gs) |
Iterate over positions in this sequence in reverse order. |
str(gs) |
Return sequence characters as a string. |
Methods
concat (sequences[, how]) |
Concatenate an iterable of Sequence objects. |
count (subsequence[, start, end]) |
Count occurrences of a subsequence in this sequence. |
definites () |
Find positions containing definite characters in the sequence. |
degap () |
Return a new sequence with gap characters removed. |
degenerates () |
Find positions containing degenerate characters in the sequence. |
distance (other[, metric]) |
Compute the distance to another sequence. |
expand_degenerates () |
Yield all possible definite versions of the sequence. |
find_motifs (motif_type[, min_length, ignore]) |
Search the biological sequence for motifs. |
find_with_regex (regex[, ignore]) |
Generate slices for patterns matched by a regular expression. |
frequencies ([chars, relative]) |
Compute frequencies of characters in the sequence. |
gaps () |
Find positions containing gaps in the biological sequence. |
has_definites () |
Determine if sequence contains one or more definite characters |
has_degenerates () |
Determine if sequence contains one or more degenerate characters. |
has_gaps () |
Determine if the sequence contains one or more gap characters. |
has_interval_metadata () |
Determine if the object has interval metadata. |
has_metadata () |
Determine if the object has metadata. |
has_nondegenerates () |
Determine if sequence contains one or more non-degenerate characters |
has_positional_metadata () |
Determine if the object has positional metadata. |
index (subsequence[, start, end]) |
Find position where subsequence first occurs in the sequence. |
iter_contiguous (included[, min_length, invert]) |
Yield contiguous subsequences based on included. |
iter_kmers (k[, overlap]) |
Generate kmers of length k from this sequence. |
kmer_frequencies (k[, overlap, relative]) |
Return counts of words of length k from this sequence. |
lowercase (lowercase) |
Return a case-sensitive string representation of the sequence. |
match_frequency (other[, relative]) |
Return count of positions that are the same between two sequences. |
matches (other) |
Find positions that match with another sequence. |
mismatch_frequency (other[, relative]) |
Return count of positions that differ between two sequences. |
mismatches (other) |
Find positions that do not match with another sequence. |
nondegenerates () |
Find positions containing non-degenerate characters in the sequence. |
read (file[, format]) |
Create a new Sequence instance from a file. |
replace (where, character) |
Replace values in this sequence with a different character. |
to_regex () |
Return regular expression object that accounts for degenerate chars. |
write (file[, format]) |
Write an instance of Sequence to a file. |