skbio.sequence.distance.hamming

skbio.sequence.distance.hamming(seq1, seq2)[source]

Compute Hamming distance between two sequences.

State: Experimental as of 0.4.2.

The Hamming distance between two equal-length sequences is the proportion of differing characters.

Parameters:

seq2 (seq1,) – Sequences to compute Hamming distance between.

Returns:

Hamming distance between seq1 and seq2.

Return type:

float

Raises:
  • TypeError – If seq1 and seq2 are not Sequence instances.
  • TypeError – If seq1 and seq2 are not the same type.
  • ValueError – If seq1 and seq2 are not the same length.

See also

scipy.spatial.distance.hamming()

Notes

np.nan will be returned if the sequences do not contain any characters.

This function does not make assumptions about the sequence alphabet in use. Each sequence object’s underlying sequence of characters are used to compute Hamming distance. Characters that may be considered equivalent in certain contexts (e.g., - and . as gap characters) are treated as distinct characters when computing Hamming distance.

Examples

>>> from skbio import Sequence
>>> from skbio.sequence.distance import hamming
>>> seq1 = Sequence('AGGGTA')
>>> seq2 = Sequence('CGTTTA')
>>> hamming(seq1, seq2)
0.5