Class Allele

  • All Implemented Interfaces:
    Serializable, Comparable<Allele>

    public class Allele
    extends Object
    implements Comparable<Allele>, Serializable
    Immutable representation of an allele.

    Types of alleles:

     Ref: a t C g a // C is the reference base
        : a t G g a // C base is a G in some individuals
        : a t - g a // C base is deleted w.r.t. the reference
        : a t CAg a // A base is inserted w.r.t. the reference sequence
     

    In these cases, where are the alleles?

    • SNP polymorphism of C/G -> { C , G } -> C is the reference allele
    • 1 base deletion of C -> { tC , t } -> C is the reference allele and we include the preceding reference base (null alleles are not allowed)
    • 1 base insertion of A -> { C ; CA } -> C is the reference allele (because null alleles are not allowed)

    Suppose I see a the following in the population:

     Ref: a t C g a // C is the reference base
        : a t G g a // C base is a G in some individuals
        : a t - g a // C base is deleted w.r.t. the reference
     

    How do I represent this? There are three segregating alleles:

    { C , G , - }

    and these are represented as:

    { tC, tG, t }

    Now suppose I have this more complex example:

     Ref: a t C g a // C is the reference base
        : a t - g a
        : a t - - a
        : a t CAg a
     

    There are actually four segregating alleles:

    { Cg , -g, --, and CAg } over bases 2-4

    represented as:

    { tCg, tg, t, tCAg }

    Critically, it should be possible to apply an allele to a reference sequence to create the correct haplotype sequence:

    Allele + reference => haplotype

    For convenience, we are going to create Alleles where the GenomeLoc of the allele is stored outside of the Allele object itself. So there's an idea of an A/C polymorphism independent of it's surrounding context. Given list of alleles it's possible to determine the "type" of the variation

          A / C @ loc => SNP
          - / A => INDEL
     

    If you know where allele is the reference, you can determine whether the variant is an insertion or deletion.

    Alelle also supports is concept of a NO_CALL allele. This Allele represents a haplotype that couldn't be determined. This is usually represented by a '.' allele.

    Note that Alleles store all bases as bytes, in **UPPER CASE**. So 'atc' == 'ATC' from the perspective of an Allele.

    See Also:
    Serialized Form
    • Field Detail

      • SPAN_DEL

        public static final Allele SPAN_DEL
      • NO_CALL

        public static final Allele NO_CALL
      • NON_REF_ALLELE

        public static final Allele NON_REF_ALLELE
    • Constructor Detail

      • Allele

        protected Allele​(byte[] bases,
                         boolean isRef)
      • Allele

        protected Allele​(String bases,
                         boolean isRef)
      • Allele

        protected Allele​(Allele allele,
                         boolean ignoreRefState)
        Creates a new allele based on the provided one. Ref state will be copied unless ignoreRefState is true (in which case the returned allele will be non-Ref). This method is efficient because it can skip the validation of the bases (since the original allele was already validated)
        Parameters:
        allele - the allele from which to copy the bases
        ignoreRefState - should we ignore the reference state of the input allele and use the default ref state?
    • Method Detail

      • create

        public static Allele create​(byte[] bases,
                                    boolean isRef)
        Create a new Allele that includes bases and if tagged as the reference allele if isRef == true. If bases == '-', a Null allele is created. If bases == '.', a no call Allele is created. If bases == '*', a spanning deletions Allele is created.
        Parameters:
        bases - the DNA sequence of this variation, '-', '.', or '*'
        isRef - should we make this a reference allele?
        Throws:
        IllegalArgumentException - if bases contains illegal characters or is otherwise malformated
      • create

        public static Allele create​(byte base,
                                    boolean isRef)
      • create

        public static Allele create​(byte base)
      • extend

        public static Allele extend​(Allele left,
                                    byte[] right)
      • wouldBeNullAllele

        public static boolean wouldBeNullAllele​(byte[] bases)
        Parameters:
        bases - bases representing an allele
        Returns:
        true if the bases represent the null allele
      • wouldBeStarAllele

        public static boolean wouldBeStarAllele​(byte[] bases)
        Parameters:
        bases - bases representing an allele
        Returns:
        true if the bases represent the SPAN_DEL allele
      • wouldBeNoCallAllele

        public static boolean wouldBeNoCallAllele​(byte[] bases)
        Parameters:
        bases - bases representing an allele
        Returns:
        true if the bases represent the NO_CALL allele
      • wouldBeSymbolicAllele

        public static boolean wouldBeSymbolicAllele​(byte[] bases)
        Parameters:
        bases - bases representing an allele
        Returns:
        true if the bases represent a symbolic allele
      • acceptableAlleleBases

        public static boolean acceptableAlleleBases​(String bases)
        Parameters:
        bases - bases representing a reference allele
        Returns:
        true if the bases represent the well formatted allele
      • acceptableAlleleBases

        public static boolean acceptableAlleleBases​(String bases,
                                                    boolean isReferenceAllele)
        Parameters:
        bases - bases representing an allele
        isReferenceAllele - is a reference allele
        Returns:
        true if the bases represent the well formatted allele
      • acceptableAlleleBases

        public static boolean acceptableAlleleBases​(byte[] bases)
        Parameters:
        bases - bases representing a reference allele
        Returns:
        true if the bases represent the well formatted allele
      • acceptableAlleleBases

        public static boolean acceptableAlleleBases​(byte[] bases,
                                                    boolean isReferenceAllele)
        Parameters:
        bases - bases representing an allele
        isReferenceAllele - true if a reference allele
        Returns:
        true if the bases represent the well formatted allele
      • create

        public static Allele create​(String bases,
                                    boolean isRef)
        Parameters:
        bases - bases representing an allele
        isRef - is this the reference allele?
        See Also:
        Allele(byte[], boolean)
      • create

        public static Allele create​(String bases)
        Creates a non-Ref allele. @see Allele(byte[], boolean) for full information
        Parameters:
        bases - bases representing an allele
      • create

        public static Allele create​(byte[] bases)
        Creates a non-Ref allele. @see Allele(byte[], boolean) for full information
        Parameters:
        bases - bases representing an allele
      • create

        public static Allele create​(Allele allele,
                                    boolean ignoreRefState)
        Creates a new allele based on the provided one. Ref state will be copied unless ignoreRefState is true (in which case the returned allele will be non-Ref). This method is efficient because it can skip the validation of the bases (since the original allele was already validated)
        Parameters:
        allele - the allele from which to copy the bases
        ignoreRefState - should we ignore the reference state of the input allele and use the default ref state?
      • isNoCall

        public boolean isNoCall()
      • isCalled

        public boolean isCalled()
      • isReference

        public boolean isReference()
      • isNonReference

        public boolean isNonReference()
      • isSymbolic

        public boolean isSymbolic()
      • getBases

        public byte[] getBases()
        Return the DNA bases segregating in this allele. Note this isn't reference polarized, so the Null allele is represented by a vector of length 0
        Returns:
        the segregating bases
      • getBaseString

        public String getBaseString()
        Return the DNA bases segregating in this allele in String format. This is useful, because toString() adds a '*' to reference alleles and getBases() returns garbage when you call toString() on it.
        Returns:
        the segregating bases
      • getDisplayString

        public String getDisplayString()
        Return the printed representation of this allele. Same as getBaseString(), except for symbolic alleles. For symbolic alleles, the base string is empty while the display string contains <TAG>.
        Returns:
        the allele string representation
      • getDisplayBases

        public byte[] getDisplayBases()
        Same as #getDisplayString() but returns the result as byte[]. Slightly faster then getDisplayString()
        Returns:
        the allele string representation
      • equals

        public boolean equals​(Object other)
        Overrides:
        equals in class Object
        Parameters:
        other - the other allele
        Returns:
        true if these alleles are equal
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object
        Returns:
        hash code
      • equals

        public boolean equals​(Allele other,
                              boolean ignoreRefState)
        Returns true if this and other are equal. If ignoreRefState is true, then doesn't require both alleles has the same ref tag
        Parameters:
        other - allele to compare to
        ignoreRefState - if true, ignore ref state in comparison
        Returns:
        true if this and other are equal
      • basesMatch

        public boolean basesMatch​(byte[] test)
        Parameters:
        test - bases to test against
        Returns:
        true if this Allele contains the same bases as test, regardless of its reference status; handles Null and NO_CALL alleles
      • basesMatch

        public boolean basesMatch​(String test)
        Parameters:
        test - bases to test against
        Returns:
        true if this Allele contains the same bases as test, regardless of its reference status; handles Null and NO_CALL alleles
      • basesMatch

        public boolean basesMatch​(Allele test)
        Parameters:
        test - allele to test against
        Returns:
        true if this Allele contains the same bases as test, regardless of its reference status; handles Null and NO_CALL alleles
      • length

        public int length()
        Returns:
        the length of this allele. Null and NO_CALL alleles have 0 length.
      • getMatchingAllele

        public static Allele getMatchingAllele​(Collection<Allele> allAlleles,
                                               byte[] alleleBases)
      • oneIsPrefixOfOther

        public static boolean oneIsPrefixOfOther​(Allele a1,
                                                 Allele a2)