Class StructureName

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Comparable<StructureName>, StructureIdentifier

    public class StructureName
    extends java.lang.Object
    implements java.lang.Comparable<StructureName>, java.io.Serializable, StructureIdentifier
    A utility class that makes working with names of structures, domains and ranges easier. Accepts a wide range of identifier formats, including ScopDomain, CathDomain, PDP domains, and SubstructureIdentifier residue ranges. Where possible, data is extracted from the input string. Otherwise, range information may be loaded from one of the factory classes: CathFactory,ScopFactory, etc.
    See Also:
    the name. e.g. 4hhb, 4hhb.A, d4hhba_, PDP:4HHBAa etc., Serialized Form
    • Field Detail

      • name

        protected java.lang.String name
      • pdbId

        protected java.lang.String pdbId
      • chainName

        protected java.lang.String chainName
    • Constructor Detail

      • StructureName

        public StructureName​(java.lang.String name)
        Create a new StructureName from the given identifier, which may be a domain name, a substructure identifier, etc.

        The source and PDB-Id are extracted at compile time, but fully interpreting the ID, which may require additional parsing or remote calls, is done lazily.

        The following sources are supported. Any may be prefixed by the source name followed by a colon (e.g. PDB:4HHB). In this case, that source will be used unequivocally. If no source is specified, StructureName will make a (usually reliable) guess as to which source was intended.

        • PDBPDB identifier, optionally followed by chain and/or residue ranges. Internally represented by a SubstructureIdentifier; see that class for the full format specification. Examples: 4hhb, 4hhb.A, 4hhb.A:1-50.
        • SCOP SCOP domain (or SCOPe, depending on the ScopFactory.getSCOP() version). Example: d1h6w.2
        • PDP Protein Domain Parser domain. PDP domains are not guessed, making the PDP: prefix obligatory. Example: PDP:4HHBAa
        • CATH Cath domains. Example: 1qvrC03
        • URL Arbitrary URLs. Most common protocols are handled, including http://, ftp://, and file://. Some parsing information can be passed as custom query parameters. Example: http://www.rcsb.org/pdb/files/1B8G.pdb.gz
        • FILE A file path. Supports relative paths and expands ~ to the user's home directory. Only existing files will be automatically detected; to refer to a potentially not-yet existing file, prepend the prefix. Internally represented as a URLIdentifier after path expansion. Example: ~/custom_protein.pdb
        • ECOD ECOD domain. Example: e1lyw.1
        • BIO Biological assembly. These are not guessed, making the BIO: prefix obligatory. Example: BIO:2ehz:1
        Parameters:
        name - An identifier string
        Throws:
        java.lang.IllegalArgumentException - if the name has a recognizable source but is semantically invalid
    • Method Detail

      • getChainId

        public java.lang.String getChainId()
        Gets the chain ID, for structures where it is unique and well-defined. May return '.' for multi-chain ranges, '_' for wildcard chains, or null if the information is unavailable.

        This method should only be used casually. For precise chainIds, it is better to use toCanonical() and iterate through the residue ranges.

        Returns:
      • getIdentifier

        public java.lang.String getIdentifier()
        Get the original form of the identifier
        Specified by:
        getIdentifier in interface StructureIdentifier
        Returns:
        The String form of this identifier
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • isScopName

        public boolean isScopName()
      • isPDPDomain

        public boolean isPDPDomain()
      • isCathID

        public boolean isCathID()
      • isPdbId

        public boolean isPdbId()
      • isURL

        public boolean isURL()
      • isFile

        public boolean isFile()
        Indicates that the identifier was determined to correspond to a file. Note that some file identifiers may also be valid URLs; in that case, the URL source is preferred.
        Returns:
      • isEcodDomain

        public boolean isEcodDomain()
      • isBioAssembly

        public boolean isBioAssembly()
      • getBaseIdentifier

        public StructureIdentifier getBaseIdentifier()
                                              throws StructureException
        StructureName wraps another StructureIdentifier. The type of the base identifier depends on the source. Most StructureName methods deligate to the base identifier.

        It is possible that future versions of StructureName might change the return type. Except for some specialized uses, it is probably better to create the correct type of identifier directly, rather than creating a StructureName and casting the result of this method.

        Returns:
        A Str
        Throws:
        StructureException - Wraps exceptions that may be thrown by individual implementations. For example, a SCOP identifier may require that the domain definitions be available for download.
      • toCanonical

        public SubstructureIdentifier toCanonical()
                                           throws StructureException
        Description copied from interface: StructureIdentifier
        Convert to a canonical SubstructureIdentifier.

        This allows all domains to be converted to a standard format String.

        Specified by:
        toCanonical in interface StructureIdentifier
        Returns:
        A SubstructureIdentifier equivalent to this
        Throws:
        StructureException - Wraps exceptions that may be thrown by individual implementations. For example, a SCOP identifier may require that the domain definitions be available for download.
      • loadStructure

        public Structure loadStructure​(AtomCache cache)
                                throws StructureException,
                                       java.io.IOException
        Description copied from interface: StructureIdentifier
        Loads a structure encompassing the structure identified. The Structure returned should be suitable for passing as the input to StructureIdentifier.reduce(Structure). It is recommended that the most complete structure available be returned (e.g. the full PDB) to allow processing of unselected portions where appropriate.
        Specified by:
        loadStructure in interface StructureIdentifier
        Returns:
        A Structure containing at least the atoms identified by this, or null if Structures are not applicable.
        Throws:
        StructureException - For errors loading and parsing the structure
        java.io.IOException - Errors reading the structure from disk
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • equals

        public boolean equals​(java.lang.Object obj)
        Overrides:
        equals in class java.lang.Object
      • compareTo

        public int compareTo​(StructureName o)
        Orders identifiers lexicographically by PDB ID and then full Identifier
        Specified by:
        compareTo in interface java.lang.Comparable<StructureName>
      • guessScopDomain

        public static ScopDomain guessScopDomain​(java.lang.String name,
                                                 ScopDatabase scopDB)

        Guess a scop domain. If an exact match is found, return that.

        Otherwise, return the first scop domain found for the specified protein such that

        • The chains match, or one of the chains is '_' or '.'.
        • The domains match, or one of the domains is '_'.
        In some cases there may be several valid matches. In this case a warning will be logged.
        Parameters:
        name - SCOP domain name, or a guess thereof
        scopDB - SCOP domain provider
        Returns:
        The best match for name among the domains of scopDB, or null if none match.