Class CollectSequencingArtifactMetrics


  • @DocumentedFeature
    public class CollectSequencingArtifactMetrics
    extends SinglePassSamProgram
    Quantify substitution errors caused by mismatched base pairings during various stages of sample / library prep. We measure two distinct error types - artifacts that are introduced before the addition of the read1/read2 adapters ("pre adapter") and those that are introduced after target selection ("bait bias"). For each of these, we provide summary metrics as well as detail metrics broken down by reference context (the ref bases surrounding the substitution event). For a deeper explanation, see Costello et al. 2013: http://www.ncbi.nlm.nih.gov/pubmed/23303777
    • Field Detail

      • INTERVALS

        @Argument(doc="An optional list of intervals to restrict analysis to.",
                  optional=true)
        public File INTERVALS
      • DB_SNP

        @Argument(doc="VCF format dbSNP file, used to exclude regions around known polymorphisms from analysis.",
                  optional=true)
        public File DB_SNP
      • MINIMUM_QUALITY_SCORE

        @Argument(shortName="Q",
                  doc="The minimum base quality score for a base to be included in analysis.")
        public int MINIMUM_QUALITY_SCORE
      • MINIMUM_MAPPING_QUALITY

        @Argument(shortName="MQ",
                  doc="The minimum mapping quality score for a base to be included in analysis.")
        public int MINIMUM_MAPPING_QUALITY
      • MINIMUM_INSERT_SIZE

        @Argument(shortName="MIN_INS",
                  doc="The minimum insert size for a read to be included in analysis.")
        public int MINIMUM_INSERT_SIZE
      • MAXIMUM_INSERT_SIZE

        @Argument(shortName="MAX_INS",
                  doc="The maximum insert size for a read to be included in analysis. Set to 0 to have no maximum.")
        public int MAXIMUM_INSERT_SIZE
      • INCLUDE_UNPAIRED

        @Argument(shortName="UNPAIRED",
                  doc="Include unpaired reads. If set to true then all paired reads will be included as well - MINIMUM_INSERT_SIZE and MAXIMUM_INSERT_SIZE will be ignored.")
        public boolean INCLUDE_UNPAIRED
      • INCLUDE_DUPLICATES

        @Argument(shortName="DUPES",
                  doc="Include duplicate reads. If set to true then all reads flagged as duplicates will be included as well.")
        public boolean INCLUDE_DUPLICATES
      • INCLUDE_NON_PF_READS

        @Argument(shortName="NON_PF",
                  doc="Whether or not to include non-PF reads.")
        public boolean INCLUDE_NON_PF_READS
      • TANDEM_READS

        @Argument(shortName="TANDEM",
                  doc="Set to true if mate pairs are being sequenced from the same strand, i.e. they\'re expected to face the same direction.")
        public boolean TANDEM_READS
      • USE_OQ

        @Argument(doc="When available, use original quality scores for filtering.")
        public boolean USE_OQ
      • CONTEXT_SIZE

        @Argument(doc="The number of context bases to include on each side of the assayed base.")
        public int CONTEXT_SIZE
      • CONTEXTS_TO_PRINT

        @Argument(doc="If specified, only print results for these contexts in the detail metrics output. However, the summary metrics output will still take all contexts into consideration.",
                  optional=true)
        public Set<String> CONTEXTS_TO_PRINT
      • FILE_EXTENSION

        @Argument(shortName="EXT",
                  doc="Append the given file extension to all metric file names (ex. OUTPUT.pre_adapter_summary_metrics.EXT). None if null",
                  optional=true)
        public String FILE_EXTENSION
    • Constructor Detail

      • CollectSequencingArtifactMetrics

        public CollectSequencingArtifactMetrics()
    • Method Detail

      • customCommandLineValidation

        protected String[] customCommandLineValidation()
        Description copied from class: CommandLineProgram
        Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access argv. Any options set by command-line parser can be validated.
        Overrides:
        customCommandLineValidation in class CommandLineProgram
        Returns:
        null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.
      • setup

        protected void setup​(htsjdk.samtools.SAMFileHeader header,
                             File samFile)
        Description copied from class: SinglePassSamProgram
        Should be implemented by subclasses to do one-time initialization work.
        Specified by:
        setup in class SinglePassSamProgram
      • acceptRead

        protected void acceptRead​(htsjdk.samtools.SAMRecord rec,
                                  htsjdk.samtools.reference.ReferenceSequence ref)
        Description copied from class: SinglePassSamProgram
        Should be implemented by subclasses to accept SAMRecords one at a time. If the read has a reference sequence and a reference sequence file was supplied to the program it will be passed as 'ref'. Otherwise 'ref' may be null.
        Specified by:
        acceptRead in class SinglePassSamProgram