Class EdgeNGramTokenizer

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class EdgeNGramTokenizer
    extends org.apache.lucene.analysis.Tokenizer
    Tokenizes the input from an edge into n-grams of given size(s).

    This Tokenizer create n-grams from the beginning edge or ending edge of a input token. MaxGram can't be larger than 1024 because of limitation.

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  EdgeNGramTokenizer.Side
      Specifies which side of the input the n-gram should be generated from
      • Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

        org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
    • Constructor Summary

      Constructors 
      Constructor Description
      EdgeNGramTokenizer​(Reader input, String sideLabel, int minGram, int maxGram)
      Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
      EdgeNGramTokenizer​(Reader input, EdgeNGramTokenizer.Side side, int minGram, int maxGram)
      Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
      EdgeNGramTokenizer​(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader input, String sideLabel, int minGram, int maxGram)
      Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
      EdgeNGramTokenizer​(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader input, EdgeNGramTokenizer.Side side, int minGram, int maxGram)
      Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
      EdgeNGramTokenizer​(org.apache.lucene.util.AttributeSource source, Reader input, String sideLabel, int minGram, int maxGram)
      Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
      EdgeNGramTokenizer​(org.apache.lucene.util.AttributeSource source, Reader input, EdgeNGramTokenizer.Side side, int minGram, int maxGram)
      Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void end()  
      boolean incrementToken()
      Returns the next token in the stream, or null at EOS.
      void reset()  
      • Methods inherited from class org.apache.lucene.analysis.Tokenizer

        close, correctOffset, reset
      • Methods inherited from class org.apache.lucene.util.AttributeSource

        addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
    • Constructor Detail

      • EdgeNGramTokenizer

        public EdgeNGramTokenizer​(Reader input,
                                  EdgeNGramTokenizer.Side side,
                                  int minGram,
                                  int maxGram)
        Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
        Parameters:
        input - Reader holding the input to be tokenized
        side - the EdgeNGramTokenizer.Side from which to chop off an n-gram
        minGram - the smallest n-gram to generate
        maxGram - the largest n-gram to generate
      • EdgeNGramTokenizer

        public EdgeNGramTokenizer​(org.apache.lucene.util.AttributeSource source,
                                  Reader input,
                                  EdgeNGramTokenizer.Side side,
                                  int minGram,
                                  int maxGram)
        Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
        Parameters:
        source - AttributeSource to use
        input - Reader holding the input to be tokenized
        side - the EdgeNGramTokenizer.Side from which to chop off an n-gram
        minGram - the smallest n-gram to generate
        maxGram - the largest n-gram to generate
      • EdgeNGramTokenizer

        public EdgeNGramTokenizer​(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
                                  Reader input,
                                  EdgeNGramTokenizer.Side side,
                                  int minGram,
                                  int maxGram)
        Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
        Parameters:
        factory - AttributeSource.AttributeFactory to use
        input - Reader holding the input to be tokenized
        side - the EdgeNGramTokenizer.Side from which to chop off an n-gram
        minGram - the smallest n-gram to generate
        maxGram - the largest n-gram to generate
      • EdgeNGramTokenizer

        public EdgeNGramTokenizer​(Reader input,
                                  String sideLabel,
                                  int minGram,
                                  int maxGram)
        Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
        Parameters:
        input - Reader holding the input to be tokenized
        sideLabel - the name of the EdgeNGramTokenizer.Side from which to chop off an n-gram
        minGram - the smallest n-gram to generate
        maxGram - the largest n-gram to generate
      • EdgeNGramTokenizer

        public EdgeNGramTokenizer​(org.apache.lucene.util.AttributeSource source,
                                  Reader input,
                                  String sideLabel,
                                  int minGram,
                                  int maxGram)
        Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
        Parameters:
        source - AttributeSource to use
        input - Reader holding the input to be tokenized
        sideLabel - the name of the EdgeNGramTokenizer.Side from which to chop off an n-gram
        minGram - the smallest n-gram to generate
        maxGram - the largest n-gram to generate
      • EdgeNGramTokenizer

        public EdgeNGramTokenizer​(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
                                  Reader input,
                                  String sideLabel,
                                  int minGram,
                                  int maxGram)
        Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
        Parameters:
        factory - AttributeSource.AttributeFactory to use
        input - Reader holding the input to be tokenized
        sideLabel - the name of the EdgeNGramTokenizer.Side from which to chop off an n-gram
        minGram - the smallest n-gram to generate
        maxGram - the largest n-gram to generate
    • Method Detail

      • incrementToken

        public boolean incrementToken()
                               throws IOException
        Returns the next token in the stream, or null at EOS.
        Specified by:
        incrementToken in class org.apache.lucene.analysis.TokenStream
        Throws:
        IOException
      • end

        public void end()
        Overrides:
        end in class org.apache.lucene.analysis.TokenStream
      • reset

        public void reset()
                   throws IOException
        Overrides:
        reset in class org.apache.lucene.analysis.TokenStream
        Throws:
        IOException