Class CJKAnalyzer

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class CJKAnalyzer
    extends org.apache.lucene.analysis.StopwordAnalyzerBase
    An Analyzer that tokenizes text with StandardTokenizer, normalizes content with CJKWidthFilter, folds case with LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter, and filters stopwords with StopFilter
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase

        org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static String[] STOP_WORDS
      Deprecated.
      • Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase

        matchVersion, stopwords
    • Constructor Summary

      Constructors 
      Constructor Description
      CJKAnalyzer​(org.apache.lucene.util.Version matchVersion)
      Builds an analyzer which removes words in getDefaultStopSet().
      CJKAnalyzer​(org.apache.lucene.util.Version matchVersion, String... stopWords)
      Deprecated.
      CJKAnalyzer​(org.apache.lucene.util.Version matchVersion, Set<?> stopwords)
      Builds an analyzer with the given stop words
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents​(String fieldName, Reader reader)  
      static Set<?> getDefaultStopSet()
      Returns an unmodifiable instance of the default stop-words set.
      • Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase

        getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet
      • Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase

        initReader, reusableTokenStream, tokenStream
      • Methods inherited from class org.apache.lucene.analysis.Analyzer

        close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream
    • Field Detail

      • STOP_WORDS

        @Deprecated
        public static final String[] STOP_WORDS
        Deprecated.
        An array containing some common English words that are not usually useful for searching and some double-byte interpunctions.
    • Constructor Detail

      • CJKAnalyzer

        public CJKAnalyzer​(org.apache.lucene.util.Version matchVersion)
        Builds an analyzer which removes words in getDefaultStopSet().
      • CJKAnalyzer

        public CJKAnalyzer​(org.apache.lucene.util.Version matchVersion,
                           Set<?> stopwords)
        Builds an analyzer with the given stop words
        Parameters:
        matchVersion - lucene compatibility version
        stopwords - a stopword set
      • CJKAnalyzer

        @Deprecated
        public CJKAnalyzer​(org.apache.lucene.util.Version matchVersion,
                           String... stopWords)
        Deprecated.
        Builds an analyzer which removes words in the provided array.
        Parameters:
        stopWords - stop word array
    • Method Detail

      • getDefaultStopSet

        public static Set<?> getDefaultStopSet()
        Returns an unmodifiable instance of the default stop-words set.
        Returns:
        an unmodifiable instance of the default stop-words set.
      • createComponents

        protected org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents​(String fieldName,
                                                                                                         Reader reader)
        Specified by:
        createComponents in class org.apache.lucene.analysis.ReusableAnalyzerBase