Class DutchAnalyzer

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class DutchAnalyzer
    extends org.apache.lucene.analysis.ReusableAnalyzerBase
    Analyzer for Dutch language.

    Supports an external list of stopwords (words that will not be indexed at all), an external list of exclusions (word that will not be stemmed, but indexed) and an external list of word-stem pairs that overrule the algorithm (dictionary stemming). A default set of stopwords is used unless an alternative list is specified, but the exclusion list is empty by default.

    You must specify the required Version compatibility when creating DutchAnalyzer:

    NOTE: This class uses the same Version dependent settings as StandardAnalyzer.

    • Constructor Detail

      • DutchAnalyzer

        public DutchAnalyzer​(org.apache.lucene.util.Version matchVersion)
        Builds an analyzer with the default stop words (getDefaultStopSet()) and a few default entries for the stem exclusion table.
      • DutchAnalyzer

        public DutchAnalyzer​(org.apache.lucene.util.Version matchVersion,
                             Set<?> stopwords)
      • DutchAnalyzer

        public DutchAnalyzer​(org.apache.lucene.util.Version matchVersion,
                             Set<?> stopwords,
                             Set<?> stemExclusionTable)
      • DutchAnalyzer

        public DutchAnalyzer​(org.apache.lucene.util.Version matchVersion,
                             Set<?> stopwords,
                             Set<?> stemExclusionTable,
                             org.apache.lucene.analysis.CharArrayMap<String> stemOverrideDict)
      • DutchAnalyzer

        @Deprecated
        public DutchAnalyzer​(org.apache.lucene.util.Version matchVersion,
                             String... stopwords)
        Deprecated.
        Builds an analyzer with the given stop words.
        Parameters:
        matchVersion -
        stopwords -
      • DutchAnalyzer

        @Deprecated
        public DutchAnalyzer​(org.apache.lucene.util.Version matchVersion,
                             HashSet<?> stopwords)
        Deprecated.
        Builds an analyzer with the given stop words.
        Parameters:
        stopwords -
      • DutchAnalyzer

        @Deprecated
        public DutchAnalyzer​(org.apache.lucene.util.Version matchVersion,
                             File stopwords)
        Deprecated.
        Builds an analyzer with the given stop words.
        Parameters:
        stopwords -
    • Method Detail

      • getDefaultStopSet

        public static Set<?> getDefaultStopSet()
        Returns an unmodifiable instance of the default stop-words set.
        Returns:
        an unmodifiable instance of the default stop-words set.
      • setStemDictionary

        @Deprecated
        public void setStemDictionary​(File stemdictFile)
        Deprecated.
        This prevents reuse of TokenStreams. If you wish to use a custom stem dictionary, create your own Analyzer with StemmerOverrideFilter
        Reads a stemdictionary file , that overrules the stemming algorithm This is a textfile that contains per line word\tstem, i.e: two tab seperated words
      • createComponents

        protected org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents​(String fieldName,
                                                                                                         Reader aReader)
        Returns a (possibly reused) TokenStream which tokenizes all the text in the provided Reader.
        Specified by:
        createComponents in class org.apache.lucene.analysis.ReusableAnalyzerBase
        Returns:
        A TokenStream built from a StandardTokenizer filtered with StandardFilter, LowerCaseFilter, StopFilter, KeywordMarkerFilter if a stem exclusion set is provided, StemmerOverrideFilter, and SnowballFilter