Uses of Class
org.apache.lucene.analysis.TokenStream
-
Packages that use TokenStream Package Description org.apache.lucene.analysis API and code to convert text into indexable/searchable tokens.org.apache.lucene.analysis.ar Analyzer for Arabic.org.apache.lucene.analysis.bg Analyzer for Bulgarian.org.apache.lucene.analysis.br Analyzer for Brazilian Portuguese.org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).org.apache.lucene.analysis.cn Analyzer for Chinese, which indexes unigrams (individual chinese characters).org.apache.lucene.analysis.cn.smart Analyzer for Simplified Chinese, which indexes words.org.apache.lucene.analysis.compound A filter that decomposes compound words you find in many Germanic languages into the word parts.org.apache.lucene.analysis.cz Analyzer for Czech.org.apache.lucene.analysis.de Analyzer for German.org.apache.lucene.analysis.el Analyzer for Greek.org.apache.lucene.analysis.en Analyzer for English.org.apache.lucene.analysis.es Analyzer for Spanish.org.apache.lucene.analysis.fa Analyzer for Persian.org.apache.lucene.analysis.fi Analyzer for Finnish.org.apache.lucene.analysis.fr Analyzer for French.org.apache.lucene.analysis.ga Analysis for Irish.org.apache.lucene.analysis.gl Analyzer for Galician.org.apache.lucene.analysis.hi Analyzer for Hindi.org.apache.lucene.analysis.hu Analyzer for Hungarian.org.apache.lucene.analysis.hunspell Stemming TokenFilter using a Java implementation of the Hunspell stemming algorithm.org.apache.lucene.analysis.icu Analysis components based on ICUorg.apache.lucene.analysis.icu.segmentation Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.org.apache.lucene.analysis.id Analyzer for Indonesian.org.apache.lucene.analysis.in Analysis components for Indian languages.org.apache.lucene.analysis.it Analyzer for Italian.org.apache.lucene.analysis.ja Analyzer for Japanese.org.apache.lucene.analysis.lv Analyzer for Latvian.org.apache.lucene.analysis.miscellaneous Miscellaneous TokenStreamsorg.apache.lucene.analysis.ngram Character n-gram tokenizers and filters.org.apache.lucene.analysis.nl Analyzer for Dutch.org.apache.lucene.analysis.no Analyzer for Norwegian.org.apache.lucene.analysis.path Analysis components for path-like strings such as filenames.org.apache.lucene.analysis.payloads Provides various convenience classes for creating payloads on Tokens.org.apache.lucene.analysis.phonetic Analysis components for phonetic search.org.apache.lucene.analysis.position Filter for assigning position increments.org.apache.lucene.analysis.pt Analyzer for Portuguese.org.apache.lucene.analysis.query Automatically filter high-frequency stopwords.org.apache.lucene.analysis.reverse Filter to reverse token text.org.apache.lucene.analysis.ru Analyzer for Russian.org.apache.lucene.analysis.shingle Word n-gram filtersorg.apache.lucene.analysis.snowball TokenFilter
andAnalyzer
implementations that use Snowball stemmers.org.apache.lucene.analysis.standard Standards-based analyzers implemented with JFlex.org.apache.lucene.analysis.stempel Stempel: Algorithmic Stemmerorg.apache.lucene.analysis.sv Analyzer for Swedish.org.apache.lucene.analysis.synonym Analysis components for Synonyms.org.apache.lucene.analysis.th Analyzer for Thai.org.apache.lucene.analysis.tr Analyzer for Turkish.org.apache.lucene.analysis.wikipedia Tokenizer that is aware of Wikipedia syntax.org.apache.lucene.collation CollationKeyFilter
converts each token into its binaryCollationKey
using the providedCollator
, and then encode theCollationKey
as a String usingIndexableBinaryStringTools
, to allow it to be stored as an index term.org.apache.lucene.document The logical representation of aDocument
for indexing and searching.org.apache.lucene.facet.enhancements Enhanced category featuresorg.apache.lucene.facet.enhancements.association Association category enhancementsorg.apache.lucene.facet.index Indexing of document categoriesorg.apache.lucene.facet.index.streaming Expert: attributes streaming definition for indexing facetsorg.apache.lucene.index.memory High-performance single-document main memory Apache Lucene fulltext search index.org.apache.lucene.queryParser A simple query parser implemented with JavaCC.org.apache.lucene.search.highlight The highlight package contains classes to provide "keyword in context" features typically used to highlight search terms in the text of results pages. -
-
Uses of TokenStream in org.apache.lucene.analysis
Subclasses of TokenStream in org.apache.lucene.analysis Modifier and Type Class Description class
ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.class
CachingTokenFilter
This class can be used if the token attributes of a TokenStream are intended to be consumed more than once.class
CannedTokenStream
TokenStream from a canned list of Tokens.class
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.class
EmptyTokenizer
Emits no tokensclass
FilteringTokenFilter
Abstract base class for TokenFilters that may remove tokens.class
ISOLatin1AccentFilter
Deprecated.If you build a new index, useASCIIFoldingFilter
which covers a superset of Latin 1.class
KeywordMarkerFilter
Marks terms as keywords via theKeywordAttribute
.class
KeywordTokenizer
Emits the entire input as a single token.class
LengthFilter
Removes words that are too long or too short from the stream.class
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.class
LimitTokenCountFilter
This TokenFilter limits the number of tokens while indexing.class
LookaheadTokenFilter<T extends LookaheadTokenFilter.Position>
An abstract TokenFilter to make it easier to build graph token filters requiring some lookahead.class
LowerCaseFilter
Normalizes token text to lower case.class
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together.class
MockFixedLengthPayloadFilter
TokenFilter that adds random fixed-length payloads.class
MockGraphTokenFilter
Randomly inserts overlapped (posInc=0) tokens with posLength sometimes > 1.class
MockHoleInjectingTokenFilter
class
MockRandomLookaheadTokenFilter
UsesLookaheadTokenFilter
to randomly peek at future tokens.class
MockTokenizer
Tokenizer for testing.class
MockVariableLengthPayloadFilter
TokenFilter that adds random variable-length payloads.class
NumericTokenStream
Expert: This class provides aTokenStream
for indexing numeric values that can be used byNumericRangeQuery
orNumericRangeFilter
.class
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.class
StopFilter
Removes stop words from a token stream.class
TeeSinkTokenFilter
This TokenFilter provides the ability to set aside attribute states that have already been analyzed.static class
TeeSinkTokenFilter.SinkTokenStream
TokenStream output from a tee with optional filtering.class
TokenFilter
A TokenFilter is a TokenStream whose input is another TokenStream.class
Tokenizer
A Tokenizer is a TokenStream whose input is a Reader.class
TypeTokenFilter
Removes tokens whose types appear in a set of blocked types from a token stream.class
ValidatingTokenFilter
A TokenFilter that checks consistency of the tokens (eg offsets are consistent with one another).class
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.Fields in org.apache.lucene.analysis declared as TokenStream Modifier and Type Field Description protected TokenStream
TokenFilter. input
The source of tokens for this filter.protected TokenStream
ReusableAnalyzerBase.TokenStreamComponents. sink
Methods in org.apache.lucene.analysis that return TokenStream Modifier and Type Method Description protected TokenStream
ReusableAnalyzerBase.TokenStreamComponents. getTokenStream()
Returns the sinkTokenStream
TokenStream
Analyzer. reusableTokenStream(String fieldName, Reader reader)
Creates a TokenStream that is allowed to be re-used from the previous time that the same thread called this method.TokenStream
LimitTokenCountAnalyzer. reusableTokenStream(String fieldName, Reader reader)
TokenStream
MockAnalyzer. reusableTokenStream(String fieldName, Reader reader)
TokenStream
PerFieldAnalyzerWrapper. reusableTokenStream(String fieldName, Reader reader)
TokenStream
ReusableAnalyzerBase. reusableTokenStream(String fieldName, Reader reader)
This method usesReusableAnalyzerBase.createComponents(String, Reader)
to obtain an instance ofReusableAnalyzerBase.TokenStreamComponents
.abstract TokenStream
Analyzer. tokenStream(String fieldName, Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.TokenStream
LimitTokenCountAnalyzer. tokenStream(String fieldName, Reader reader)
TokenStream
MockAnalyzer. tokenStream(String fieldName, Reader reader)
TokenStream
PerFieldAnalyzerWrapper. tokenStream(String fieldName, Reader reader)
TokenStream
ReusableAnalyzerBase. tokenStream(String fieldName, Reader reader)
This method usesReusableAnalyzerBase.createComponents(String, Reader)
to obtain an instance ofReusableAnalyzerBase.TokenStreamComponents
and returns the sink of the components.Methods in org.apache.lucene.analysis with parameters of type TokenStream Modifier and Type Method Description static void
BaseTokenStreamTestCase. assertTokenStreamContents(TokenStream ts, String[] output)
static void
BaseTokenStreamTestCase. assertTokenStreamContents(TokenStream ts, String[] output, int[] posIncrements)
static void
BaseTokenStreamTestCase. assertTokenStreamContents(TokenStream ts, String[] output, int[] startOffsets, int[] endOffsets)
static void
BaseTokenStreamTestCase. assertTokenStreamContents(TokenStream ts, String[] output, int[] startOffsets, int[] endOffsets, int[] posIncrements)
static void
BaseTokenStreamTestCase. assertTokenStreamContents(TokenStream ts, String[] output, int[] startOffsets, int[] endOffsets, int[] posIncrements, int[] posLengths, Integer finalOffset)
static void
BaseTokenStreamTestCase. assertTokenStreamContents(TokenStream ts, String[] output, int[] startOffsets, int[] endOffsets, int[] posIncrements, Integer finalOffset)
static void
BaseTokenStreamTestCase. assertTokenStreamContents(TokenStream ts, String[] output, int[] startOffsets, int[] endOffsets, Integer finalOffset)
static void
BaseTokenStreamTestCase. assertTokenStreamContents(TokenStream ts, String[] output, int[] startOffsets, int[] endOffsets, String[] types, int[] posIncrements)
static void
BaseTokenStreamTestCase. assertTokenStreamContents(TokenStream ts, String[] output, int[] startOffsets, int[] endOffsets, String[] types, int[] posIncrements, int[] posLengths, Integer finalOffset)
static void
BaseTokenStreamTestCase. assertTokenStreamContents(TokenStream ts, String[] output, int[] startOffsets, int[] endOffsets, String[] types, int[] posIncrements, int[] posLengths, Integer finalOffset, boolean offsetsAreCorrect)
static void
BaseTokenStreamTestCase. assertTokenStreamContents(TokenStream ts, String[] output, int[] startOffsets, int[] endOffsets, String[] types, int[] posIncrements, Integer finalOffset)
static void
BaseTokenStreamTestCase. assertTokenStreamContents(TokenStream ts, String[] output, String[] types)
Constructors in org.apache.lucene.analysis with parameters of type TokenStream Constructor Description ASCIIFoldingFilter(TokenStream input)
CachingTokenFilter(TokenStream input)
FilteringTokenFilter(boolean enablePositionIncrements, TokenStream input)
ISOLatin1AccentFilter(TokenStream input)
Deprecated.KeywordMarkerFilter(TokenStream in, Set<?> keywordSet)
Create a new KeywordMarkerFilter, that marks the current token as a keyword if the tokens term buffer is contained in the given set via theKeywordAttribute
.KeywordMarkerFilter(TokenStream in, CharArraySet keywordSet)
Create a new KeywordMarkerFilter, that marks the current token as a keyword if the tokens term buffer is contained in the given set via theKeywordAttribute
.LengthFilter(boolean enablePositionIncrements, TokenStream in, int min, int max)
Build a filter that removes words that are too long or too short from the text.LengthFilter(TokenStream in, int min, int max)
Deprecated.UseLengthFilter(boolean, TokenStream, int, int)
instead.LimitTokenCountFilter(TokenStream in, int maxTokenCount)
Build a filter that only accepts tokens up to a maximum number.LookaheadTokenFilter(TokenStream input)
LowerCaseFilter(TokenStream in)
Deprecated.UseLowerCaseFilter(Version, TokenStream)
instead.LowerCaseFilter(Version matchVersion, TokenStream in)
Create a new LowerCaseFilter, that normalizes token text to lower case.MockFixedLengthPayloadFilter(Random random, TokenStream in, int length)
MockGraphTokenFilter(Random random, TokenStream input)
MockHoleInjectingTokenFilter(Random random, TokenStream in)
MockRandomLookaheadTokenFilter(Random random, TokenStream in)
MockVariableLengthPayloadFilter(Random random, TokenStream in)
PorterStemFilter(TokenStream in)
StopFilter(boolean enablePositionIncrements, TokenStream in, Set<?> stopWords)
Deprecated.useStopFilter(Version, TokenStream, Set)
insteadStopFilter(boolean enablePositionIncrements, TokenStream input, Set<?> stopWords, boolean ignoreCase)
Deprecated.UseStopFilter(Version, TokenStream, Set)
insteadStopFilter(Version matchVersion, TokenStream in, Set<?> stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set.StopFilter(Version matchVersion, TokenStream input, Set<?> stopWords, boolean ignoreCase)
Deprecated.UseStopFilter(Version, TokenStream, Set)
insteadTeeSinkTokenFilter(TokenStream input)
Instantiates a new TeeSinkTokenFilter.TokenFilter(TokenStream input)
Construct a token stream filtering the given input.TokenStreamComponents(Tokenizer source, TokenStream result)
Creates a newReusableAnalyzerBase.TokenStreamComponents
instance.TokenStreamToDot(String inputText, TokenStream in, PrintWriter out)
If inputText is non-null, and the TokenStream has offsets, we include the surface form in each arc's label.TypeTokenFilter(boolean enablePositionIncrements, TokenStream input, Set<String> stopTypes)
TypeTokenFilter(boolean enablePositionIncrements, TokenStream input, Set<String> stopTypes, boolean useWhiteList)
ValidatingTokenFilter(TokenStream in, String name, boolean offsetsAreCorrect)
The name arg is used to identify this stage when throwing exceptions (useful if you have more than one instance in your chain). -
Uses of TokenStream in org.apache.lucene.analysis.ar
Subclasses of TokenStream in org.apache.lucene.analysis.ar Modifier and Type Class Description class
ArabicLetterTokenizer
Deprecated.(3.1) UseStandardTokenizer
instead.class
ArabicNormalizationFilter
ATokenFilter
that appliesArabicNormalizer
to normalize the orthography.class
ArabicStemFilter
ATokenFilter
that appliesArabicStemmer
to stem Arabic words..Constructors in org.apache.lucene.analysis.ar with parameters of type TokenStream Constructor Description ArabicNormalizationFilter(TokenStream input)
ArabicStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.bg
Subclasses of TokenStream in org.apache.lucene.analysis.bg Modifier and Type Class Description class
BulgarianStemFilter
ATokenFilter
that appliesBulgarianStemmer
to stem Bulgarian words.Constructors in org.apache.lucene.analysis.bg with parameters of type TokenStream Constructor Description BulgarianStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.br
Subclasses of TokenStream in org.apache.lucene.analysis.br Modifier and Type Class Description class
BrazilianStemFilter
ATokenFilter
that appliesBrazilianStemmer
.Constructors in org.apache.lucene.analysis.br with parameters of type TokenStream Constructor Description BrazilianStemFilter(TokenStream in)
Creates a new BrazilianStemFilterBrazilianStemFilter(TokenStream in, Set<?> exclusiontable)
Deprecated.useKeywordAttribute
withKeywordMarkerFilter
instead. -
Uses of TokenStream in org.apache.lucene.analysis.cjk
Subclasses of TokenStream in org.apache.lucene.analysis.cjk Modifier and Type Class Description class
CJKBigramFilter
Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.class
CJKTokenizer
Deprecated.Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead.class
CJKWidthFilter
ATokenFilter
that normalizes CJK width differences: Folds fullwidth ASCII variants into the equivalent basic latin Folds halfwidth Katakana variants into the equivalent kanaConstructors in org.apache.lucene.analysis.cjk with parameters of type TokenStream Constructor Description CJKBigramFilter(TokenStream in)
CJKBigramFilter(TokenStream in, int flags)
Create a new CJKBigramFilter, specifying which writing systems should be bigrammed.CJKWidthFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.cn
Subclasses of TokenStream in org.apache.lucene.analysis.cn Modifier and Type Class Description class
ChineseFilter
Deprecated.UseStopFilter
instead, which has the same functionality.class
ChineseTokenizer
Deprecated.UseStandardTokenizer
instead, which has the same functionality.Constructors in org.apache.lucene.analysis.cn with parameters of type TokenStream Constructor Description ChineseFilter(TokenStream in)
Deprecated. -
Uses of TokenStream in org.apache.lucene.analysis.cn.smart
Subclasses of TokenStream in org.apache.lucene.analysis.cn.smart Modifier and Type Class Description class
SentenceTokenizer
Tokenizes input text into sentences.class
WordTokenFilter
ATokenFilter
that breaks sentences into words.Methods in org.apache.lucene.analysis.cn.smart that return TokenStream Modifier and Type Method Description TokenStream
SmartChineseAnalyzer. reusableTokenStream(String fieldName, Reader reader)
TokenStream
SmartChineseAnalyzer. tokenStream(String fieldName, Reader reader)
Constructors in org.apache.lucene.analysis.cn.smart with parameters of type TokenStream Constructor Description WordTokenFilter(TokenStream in)
Construct a new WordTokenizer. -
Uses of TokenStream in org.apache.lucene.analysis.compound
Subclasses of TokenStream in org.apache.lucene.analysis.compound Modifier and Type Class Description class
CompoundWordTokenFilterBase
Base class for decomposition token filters.class
DictionaryCompoundWordTokenFilter
ATokenFilter
that decomposes compound words found in many Germanic languages.class
HyphenationCompoundWordTokenFilter
ATokenFilter
that decomposes compound words found in many Germanic languages.Constructors in org.apache.lucene.analysis.compound with parameters of type TokenStream Constructor Description CompoundWordTokenFilterBase(TokenStream input, String[] dictionary)
Deprecated.CompoundWordTokenFilterBase(TokenStream input, String[] dictionary, boolean onlyLongestMatch)
Deprecated.CompoundWordTokenFilterBase(TokenStream input, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
Deprecated.CompoundWordTokenFilterBase(TokenStream input, Set<?> dictionary)
Deprecated.CompoundWordTokenFilterBase(TokenStream input, Set<?> dictionary, boolean onlyLongestMatch)
Deprecated.CompoundWordTokenFilterBase(TokenStream input, Set<?> dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
Deprecated.CompoundWordTokenFilterBase(Version matchVersion, TokenStream input, String[] dictionary)
CompoundWordTokenFilterBase(Version matchVersion, TokenStream input, String[] dictionary, boolean onlyLongestMatch)
CompoundWordTokenFilterBase(Version matchVersion, TokenStream input, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
CompoundWordTokenFilterBase(Version matchVersion, TokenStream input, Set<?> dictionary)
CompoundWordTokenFilterBase(Version matchVersion, TokenStream input, Set<?> dictionary, boolean onlyLongestMatch)
CompoundWordTokenFilterBase(Version matchVersion, TokenStream input, Set<?> dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
DictionaryCompoundWordTokenFilter(TokenStream input, String[] dictionary)
Deprecated.DictionaryCompoundWordTokenFilter(TokenStream input, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
Deprecated.DictionaryCompoundWordTokenFilter(TokenStream input, Set dictionary)
Deprecated.DictionaryCompoundWordTokenFilter(TokenStream input, Set dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
Deprecated.DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, String[] dictionary)
Deprecated.Use the constructors takingSet
DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
Deprecated.Use the constructors takingSet
DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, Set<?> dictionary)
Creates a newDictionaryCompoundWordTokenFilter
DictionaryCompoundWordTokenFilter(Version matchVersion, TokenStream input, Set<?> dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
Creates a newDictionaryCompoundWordTokenFilter
HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, String[] dictionary)
Deprecated.HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, Set<?> dictionary)
Deprecated.HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, Set<?> dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
HyphenationCompoundWordTokenFilter(Version matchVersion, TokenStream input, HyphenationTree hyphenator)
Create a HyphenationCompoundWordTokenFilter with no dictionary.HyphenationCompoundWordTokenFilter(Version matchVersion, TokenStream input, HyphenationTree hyphenator, int minWordSize, int minSubwordSize, int maxSubwordSize)
Create a HyphenationCompoundWordTokenFilter with no dictionary.HyphenationCompoundWordTokenFilter(Version matchVersion, TokenStream input, HyphenationTree hyphenator, String[] dictionary)
Deprecated.Use the constructors takingSet
HyphenationCompoundWordTokenFilter(Version matchVersion, TokenStream input, HyphenationTree hyphenator, String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
Deprecated.Use the constructors takingSet
HyphenationCompoundWordTokenFilter(Version matchVersion, TokenStream input, HyphenationTree hyphenator, Set<?> dictionary)
Creates a newHyphenationCompoundWordTokenFilter
instance.HyphenationCompoundWordTokenFilter(Version matchVersion, TokenStream input, HyphenationTree hyphenator, Set<?> dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
Creates a newHyphenationCompoundWordTokenFilter
instance. -
Uses of TokenStream in org.apache.lucene.analysis.cz
Subclasses of TokenStream in org.apache.lucene.analysis.cz Modifier and Type Class Description class
CzechStemFilter
ATokenFilter
that appliesCzechStemmer
to stem Czech words.Constructors in org.apache.lucene.analysis.cz with parameters of type TokenStream Constructor Description CzechStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.de
Subclasses of TokenStream in org.apache.lucene.analysis.de Modifier and Type Class Description class
GermanLightStemFilter
ATokenFilter
that appliesGermanLightStemmer
to stem German words.class
GermanMinimalStemFilter
ATokenFilter
that appliesGermanMinimalStemmer
to stem German words.class
GermanNormalizationFilter
Normalizes German characters according to the heuristics of the German2 snowball algorithm.class
GermanStemFilter
ATokenFilter
that stems German words.Constructors in org.apache.lucene.analysis.de with parameters of type TokenStream Constructor Description GermanLightStemFilter(TokenStream input)
GermanMinimalStemFilter(TokenStream input)
GermanNormalizationFilter(TokenStream input)
GermanStemFilter(TokenStream in)
Creates aGermanStemFilter
instanceGermanStemFilter(TokenStream in, Set<?> exclusionSet)
Deprecated.useKeywordAttribute
withKeywordMarkerFilter
instead. -
Uses of TokenStream in org.apache.lucene.analysis.el
Subclasses of TokenStream in org.apache.lucene.analysis.el Modifier and Type Class Description class
GreekLowerCaseFilter
Normalizes token text to lower case, removes some Greek diacritics, and standardizes final sigma to sigma.class
GreekStemFilter
ATokenFilter
that appliesGreekStemmer
to stem Greek words.Constructors in org.apache.lucene.analysis.el with parameters of type TokenStream Constructor Description GreekLowerCaseFilter(TokenStream in)
Deprecated.UseGreekLowerCaseFilter(Version, TokenStream)
instead.GreekLowerCaseFilter(Version matchVersion, TokenStream in)
Create a GreekLowerCaseFilter that normalizes Greek token text.GreekStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.en
Subclasses of TokenStream in org.apache.lucene.analysis.en Modifier and Type Class Description class
EnglishMinimalStemFilter
ATokenFilter
that appliesEnglishMinimalStemmer
to stem English words.class
EnglishPossessiveFilter
TokenFilter that removes possessives (trailing 's) from words.class
KStemFilter
A high-performance kstem filter for english.Constructors in org.apache.lucene.analysis.en with parameters of type TokenStream Constructor Description EnglishMinimalStemFilter(TokenStream input)
EnglishPossessiveFilter(TokenStream input)
Deprecated.UseEnglishPossessiveFilter(Version, TokenStream)
instead.EnglishPossessiveFilter(Version version, TokenStream input)
KStemFilter(TokenStream in)
-
Uses of TokenStream in org.apache.lucene.analysis.es
Subclasses of TokenStream in org.apache.lucene.analysis.es Modifier and Type Class Description class
SpanishLightStemFilter
ATokenFilter
that appliesSpanishLightStemmer
to stem Spanish words.Constructors in org.apache.lucene.analysis.es with parameters of type TokenStream Constructor Description SpanishLightStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.fa
Subclasses of TokenStream in org.apache.lucene.analysis.fa Modifier and Type Class Description class
PersianNormalizationFilter
ATokenFilter
that appliesPersianNormalizer
to normalize the orthography.Constructors in org.apache.lucene.analysis.fa with parameters of type TokenStream Constructor Description PersianNormalizationFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.fi
Subclasses of TokenStream in org.apache.lucene.analysis.fi Modifier and Type Class Description class
FinnishLightStemFilter
ATokenFilter
that appliesFinnishLightStemmer
to stem Finnish words.Constructors in org.apache.lucene.analysis.fi with parameters of type TokenStream Constructor Description FinnishLightStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.fr
Subclasses of TokenStream in org.apache.lucene.analysis.fr Modifier and Type Class Description class
ElisionFilter
Removes elisions from aTokenStream
.class
FrenchLightStemFilter
ATokenFilter
that appliesFrenchLightStemmer
to stem French words.class
FrenchMinimalStemFilter
ATokenFilter
that appliesFrenchMinimalStemmer
to stem French words.class
FrenchStemFilter
Deprecated.UseSnowballFilter
withFrenchStemmer
instead, which has the same functionality.Constructors in org.apache.lucene.analysis.fr with parameters of type TokenStream Constructor Description ElisionFilter(TokenStream input)
Deprecated.useElisionFilter(Version, TokenStream)
insteadElisionFilter(TokenStream input, String[] articles)
Deprecated.useElisionFilter(Version, TokenStream, Set)
insteadElisionFilter(TokenStream input, Set<?> articles)
Deprecated.useElisionFilter(Version, TokenStream, Set)
insteadElisionFilter(Version matchVersion, TokenStream input)
Constructs an elision filter with standard stop wordsElisionFilter(Version matchVersion, TokenStream input, Set<?> articles)
Constructs an elision filter with a Set of stop wordsFrenchLightStemFilter(TokenStream input)
FrenchMinimalStemFilter(TokenStream input)
FrenchStemFilter(TokenStream in)
Deprecated.FrenchStemFilter(TokenStream in, Set<?> exclusiontable)
Deprecated.useKeywordAttribute
withKeywordMarkerFilter
instead. -
Uses of TokenStream in org.apache.lucene.analysis.ga
Subclasses of TokenStream in org.apache.lucene.analysis.ga Modifier and Type Class Description class
IrishLowerCaseFilter
Normalises token text to lower case, handling t-prothesis and n-eclipsis (i.e., that 'nAthair' should become 'n-athair')Constructors in org.apache.lucene.analysis.ga with parameters of type TokenStream Constructor Description IrishLowerCaseFilter(TokenStream in)
Create an IrishLowerCaseFilter that normalises Irish token text. -
Uses of TokenStream in org.apache.lucene.analysis.gl
Subclasses of TokenStream in org.apache.lucene.analysis.gl Modifier and Type Class Description class
GalicianMinimalStemFilter
ATokenFilter
that appliesGalicianMinimalStemmer
to stem Galician words.class
GalicianStemFilter
ATokenFilter
that appliesGalicianStemmer
to stem Galician words.Constructors in org.apache.lucene.analysis.gl with parameters of type TokenStream Constructor Description GalicianMinimalStemFilter(TokenStream input)
GalicianStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.hi
Subclasses of TokenStream in org.apache.lucene.analysis.hi Modifier and Type Class Description class
HindiNormalizationFilter
ATokenFilter
that appliesHindiNormalizer
to normalize the orthography.class
HindiStemFilter
ATokenFilter
that appliesHindiStemmer
to stem Hindi words.Constructors in org.apache.lucene.analysis.hi with parameters of type TokenStream Constructor Description HindiNormalizationFilter(TokenStream input)
HindiStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.hu
Subclasses of TokenStream in org.apache.lucene.analysis.hu Modifier and Type Class Description class
HungarianLightStemFilter
ATokenFilter
that appliesHungarianLightStemmer
to stem Hungarian words.Constructors in org.apache.lucene.analysis.hu with parameters of type TokenStream Constructor Description HungarianLightStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.hunspell
Subclasses of TokenStream in org.apache.lucene.analysis.hunspell Modifier and Type Class Description class
HunspellStemFilter
TokenFilter that uses hunspell affix rules and words to stem tokens.Constructors in org.apache.lucene.analysis.hunspell with parameters of type TokenStream Constructor Description HunspellStemFilter(TokenStream input, HunspellDictionary dictionary)
Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided HunspellDictionaryHunspellStemFilter(TokenStream input, HunspellDictionary dictionary, boolean dedup)
Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided HunspellDictionary -
Uses of TokenStream in org.apache.lucene.analysis.icu
Subclasses of TokenStream in org.apache.lucene.analysis.icu Modifier and Type Class Description class
ICUFoldingFilter
A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.class
ICUNormalizer2Filter
Normalize token text with ICU'sNormalizer2
class
ICUTransformFilter
ATokenFilter
that transforms text with ICU.Constructors in org.apache.lucene.analysis.icu with parameters of type TokenStream Constructor Description ICUFoldingFilter(TokenStream input)
Create a new ICUFoldingFilter on the specified inputICUNormalizer2Filter(TokenStream input)
Create a new Normalizer2Filter that combines NFKC normalization, Case Folding, and removes Default Ignorables (NFKC_Casefold)ICUNormalizer2Filter(TokenStream input, com.ibm.icu.text.Normalizer2 normalizer)
Create a new Normalizer2Filter with the specified Normalizer2ICUTransformFilter(TokenStream input, com.ibm.icu.text.Transliterator transform)
Create a new ICUTransformFilter that transforms text on the given stream. -
Uses of TokenStream in org.apache.lucene.analysis.icu.segmentation
Subclasses of TokenStream in org.apache.lucene.analysis.icu.segmentation Modifier and Type Class Description class
ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/) -
Uses of TokenStream in org.apache.lucene.analysis.id
Subclasses of TokenStream in org.apache.lucene.analysis.id Modifier and Type Class Description class
IndonesianStemFilter
ATokenFilter
that appliesIndonesianStemmer
to stem Indonesian words.Constructors in org.apache.lucene.analysis.id with parameters of type TokenStream Constructor Description IndonesianStemFilter(TokenStream input)
IndonesianStemFilter(TokenStream input, boolean stemDerivational)
Create a new IndonesianStemFilter. -
Uses of TokenStream in org.apache.lucene.analysis.in
Subclasses of TokenStream in org.apache.lucene.analysis.in Modifier and Type Class Description class
IndicNormalizationFilter
ATokenFilter
that appliesIndicNormalizer
to normalize text in Indian Languages.class
IndicTokenizer
Deprecated.(3.6) UseStandardTokenizer
instead.Constructors in org.apache.lucene.analysis.in with parameters of type TokenStream Constructor Description IndicNormalizationFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.it
Subclasses of TokenStream in org.apache.lucene.analysis.it Modifier and Type Class Description class
ItalianLightStemFilter
ATokenFilter
that appliesItalianLightStemmer
to stem Italian words.Constructors in org.apache.lucene.analysis.it with parameters of type TokenStream Constructor Description ItalianLightStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.ja
Subclasses of TokenStream in org.apache.lucene.analysis.ja Modifier and Type Class Description class
JapaneseBaseFormFilter
Replaces term text with theBaseFormAttribute
.class
JapaneseKatakanaStemFilter
ATokenFilter
that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC).class
JapanesePartOfSpeechStopFilter
Removes tokens that match a set of part-of-speech tags.class
JapaneseReadingFormFilter
ATokenFilter
that replaces the term attribute with the reading of a token in either katakana or romaji form.class
JapaneseTokenizer
Tokenizer for Japanese that uses morphological analysis.Constructors in org.apache.lucene.analysis.ja with parameters of type TokenStream Constructor Description JapaneseBaseFormFilter(TokenStream input)
JapaneseKatakanaStemFilter(TokenStream input)
JapaneseKatakanaStemFilter(TokenStream input, int minimumLength)
JapanesePartOfSpeechStopFilter(boolean enablePositionIncrements, TokenStream input, Set<String> stopTags)
JapaneseReadingFormFilter(TokenStream input)
JapaneseReadingFormFilter(TokenStream input, boolean useRomaji)
-
Uses of TokenStream in org.apache.lucene.analysis.lv
Subclasses of TokenStream in org.apache.lucene.analysis.lv Modifier and Type Class Description class
LatvianStemFilter
ATokenFilter
that appliesLatvianStemmer
to stem Latvian words.Constructors in org.apache.lucene.analysis.lv with parameters of type TokenStream Constructor Description LatvianStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.miscellaneous
Subclasses of TokenStream in org.apache.lucene.analysis.miscellaneous Modifier and Type Class Description class
EmptyTokenStream
An always exhausted token stream.class
PrefixAndSuffixAwareTokenFilter
Links twoPrefixAwareTokenFilter
.class
PrefixAwareTokenFilter
Joins two token streams and leaves the last token of the first stream available to be used when updating the token values in the second stream based on that token.class
SingleTokenTokenStream
ATokenStream
containing a single token.class
StemmerOverrideFilter
Provides the ability to override anyKeywordAttribute
aware stemmer with custom dictionary-based stemming.Methods in org.apache.lucene.analysis.miscellaneous that return TokenStream Modifier and Type Method Description TokenStream
PrefixAwareTokenFilter. getPrefix()
TokenStream
PrefixAwareTokenFilter. getSuffix()
Methods in org.apache.lucene.analysis.miscellaneous with parameters of type TokenStream Modifier and Type Method Description void
PrefixAwareTokenFilter. setPrefix(TokenStream prefix)
void
PrefixAwareTokenFilter. setSuffix(TokenStream suffix)
Constructors in org.apache.lucene.analysis.miscellaneous with parameters of type TokenStream Constructor Description PrefixAndSuffixAwareTokenFilter(TokenStream prefix, TokenStream input, TokenStream suffix)
PrefixAwareTokenFilter(TokenStream prefix, TokenStream suffix)
StemmerOverrideFilter(Version matchVersion, TokenStream input, Map<?,String> dictionary)
Create a new StemmerOverrideFilter, performing dictionary-based stemming with the provideddictionary
. -
Uses of TokenStream in org.apache.lucene.analysis.ngram
Subclasses of TokenStream in org.apache.lucene.analysis.ngram Modifier and Type Class Description class
EdgeNGramTokenFilter
Tokenizes the given token into n-grams of given size(s).class
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).class
NGramTokenFilter
Tokenizes the input into n-grams of the given size(s).class
NGramTokenizer
Tokenizes the input into n-grams of the given size(s).Constructors in org.apache.lucene.analysis.ngram with parameters of type TokenStream Constructor Description EdgeNGramTokenFilter(TokenStream input, String sideLabel, int minGram, int maxGram)
Creates EdgeNGramTokenFilter that can generate n-grams in the sizes of the given rangeEdgeNGramTokenFilter(TokenStream input, EdgeNGramTokenFilter.Side side, int minGram, int maxGram)
Creates EdgeNGramTokenFilter that can generate n-grams in the sizes of the given rangeNGramTokenFilter(TokenStream input)
Creates NGramTokenFilter with default min and max n-grams.NGramTokenFilter(TokenStream input, int minGram, int maxGram)
Creates NGramTokenFilter with given min and max n-grams. -
Uses of TokenStream in org.apache.lucene.analysis.nl
Subclasses of TokenStream in org.apache.lucene.analysis.nl Modifier and Type Class Description class
DutchStemFilter
Deprecated.UseSnowballFilter
withDutchStemmer
instead, which has the same functionality.Constructors in org.apache.lucene.analysis.nl with parameters of type TokenStream Constructor Description DutchStemFilter(TokenStream _in)
Deprecated.DutchStemFilter(TokenStream _in, Map<?,?> stemdictionary)
Deprecated.DutchStemFilter(TokenStream _in, Set<?> exclusiontable)
Deprecated.useKeywordAttribute
withKeywordMarkerFilter
instead.DutchStemFilter(TokenStream _in, Set<?> exclusiontable, Map<?,?> stemdictionary)
Deprecated.useKeywordAttribute
withKeywordMarkerFilter
instead. -
Uses of TokenStream in org.apache.lucene.analysis.no
Subclasses of TokenStream in org.apache.lucene.analysis.no Modifier and Type Class Description class
NorwegianLightStemFilter
ATokenFilter
that appliesNorwegianLightStemmer
to stem Norwegian words.class
NorwegianMinimalStemFilter
ATokenFilter
that appliesNorwegianMinimalStemmer
to stem Norwegian words.Constructors in org.apache.lucene.analysis.no with parameters of type TokenStream Constructor Description NorwegianLightStemFilter(TokenStream input)
NorwegianMinimalStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.path
Subclasses of TokenStream in org.apache.lucene.analysis.path Modifier and Type Class Description class
PathHierarchyTokenizer
Tokenizer for path-like hierarchies.class
ReversePathHierarchyTokenizer
Tokenizer for domain-like hierarchies. -
Uses of TokenStream in org.apache.lucene.analysis.payloads
Subclasses of TokenStream in org.apache.lucene.analysis.payloads Modifier and Type Class Description class
DelimitedPayloadTokenFilter
Characters before the delimiter are the "token", those after are the payload.class
NumericPayloadTokenFilter
Assigns a payload to a token based on theToken.type()
class
TokenOffsetPayloadTokenFilter
Adds theToken.setStartOffset(int)
andToken.setEndOffset(int)
First 4 bytes are the startclass
TypeAsPayloadTokenFilter
Makes theToken.type()
a payload.Constructors in org.apache.lucene.analysis.payloads with parameters of type TokenStream Constructor Description DelimitedPayloadTokenFilter(TokenStream input, char delimiter, PayloadEncoder encoder)
NumericPayloadTokenFilter(TokenStream input, float payload, String typeMatch)
TokenOffsetPayloadTokenFilter(TokenStream input)
TypeAsPayloadTokenFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.phonetic
Subclasses of TokenStream in org.apache.lucene.analysis.phonetic Modifier and Type Class Description class
BeiderMorseFilter
TokenFilter for Beider-Morse phonetic encoding.class
DoubleMetaphoneFilter
Filter for DoubleMetaphone (supporting secondary codes)class
PhoneticFilter
Create tokens for phonetic matches.Constructors in org.apache.lucene.analysis.phonetic with parameters of type TokenStream Constructor Description BeiderMorseFilter(TokenStream input, org.apache.commons.codec.language.bm.PhoneticEngine engine)
BeiderMorseFilter(TokenStream input, org.apache.commons.codec.language.bm.PhoneticEngine engine, org.apache.commons.codec.language.bm.Languages.LanguageSet languages)
Create a new BeiderMorseFilterDoubleMetaphoneFilter(TokenStream input, int maxCodeLength, boolean inject)
PhoneticFilter(TokenStream in, org.apache.commons.codec.Encoder encoder, boolean inject)
-
Uses of TokenStream in org.apache.lucene.analysis.position
Subclasses of TokenStream in org.apache.lucene.analysis.position Modifier and Type Class Description class
PositionFilter
Set the positionIncrement of all tokens to the "positionIncrement", except the first return token which retains its original positionIncrement value.Constructors in org.apache.lucene.analysis.position with parameters of type TokenStream Constructor Description PositionFilter(TokenStream input)
Constructs a PositionFilter that assigns a position increment of zero to all but the first token from the given input stream.PositionFilter(TokenStream input, int positionIncrement)
Constructs a PositionFilter that assigns the given position increment to all but the first token from the given input stream. -
Uses of TokenStream in org.apache.lucene.analysis.pt
Subclasses of TokenStream in org.apache.lucene.analysis.pt Modifier and Type Class Description class
PortugueseLightStemFilter
ATokenFilter
that appliesPortugueseLightStemmer
to stem Portuguese words.class
PortugueseMinimalStemFilter
ATokenFilter
that appliesPortugueseMinimalStemmer
to stem Portuguese words.class
PortugueseStemFilter
ATokenFilter
that appliesPortugueseStemmer
to stem Portuguese words.Constructors in org.apache.lucene.analysis.pt with parameters of type TokenStream Constructor Description PortugueseLightStemFilter(TokenStream input)
PortugueseMinimalStemFilter(TokenStream input)
PortugueseStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.query
Methods in org.apache.lucene.analysis.query that return TokenStream Modifier and Type Method Description TokenStream
QueryAutoStopWordAnalyzer. reusableTokenStream(String fieldName, Reader reader)
TokenStream
QueryAutoStopWordAnalyzer. tokenStream(String fieldName, Reader reader)
-
Uses of TokenStream in org.apache.lucene.analysis.reverse
Subclasses of TokenStream in org.apache.lucene.analysis.reverse Modifier and Type Class Description class
ReverseStringFilter
Reverse token string, for example "country" => "yrtnuoc".Constructors in org.apache.lucene.analysis.reverse with parameters of type TokenStream Constructor Description ReverseStringFilter(TokenStream in)
Deprecated.useReverseStringFilter(Version, TokenStream)
instead.ReverseStringFilter(TokenStream in, char marker)
Deprecated.useReverseStringFilter(Version, TokenStream, char)
instead.ReverseStringFilter(Version matchVersion, TokenStream in)
Create a new ReverseStringFilter that reverses all tokens in the suppliedTokenStream
.ReverseStringFilter(Version matchVersion, TokenStream in, char marker)
Create a new ReverseStringFilter that reverses and marks all tokens in the suppliedTokenStream
. -
Uses of TokenStream in org.apache.lucene.analysis.ru
Subclasses of TokenStream in org.apache.lucene.analysis.ru Modifier and Type Class Description class
RussianLetterTokenizer
Deprecated.UseStandardTokenizer
instead, which has the same functionality.class
RussianLightStemFilter
ATokenFilter
that appliesRussianLightStemmer
to stem Russian words.class
RussianLowerCaseFilter
Deprecated.UseLowerCaseFilter
instead, which has the same functionality.class
RussianStemFilter
Deprecated.UseSnowballFilter
withRussianStemmer
instead, which has the same functionality.Constructors in org.apache.lucene.analysis.ru with parameters of type TokenStream Constructor Description RussianLightStemFilter(TokenStream input)
RussianLowerCaseFilter(TokenStream in)
Deprecated.RussianStemFilter(TokenStream in)
Deprecated. -
Uses of TokenStream in org.apache.lucene.analysis.shingle
Subclasses of TokenStream in org.apache.lucene.analysis.shingle Modifier and Type Class Description class
ShingleFilter
A ShingleFilter constructs shingles (token n-grams) from a token stream.class
ShingleMatrixFilter
Deprecated.Will be removed in Lucene 4.0.Methods in org.apache.lucene.analysis.shingle that return TokenStream Modifier and Type Method Description TokenStream
ShingleAnalyzerWrapper. reusableTokenStream(String fieldName, Reader reader)
TokenStream
ShingleAnalyzerWrapper. tokenStream(String fieldName, Reader reader)
Constructors in org.apache.lucene.analysis.shingle with parameters of type TokenStream Constructor Description ShingleFilter(TokenStream input)
Construct a ShingleFilter with default shingle size: 2.ShingleFilter(TokenStream input, int maxShingleSize)
Constructs a ShingleFilter with the specified shingle size from theTokenStream
input
ShingleFilter(TokenStream input, int minShingleSize, int maxShingleSize)
Constructs a ShingleFilter with the specified shingle size from theTokenStream
input
ShingleFilter(TokenStream input, String tokenType)
Construct a ShingleFilter with the specified token type for shingle tokens and the default shingle size: 2ShingleMatrixFilter(TokenStream input, int minimumShingleSize, int maximumShingleSize)
Deprecated.Creates a shingle filter using default settings.ShingleMatrixFilter(TokenStream input, int minimumShingleSize, int maximumShingleSize, Character spacerCharacter)
Deprecated.Creates a shingle filter using default settings.ShingleMatrixFilter(TokenStream input, int minimumShingleSize, int maximumShingleSize, Character spacerCharacter, boolean ignoringSinglePrefixOrSuffixShingle)
Deprecated.Creates a shingle filter using the defaultShingleMatrixFilter.TokenSettingsCodec
.ShingleMatrixFilter(TokenStream input, int minimumShingleSize, int maximumShingleSize, Character spacerCharacter, boolean ignoringSinglePrefixOrSuffixShingle, ShingleMatrixFilter.TokenSettingsCodec settingsCodec)
Deprecated.Creates a shingle filter with ad hoc parameter settings. -
Uses of TokenStream in org.apache.lucene.analysis.snowball
Subclasses of TokenStream in org.apache.lucene.analysis.snowball Modifier and Type Class Description class
SnowballFilter
A filter that stems words using a Snowball-generated stemmer.Methods in org.apache.lucene.analysis.snowball that return TokenStream Modifier and Type Method Description TokenStream
SnowballAnalyzer. reusableTokenStream(String fieldName, Reader reader)
Deprecated.Returns a (possibly reused)StandardTokenizer
filtered by aStandardFilter
, aLowerCaseFilter
, aStopFilter
, and aSnowballFilter
TokenStream
SnowballAnalyzer. tokenStream(String fieldName, Reader reader)
Deprecated.Constructs aStandardTokenizer
filtered by aStandardFilter
, aLowerCaseFilter
, aStopFilter
, and aSnowballFilter
Constructors in org.apache.lucene.analysis.snowball with parameters of type TokenStream Constructor Description SnowballFilter(TokenStream in, String name)
Construct the named stemming filter.SnowballFilter(TokenStream input, SnowballProgram stemmer)
-
Uses of TokenStream in org.apache.lucene.analysis.standard
Subclasses of TokenStream in org.apache.lucene.analysis.standard Modifier and Type Class Description class
ClassicFilter
Normalizes tokens extracted withClassicTokenizer
.class
ClassicTokenizer
A grammar-based tokenizer constructed with JFlexclass
StandardFilter
Normalizes tokens extracted withStandardTokenizer
.class
StandardTokenizer
A grammar-based tokenizer constructed with JFlex.class
UAX29URLEmailTokenizer
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.Constructors in org.apache.lucene.analysis.standard with parameters of type TokenStream Constructor Description ClassicFilter(TokenStream in)
Construct filtering in.StandardFilter(TokenStream in)
Deprecated.UseStandardFilter(Version, TokenStream)
instead.StandardFilter(Version matchVersion, TokenStream in)
-
Uses of TokenStream in org.apache.lucene.analysis.stempel
Subclasses of TokenStream in org.apache.lucene.analysis.stempel Modifier and Type Class Description class
StempelFilter
Transforms the token stream as per the stemming algorithm.Constructors in org.apache.lucene.analysis.stempel with parameters of type TokenStream Constructor Description StempelFilter(TokenStream in, StempelStemmer stemmer)
Create filter using the supplied stemming table.StempelFilter(TokenStream in, StempelStemmer stemmer, int minLength)
Create filter using the supplied stemming table. -
Uses of TokenStream in org.apache.lucene.analysis.sv
Subclasses of TokenStream in org.apache.lucene.analysis.sv Modifier and Type Class Description class
SwedishLightStemFilter
ATokenFilter
that appliesSwedishLightStemmer
to stem Swedish words.Constructors in org.apache.lucene.analysis.sv with parameters of type TokenStream Constructor Description SwedishLightStemFilter(TokenStream input)
-
Uses of TokenStream in org.apache.lucene.analysis.synonym
Subclasses of TokenStream in org.apache.lucene.analysis.synonym Modifier and Type Class Description class
SynonymFilter
Matches single or multi word synonyms in a token stream.Constructors in org.apache.lucene.analysis.synonym with parameters of type TokenStream Constructor Description SynonymFilter(TokenStream input, SynonymMap synonyms, boolean ignoreCase)
-
Uses of TokenStream in org.apache.lucene.analysis.th
Subclasses of TokenStream in org.apache.lucene.analysis.th Modifier and Type Class Description class
ThaiWordFilter
TokenFilter
that useBreakIterator
to break each Token that is Thai into separate Token(s) for each Thai word.Constructors in org.apache.lucene.analysis.th with parameters of type TokenStream Constructor Description ThaiWordFilter(TokenStream input)
Deprecated.Use the ctor withmatchVersion
instead!ThaiWordFilter(Version matchVersion, TokenStream input)
Creates a new ThaiWordFilter with the specified match version. -
Uses of TokenStream in org.apache.lucene.analysis.tr
Subclasses of TokenStream in org.apache.lucene.analysis.tr Modifier and Type Class Description class
TurkishLowerCaseFilter
Normalizes Turkish token text to lower case.Constructors in org.apache.lucene.analysis.tr with parameters of type TokenStream Constructor Description TurkishLowerCaseFilter(TokenStream in)
Create a new TurkishLowerCaseFilter, that normalizes Turkish token text to lower case. -
Uses of TokenStream in org.apache.lucene.analysis.wikipedia
Subclasses of TokenStream in org.apache.lucene.analysis.wikipedia Modifier and Type Class Description class
WikipediaTokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax. -
Uses of TokenStream in org.apache.lucene.collation
Subclasses of TokenStream in org.apache.lucene.collation Modifier and Type Class Description class
CollationKeyFilter
Converts each token into itsCollationKey
, and then encodes the CollationKey withIndexableBinaryStringTools
, to allow it to be stored as an index term.class
ICUCollationKeyFilter
Converts each token into itsCollationKey
, and then encodes the CollationKey withIndexableBinaryStringTools
, to allow it to be stored as an index term.Methods in org.apache.lucene.collation that return TokenStream Modifier and Type Method Description TokenStream
CollationKeyAnalyzer. reusableTokenStream(String fieldName, Reader reader)
TokenStream
ICUCollationKeyAnalyzer. reusableTokenStream(String fieldName, Reader reader)
TokenStream
CollationKeyAnalyzer. tokenStream(String fieldName, Reader reader)
TokenStream
ICUCollationKeyAnalyzer. tokenStream(String fieldName, Reader reader)
Constructors in org.apache.lucene.collation with parameters of type TokenStream Constructor Description CollationKeyFilter(TokenStream input, Collator collator)
ICUCollationKeyFilter(TokenStream input, com.ibm.icu.text.Collator collator)
-
Uses of TokenStream in org.apache.lucene.document
Fields in org.apache.lucene.document declared as TokenStream Modifier and Type Field Description protected TokenStream
AbstractField. tokenStream
Methods in org.apache.lucene.document that return TokenStream Modifier and Type Method Description TokenStream
Field. tokenStreamValue()
The TokesStream for this field to be used when indexing, or null.TokenStream
Fieldable. tokenStreamValue()
The TokenStream for this field to be used when indexing, or null.TokenStream
NumericField. tokenStreamValue()
Returns aNumericTokenStream
for indexing the numeric value.Methods in org.apache.lucene.document with parameters of type TokenStream Modifier and Type Method Description void
Field. setTokenStream(TokenStream tokenStream)
Expert: sets the token stream to be used for indexing and causes isIndexed() and isTokenized() to return true.Constructors in org.apache.lucene.document with parameters of type TokenStream Constructor Description Field(String name, TokenStream tokenStream)
Create a tokenized and indexed field that is not stored.Field(String name, TokenStream tokenStream, Field.TermVector termVector)
Create a tokenized and indexed field that is not stored, optionally with storing term vectors. -
Uses of TokenStream in org.apache.lucene.facet.enhancements
Subclasses of TokenStream in org.apache.lucene.facet.enhancements Modifier and Type Class Description class
EnhancementsCategoryTokenizer
A tokenizer which adds to each category token payload according to theCategoryEnhancement
s defined in the givenEnhancementsIndexingParams
.Methods in org.apache.lucene.facet.enhancements that return TokenStream Modifier and Type Method Description protected TokenStream
EnhancementsDocumentBuilder. getParentsStream(CategoryAttributesStream categoryAttributesStream)
Methods in org.apache.lucene.facet.enhancements with parameters of type TokenStream Modifier and Type Method Description CategoryListTokenizer
CategoryEnhancement. getCategoryListTokenizer(TokenStream tokenizer, EnhancementsIndexingParams indexingParams, TaxonomyWriter taxonomyWriter)
Get theCategoryListTokenizer
which generates the category list for this enhancement.protected CategoryListTokenizer
EnhancementsDocumentBuilder. getCategoryListTokenizer(TokenStream categoryStream)
protected CategoryTokenizer
EnhancementsDocumentBuilder. getCategoryTokenizer(TokenStream categoryStream)
Constructors in org.apache.lucene.facet.enhancements with parameters of type TokenStream Constructor Description EnhancementsCategoryTokenizer(TokenStream input, EnhancementsIndexingParams indexingParams)
Constructor. -
Uses of TokenStream in org.apache.lucene.facet.enhancements.association
Subclasses of TokenStream in org.apache.lucene.facet.enhancements.association Modifier and Type Class Description class
AssociationListTokenizer
Tokenizer for associations of a categoryMethods in org.apache.lucene.facet.enhancements.association with parameters of type TokenStream Modifier and Type Method Description CategoryListTokenizer
AssociationEnhancement. getCategoryListTokenizer(TokenStream tokenizer, EnhancementsIndexingParams indexingParams, TaxonomyWriter taxonomyWriter)
Constructors in org.apache.lucene.facet.enhancements.association with parameters of type TokenStream Constructor Description AssociationListTokenizer(TokenStream input, EnhancementsIndexingParams indexingParams, CategoryEnhancement enhancement)
-
Uses of TokenStream in org.apache.lucene.facet.index
Methods in org.apache.lucene.facet.index that return TokenStream Modifier and Type Method Description protected TokenStream
CategoryDocumentBuilder. getParentsStream(CategoryAttributesStream categoryAttributesStream)
Get a stream of categories which includes the parents, according to policies defined in indexing parameters.Methods in org.apache.lucene.facet.index with parameters of type TokenStream Modifier and Type Method Description protected CategoryListTokenizer
CategoryDocumentBuilder. getCategoryListTokenizer(TokenStream categoryStream)
Get a category list tokenizer (or a series of such tokenizers) to create the category list tokens.protected CategoryTokenizer
CategoryDocumentBuilder. getCategoryTokenizer(TokenStream categoryStream)
Get aCategoryTokenizer
to create the category tokens.protected CountingListTokenizer
CategoryDocumentBuilder. getCountingListTokenizer(TokenStream categoryStream)
Get aCountingListTokenizer
for creating counting list token. -
Uses of TokenStream in org.apache.lucene.facet.index.streaming
Subclasses of TokenStream in org.apache.lucene.facet.index.streaming Modifier and Type Class Description class
CategoryAttributesStream
An attribute stream built from anIterable
ofCategoryAttribute
.class
CategoryListTokenizer
A base class for category list tokenizers, which add category list tokens to category streams.class
CategoryParentsStream
This class adds parents to aCategoryAttributesStream
.class
CategoryTokenizer
Basic class for setting theCharTermAttribute
s andPayloadAttribute
s of category tokens.class
CategoryTokenizerBase
A base class for all token filters which add term and payload attributes to tokens and are to be used inCategoryDocumentBuilder
.class
CountingListTokenizer
CategoryListTokenizer
for facet countingConstructors in org.apache.lucene.facet.index.streaming with parameters of type TokenStream Constructor Description CategoryListTokenizer(TokenStream input, FacetIndexingParams indexingParams)
CategoryTokenizer(TokenStream input, FacetIndexingParams indexingParams)
CategoryTokenizerBase(TokenStream input, FacetIndexingParams indexingParams)
Constructor.CountingListTokenizer(TokenStream input, FacetIndexingParams indexingParams)
-
Uses of TokenStream in org.apache.lucene.index.memory
Methods in org.apache.lucene.index.memory that return TokenStream Modifier and Type Method Description <T> TokenStream
MemoryIndex. keywordTokenStream(Collection<T> keywords)
Convenience method; Creates and returns a token stream that generates a token for each keyword in the given collection, "as is", without any transforming text analysis.Methods in org.apache.lucene.index.memory with parameters of type TokenStream Modifier and Type Method Description void
MemoryIndex. addField(String fieldName, TokenStream stream)
Equivalent toaddField(fieldName, stream, 1.0f)
.void
MemoryIndex. addField(String fieldName, TokenStream stream, float boost)
Iterates over the given token stream and adds the resulting terms to the index; Equivalent to adding a tokenized, indexed, termVectorStored, unstored, LuceneField
. -
Uses of TokenStream in org.apache.lucene.queryParser
Subclasses of TokenStream in org.apache.lucene.queryParser Modifier and Type Class Description static class
QueryParserTestBase.QPTestFilter
Filter which discards the token 'stop' and which expands the token 'phrase' into 'phrase1 phrase2'Methods in org.apache.lucene.queryParser that return TokenStream Modifier and Type Method Description TokenStream
QueryParserTestBase.QPTestAnalyzer. tokenStream(String fieldName, Reader reader)
Constructors in org.apache.lucene.queryParser with parameters of type TokenStream Constructor Description QPTestFilter(TokenStream in)
-
Uses of TokenStream in org.apache.lucene.search.highlight
Subclasses of TokenStream in org.apache.lucene.search.highlight Modifier and Type Class Description class
OffsetLimitTokenFilter
This TokenFilter limits the number of tokens while indexing by adding up the current offset.class
TokenStreamFromTermPositionVector
Methods in org.apache.lucene.search.highlight that return TokenStream Modifier and Type Method Description static TokenStream
TokenSources. getAnyTokenStream(IndexReader reader, int docId, String field, Analyzer analyzer)
A convenience method that tries a number of approaches to getting a token stream.static TokenStream
TokenSources. getAnyTokenStream(IndexReader reader, int docId, String field, Document doc, Analyzer analyzer)
A convenience method that tries to first get a TermPositionVector for the specified docId, then, falls back to using the passed inDocument
to retrieve the TokenStream.static TokenStream
TokenSources. getTokenStream(String field, String contents, Analyzer analyzer)
static TokenStream
TokenSources. getTokenStream(Document doc, String field, Analyzer analyzer)
static TokenStream
TokenSources. getTokenStream(IndexReader reader, int docId, String field)
static TokenStream
TokenSources. getTokenStream(IndexReader reader, int docId, String field, Analyzer analyzer)
static TokenStream
TokenSources. getTokenStream(TermPositionVector tpv)
static TokenStream
TokenSources. getTokenStream(TermPositionVector tpv, boolean tokenPositionsGuaranteedContiguous)
Low level api.TokenStream
WeightedSpanTermExtractor. getTokenStream()
TokenStream
QueryScorer. init(TokenStream tokenStream)
TokenStream
QueryTermScorer. init(TokenStream tokenStream)
TokenStream
Scorer. init(TokenStream tokenStream)
Called to init the Scorer with aTokenStream
.Methods in org.apache.lucene.search.highlight with parameters of type TokenStream Modifier and Type Method Description String
Highlighter. getBestFragment(TokenStream tokenStream, String text)
Highlights chosen terms in a text, extracting the most relevant section.String[]
Highlighter. getBestFragments(TokenStream tokenStream, String text, int maxNumFragments)
Highlights chosen terms in a text, extracting the most relevant sections.String
Highlighter. getBestFragments(TokenStream tokenStream, String text, int maxNumFragments, String separator)
Highlights terms in the text , extracting the most relevant sections and concatenating the chosen fragments with a separator (typically "...").TextFragment[]
Highlighter. getBestTextFragments(TokenStream tokenStream, String text, boolean mergeContiguousFragments, int maxNumFragments)
Low level api to get the most relevant (formatted) sections of the document.Map<String,WeightedSpanTerm>
WeightedSpanTermExtractor. getWeightedSpanTerms(Query query, TokenStream tokenStream)
Creates a Map ofWeightedSpanTerms
from the givenQuery
andTokenStream
.Map<String,WeightedSpanTerm>
WeightedSpanTermExtractor. getWeightedSpanTerms(Query query, TokenStream tokenStream, String fieldName)
Creates a Map ofWeightedSpanTerms
from the givenQuery
andTokenStream
.Map<String,WeightedSpanTerm>
WeightedSpanTermExtractor. getWeightedSpanTermsWithScores(Query query, TokenStream tokenStream, String fieldName, IndexReader reader)
Creates a Map ofWeightedSpanTerms
from the givenQuery
andTokenStream
.TokenStream
QueryScorer. init(TokenStream tokenStream)
TokenStream
QueryTermScorer. init(TokenStream tokenStream)
TokenStream
Scorer. init(TokenStream tokenStream)
Called to init the Scorer with aTokenStream
.void
Fragmenter. start(String originalText, TokenStream tokenStream)
Initializes the Fragmenter.void
NullFragmenter. start(String s, TokenStream tokenStream)
void
SimpleFragmenter. start(String originalText, TokenStream stream)
void
SimpleSpanFragmenter. start(String originalText, TokenStream tokenStream)
Constructors in org.apache.lucene.search.highlight with parameters of type TokenStream Constructor Description OffsetLimitTokenFilter(TokenStream input, int offsetLimit)
TokenGroup(TokenStream tokenStream)
-