Uses of Class
org.apache.lucene.analysis.TokenStream
-
Packages that use TokenStream Package Description org.apache.lucene.analysis API and code to convert text into indexable/searchable tokens.org.apache.lucene.analysis.standard Standards-based analyzers implemented with JFlex.org.apache.lucene.collation CollationKeyFilter
converts each token into its binaryCollationKey
using the providedCollator
, and then encode theCollationKey
as a String usingIndexableBinaryStringTools
, to allow it to be stored as an index term.org.apache.lucene.document The logical representation of aDocument
for indexing and searching. -
-
Uses of TokenStream in org.apache.lucene.analysis
Subclasses of TokenStream in org.apache.lucene.analysis Modifier and Type Class Description class
ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.class
CachingTokenFilter
This class can be used if the token attributes of a TokenStream are intended to be consumed more than once.class
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.class
FilteringTokenFilter
Abstract base class for TokenFilters that may remove tokens.class
ISOLatin1AccentFilter
Deprecated.If you build a new index, useASCIIFoldingFilter
which covers a superset of Latin 1.class
KeywordMarkerFilter
Marks terms as keywords via theKeywordAttribute
.class
KeywordTokenizer
Emits the entire input as a single token.class
LengthFilter
Removes words that are too long or too short from the stream.class
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.class
LimitTokenCountFilter
This TokenFilter limits the number of tokens while indexing.class
LowerCaseFilter
Normalizes token text to lower case.class
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together.class
NumericTokenStream
Expert: This class provides aTokenStream
for indexing numeric values that can be used byNumericRangeQuery
orNumericRangeFilter
.class
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.class
StopFilter
Removes stop words from a token stream.class
TeeSinkTokenFilter
This TokenFilter provides the ability to set aside attribute states that have already been analyzed.static class
TeeSinkTokenFilter.SinkTokenStream
TokenStream output from a tee with optional filtering.class
TokenFilter
A TokenFilter is a TokenStream whose input is another TokenStream.class
Tokenizer
A Tokenizer is a TokenStream whose input is a Reader.class
TypeTokenFilter
Removes tokens whose types appear in a set of blocked types from a token stream.class
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.Fields in org.apache.lucene.analysis declared as TokenStream Modifier and Type Field Description protected TokenStream
TokenFilter. input
The source of tokens for this filter.protected TokenStream
ReusableAnalyzerBase.TokenStreamComponents. sink
Methods in org.apache.lucene.analysis that return TokenStream Modifier and Type Method Description protected TokenStream
ReusableAnalyzerBase.TokenStreamComponents. getTokenStream()
Returns the sinkTokenStream
TokenStream
Analyzer. reusableTokenStream(String fieldName, Reader reader)
Creates a TokenStream that is allowed to be re-used from the previous time that the same thread called this method.TokenStream
LimitTokenCountAnalyzer. reusableTokenStream(String fieldName, Reader reader)
TokenStream
PerFieldAnalyzerWrapper. reusableTokenStream(String fieldName, Reader reader)
TokenStream
ReusableAnalyzerBase. reusableTokenStream(String fieldName, Reader reader)
This method usesReusableAnalyzerBase.createComponents(String, Reader)
to obtain an instance ofReusableAnalyzerBase.TokenStreamComponents
.abstract TokenStream
Analyzer. tokenStream(String fieldName, Reader reader)
Creates a TokenStream which tokenizes all the text in the provided Reader.TokenStream
LimitTokenCountAnalyzer. tokenStream(String fieldName, Reader reader)
TokenStream
PerFieldAnalyzerWrapper. tokenStream(String fieldName, Reader reader)
TokenStream
ReusableAnalyzerBase. tokenStream(String fieldName, Reader reader)
This method usesReusableAnalyzerBase.createComponents(String, Reader)
to obtain an instance ofReusableAnalyzerBase.TokenStreamComponents
and returns the sink of the components.Constructors in org.apache.lucene.analysis with parameters of type TokenStream Constructor Description ASCIIFoldingFilter(TokenStream input)
CachingTokenFilter(TokenStream input)
FilteringTokenFilter(boolean enablePositionIncrements, TokenStream input)
ISOLatin1AccentFilter(TokenStream input)
Deprecated.KeywordMarkerFilter(TokenStream in, Set<?> keywordSet)
Create a new KeywordMarkerFilter, that marks the current token as a keyword if the tokens term buffer is contained in the given set via theKeywordAttribute
.KeywordMarkerFilter(TokenStream in, CharArraySet keywordSet)
Create a new KeywordMarkerFilter, that marks the current token as a keyword if the tokens term buffer is contained in the given set via theKeywordAttribute
.LengthFilter(boolean enablePositionIncrements, TokenStream in, int min, int max)
Build a filter that removes words that are too long or too short from the text.LengthFilter(TokenStream in, int min, int max)
Deprecated.UseLengthFilter(boolean, TokenStream, int, int)
instead.LimitTokenCountFilter(TokenStream in, int maxTokenCount)
Build a filter that only accepts tokens up to a maximum number.LowerCaseFilter(TokenStream in)
Deprecated.UseLowerCaseFilter(Version, TokenStream)
instead.LowerCaseFilter(Version matchVersion, TokenStream in)
Create a new LowerCaseFilter, that normalizes token text to lower case.PorterStemFilter(TokenStream in)
StopFilter(boolean enablePositionIncrements, TokenStream in, Set<?> stopWords)
Deprecated.useStopFilter(Version, TokenStream, Set)
insteadStopFilter(boolean enablePositionIncrements, TokenStream input, Set<?> stopWords, boolean ignoreCase)
Deprecated.UseStopFilter(Version, TokenStream, Set)
insteadStopFilter(Version matchVersion, TokenStream in, Set<?> stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set.StopFilter(Version matchVersion, TokenStream input, Set<?> stopWords, boolean ignoreCase)
Deprecated.UseStopFilter(Version, TokenStream, Set)
insteadTeeSinkTokenFilter(TokenStream input)
Instantiates a new TeeSinkTokenFilter.TokenFilter(TokenStream input)
Construct a token stream filtering the given input.TokenStreamComponents(Tokenizer source, TokenStream result)
Creates a newReusableAnalyzerBase.TokenStreamComponents
instance.TypeTokenFilter(boolean enablePositionIncrements, TokenStream input, Set<String> stopTypes)
TypeTokenFilter(boolean enablePositionIncrements, TokenStream input, Set<String> stopTypes, boolean useWhiteList)
-
Uses of TokenStream in org.apache.lucene.analysis.standard
Subclasses of TokenStream in org.apache.lucene.analysis.standard Modifier and Type Class Description class
ClassicFilter
Normalizes tokens extracted withClassicTokenizer
.class
ClassicTokenizer
A grammar-based tokenizer constructed with JFlexclass
StandardFilter
Normalizes tokens extracted withStandardTokenizer
.class
StandardTokenizer
A grammar-based tokenizer constructed with JFlex.class
UAX29URLEmailTokenizer
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.Constructors in org.apache.lucene.analysis.standard with parameters of type TokenStream Constructor Description ClassicFilter(TokenStream in)
Construct filtering in.StandardFilter(TokenStream in)
Deprecated.UseStandardFilter(Version, TokenStream)
instead.StandardFilter(Version matchVersion, TokenStream in)
-
Uses of TokenStream in org.apache.lucene.collation
Subclasses of TokenStream in org.apache.lucene.collation Modifier and Type Class Description class
CollationKeyFilter
Converts each token into itsCollationKey
, and then encodes the CollationKey withIndexableBinaryStringTools
, to allow it to be stored as an index term.Methods in org.apache.lucene.collation that return TokenStream Modifier and Type Method Description TokenStream
CollationKeyAnalyzer. reusableTokenStream(String fieldName, Reader reader)
TokenStream
CollationKeyAnalyzer. tokenStream(String fieldName, Reader reader)
Constructors in org.apache.lucene.collation with parameters of type TokenStream Constructor Description CollationKeyFilter(TokenStream input, Collator collator)
-
Uses of TokenStream in org.apache.lucene.document
Fields in org.apache.lucene.document declared as TokenStream Modifier and Type Field Description protected TokenStream
AbstractField. tokenStream
Methods in org.apache.lucene.document that return TokenStream Modifier and Type Method Description TokenStream
Field. tokenStreamValue()
The TokesStream for this field to be used when indexing, or null.TokenStream
Fieldable. tokenStreamValue()
The TokenStream for this field to be used when indexing, or null.TokenStream
NumericField. tokenStreamValue()
Returns aNumericTokenStream
for indexing the numeric value.Methods in org.apache.lucene.document with parameters of type TokenStream Modifier and Type Method Description void
Field. setTokenStream(TokenStream tokenStream)
Expert: sets the token stream to be used for indexing and causes isIndexed() and isTokenized() to return true.Constructors in org.apache.lucene.document with parameters of type TokenStream Constructor Description Field(String name, TokenStream tokenStream)
Create a tokenized and indexed field that is not stored.Field(String name, TokenStream tokenStream, Field.TermVector termVector)
Create a tokenized and indexed field that is not stored, optionally with storing term vectors.
-