Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable/searchable tokens.
|
org.apache.lucene.analysis.standard |
Standards-based analyzers implemented with JFlex.
|
org.apache.lucene.collation |
CollationKeyFilter
converts each token into its binary CollationKey using the
provided Collator , and then encode the CollationKey
as a String using
IndexableBinaryStringTools , to allow it to be
stored as an index term. |
org.apache.lucene.index |
Code to maintain and access indices.
|
org.apache.lucene.util |
Some utility classes.
|
Modifier and Type | Class | Description |
---|---|---|
class |
ASCIIFoldingFilter |
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
class |
CachingTokenFilter |
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
class |
CharTokenizer |
An abstract base class for simple, character-oriented tokenizers.
|
class |
FilteringTokenFilter |
Abstract base class for TokenFilters that may remove tokens.
|
class |
ISOLatin1AccentFilter |
Deprecated.
If you build a new index, use
ASCIIFoldingFilter
which covers a superset of Latin 1. |
class |
KeywordMarkerFilter |
Marks terms as keywords via the
KeywordAttribute . |
class |
KeywordTokenizer |
Emits the entire input as a single token.
|
class |
LengthFilter |
Removes words that are too long or too short from the stream.
|
class |
LetterTokenizer |
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
class |
LimitTokenCountFilter |
This TokenFilter limits the number of tokens while indexing.
|
class |
LowerCaseFilter |
Normalizes token text to lower case.
|
class |
LowerCaseTokenizer |
LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together.
|
class |
NumericTokenStream |
Expert: This class provides a
TokenStream
for indexing numeric values that can be used by NumericRangeQuery or NumericRangeFilter . |
class |
PorterStemFilter |
Transforms the token stream as per the Porter stemming algorithm.
|
class |
StopFilter |
Removes stop words from a token stream.
|
class |
TeeSinkTokenFilter |
This TokenFilter provides the ability to set aside attribute states
that have already been analyzed.
|
static class |
TeeSinkTokenFilter.SinkTokenStream |
TokenStream output from a tee with optional filtering.
|
class |
TokenFilter |
A TokenFilter is a TokenStream whose input is another TokenStream.
|
class |
Tokenizer |
A Tokenizer is a TokenStream whose input is a Reader.
|
class |
TokenStream |
|
class |
TypeTokenFilter |
Removes tokens whose types appear in a set of blocked types from a token stream.
|
class |
WhitespaceTokenizer |
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
Modifier and Type | Method | Description |
---|---|---|
abstract boolean |
TeeSinkTokenFilter.SinkFilter.accept(AttributeSource source) |
Returns true, iff the current state of the passed-in
AttributeSource shall be stored
in the sink. |
Constructor | Description |
---|---|
CharTokenizer(AttributeSource source,
Reader input) |
Deprecated.
use
CharTokenizer(Version, AttributeSource, Reader) instead. |
CharTokenizer(Version matchVersion,
AttributeSource source,
Reader input) |
Creates a new
CharTokenizer instance |
KeywordTokenizer(AttributeSource source,
Reader input,
int bufferSize) |
|
LetterTokenizer(AttributeSource source,
Reader in) |
Deprecated.
use
LetterTokenizer(Version, AttributeSource, Reader) instead. |
LetterTokenizer(Version matchVersion,
AttributeSource source,
Reader in) |
Construct a new LetterTokenizer using a given
AttributeSource . |
LowerCaseTokenizer(AttributeSource source,
Reader in) |
Deprecated.
use
LowerCaseTokenizer(Version, AttributeSource, Reader)
instead. |
LowerCaseTokenizer(Version matchVersion,
AttributeSource source,
Reader in) |
Construct a new LowerCaseTokenizer using a given
AttributeSource . |
NumericTokenStream(AttributeSource source,
int precisionStep) |
Expert: Creates a token stream for numeric values with the specified
precisionStep using the given AttributeSource . |
Tokenizer(AttributeSource source) |
Deprecated.
use
Tokenizer(AttributeSource, Reader) instead. |
Tokenizer(AttributeSource source,
Reader input) |
Construct a token stream processing the given input using the given AttributeSource.
|
TokenStream(AttributeSource input) |
A TokenStream that uses the same attributes as the supplied one.
|
WhitespaceTokenizer(AttributeSource source,
Reader in) |
Deprecated.
|
WhitespaceTokenizer(Version matchVersion,
AttributeSource source,
Reader in) |
Construct a new WhitespaceTokenizer using a given
AttributeSource . |
Modifier and Type | Class | Description |
---|---|---|
class |
ClassicFilter |
Normalizes tokens extracted with
ClassicTokenizer . |
class |
ClassicTokenizer |
A grammar-based tokenizer constructed with JFlex
|
class |
StandardFilter |
Normalizes tokens extracted with
StandardTokenizer . |
class |
StandardTokenizer |
A grammar-based tokenizer constructed with JFlex.
|
class |
UAX29URLEmailTokenizer |
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
|
Constructor | Description |
---|---|
ClassicTokenizer(Version matchVersion,
AttributeSource source,
Reader input) |
Creates a new ClassicTokenizer with a given
AttributeSource . |
StandardTokenizer(Version matchVersion,
AttributeSource source,
Reader input) |
Creates a new StandardTokenizer with a given
AttributeSource . |
UAX29URLEmailTokenizer(AttributeSource source,
Reader input) |
Deprecated.
|
UAX29URLEmailTokenizer(Version matchVersion,
AttributeSource source,
Reader input) |
Creates a new UAX29URLEmailTokenizer with a given
AttributeSource . |
Modifier and Type | Class | Description |
---|---|---|
class |
CollationKeyFilter |
Converts each token into its
CollationKey , and then
encodes the CollationKey with IndexableBinaryStringTools , to allow
it to be stored as an index term. |
Modifier and Type | Method | Description |
---|---|---|
AttributeSource |
FieldInvertState.getAttributeSource() |
Modifier and Type | Method | Description |
---|---|---|
AttributeSource |
AttributeSource.cloneAttributes() |
Performs a clone of all
AttributeImpl instances returned in a new
AttributeSource instance. |
Modifier and Type | Method | Description |
---|---|---|
void |
AttributeSource.copyTo(AttributeSource target) |
Copies the contents of this
AttributeSource to the given target AttributeSource . |
Constructor | Description |
---|---|
AttributeSource(AttributeSource input) |
An AttributeSource that uses the same attributes as the supplied one.
|
Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.