Uses of Class
org.apache.lucene.analysis.Tokenizer
-
Packages that use Tokenizer Package Description org.apache.lucene.analysis API and code to convert text into indexable/searchable tokens.org.apache.lucene.analysis.ar Analyzer for Arabic.org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).org.apache.lucene.analysis.cn Analyzer for Chinese, which indexes unigrams (individual chinese characters).org.apache.lucene.analysis.cn.smart Analyzer for Simplified Chinese, which indexes words.org.apache.lucene.analysis.icu.segmentation Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.org.apache.lucene.analysis.in Analysis components for Indian languages.org.apache.lucene.analysis.ja Analyzer for Japanese.org.apache.lucene.analysis.ngram Character n-gram tokenizers and filters.org.apache.lucene.analysis.path Analysis components for path-like strings such as filenames.org.apache.lucene.analysis.ru Analyzer for Russian.org.apache.lucene.analysis.standard Standards-based analyzers implemented with JFlex.org.apache.lucene.analysis.wikipedia Tokenizer that is aware of Wikipedia syntax. -
-
Uses of Tokenizer in org.apache.lucene.analysis
Subclasses of Tokenizer in org.apache.lucene.analysis Modifier and Type Class Description class
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.class
EmptyTokenizer
Emits no tokensclass
KeywordTokenizer
Emits the entire input as a single token.class
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.class
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together.class
MockTokenizer
Tokenizer for testing.class
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.Fields in org.apache.lucene.analysis declared as Tokenizer Modifier and Type Field Description protected Tokenizer
ReusableAnalyzerBase.TokenStreamComponents. source
Constructors in org.apache.lucene.analysis with parameters of type Tokenizer Constructor Description TokenStreamComponents(Tokenizer source)
Creates a newReusableAnalyzerBase.TokenStreamComponents
instance.TokenStreamComponents(Tokenizer source, TokenStream result)
Creates a newReusableAnalyzerBase.TokenStreamComponents
instance. -
Uses of Tokenizer in org.apache.lucene.analysis.ar
Subclasses of Tokenizer in org.apache.lucene.analysis.ar Modifier and Type Class Description class
ArabicLetterTokenizer
Deprecated.(3.1) UseStandardTokenizer
instead. -
Uses of Tokenizer in org.apache.lucene.analysis.cjk
Subclasses of Tokenizer in org.apache.lucene.analysis.cjk Modifier and Type Class Description class
CJKTokenizer
Deprecated.Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead. -
Uses of Tokenizer in org.apache.lucene.analysis.cn
Subclasses of Tokenizer in org.apache.lucene.analysis.cn Modifier and Type Class Description class
ChineseTokenizer
Deprecated.UseStandardTokenizer
instead, which has the same functionality. -
Uses of Tokenizer in org.apache.lucene.analysis.cn.smart
Subclasses of Tokenizer in org.apache.lucene.analysis.cn.smart Modifier and Type Class Description class
SentenceTokenizer
Tokenizes input text into sentences. -
Uses of Tokenizer in org.apache.lucene.analysis.icu.segmentation
Subclasses of Tokenizer in org.apache.lucene.analysis.icu.segmentation Modifier and Type Class Description class
ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/) -
Uses of Tokenizer in org.apache.lucene.analysis.in
Subclasses of Tokenizer in org.apache.lucene.analysis.in Modifier and Type Class Description class
IndicTokenizer
Deprecated.(3.6) UseStandardTokenizer
instead. -
Uses of Tokenizer in org.apache.lucene.analysis.ja
Subclasses of Tokenizer in org.apache.lucene.analysis.ja Modifier and Type Class Description class
JapaneseTokenizer
Tokenizer for Japanese that uses morphological analysis. -
Uses of Tokenizer in org.apache.lucene.analysis.ngram
Subclasses of Tokenizer in org.apache.lucene.analysis.ngram Modifier and Type Class Description class
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).class
NGramTokenizer
Tokenizes the input into n-grams of the given size(s). -
Uses of Tokenizer in org.apache.lucene.analysis.path
Subclasses of Tokenizer in org.apache.lucene.analysis.path Modifier and Type Class Description class
PathHierarchyTokenizer
Tokenizer for path-like hierarchies.class
ReversePathHierarchyTokenizer
Tokenizer for domain-like hierarchies. -
Uses of Tokenizer in org.apache.lucene.analysis.ru
Subclasses of Tokenizer in org.apache.lucene.analysis.ru Modifier and Type Class Description class
RussianLetterTokenizer
Deprecated.UseStandardTokenizer
instead, which has the same functionality. -
Uses of Tokenizer in org.apache.lucene.analysis.standard
Subclasses of Tokenizer in org.apache.lucene.analysis.standard Modifier and Type Class Description class
ClassicTokenizer
A grammar-based tokenizer constructed with JFlexclass
StandardTokenizer
A grammar-based tokenizer constructed with JFlex.class
UAX29URLEmailTokenizer
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs. -
Uses of Tokenizer in org.apache.lucene.analysis.wikipedia
Subclasses of Tokenizer in org.apache.lucene.analysis.wikipedia Modifier and Type Class Description class
WikipediaTokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax.
-