Package org.apache.lucene.analysis
Class MockTokenizer
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.Tokenizer
-
- org.apache.lucene.analysis.MockTokenizer
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public class MockTokenizer extends org.apache.lucene.analysis.Tokenizer
Tokenizer for testing.This tokenizer is a replacement for
WHITESPACE
,SIMPLE
, andKEYWORD
tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. This tokenizer has the following behavior:- An internal state-machine is used for checking consumer consistency. These checks can
be disabled with
setEnableChecks(boolean)
. - For convenience, optionally lowercases terms that it outputs.
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_MAX_TOKEN_LENGTH
static int
KEYWORD
Acts Similar to KeywordTokenizer.static int
SIMPLE
Acts like LetterTokenizer.static int
WHITESPACE
Acts Similar to WhitespaceTokenizer
-
Constructor Summary
Constructors Constructor Description MockTokenizer(Reader input)
MockTokenizer(Reader input, int pattern, boolean lowerCase)
MockTokenizer(Reader input, int pattern, boolean lowerCase, int maxTokenLength)
MockTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader input, int pattern, boolean lowerCase, int maxTokenLength)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
void
end()
boolean
incrementToken()
protected boolean
isTokenChar(int c)
protected int
normalize(int c)
protected int
readCodePoint()
void
reset()
void
reset(Reader input)
void
setEnableChecks(boolean enableChecks)
Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
-
-
-
Field Detail
-
WHITESPACE
public static final int WHITESPACE
Acts Similar to WhitespaceTokenizer- See Also:
- Constant Field Values
-
KEYWORD
public static final int KEYWORD
Acts Similar to KeywordTokenizer. TODO: Keyword returns an "empty" token for an empty reader...- See Also:
- Constant Field Values
-
SIMPLE
public static final int SIMPLE
Acts like LetterTokenizer.- See Also:
- Constant Field Values
-
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTH
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
MockTokenizer
public MockTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader input, int pattern, boolean lowerCase, int maxTokenLength)
-
MockTokenizer
public MockTokenizer(Reader input, int pattern, boolean lowerCase, int maxTokenLength)
-
MockTokenizer
public MockTokenizer(Reader input, int pattern, boolean lowerCase)
-
MockTokenizer
public MockTokenizer(Reader input)
-
-
Method Detail
-
incrementToken
public final boolean incrementToken() throws IOException
- Specified by:
incrementToken
in classorg.apache.lucene.analysis.TokenStream
- Throws:
IOException
-
readCodePoint
protected int readCodePoint() throws IOException
- Throws:
IOException
-
isTokenChar
protected boolean isTokenChar(int c)
-
normalize
protected int normalize(int c)
-
reset
public void reset() throws IOException
- Overrides:
reset
in classorg.apache.lucene.analysis.TokenStream
- Throws:
IOException
-
close
public void close() throws IOException
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Overrides:
close
in classorg.apache.lucene.analysis.Tokenizer
- Throws:
IOException
-
reset
public void reset(Reader input) throws IOException
- Overrides:
reset
in classorg.apache.lucene.analysis.Tokenizer
- Throws:
IOException
-
end
public void end() throws IOException
- Overrides:
end
in classorg.apache.lucene.analysis.TokenStream
- Throws:
IOException
-
setEnableChecks
public void setEnableChecks(boolean enableChecks)
Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.
-
-