Class ICUTransformFilter

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class ICUTransformFilter
    extends org.apache.lucene.analysis.TokenFilter
    A TokenFilter that transforms text with ICU.

    ICU provides text-transformation functionality via its Transliteration API. Although script conversion is its most common use, a Transliterator can actually perform a more general class of tasks. In fact, Transliterator defines a very general API which specifies only that a segment of the input text is replaced by new text. The particulars of this conversion are determined entirely by subclasses of Transliterator.

    Some useful transformations for search are built-in:

    • Conversion from Traditional to Simplified Chinese characters
    • Conversion from Hiragana to Katakana
    • Conversion from Fullwidth to Halfwidth forms.
    • Script conversions, for example Serbian Cyrillic to Latin

    Example usage:

    stream = new ICUTransformFilter(stream, Transliterator.getInstance("Traditional-Simplified"));

    For more details, see the ICU User Guide.
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

        org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
    • Field Summary

      • Fields inherited from class org.apache.lucene.analysis.TokenFilter

        input
    • Constructor Summary

      Constructors 
      Constructor Description
      ICUTransformFilter​(org.apache.lucene.analysis.TokenStream input, com.ibm.icu.text.Transliterator transform)
      Create a new ICUTransformFilter that transforms text on the given stream.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean incrementToken()  
      • Methods inherited from class org.apache.lucene.analysis.TokenFilter

        close, end, reset
      • Methods inherited from class org.apache.lucene.util.AttributeSource

        addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
    • Constructor Detail

      • ICUTransformFilter

        public ICUTransformFilter​(org.apache.lucene.analysis.TokenStream input,
                                  com.ibm.icu.text.Transliterator transform)
        Create a new ICUTransformFilter that transforms text on the given stream.
        Parameters:
        input - TokenStream to filter.
        transform - Transliterator to transform the text.
    • Method Detail

      • incrementToken

        public boolean incrementToken()
                               throws IOException
        Specified by:
        incrementToken in class org.apache.lucene.analysis.TokenStream
        Throws:
        IOException