Package org.apache.lucene.util.encoding
IntEncoder
and for most of the
encoders there is a matching IntDecoder
implementation (not all
encoders need a decoder).
An encoder encodes the integers that are passed to encode
into a
set output stream (see reInit
). One should always call close
when all
integers have been encoded, to ensure proper finish by the encoder. Some
encoders buffer values in-memory and encode in batches in order to
optimize the encoding, and not closing them may result in loss of
information or corrupt stream.
A proper and typical usage of an encoder looks like this:
Each encoder also implementsint[] data = <the values to encode> IntEncoder encoder = new VInt8IntEncoder(); OutputStream out = new ByteArrayOutputStream(); encoder.reInit(out); for (int val : data) { encoder.encode(val); } encoder.close(); // Print the bytes in binary byte[] bytes = out.toByteArray(); for (byte b : bytes) { System.out.println(Integer.toBinaryString(b)); }
createMatchingDecoder
which returns the matching decoder for this encoder.
As mentioned above, not all encoders have a matching decoder (like some
encoder filters which are explained next), however every encoder should
return a decoder following a call to that method. To complete the
example above, one can easily iterate over the decoded values like this:
IntDecoder d = e.createMatchingDecoder(); d.reInit(new ByteArrayInputStream(bytes)); long val; while ((val = d.decode()) != IntDecoder.EOS) { System.out.println(val); }
Some encoders don't perform any encoding at all, or do not include an
encoding logic. Those are called IntEncoderFilter
s. A filter is an
encoder which delegates the encoding task to a given encoder, however
performs additional logic before the values are sent for encoding. An
example is DGapIntEncoder
which encodes the gaps between values rather than the values themselves.
Another example is SortingIntEncoder
which sorts all the
values in ascending order before they are sent for encoding. This
encoder aggregates the values in its encode
implementation
and decoding only happens upon calling close
.
Extending IntEncoder
ExtendingIntEncoder
is a very
easy task. One only needs to implement encode
and
createMatchingDecoder
as the base implementation takes care of
re-initializing the output stream and closing it. The following example
illustrates how can one write an encoder (and a matching decoder) which
'tags' the stream with type/ID of the encoder. Such tagging is important
in scenarios where an application uses different encoders for different
streams, and wants to manage some sort of mapping between an encoder ID
to an IntEncoder/Decoder implementation, so a proper decoder will be
initialized on the fly:
And the matching decoder:public class TaggingIntEncoder extends IntEncoderFilter { public TaggingIntEncoder(IntEncoder encoder) { super(encoder); } @Override public void encode(int value) throws IOException { encoder.encode(value); } @Override public IntDecoder createMatchingDecoder() { return new TaggingIntDecoder(); } @Override public void reInit(OutputStream out) { super.reInit(os); // Assumes the application has a static EncodersMap class which is able to // return a unique ID for a given encoder. int encoderID = EncodersMap.getID(encoder); this.out.write(encoderID); } @Override public String toString() { return "Tagging (" + encoder.toString() + ")"; } }
The example implementspublic class TaggingIntDecoder extends IntDecoder { // Will be initialized upon calling reInit. private IntDecoder decoder; @Override public void reInit(InputStream in) { super.reInit(in); // Read the ID of the encoder that tagged this stream. int encoderID = in.read(); // Assumes EncodersMap can return the proper IntEncoder given the ID. decoder = EncodersMap.getEncoder(encoderID).createMatchingDecoder(); } @Override public long decode() throws IOException { return decoder.decode(); } @Override public String toString() { return "Tagging (" + decoder == null ? "none" : decoder.toString() + ")"; } }
TaggingIntEncoder
as a filter over another
encoder. Even though it does not do any filtering on the actual values, it feels
right to present it as a filter. Anyway, this is just an example code and one
can choose to implement it however it makes sense to the application. For
simplicity, error checking was omitted from the sample code.-
Class Summary Class Description ChunksIntEncoder AnIntEncoder
which encodes values in chunks.DGapIntDecoder AnIntDecoder
which wraps anotherIntDecoder
and reverts the d-gap that was encoded byDGapIntEncoder
.DGapIntEncoder AnIntEncoderFilter
which encodes the gap between the given values, rather than the values themselves.EightFlagsIntDecoder Decodes data which was encoded byEightFlagsIntEncoder
.EightFlagsIntEncoder AChunksIntEncoder
which encodes data in chunks of 8.FourFlagsIntDecoder Decodes data which was encoded byFourFlagsIntEncoder
.FourFlagsIntEncoder AChunksIntEncoder
which encodes values in chunks of 4.IntDecoder Decodes integers from a setInputStream
.IntEncoder Encodes integers to a setOutputStream
.IntEncoderFilter An abstract implementation ofIntEncoder
which is served as a filter on the values to encode.NOnesIntDecoder Decodes data which was encoded byNOnesIntEncoder
.NOnesIntEncoder A variation ofFourFlagsIntEncoder
which translates the data as follows: Values ≥ 2 are trnalsated tovalue+1
(2 ⇒ 3, 3 ⇒ 4 and so forth).SimpleIntDecoder A simple stream decoder which can decode values encoded withSimpleIntEncoder
.SimpleIntEncoder A simpleIntEncoder
, writing an integer as 4 raw bytes.SortingIntEncoder AnIntEncoderFilter
which sorts the values to encode in ascending order before encoding them.UniqueValuesIntEncoder AnIntEncoderFilter
which ensures only unique values are encoded.VInt8IntDecoder AnIntDecoder
which can decode values encoded byVInt8IntEncoder
.VInt8IntEncoder AnIntEncoder
which implements variable length encoding.