Package org.apache.lucene.util.fst
Class Util
- java.lang.Object
-
- org.apache.lucene.util.fst.Util
-
public final class Util extends Object
Static helper methods.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
Util.MinResult<T>
Holds a single input (IntsRef) + output, returned byshortestPaths(org.apache.lucene.util.fst.FST<T>, org.apache.lucene.util.fst.FST.Arc<T>, java.util.Comparator<T>, int)
.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static <T> T
get(FST<T> fst, BytesRef input)
Looks up the output for this input, or null if the input is not acceptedstatic <T> T
get(FST<T> fst, IntsRef input)
Looks up the output for this input, or null if the input is not accepted.static IntsRef
getByOutput(FST<Long> fst, long targetOutput)
Reverse lookup (lookup by output instead of by input), in the special case when your FSTs outputs are strictly ascending.static <T> Util.MinResult<T>[]
shortestPaths(FST<T> fst, FST.Arc<T> fromNode, Comparator<T> comparator, int topN)
Starting from node, find the top N min cost completions to a final node.static BytesRef
toBytesRef(IntsRef input, BytesRef scratch)
Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.static <T> void
toDot(FST<T> fst, Writer out, boolean sameRank, boolean labelStates)
Dumps anFST
to a GraphViz'sdot
language description for visualization.static IntsRef
toIntsRef(BytesRef input, IntsRef scratch)
Just takes unsigned byte values from the BytesRef and converts into an IntsRef.static IntsRef
toUTF32(char[] s, int offset, int length, IntsRef scratch)
Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.static IntsRef
toUTF32(CharSequence s, IntsRef scratch)
Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.
-
-
-
Method Detail
-
get
public static <T> T get(FST<T> fst, IntsRef input) throws IOException
Looks up the output for this input, or null if the input is not accepted.- Throws:
IOException
-
get
public static <T> T get(FST<T> fst, BytesRef input) throws IOException
Looks up the output for this input, or null if the input is not accepted- Throws:
IOException
-
getByOutput
public static IntsRef getByOutput(FST<Long> fst, long targetOutput) throws IOException
Reverse lookup (lookup by output instead of by input), in the special case when your FSTs outputs are strictly ascending. This locates the input/output pair where the output is equal to the target, and will return null if that output does not exist.NOTE: this only works with FST
, only works when the outputs are ascending in order with the inputs and only works when you shared the outputs (pass doShare=true to PositiveIntOutputs.getSingleton(boolean)
). For example, simple ordinals (0, 1, 2, ...), or file offets (when appending to a file) fit this.- Throws:
IOException
-
shortestPaths
public static <T> Util.MinResult<T>[] shortestPaths(FST<T> fst, FST.Arc<T> fromNode, Comparator<T> comparator, int topN) throws IOException
Starting from node, find the top N min cost completions to a final node.NOTE: you must share the outputs when you build the FST (pass doShare=true to
PositiveIntOutputs.getSingleton(boolean)
).- Throws:
IOException
-
toDot
public static <T> void toDot(FST<T> fst, Writer out, boolean sameRank, boolean labelStates) throws IOException
Dumps anFST
to a GraphViz'sdot
language description for visualization. Example of use:PrintWriter pw = new PrintWriter("out.dot"); Util.toDot(fst, pw, true, true); pw.close();
and then, from command line:dot -Tpng -o out.png out.dot
Note: larger FSTs (a few thousand nodes) won't even render, don't bother.
- Parameters:
sameRank
- Iftrue
, the resultingdot
file will try to order states in layers of breadth-first traversal. This may mess up arcs, but makes the output FST's structure a bit clearer.labelStates
- Iftrue
states will have labels equal to their offsets in their binary format. Expands the graph considerably.- Throws:
IOException
- See Also:
- "http://www.graphviz.org/"
-
toUTF32
public static IntsRef toUTF32(CharSequence s, IntsRef scratch)
Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.
-
toUTF32
public static IntsRef toUTF32(char[] s, int offset, int length, IntsRef scratch)
Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.
-
toIntsRef
public static IntsRef toIntsRef(BytesRef input, IntsRef scratch)
Just takes unsigned byte values from the BytesRef and converts into an IntsRef.
-
-