Package org.apache.lucene.util.fst
Class Builder<T>
- java.lang.Object
-
- org.apache.lucene.util.fst.Builder<T>
-
public class Builder<T> extends Object
Builds a minimal FST (maps an IntsRef term to an arbitrary output) from pre-sorted terms with outputs. The FST becomes an FSA if you use NoOutputs. The FST is written on-the-fly into a compact serialized format byte array, which can be saved to / loaded from a Directory or used directly for traversal. The FST is always finite (no cycles).NOTE: The algorithm is described at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698
The parameterized type T is the output type. See the subclasses ofOutputs
.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
Builder.Arc<T>
Expert: holds a pending (seen but not yet serialized) arc.static class
Builder.FreezeTail<T>
Expert: this is invoked by Builder whenever a suffix is serialized.static class
Builder.UnCompiledNode<T>
Expert: holds a pending (seen but not yet serialized) Node.
-
Constructor Summary
Constructors Constructor Description Builder(FST.INPUT_TYPE inputType, int minSuffixCount1, int minSuffixCount2, boolean doShareSuffix, boolean doShareNonSingletonNodes, int shareMaxTailLength, Outputs<T> outputs, Builder.FreezeTail<T> freezeTail, boolean willPackFST)
Instantiates an FST/FSA builder with all the possible tuning and construction tweaks.Builder(FST.INPUT_TYPE inputType, Outputs<T> outputs)
Instantiates an FST/FSA builder without any pruning.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(IntsRef input, T output)
It's OK to add the same input twice in a row with different outputs, as long as outputs impls the merge method.FST<T>
finish()
Returns final FST.int
getMappedStateCount()
long
getTermCount()
int
getTotStateCount()
void
setAllowArrayArcs(boolean b)
Pass false to disable the array arc optimization while building the FST; this will make the resulting FST smaller but slower to traverse.
-
-
-
Constructor Detail
-
Builder
public Builder(FST.INPUT_TYPE inputType, Outputs<T> outputs)
Instantiates an FST/FSA builder without any pruning. A shortcut toBuilder(FST.INPUT_TYPE, int, int, boolean, boolean, int, Outputs, FreezeTail, boolean)
with pruning options turned off.
-
Builder
public Builder(FST.INPUT_TYPE inputType, int minSuffixCount1, int minSuffixCount2, boolean doShareSuffix, boolean doShareNonSingletonNodes, int shareMaxTailLength, Outputs<T> outputs, Builder.FreezeTail<T> freezeTail, boolean willPackFST)
Instantiates an FST/FSA builder with all the possible tuning and construction tweaks. Read parameter documentation carefully.- Parameters:
inputType
- The input type (transition labels). Can be anything fromFST.INPUT_TYPE
enumeration. Shorter types will consume less memory. Strings (character sequences) are represented asFST.INPUT_TYPE.BYTE4
(full unicode codepoints).minSuffixCount1
- If pruning the input graph during construction, this threshold is used for telling if a node is kept or pruned. If transition_count(node) >= minSuffixCount1, the node is kept.minSuffixCount2
- (Note: only Mike McCandless knows what this one is really doing...)doShareSuffix
- Iftrue
, the shared suffixes will be compacted into unique paths. This requires an additional hash map for lookups in memory. Setting this parameter tofalse
creates a single path for all input sequences. This will result in a larger graph, but may require less memory and will speed up construction.doShareNonSingletonNodes
- Only used if doShareSuffix is true. Set this to true to ensure FST is fully minimal, at cost of more CPU and more RAM during building.shareMaxTailLength
- Only used if doShareSuffix is true. Set this to Integer.MAX_VALUE to ensure FST is fully minimal, at cost of more CPU and more RAM during building.outputs
- The output type for each input sequence. Applies only if building an FST. For FSA, useNoOutputs.getSingleton()
andNoOutputs.getNoOutput()
as the singleton output object.willPackFST
- Pass true if you will pack the FST before saving. This causes the FST to create additional data structures internally to facilitate packing, but it means the resulting FST cannot be saved: it must first be packed usingFST.pack(int, int)
}.
-
-
Method Detail
-
getTotStateCount
public int getTotStateCount()
-
getTermCount
public long getTermCount()
-
getMappedStateCount
public int getMappedStateCount()
-
setAllowArrayArcs
public void setAllowArrayArcs(boolean b)
Pass false to disable the array arc optimization while building the FST; this will make the resulting FST smaller but slower to traverse.
-
add
public void add(IntsRef input, T output) throws IOException
It's OK to add the same input twice in a row with different outputs, as long as outputs impls the merge method.- Throws:
IOException
-
finish
public FST<T> finish() throws IOException
Returns final FST. NOTE: this will return null if nothing is accepted by the FST.- Throws:
IOException
-
-