Package org.apache.lucene.misc
Class SweetSpotSimilarity
- java.lang.Object
-
- org.apache.lucene.search.Similarity
-
- org.apache.lucene.search.DefaultSimilarity
-
- org.apache.lucene.misc.SweetSpotSimilarity
-
- All Implemented Interfaces:
Serializable
public class SweetSpotSimilarity extends DefaultSimilarity
A similarity with a lengthNorm that provides for a "plateau" of equally good lengths, and tf helper functions.For lengthNorm, A global min/max can be specified to define the plateau of lengths that should all have a norm of 1.0. Below the min, and above the max the lengthNorm drops off in a sqrt function.
A per field min/max can be specified if different fields have different sweet spots.
For tf, baselineTf and hyperbolicTf functions are provided, which subclasses can choose between.
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.search.DefaultSimilarity
discountOverlaps
-
Fields inherited from class org.apache.lucene.search.Similarity
NO_DOC_ID_PROVIDED
-
-
Constructor Summary
Constructors Constructor Description SweetSpotSimilarity()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description float
baselineTf(float freq)
Implemented as:(x <= min) ? base : sqrt(x+(base**2)-min)
...but with a special case check for 0.float
computeLengthNorm(String fieldName, int numTerms)
Implemented as:1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 )
.float
computeNorm(String fieldName, FieldInvertState state)
Implemented asstate.getBoost() * lengthNorm(fieldName, numTokens)
where numTokens does not count overlap tokens if discountOverlaps is true by default or true for this specific field.float
hyperbolicTf(float freq)
Uses a hyperbolic tangent function that allows for a hard max...void
setBaselineTfFactors(float base, float min)
Sets the baseline and minimum function variables for baselineTfvoid
setHyperbolicTfFactors(float min, float max, double base, float xoffset)
Sets the function variables for the hyperbolicTf functionsvoid
setLengthNormFactors(int min, int max, float steepness)
Sets the default function variables used by lengthNorm when no field specific variables have been set.void
setLengthNormFactors(String field, int min, int max, float steepness, boolean discountOverlaps)
Sets the function variables used by lengthNorm for a specific named field.float
tf(int freq)
Delegates to baselineTf-
Methods inherited from class org.apache.lucene.search.DefaultSimilarity
coord, getDiscountOverlaps, idf, queryNorm, setDiscountOverlaps, sloppyFreq, tf
-
Methods inherited from class org.apache.lucene.search.Similarity
decodeNorm, decodeNormValue, encodeNorm, encodeNormValue, getDefault, getNormDecoder, idfExplain, idfExplain, idfExplain, lengthNorm, scorePayload, setDefault
-
-
-
-
Method Detail
-
setBaselineTfFactors
public void setBaselineTfFactors(float base, float min)
Sets the baseline and minimum function variables for baselineTf- See Also:
baselineTf(float)
-
setHyperbolicTfFactors
public void setHyperbolicTfFactors(float min, float max, double base, float xoffset)
Sets the function variables for the hyperbolicTf functions- Parameters:
min
- the minimum tf value to ever be returned (default: 0.0)max
- the maximum tf value to ever be returned (default: 2.0)base
- the base value to be used in the exponential for the hyperbolic function (default: 1.3)xoffset
- the midpoint of the hyperbolic function (default: 10.0)- See Also:
hyperbolicTf(float)
-
setLengthNormFactors
public void setLengthNormFactors(int min, int max, float steepness)
Sets the default function variables used by lengthNorm when no field specific variables have been set.
-
setLengthNormFactors
public void setLengthNormFactors(String field, int min, int max, float steepness, boolean discountOverlaps)
Sets the function variables used by lengthNorm for a specific named field.- Parameters:
field
- field namemin
- minimum valuemax
- maximum valuesteepness
- steepness of the curvediscountOverlaps
- if true,numOverlapTokens
will be subtracted fromnumTokens
; if false thennumOverlapTokens
will be assumed to be 0 (seeDefaultSimilarity.computeNorm(String, FieldInvertState)
for details).- See Also:
Similarity.lengthNorm(java.lang.String, int)
-
computeNorm
public float computeNorm(String fieldName, FieldInvertState state)
Implemented asstate.getBoost() * lengthNorm(fieldName, numTokens)
where numTokens does not count overlap tokens if discountOverlaps is true by default or true for this specific field.- Overrides:
computeNorm
in classDefaultSimilarity
- Parameters:
fieldName
- field namestate
- current processing state for this field- Returns:
- the calculated float norm
-
computeLengthNorm
public float computeLengthNorm(String fieldName, int numTerms)
Implemented as:1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 )
.This degrades to
1/sqrt(x)
when min and max are both 1 and steepness is 0.5:TODO: potential optimization is to just flat out return 1.0f if numTerms is between min and max.
-
tf
public float tf(int freq)
Delegates to baselineTf- Overrides:
tf
in classSimilarity
- Parameters:
freq
- the frequency of a term within a document- Returns:
- a score factor based on a term's within-document frequency
- See Also:
baselineTf(float)
-
baselineTf
public float baselineTf(float freq)
Implemented as:(x <= min) ? base : sqrt(x+(base**2)-min)
...but with a special case check for 0.This degrates to
sqrt(x)
when min and base are both 0
-
hyperbolicTf
public float hyperbolicTf(float freq)
Uses a hyperbolic tangent function that allows for a hard max...tf(x)=min+(max-min)/2*(((base**(x-xoffset)-base**-(x-xoffset))/(base**(x-xoffset)+base**-(x-xoffset)))+1)
This code is provided as a convenience for subclasses that want to use a hyperbolic tf function.
-
-