Class WildcardStringParser


  • public class WildcardStringParser
    extends java.lang.Object
    Deprecated.
    Will probably be removed in the near future
    This class parses arbitrary strings against a wildcard string mask provided. The wildcard characters are '*' and '?'.

    The string masks provided are treated as case sensitive.
    Null-valued string masks as well as null valued strings to be parsed, will lead to rejection.

    This class is custom designed for wildcard string parsing and is several times faster than the implementation based on the Jakarta Regexp package.


    This task is performed based on regular expression techniques. The possibilities of string generation with the well-known wildcard characters stated above, represent a subset of the possibilities of string generation with regular expressions.
    The '*' corresponds to ([Union of all characters in the alphabet])*
    The '?' corresponds to ([Union of all characters in the alphabet])
          These expressions are not suited for textual representation at all, I must say. Is there any math tags included in HTML?

    The complete meta-language for regular expressions are much larger. This fact makes it fairly straightforward to build data structures for parsing because the amount of rules of building these structures are quite limited, as stated below.

    To bring this over to mathematical terms: The parser ia a nondeterministic finite automaton (latin) representing the grammar which is stated by the string mask. The language accepted by this automaton is the set of all strings accepted by this automaton.
    The formal automaton quintuple consists of:

    1. A finite set of states, depending on the wildcard string mask. For each character in the mask a state representing that character is created. The number of states therefore coincides with the length of the mask.
    2. An alphabet consisting of all legal filename characters - included the two wildcard characters '*' and '?'. This alphabet is hard-coded in this class. It contains {a .. �}, {A .. �}, {0 .. 9}, {.}, {_}, {-}, {*} and {?}.
    3. A finite set of initial states, here only consisting of the state corresponding to the first character in the mask.
    4. A finite set of final states, here only consisting of the state corresponding to the last character in the mask.
    5. A transition relation that is a finite set of transitions satisfying some formal rules.
      This implementation on the other hand, only uses ad-hoc rules which start with an initial setup of the states as a sequence according to the string mask.
      Additionally, the following rules completes the building of the automaton:
      1. If the next state represents the same character as the next character in the string to test - go to this next state.
      2. If the next state represents '*' - go to this next state.
      3. If the next state represents '?' - go to this next state.
      4. If a '*' is followed by one or more '?', the last of these '?' state counts as a '*' state. Some extra checks regarding the number of characters read must be imposed if this is the case...
      5. If the next character in the string to test does not coincide with the next state - go to the last state representing '*'. If there are none - rejection.
      6. If there are no subsequent state (final state) and the state represents '*' - acceptance.
      7. If there are no subsequent state (final state) and the end of the string to test is reached - acceptance.

      Disclaimer: This class does not build a finite automaton according to formal mathematical rules. The proper way of implementation should be finding the complete set of transition relations, decomposing these into rules accepted by a deterministic finite automaton and finally build this automaton to be used for string parsing. Instead, this class is ad-hoc implemented based on the informal transition rules stated above. Therefore the correctness cannot be guaranteed before extensive testing has been imposed on this class... anyway, I think I have succeeded. Parsing faults must be reported to the author.

    Examples of usage:
    This example will return "Accepted!".

     WildcardStringParser parser = new WildcardStringParser("*_28????.jp*");
     if (parser.parseString("gupu_280915.jpg")) {
         System.out.println("Accepted!");
     } else {
         System.out.println("Not accepted!");
     }
     

    Theories and concepts are based on the book Elements of the Theory of Computation, by Harry l. Lewis and Christos H. Papadimitriou, (c) 1981 by Prentice Hall.

    Author:
    Eirik Torske
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static char[] ALPHABET
      Deprecated.
      Field ALPHABET
      static char FREE_PASS_CHARACTER
      Deprecated.
      Field FREE_PASS_CHARACTER
      static char FREE_RANGE_CHARACTER
      Deprecated.
      Field FREE_RANGE_CHARACTER
    • Constructor Summary

      Constructors 
      Constructor Description
      WildcardStringParser​(java.lang.String pStringMask)
      Deprecated.
      Creates a wildcard string parser.
      WildcardStringParser​(java.lang.String pStringMask, boolean pDebugging)
      Deprecated.
      Creates a wildcard string parser.
      WildcardStringParser​(java.lang.String pStringMask, boolean pDebugging, java.io.PrintStream pDebuggingPrintStream)
      Deprecated.
      Creates a wildcard string parser.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      protected java.lang.Object clone()
      Deprecated.
       
      boolean equals​(java.lang.Object pObject)
      Deprecated.
      Method equals
      protected void finalize()
      Deprecated.
       
      java.lang.String getStringMask()
      Deprecated.
      Gets the string mask that was used when building the parser atomaton.
      int hashCode()
      Deprecated.
      Method hashCode
      static boolean isFreePassCharacter​(char pCharToCheck)
      Deprecated.
      Tests if a certain character is the designated "free-pass" character ('?').
      static boolean isFreeRangeCharacter​(char pCharToCheck)
      Deprecated.
      Tests if a certain character is the designated "free-range" character ('*').
      static boolean isInAlphabet​(char pCharToCheck)
      Deprecated.
      Tests if a certain character is a valid character in the alphabet that is applying for this automaton.
      static boolean isWildcardCharacter​(char pCharToCheck)
      Deprecated.
      Tests if a certain character is a wildcard character ('*' or '?').
      boolean parseString​(java.lang.String pStringToParse)
      Deprecated.
      Parses a string according to the rules stated above.
      java.lang.String toString()
      Deprecated.
      Method toString
      • Methods inherited from class java.lang.Object

        getClass, notify, notifyAll, wait, wait, wait
    • Field Detail

      • ALPHABET

        public static final char[] ALPHABET
        Deprecated.
        Field ALPHABET
      • FREE_RANGE_CHARACTER

        public static final char FREE_RANGE_CHARACTER
        Deprecated.
        Field FREE_RANGE_CHARACTER
        See Also:
        Constant Field Values
      • FREE_PASS_CHARACTER

        public static final char FREE_PASS_CHARACTER
        Deprecated.
        Field FREE_PASS_CHARACTER
        See Also:
        Constant Field Values
    • Constructor Detail

      • WildcardStringParser

        public WildcardStringParser​(java.lang.String pStringMask)
        Deprecated.
        Creates a wildcard string parser.
        Parameters:
        pStringMask - the wildcard string mask.
      • WildcardStringParser

        public WildcardStringParser​(java.lang.String pStringMask,
                                    boolean pDebugging)
        Deprecated.
        Creates a wildcard string parser.
        Parameters:
        pStringMask - the wildcard string mask.
        pDebugging - true will cause debug messages to be emitted to System.out.
      • WildcardStringParser

        public WildcardStringParser​(java.lang.String pStringMask,
                                    boolean pDebugging,
                                    java.io.PrintStream pDebuggingPrintStream)
        Deprecated.
        Creates a wildcard string parser.
        Parameters:
        pStringMask - the wildcard string mask.
        pDebugging - true will cause debug messages to be emitted.
        pDebuggingPrintStream - the java.io.PrintStream to which the debug messages will be emitted.
    • Method Detail

      • isInAlphabet

        public static boolean isInAlphabet​(char pCharToCheck)
        Deprecated.
        Tests if a certain character is a valid character in the alphabet that is applying for this automaton.
      • isFreeRangeCharacter

        public static boolean isFreeRangeCharacter​(char pCharToCheck)
        Deprecated.
        Tests if a certain character is the designated "free-range" character ('*').
      • isFreePassCharacter

        public static boolean isFreePassCharacter​(char pCharToCheck)
        Deprecated.
        Tests if a certain character is the designated "free-pass" character ('?').
      • isWildcardCharacter

        public static boolean isWildcardCharacter​(char pCharToCheck)
        Deprecated.
        Tests if a certain character is a wildcard character ('*' or '?').
      • getStringMask

        public java.lang.String getStringMask()
        Deprecated.
        Gets the string mask that was used when building the parser atomaton.
        Returns:
        the string mask used for building the parser automaton.
      • parseString

        public boolean parseString​(java.lang.String pStringToParse)
        Deprecated.
        Parses a string according to the rules stated above.
        Parameters:
        pStringToParse - the string to parse.
        Returns:
        true if and only if the string are accepted by the automaton.
      • toString

        public java.lang.String toString()
        Deprecated.
        Method toString
        Overrides:
        toString in class java.lang.Object
        Returns:
      • equals

        public boolean equals​(java.lang.Object pObject)
        Deprecated.
        Method equals
        Overrides:
        equals in class java.lang.Object
        Parameters:
        pObject -
        Returns:
      • hashCode

        public int hashCode()
        Deprecated.
        Method hashCode
        Overrides:
        hashCode in class java.lang.Object
        Returns:
      • clone

        protected java.lang.Object clone()
                                  throws java.lang.CloneNotSupportedException
        Deprecated.
        Overrides:
        clone in class java.lang.Object
        Throws:
        java.lang.CloneNotSupportedException
      • finalize

        protected void finalize()
                         throws java.lang.Throwable
        Deprecated.
        Overrides:
        finalize in class java.lang.Object
        Throws:
        java.lang.Throwable