Class BinaryCasSerDes6


  • public class BinaryCasSerDes6
    extends Object
    User callable serialization and deserialization of the CAS in a compressed Binary Format This serializes/deserializes the state of the CAS. It has the capability to map type systems, so the sending and receiving type systems do not have to be the same. - types and features are matched by name, and features must have the same range (slot kind) - types and/or features in one type system not in the other are skipped over Header specifies to reader the format, and the compression level. How to Serialize: 1) create an instance of this class a) if doing a delta serialization, pass in the mark and a ReuseInfo object that was created after deserializing this CAS initially. b) if serializaing to a target with a different type system, pass the target's type system impl object so the serialization can filter the types for the target. 2) call serialize() to serialize the CAS 3) If doing serialization to a target from which you expect to receive back a delta CAS, create a ReuseInfo object from this object and reuse it for deserializing the delta CAS. TypeSystemImpl objects are lazily augmented by customized TypeInfo instances for each type encountered in serializing or deserializing. These are preserved for future calls, so their setup / initialization is only needed the first time. TypeSystemImpl objects are also lazily augmented by typeMappers for individual different target typesystems; these too are preserved and reused on future calls. Compressed Binary CASes are designed to be "self-describing" - The format of the compressed binary CAS, including version info, is inserted at the beginning so that a proper deserialization method can be automatically chosen. Compressed Binary format implemented by this class supports type system mapping. Types in the source which are not in the target (or vice versa) are omitted. Types with "extra" features have their extra features omitted (or on deserialization, they are set to their default value - null, or 0, etc.). Feature slots which hold references to types not in the target type system are replaced with 0 (null). How to Deserialize: 1) get an appropriate CAS to deserialize into. For delta CAS, it does not have to be empty, but it must be the originating CAS from which the delta was produced. 2) If the case is one where the target type system == the CAS's, and the serialized for is not Delta, then, call aCAS.reinit(source). Otherwise, create an instance of this class -%gt; xxx a) Assuming the object being deserialized has a different type system, set the "target" type system to the TypeSystemImpl instance of the object being deserialized. a) if delta deserializing, pass in the ReuseInfo object created when the CAS was serialized 3) call xxx.deserialize(inputStream) Compression/Decompression Works in two stages: application of Zip/Unzip to particular sub-collections of CAS data, grouped according to similar data distribution collection of like kinds of data (to make the zipping more effective) There can be up to ~20 of these collections, such as control info, float-exponents, string chars Deserialization: Read all bytes, create separate ByteArrayInputStreams for each segment create appropriate unzip data input streams for these Slow but expensive data: extra type system info - lazily created and added to shared TypeSystemImpl object set up per type actually referenced mapper for type system - lazily created and added to shared TypeSystemImpl object in identity-map cache (size limit = 10 per source type system?) - key is target typesystemimpl. Defaulting: flags: doMeasurements, compressLevel, CompressStrategy Per serialize call: cas, output, [target ts], [mark for delta] Per deserialize call: cas, input, [target ts], whether-to-save-info-for-delta-serialization CASImpl has instance method with defaulting args for serialization. CASImpl has reinit which works with compressed binary serialization objects if no type mapping If type mapping, (new BinaryCasSerDes6(cas, marker-or-null, targetTypeSystem (for stream being deserialized), reuseInfo-or-null) .deserialize(in-stream) Use Cases, filtering and delta ************************************************************************** * (de)serialize * filter? * delta? * Use case ************************************************************************** * serialize * N * N * Saving a Cas, * * * * sending Cas to service with identical ts ************************************************************************** * serialize * Y * N * sending Cas to service with * * * * different ts (a guaranteed subset) ************************************************************************** * serialize * N * Y * returning Cas to client * * * * uses info saved when deserializing * * * * (?? saving just a delta to disk??) ************************************************************************** * serialize * Y * Y * NOT SUPPORTED (not needed) ************************************************************************** * deserialize * N * N * reading/(receiving) CAS, identical TS ************************************************************************** * deserialize * Y * N * reading/receiving CAS, different TS * * * * ts not guaranteed to be superset * * * * for "reading" case. ************************************************************************** * deserialize * N * Y * receiving CAS, identical TS * * * * uses info saved when serializing ************************************************************************** * deserialize * Y * Y * receiving CAS, different TS (tgt a feature subset) * * * * uses info saved when serializing **************************************************************************
    • Constructor Detail

      • BinaryCasSerDes6

        public BinaryCasSerDes6​(AbstractCas aCas,
                                MarkerImpl mark,
                                TypeSystemImpl tgtTs,
                                BinaryCasSerDes6.ReuseInfo rfs,
                                boolean doMeasurements,
                                BinaryCasSerDes6.CompressLevel compressLevel,
                                BinaryCasSerDes6.CompressStrat compressStrategy)
                         throws ResourceInitializationException
        Setup to serialize or deserialize using binary compression, with (optional) type mapping and only processing reachable Feature Structures
        Parameters:
        aCas - required - refs the CAS being serialized or deserialized into
        mark - if not null is the serialization mark for delta serialization. Unused for deserialization.
        tgtTs - if not null is the target type system. For serialization - this is a subset of the CASs TS
        rfs - For delta serialization - must be not null, and the saved value after deserializing the original before any modifications / additions made. For normal serialization - can be null, but if not, is used in place of re-calculating, for speed up For delta deserialization - must not be null, and is the saved value after serializing to the service For normal deserialization - must be null
        doMeasurements - if true, measurements are done (on serialization)
        compressLevel - if not null, specifies enum instance for compress level
        compressStrategy - if not null, specifies enum instance for compress strategy
        Throws:
        ResourceInitializationException - if the target type system is incompatible with the source type system
      • BinaryCasSerDes6

        public BinaryCasSerDes6​(AbstractCas cas,
                                MarkerImpl mark,
                                TypeSystemImpl tgtTs,
                                BinaryCasSerDes6.ReuseInfo rfs,
                                boolean doMeasurements)
                         throws ResourceInitializationException
        Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures, output measurements
        Parameters:
        cas - -
        mark - -
        tgtTs - -
        rfs - Reused Feature Structure information - speed up on serialization, required on delta deserialization
        doMeasurements - -
        Throws:
        ResourceInitializationException - if the target type system is incompatible with the source type system
    • Method Detail

      • deserialize

        public void deserialize​(InputStream istream,
                                AllowPreexistingFS allowPreexistingFS)
                         throws IOException
        Version used by uima-as to read delta cas from remote parallel steps
        Parameters:
        istream - input stream
        allowPreexistingFS - what to do if item already exists below the mark
        Throws:
        IOException - passthru
      • compareCASes

        public boolean compareCASes​(CASImpl c1,
                                    CASImpl c2)
        Compare 2 CASes, with perhaps different type systems. If the type systems are different, construct a type mapper and use that to selectively ignore types or features not in other type system The Mapper filters C1 -%gt; C2. Compare only feature structures reachable via indexes or refs The order must match
        Parameters:
        c1 - CAS to compare
        c2 - CAS to compare
        Returns:
        true if equal (for types / features in both)