hdf5storage.utilities

Module for various utility functions.

There are utility functions for low level reading and writing, setting and delete HDF5 attributes, encoding and decoding strings and complex arrays, etc.

does_dtype_have_a_zero_shape(dt)

Determine whether a dtype (or its fields) have zero shape.

write_data(f, grp, name, data, type_string, …)

Writes a piece of data into an open HDF5 file.

read_data(f, grp, name, options[, dsetgrp])

Writes a piece of data into an open HDF5 file.

write_object_array(f, data, options)

Writes an array of objects recursively.

read_object_array(f, data, options)

Reads an array of objects recursively.

next_unused_name_in_group(grp, length)

Gives a name that isn’t used in a Group.

convert_numpy_str_to_uint16(data)

Converts a numpy.unicode_ to UTF-16 in numpy.uint16 form.

convert_numpy_str_to_uint32(data)

Converts numpy.unicode_ to its numpy.uint32 representation.

convert_to_str(data)

Decodes data to the str type.

convert_to_numpy_str(data[, length])

Decodes data to Numpy unicode string (numpy.unicode_).

convert_to_numpy_bytes(data[, length])

Decodes data to Numpy UTF-8 econded string (numpy.bytes_).

decode_complex(data[, complex_names])

Decodes possibly complex data read from an HDF5 file.

encode_complex(data, complex_names)

Encodes complex data to having arbitrary complex field names.

get_attribute(target, name)

Gets an attribute from a Dataset or Group.

convert_attribute_to_string(value)

Convert an attribute value to a string.

get_attribute_string(target, name)

Gets a string attribute from a Dataset or Group.

convert_attribute_to_string_array(value)

Converts an Attribute value to a string array.

get_attribute_string_array(target, name)

Gets a string array Attribute from a Dataset or Group.

set_attribute(target, name, value)

Sets an attribute on a Dataset or Group.

set_attribute_string(target, name, value)

Sets an attribute to a string on a Dataset or Group.

set_attribute_string_array(target, name, …)

Sets an attribute to an array of string on a Dataset or Group.

set_attributes_all(target, attributes[, …])

Set Attributes in bulk and optionally discard others.

del_attribute(target, name)

Deletes an attribute on a Dataset or Group.

does_dtype_have_a_zero_shape

hdf5storage.utilities.does_dtype_have_a_zero_shape(dt)[source]

Determine whether a dtype (or its fields) have zero shape.

Determines whether the given numpy.dtype has a shape with a zero element or if one of its fields does, or if one of its fields’ fields does, and so on recursively. The following dtypes do not have zero shape.

  • 'uint8'

  • [('a', 'int32'), ('blah', 'float16', (3, 3))]

  • [('a', [('b', 'complex64')], (2, 1, 3))]

But the following do

  • ('uint8', (1, 0))

  • [('a', 'int32'), ('blah', 'float16', (3, 0))]

  • [('a', [('b', 'complex64')], (2, 0, 3))]

Parameters

dt (numpy.dtype) – The dtype to check.

Returns

yesno – Whether dt or one of its fields has a shape with at least one element that is zero.

Return type

bool

Raises

TypeError – If dt is not a numpy.dtype.

write_data

hdf5storage.utilities.write_data(f, grp, name, data, type_string, options)[source]

Writes a piece of data into an open HDF5 file.

Low level function to store a Python type (data) into the specified Group.

Changed in version 0.2: Added return value obj.

Parameters
  • f (h5py.File) – The open HDF5 file.

  • grp (h5py.Group or h5py.File) – The Group to place the data in.

  • name (str) – The name to write the data to.

  • data (any) – The data to write.

  • type_string (str or None) – The type string of the data, or None to deduce automatically.

  • options (hdf5storage.core.Options) – The options to use when writing.

Returns

obj – The base Dataset or Group having the name name in grp that was made, or None if nothing was written.

Return type

h5py.Dataset or h5py.Group or None

Raises

See also

hdf5storage.write

Higher level version.

read_data, hdf5storage.Options

read_data

hdf5storage.utilities.read_data(f, grp, name, options, dsetgrp=None)[source]

Writes a piece of data into an open HDF5 file.

Low level function to read a Python type of the specified name from specified Group.

Changed in version 0.2: Added argument dsetgrp.

Parameters
  • f (h5py.File) – The open HDF5 file.

  • grp (h5py.Group or h5py.File) – The Group to read the data from.

  • name (str) – The name of the data to read.

  • options (hdf5storage.core.Options) – The options to use when reading.

  • dsetgrp (h5py.Dataset or h5py.Group or None, optional) – The Dataset or Group object to read if that has already been obtained and thus should not be re-obtained (None otherwise). If given, overrides grp and name.

Returns

The data named name in Group grp.

Return type

data

Raises

See also

hdf5storage.read

Higher level version.

write_data, hdf5storage.Options

write_object_array

hdf5storage.utilities.write_object_array(f, data, options)[source]

Writes an array of objects recursively.

Writes the elements of the given object array recursively in the HDF5 Group options.group_for_references and returns an h5py.Reference array to all the elements.

Parameters
  • f (h5py.File) – The HDF5 file handle that is open.

  • data (numpy.ndarray of objects) – Numpy object array to write the elements of.

  • options (hdf5storage.core.Options) – hdf5storage options object.

Returns

obj_array – A reference array pointing to all the elements written to the HDF5 file. For those that couldn’t be written, the respective element points to the canonical empty.

Return type

numpy.ndarray of h5py.Reference

Raises

TypeNotMatlabCompatibleError – If writing a type not compatible with MATLAB and options.action_for_matlab_incompatible is set to 'error'.

read_object_array

hdf5storage.utilities.read_object_array(f, data, options)[source]

Reads an array of objects recursively.

Reads the elements of the given HDF5 Reference array recursively and constructs a numpy.object_ array from its elements, which is returned.

Parameters
  • f (h5py.File) – The HDF5 file handle that is open.

  • data (numpy.ndarray of h5py.Reference) – The array of HDF5 References to read and make an object array from.

  • options (hdf5storage.core.Options) – hdf5storage options object.

Raises

NotImplementedError – If reading the object from file is currently not supported.

Returns

obj_array – The Python object array containing the items pointed to by data.

Return type

numpy.ndarray of numpy.object_

next_unused_name_in_group

hdf5storage.utilities.next_unused_name_in_group(grp, length)[source]

Gives a name that isn’t used in a Group.

Generates a name of the desired length that is not a Dataset or Group in the given group. Note, if length is not large enough and grp is full enough, there may be no available names meaning that this function will hang.

Parameters
  • grp (h5py.Group or h5py.File) – The HDF5 Group (or File if at ‘/’) to generate an unused name in.

  • length (int) – Number of characters the name should be.

Returns

name – A name that isn’t already an existing Dataset or Group in grp.

Return type

str

convert_numpy_str_to_uint16

hdf5storage.utilities.convert_numpy_str_to_uint16(data)[source]

Converts a numpy.unicode_ to UTF-16 in numpy.uint16 form.

Convert a numpy.unicode_ or an array of them (they are UTF-32 strings) to UTF-16 in the equivalent array of numpy.uint16. The conversion will throw an exception if any characters cannot be converted to UTF-16. Strings are expanded along rows (across columns) so a 2x3x4 array of 10 element strings will get turned into a 2x2x40 array of uint16’s if every UTF-32 character converts easily to a UTF-16 singlet, as opposed to a UTF-16 doublet.

Parameters

data (numpy.unicode_ or numpy.ndarray of numpy.unicode_) – The string or array of them to convert.

Returns

array – The result of the conversion.

Return type

numpy.ndarray of numpy.uint16

Raises

UnicodeEncodeError – If a UTF-32 character has no UTF-16 representation.

convert_numpy_str_to_uint32

hdf5storage.utilities.convert_numpy_str_to_uint32(data)[source]

Converts numpy.unicode_ to its numpy.uint32 representation.

Convert a numpy.unicode_ or an array of them (they are UTF-32 strings) into the equivalent array of numpy.uint32 that is byte for byte identical. Strings are expanded along rows (across columns) so a 2x3x4 array of 10 element strings will get turned into a 2x3x40 array of uint32’s.

Parameters

data (numpy.unicode_ or numpy.ndarray of numpy.unicode_) – The string or array of them to convert.

Returns

array – The result of the conversion.

Return type

numpy.ndarray of numpy.uint32

convert_to_str

hdf5storage.utilities.convert_to_str(data)[source]

Decodes data to the str type.

Decodes data to a str. If it can’t be decoded, it is returned as is. Unsigned integers, Python bytes, and Numpy strings (numpy.unicode_ and numpy.bytes_). Python 3.x bytes and numpy.bytes_ are assumed to be encoded in UTF-8.

Parameters

data (some type) – Data decode into an str string.

Returns

s – If data can be decoded into a str, the decoded version is returned. Otherwise, data is returned unchanged.

Return type

str or data

convert_to_numpy_str

hdf5storage.utilities.convert_to_numpy_str(data, length=None)[source]

Decodes data to Numpy unicode string (numpy.unicode_).

Decodes data to Numpy unicode string (UTF-32), which is numpy.unicode_, or an array of them. If it can’t be decoded, it is returned as is. Unsigned integers, Python string types (str, bytes), and numpy.bytes_ are supported. If it is an array of numpy.bytes_, an array of those all converted to numpy.unicode_ is returned. bytes and numpy.bytes_ are assumed to be encoded in UTF-8.

For an array of unsigned integers, it may be desirable to make an array with strings of some specified length as opposed to an array of the same size with each element being a one element string. This naturally arises when converting strings to unsigned integer types in the first place, so it needs to be reversible. The length parameter specifies how many to group together into a string (desired string length). For 1d arrays, this is along its only dimension. For higher dimensional arrays, it is done along each row (across columns). So, for a 3x5x10 input array of uints and a length of 5, the output array would be a 3x5x2 of 5 element strings.

Parameters
  • data (some type) – Data decode into a Numpy unicode string.

  • length (int or None, optional) – The number of consecutive elements (in the case of unsigned integer data) to compose each string in the output array from. None indicates the full amount for a 1d array or the number of columns (full length of row) for a higher dimension array.

Returns

s – If data can be decoded into a numpy.unicode_ or a numpy.ndarray of them, the decoded version is returned. Otherwise, data is returned unchanged.

Return type

numpy.unicode_ or numpy.ndarray of numpy.unicode_ or data

See also

convert_to_str, convert_to_numpy_bytes, numpy.unicode_

convert_to_numpy_bytes

hdf5storage.utilities.convert_to_numpy_bytes(data, length=None)[source]

Decodes data to Numpy UTF-8 econded string (numpy.bytes_).

Decodes data to a Numpy UTF-8 encoded string, which is numpy.bytes_, or an array of them in which case it will be ASCII encoded instead. If it can’t be decoded, it is returned as is. Unsigned integers, Python string types (str, bytes), and numpy.unicode_ (UTF-32) are supported.

For an array of unsigned integers, it may be desirable to make an array with strings of some specified length as opposed to an array of the same size with each element being a one element string. This naturally arises when converting strings to unsigned integer types in the first place, so it needs to be reversible. The length parameter specifies how many to group together into a string (desired string length). For 1d arrays, this is along its only dimension. For higher dimensional arrays, it is done along each row (across columns). So, for a 3x5x10 input array of uints and a length of 5, the output array would be a 3x5x2 of 5 element strings.

Parameters
  • data (some type) – Data decode into a Numpy UTF-8 encoded string/s.

  • length (int or None, optional) – The number of consecutive elements (in the case of unsigned integer data) to compose each string in the output array from. None indicates the full amount for a 1d array or the number of columns (full length of row) for a higher dimension array.

Returns

b – If data can be decoded into a numpy.bytes_ or a numpy.ndarray of them, the decoded version is returned. Otherwise, data is returned unchanged.

Return type

numpy.bytes_ or numpy.ndarray of numpy.bytes_ or data

See also

convert_to_str, convert_to_numpy_str, numpy.bytes_

decode_complex

hdf5storage.utilities.decode_complex(data, complex_names=(None, None))[source]

Decodes possibly complex data read from an HDF5 file.

Decodes possibly complex datasets read from an HDF5 file. HDF5 doesn’t have a native complex type, so they are stored as H5T_COMPOUND types with fields such as ‘r’ and ‘i’ for the real and imaginary parts. As there is no standardization for field names, the field names have to be given explicitly, or the fieldnames in data analyzed for proper decoding to figure out the names. A variety of reasonably expected combinations of field names are checked and used if available to decode. If decoding is not possible, it is returned as is.

Parameters
  • data (arraylike) – The data read from an HDF5 file, that might be complex, to decode into the proper Numpy complex type.

  • complex_names (tuple of 2 str and/or Nones, optional) – tuple of the names to use (in order) for the real and imaginary fields. A None indicates that various common field names should be tried.

Returns

c – If data can be decoded into a complex type, the decoded complex version is returned. Otherwise, data is returned unchanged.

Return type

decoded data or data

See also

encode_complex

Notes

Currently looks for real field names of ('r', 're', 'real') and imaginary field names of ('i', 'im', 'imag', 'imaginary') ignoring case.

encode_complex

hdf5storage.utilities.encode_complex(data, complex_names)[source]

Encodes complex data to having arbitrary complex field names.

Encodes complex data to have the real and imaginary field names given in complex_numbers. This is needed because the field names have to be set so that it can be written to an HDF5 file with the right field names (HDF5 doesn’t have a native complex type, so H5T_COMPOUND have to be used).

Parameters
  • data (arraylike) – The data to encode as a complex type with the desired real and imaginary part field names.

  • complex_names (tuple of 2 str) – tuple of the names to use (in order) for the real and imaginary fields.

Returns

ddata encoded into having the specified field names for the real and imaginary parts.

Return type

encoded data

See also

decode_complex

get_attribute

hdf5storage.utilities.get_attribute(target, name)[source]

Gets an attribute from a Dataset or Group.

Gets the value of an Attribute if it is present (get None if not).

Parameters
  • target (Dataset or Group) – Dataset or Group to get the attribute of.

  • name (str) – Name of the attribute to get.

Returns

The value of the attribute if it is present, or None if it isn’t.

Return type

value

convert_attribute_to_string

hdf5storage.utilities.convert_attribute_to_string(value)[source]

Convert an attribute value to a string.

Converts the attribute value to a string if possible (get None if isn’t a string type).

New in version 0.2.

Parameters

value – The Attribute value.

Returns

s – The str value of the attribute if the conversion is possible, or None if not.

Return type

str or None

get_attribute_string

hdf5storage.utilities.get_attribute_string(target, name)[source]

Gets a string attribute from a Dataset or Group.

Gets the value of an Attribute that is a string if it is present (get None if it is not present or isn’t a string type).

Parameters
  • target (Dataset or Group) – Dataset or Group to get the string attribute of.

  • name (str) – Name of the attribute to get.

Returns

s – The str value of the attribute if it is present, or None if it isn’t or isn’t a type that can be converted to str

Return type

str or None

convert_attribute_to_string_array

hdf5storage.utilities.convert_attribute_to_string_array(value)[source]

Converts an Attribute value to a string array.

Converts the value of an Attribute to a string array if possible (get None if not).

New in version 0.2.

Parameters

value – The Attribute value.

Returns

array – The converted string array value if possible, or None if it isn’t.

Return type

list of str or None

get_attribute_string_array

hdf5storage.utilities.get_attribute_string_array(target, name)[source]

Gets a string array Attribute from a Dataset or Group.

Gets the value of an Attribute that is a string array if it is present (get None if not).

Parameters
  • target (Dataset or Group) – Dataset or Group to get the attribute of.

  • name (str) – Name of the string array Attribute to get.

Returns

array – The string array value of the Attribute if it is present, or None if it isn’t.

Return type

list of str or None

set_attribute

hdf5storage.utilities.set_attribute(target, name, value)[source]

Sets an attribute on a Dataset or Group.

If the attribute name doesn’t exist yet, it is created. If it already exists, it is overwritten if it differs from value.

Notes

set_attributes_all is the fastest way to set and delete Attributes in bulk.

Parameters
  • target (Dataset or Group) – Dataset or Group to set the attribute of.

  • name (str) – Name of the attribute to set.

  • value (numpy type other than numpy.unicode_) – Value to set the attribute to.

set_attribute_string

hdf5storage.utilities.set_attribute_string(target, name, value)[source]

Sets an attribute to a string on a Dataset or Group.

If the attribute name doesn’t exist yet, it is created. If it already exists, it is overwritten if it differs from value.

Notes

set_attributes_all is the fastest way to set and delete Attributes in bulk.

Parameters
  • target (Dataset or Group) – Dataset or Group to set the string attribute of.

  • name (str) – Name of the attribute to set.

  • value (string) – Value to set the attribute to. Can be any sort of string type that will convert to a numpy.bytes_

set_attribute_string_array

hdf5storage.utilities.set_attribute_string_array(target, name, string_list)[source]

Sets an attribute to an array of string on a Dataset or Group.

If the attribute name doesn’t exist yet, it is created. If it already exists, it is overwritten with the list of string string_list (they will be vlen strings).

Notes

set_attributes_all is the fastest way to set and delete Attributes in bulk.

Parameters
  • target (Dataset or Group) – Dataset or Group to set the string array attribute of.

  • name (str) – Name of the attribute to set.

  • string_list (list of str) – List of strings to set the attribute to. Strings must be str

set_attributes_all

hdf5storage.utilities.set_attributes_all(target, attributes, discard_others=True)[source]

Set Attributes in bulk and optionally discard others.

Sets each Attribute in turn (modifying it in place if possible if it is already present) and optionally discarding all other Attributes not explicitly set. This function yields much greater performance than the required individual calls to set_attribute, set_attribute_string, set_attribute_string_array and del_attribute put together.

New in version 0.2.

Parameters
  • target (Dataset or Group) – Dataset or Group to set the Attributes of.

  • attributes (dict) – The Attributes to set. The keys (str) are the names. The values are tuple of the Attribute kind and the value to set. Valid kinds are 'string_array', 'string', and 'value'. The values must correspond to what set_attribute_string_array, set_attribute_string and set_attribute would take respectively.

  • discard_others (bool, optional) – Whether to discard all other Attributes not explicitly set (default) or not.

del_attribute

hdf5storage.utilities.del_attribute(target, name)[source]

Deletes an attribute on a Dataset or Group.

If the attribute name exists, it is deleted.

Parameters
  • target (Dataset or Group) – Dataset or Group to delete attribute of.

  • name (str) – Name of the attribute to delete.