fileslice
¶
Utilities for getting array slices out of file-like objects
calc_slicedefs (sliceobj, in_shape, itemsize, ...) |
Return parameters for slicing array with sliceobj given memory layout |
canonical_slicers (sliceobj, shape[, check_inds]) |
Return canonical version of sliceobj for array shape shape |
fileslice (fileobj, sliceobj, shape, dtype[, ...]) |
Slice array in fileobj using sliceobj slicer and array definitions |
fill_slicer (slicer, in_len) |
Return slice object with Nones filled out to match in_len |
is_fancy (sliceobj) |
Returns True if sliceobj is attempting fancy indexing |
optimize_read_slicers (sliceobj, in_shape, ...) |
Calculates slices to read from disk, and apply after reading |
optimize_slicer (slicer, dim_len, all_full, ...) |
Return maybe modified slice and post-slice slicing for slicer |
predict_shape (sliceobj, in_shape) |
Predict shape of array from slicing array shape shape with sliceobj |
read_segments (fileobj, segments, n_bytes) |
Read n_bytes byte data implied by segments from fileobj |
slice2len (slicer, in_len) |
Output length after slicing original length in_len with slicer |
slice2outax (ndim, sliceobj) |
Matching output axes for input array ndim ndim and slice sliceobj |
slicers2segments (read_slicers, in_shape, ...) |
Get segments from read_slicers given input in_shape and memory steps |
strided_scalar (shape[, scalar]) |
Return array shape shape where all entries point to value scalar |
threshold_heuristic (slicer, dim_len, stride) |
Whether to force full axis read or contiguous read of stepped slice |
calc_slicedefs¶
-
nibabel.fileslice.
calc_slicedefs
(sliceobj, in_shape, itemsize, offset, order, heuristic=<function threshold_heuristic>)¶ Return parameters for slicing array with sliceobj given memory layout
Calculate the best combination of skips / (read + discard) to use for reading the data from disk / memory, then generate corresponding segments, the disk offsets and read lengths to read the memory. If we have chosen some (read + discard) optimization, then we need to discard the surplus values from the read array using post_slicers, a slicing tuple that takes the array as read from a file-like object, and returns the array we want.
Parameters: sliceobj : object
something that can be used to slice an array as in
arr[sliceobj]
in_shape : sequence
shape of underlying array to be sliced
itemsize : int
element size in array (in bytes)
offset : int
offset of array data in underlying file or memory buffer
order : {‘C’, ‘F’}
memory layout of underlying array
heuristic : callable, optional
function taking slice object, dim_len, stride length as arguments, returning one of ‘full’, ‘contiguous’, None. See
optimize_slicer()
andthreshold_heuristic()
Returns: segments : list
list of 2 element lists where lists are (offset, length), giving absolute memory offset in bytes and number of bytes to read
read_shape : tuple
shape with which to interpret memory as read from segments. Interpreting the memory read from segments with this shape, and a dtype, gives an intermediate array - call this
R
post_slicers : tuple
Any new slicing to be applied to the array
R
after reading via segments and reshaping via read_shape. Slices are in terms of read_shape. If empty, no new slicing to apply
canonical_slicers¶
-
nibabel.fileslice.
canonical_slicers
(sliceobj, shape, check_inds=True)¶ Return canonical version of sliceobj for array shape shape
sliceobj is a slicer for an array
A
implied by shape.- Expand sliceobj with
slice(None)
to add any missing (implied) axes in sliceobj - Find any slicers in sliceobj that do a full axis slice and replace by
slice(None)
- Replace any floating point values for slicing with integers
- Replace negative integer slice values with equivalent positive integers.
Does not handle fancy indexing (indexing with arrays or array-like indices)
Parameters: sliceobj : object
something that can be used to slice an array as in
arr[sliceobj]
shape : sequence
shape of array that will be indexed by sliceobj
check_inds : {True, False}, optional
Whether to check if integer indices are out of bounds
Returns: can_slicers : tuple
version of sliceobj for which Ellipses have been expanded, missing (implied) dimensions have been appended, and slice objects equivalent to
slice(None)
have been replaced byslice(None)
, integer axes have been checked, and negative indices set to positive equivalent- Expand sliceobj with
fileslice¶
-
nibabel.fileslice.
fileslice
(fileobj, sliceobj, shape, dtype, offset=0, order='C', heuristic=<function threshold_heuristic>)¶ Slice array in fileobj using sliceobj slicer and array definitions
fileobj contains the contiguous binary data for an array
A
of shape, dtype, memory layout shape, dtype, order, with the binary data starting at file offset offset.Our job is to return the sliced array
A[sliceobj]
in the most efficient way in terms of memory and time.Sometimes it will be quicker to read memory that we will later throw away, to save time we might lose doing short seeks on fileobj. Call these alternatives: (read + discard); and skip. This routine guesses when to (read+discard) or skip using the callable heuristic, with a default using a hard threshold for the memory gap large enough to prefer a skip.
Parameters: fileobj : file-like object
binary file-like object. Implements
read
andseek
sliceobj : object
something that can be used to slice an array as in
arr[sliceobj]
shape : sequence
shape of full array inside fileobj
dtype : dtype object
dtype of array inside fileobj
offset : int, optional
offset of array data within fileobj
order : {‘C’, ‘F’}, optional
memory layout of array in fileobj
heuristic : callable, optional
function taking slice object, axis length, stride length as arguments, returning one of ‘full’, ‘contiguous’, None. See
optimize_slicer()
and seethreshold_heuristic()
for an example.Returns: sliced_arr : array
Array in fileobj as sliced with sliceobj
fill_slicer¶
-
nibabel.fileslice.
fill_slicer
(slicer, in_len)¶ Return slice object with Nones filled out to match in_len
Also fixes too large stop / start values according to slice() slicing rules.
The returned slicer can have a None as slicer.stop if slicer.step is negative and the input slicer.stop is None. This is because we can’t represent the
stop
as an integer, because -1 has a different meaning.Parameters: slicer : slice object
in_len : int
length of axis on which slicer will be applied
Returns: can_slicer : slice object
slice with start, stop, step set to explicit values, with the exception of
stop
for negative step, which is None for the case of slicing down through the first element
is_fancy¶
-
nibabel.fileslice.
is_fancy
(sliceobj)¶ Returns True if sliceobj is attempting fancy indexing
Parameters: sliceobj : object
something that can be used to slice an array as in
arr[sliceobj]
Returns: tf: bool :
True if sliceobj represents fancy indexing, False for basic indexing
optimize_read_slicers¶
-
nibabel.fileslice.
optimize_read_slicers
(sliceobj, in_shape, itemsize, heuristic)¶ Calculates slices to read from disk, and apply after reading
Parameters: sliceobj : object
something that can be used to slice an array as in
arr[sliceobj]
. Can be assumed to be canonical in the sense ofcanonical_slicers
in_shape : sequence
shape of underlying array to be sliced. Array for in_shape assumed to be already in ‘F’ order. Reorder shape / sliceobj for slicing a ‘C’ array before passing to this function.
itemsize : int
element size in array (bytes)
heuristic : callable
function taking slice object, axis length, and stride length as arguments, returning one of ‘full’, ‘contiguous’, None. See
optimize_slicer()
; seethreshold_heuristic()
for an example.Returns: read_slicers : tuple
sliceobj maybe rephrased to fill out dimensions that are better read from disk and later trimmed to their original size with post_slicers. read_slicers implies a block of memory to be read from disk. The actual disk positions come from slicers2segments run over read_slicers. Includes any
newaxis
dimensions in sliceobjpost_slicers : tuple
Any new slicing to be applied to the read array after reading. The post_slicers discard any memory that we read to save time, but that we don’t need for the slice. Include any
newaxis
dimension added by sliceobj
optimize_slicer¶
-
nibabel.fileslice.
optimize_slicer
(slicer, dim_len, all_full, is_slowest, stride, heuristic=<function threshold_heuristic>)¶ Return maybe modified slice and post-slice slicing for slicer
Parameters: slicer : slice object or int
dim_len : int
length of axis along which to slice
all_full : bool
Whether dimensions up until now have been full (all elements)
is_slowest : bool
Whether this dimension is the slowest changing in memory / on disk
stride : int
size of one step along this axis
heuristic : callable, optional
function taking slice object, dim_len, stride length as arguments, returning one of ‘full’, ‘contiguous’, None. See
threshold_heuristic()
for an example.Returns: to_read : slice object or int
maybe modified slice based on slicer expressing what data should be read from an underlying file or buffer. to_read must always have positive
step
(because we don’t want to go backwards in the buffer / file)post_slice : slice object
slice to be applied after array has been read. Applies any transformations in slicer that have not been applied in to_read. If axis will be dropped by to_read slicing, so no slicing would make sense, return string
dropped
Notes
This is the heart of the algorithm for making segments from slice objects.
A contiguous slice is a slice with
slice.step in (1, -1)
A full slice is a continuous slice returning all elements.
The main question we have to ask is whether we should transform to_read, post_slice to prefer a full read and partial slice. We only do this in the case of all_full==True. In this case we might benefit from reading a continuous chunk of data even if the slice is not continuous, or reading all the data even if the slice is not full. Apply a heuristic heuristic to decide whether to do this, and adapt to_read and post_slice slice accordingly.
Otherwise (apart from constraint to be positive) return to_read unaltered and post_slice as
slice(None)
predict_shape¶
-
nibabel.fileslice.
predict_shape
(sliceobj, in_shape)¶ Predict shape of array from slicing array shape shape with sliceobj
Parameters: sliceobj : object
something that can be used to slice an array as in
arr[sliceobj]
in_shape : sequence
shape of array that could be sliced by sliceobj
Returns: out_shape : tuple
predicted shape arising from slicing array shape in_shape with sliceobj
read_segments¶
-
nibabel.fileslice.
read_segments
(fileobj, segments, n_bytes)¶ Read n_bytes byte data implied by segments from fileobj
Parameters: fileobj : file-like object
Implements seek and read
segments : sequence
list of 2 sequences where sequences are (offset, length), giving absolute file offset in bytes and number of bytes to read
n_bytes : int
total number of bytes that will be read
Returns: buffer : buffer object
object implementing buffer protocol, such as byte string or ndarray or mmap or ctypes
c_char_array
slice2len¶
-
nibabel.fileslice.
slice2len
(slicer, in_len)¶ Output length after slicing original length in_len with slicer Parameters ———- slicer : slice object in_len : int
Returns: out_len : int
Length after slicing
Notes
Returns same as
len(np.arange(in_len)[slicer])
slice2outax¶
-
nibabel.fileslice.
slice2outax
(ndim, sliceobj)¶ Matching output axes for input array ndim ndim and slice sliceobj
Parameters: ndim : int
number of axes in input array
sliceobj : object
something that can be used to slice an array as in
arr[sliceobj]
Returns: out_ax_inds : tuple
Say
A` is a (pretend) input array of `ndim` dimensions. Say ``B = A[sliceobj]
. out_ax_inds has one value per axis inA
giving corresponding axis inB
.
slicers2segments¶
-
nibabel.fileslice.
slicers2segments
(read_slicers, in_shape, offset, itemsize)¶ Get segments from read_slicers given input in_shape and memory steps
Parameters: read_slicers : object
something that can be used to slice an array as in
arr[sliceobj]
Slice objects can by be assumed canonical as incanonical_slicers
, and positive as in_positive_slice
in_shape : sequence
shape of underlying array on disk before reading
offset : int
offset of array data in underlying file or memory buffer
itemsize : int
element size in array (in bytes)
Returns: segments : list
list of 2 element lists where lists are [offset, length], giving absolute memory offset in bytes and number of bytes to read
strided_scalar¶
-
nibabel.fileslice.
strided_scalar
(shape, scalar=0.0)¶ Return array shape shape where all entries point to value scalar
Parameters: shape : sequence
Shape of output array.
scalar : scalar
Scalar value with which to fill array.
Returns: strided_arr : array
Array of shape shape for which all values == scalar, built by setting all strides of strided_arr to 0, so the scalar is broadcast out to the full array shape.
threshold_heuristic¶
-
nibabel.fileslice.
threshold_heuristic
(slicer, dim_len, stride, skip_thresh=256)¶ Whether to force full axis read or contiguous read of stepped slice
Allows
fileslice()
to sometimes read memory that it will throw away in order to get maximum speed. In other words, trade memory for fewer disk reads.Parameters: slicer : slice object, or int
If slice, can be assumed to be full as in
fill_slicer
dim_len : int
length of axis being sliced
stride : int
memory distance between elements on this axis
skip_thresh : int, optional
Memory gap threshold in bytes above which to prefer skipping memory rather than reading it and later discarding.
Returns: action : {‘full’, ‘contiguous’, None}
Gives the suggested optimization for reading the data
- ‘full’ - read whole axis
- ‘contiguous’ - read all elements between start and stop
- None - read only memory needed for output
Notes
Let’s say we are in the middle of reading a file at the start of some memory length \(B\) bytes. We don’t need the memory, and we are considering whether to read it anyway (then throw it away) (READ) or stop reading, skip \(B\) bytes and restart reading from there (SKIP).
After trying some more fancy algorithms, a hard threshold (skip_thresh) for the maximum skip distance seemed to work well, as measured by times on
nibabel.benchmarks.bench_fileslice