HDF5  1.13.0
Dataset Creation Properties

Functions

herr_t H5Pset_layout (hid_t plist_id, H5D_layout_t layout)
 Sets the type of storage used to store the raw data for a dataset. More...
 
H5D_layout_t H5Pget_layout (hid_t plist_id)
 Returns the layout of the raw data for a dataset. More...
 
herr_t H5Pset_chunk (hid_t plist_id, int ndims, const hsize_t dim[])
 Sets the size of the chunks used to store a chunked layout dataset. More...
 
int H5Pget_chunk (hid_t plist_id, int max_ndims, hsize_t dim[])
 Retrieves the size of chunks for the raw data of a chunked layout dataset. More...
 
herr_t H5Pset_szip (hid_t plist_id, unsigned options_mask, unsigned pixels_per_block)
 Sets up use of the SZIP compression filter. More...
 

Detailed Description

Function Documentation

◆ H5Pget_chunk()

int H5Pget_chunk ( hid_t  plist_id,
int  max_ndims,
hsize_t  dim[] 
)

Retrieves the size of chunks for the raw data of a chunked layout dataset.

Parameters
[in]plist_idDataset creation property list identifier
[in]max_ndimsSize of the dims array
[out]dimsArray to store the chunk dimensions
Returns
Returns chunk dimensionality if successful; otherwise returns a negative value.

H5Pget_chunk() retrieves the size of chunks for the raw data of a chunked layout dataset. This function is only valid for dataset creation property lists. At most, max_ndims elements of dims will be initialized.

Since
1.0.0

◆ H5Pget_layout()

H5D_layout_t H5Pget_layout ( hid_t  plist_id)

Returns the layout of the raw data for a dataset.

Parameters
[in]plist_idDataset creation property list identifier
Returns
Returns the layout type (a non-negative value) of a dataset creation property list if successful. Valid return values are:
  • H5D_COMPACT: Raw data is stored in the object header in the file.
  • H5D_CONTIGUOUS: Raw data is stored separately from the object header in one contiguous chunk in the file.
  • H5D_CHUNKED: Raw data is stored separately from the object header in chunks in separate locations in the file.
  • H5D_VIRTUAL: Raw data is drawn from multiple datasets in different files.
Otherwise, returns a negative value indicating failure.

H5Pget_layout() returns the layout of the raw data for a dataset. This function is only valid for dataset creation property lists.

Note that a compact storage layout may affect writing data to the dataset with parallel applications. See the H5Dwrite() documentation for details.

Version
1.10.0 H5D_VIRTUAL and #H5D_VIRTUAL_F added in this release.
Since
1.0.0

◆ H5Pset_chunk()

herr_t H5Pset_chunk ( hid_t  plist_id,
int  ndims,
const hsize_t  dim[] 
)

Sets the size of the chunks used to store a chunked layout dataset.

Parameters
[in]plist_idDataset creation property list identifier
[in]ndimsThe number of dimensions of each chunk
[in]dimAn array defining the size, in dataset elements, of each chunk
Returns
Returns a non-negative value if successful; otherwise returns a negative value.

H5Pset_chunk() sets the size of the chunks used to store a chunked layout dataset. This function is only valid for dataset creation property lists.

The ndims parameter currently must be the same size as the rank of the dataset.

The values of the dim array define the size of the chunks to store the dataset's raw data. The unit of measure for dim values is dataset elements.

As a side-effect of this function, the layout of the dataset is changed to H5D_CHUNKED, if it is not already so set.

Note
Chunk size cannot exceed the size of a fixed-size dataset. For example, a dataset consisting of a 5x4 fixed-size array cannot be defined with 10x10 chunks. Chunk maximums:
  • The maximum number of elements in a chunk is 232-1 which is equal to 4,294,967,295. If the number of elements in a chunk is set via H5Pset_chunk() to a value greater than 232-1, then H5Pset_chunk() will fail.
  • The maximum size for any chunk is 4GB. If a chunk that is larger than 4GB attempts to be written with H5Dwrite(), then H5Dwrite() will fail.
See also
H5Pset_layout(), H5Dwrite()
Since
1.0.0

◆ H5Pset_layout()

herr_t H5Pset_layout ( hid_t  plist_id,
H5D_layout_t  layout 
)

Sets the type of storage used to store the raw data for a dataset.

Parameters
[in]plist_idDataset creation property list identifier
[in]layoutType of storage layout for raw data
Returns
Returns a non-negative value if successful; otherwise returns a negative value.

H5Pset_layout() sets the type of storage used to store the raw data for a dataset. This function is only valid for dataset creation property lists.

Valid values for layout are:

  • H5D_COMPACT: Store raw data in the dataset object header in file. This should only be used for datasets with small amounts of raw data. The raw data size limit is 64K (65520 bytes). Attempting to create a dataset with raw data larger than this limit will cause the H5Dcreate() call to fail.
  • H5D_CONTIGUOUS: Store raw data separately from the object header in one large chunk in the file.
  • H5D_CHUNKED: Store raw data separately from the object header as chunks of data in separate locations in the file.
  • H5D_VIRTUAL: Draw raw data from multiple datasets in different files.

Note that a compact storage layout may affect writing data to the dataset with parallel applications. See the note in H5Dwrite() documentation for details.

Version
1.10.0 H5D_VIRTUAL added in this release.
Since
1.0.0

◆ H5Pset_szip()

herr_t H5Pset_szip ( hid_t  plist_id,
unsigned  options_mask,
unsigned  pixels_per_block 
)

Sets up use of the SZIP compression filter.

Parameters
[in]plist_idDataset creation property list identifier
[in]options_maskA bit-mask conveying the desired SZIP options; Valid values are H5_SZIP_EC_OPTION_MASK and H5_SZIP_NN_OPTION_MASK.
[in]pixels_per_blockThe number of pixels or data elements in each data block
Returns
Returns a non-negative value if successful; otherwise returns a negative value.

H5Pset_szip() sets an SZIP compression filter, H5Z_FILTER_SZIP, for a dataset. SZIP is a compression method designed for use with scientific data.

Before proceeding, all users should review the “Limitations” section below.

Users familiar with SZIP outside the HDF5 context may benefit from reviewing the Note “For Users Familiar with SZIP in Other Contexts” below.

In the text below, the term pixel refers to an HDF5 data element. This terminology derives from SZIP compression's use with image data, where pixel referred to an image pixel.

The SZIP bits_per_pixel value (see Note, below) is automatically set, based on the HDF5 datatype. SZIP can be used with atomic datatypes that may have size of 8, 16, 32, or 64 bits. Specifically, a dataset with a datatype that is 8-, 16-, 32-, or 64-bit signed or unsigned integer; char; or 32- or 64-bit float can be compressed with SZIP. See Note, below, for further discussion of the the SZIP bits_per_pixel setting.

SZIP options are passed in an options mask, options_mask, as follows.

Option Description (Mutually exclusive; select one.)
H5_SZIP_EC_OPTION_MASK Selects entropy coding method
H5_SZIP_NN_OPTION_MASK Selects nearest neighbor coding method

The following guidelines can be used in determining which option to select:

  • The entropy coding method, the EC option specified by H5_SZIP_EC_OPTION_MASK, is best suited for data that has been processed. The EC method works best for small numbers.
  • The nearest neighbor coding method, the NN option specified by H5_SZIP_NN_OPTION_MASK, preprocesses the data then the applies EC method as above.

Other factors may affect results, but the above criteria provides a good starting point for optimizing data compression.

SZIP compresses data block by block, with a user-tunable block size. This block size is passed in the parameter pixels_per_block and must be even and not greater than 32, with typical values being 8, 10, 16, or 32. This parameter affects compression ratio; the more pixel values vary, the smaller this number should be to achieve better performance.

In HDF5, compression can be applied only to chunked datasets. If pixels_per_block is bigger than the total number of elements in a dataset chunk, H5Pset_szip() will succeed but the subsequent call to H5Dcreate() will fail; the conflict can be detected only when the property list is used.

To achieve optimal performance for SZIP compression, it is recommended that a chunk's fastest-changing dimension be equal to N times pixels_per_block where N is the maximum number of blocks per scan line allowed by the SZIP library. In the current version of SZIP, N is set to 128.

SZIP compression is an optional HDF5 filter.

Limitations:

  • SZIP compression cannot be applied to compound, array, variable-length, enumeration, or any other user-defined datatypes. If an SZIP filter is set in a dataset creation property list used to create a dataset containing a non-allowed datatype, the call to H5Dcreate() will fail; the conflict can be detected only when the property list is used.
  • Users should be aware that there are factors that affect one’s rights and ability to use SZIP compression by reviewing the SZIP copyright notice.
Note
For Users Familiar with SZIP in Other Contexts:
The following notes are of interest primarily to those who have used SZIP compression outside of the HDF5 context. In non-HDF5 applications, SZIP typically requires that the user application supply additional parameters:
  • pixels_in_object, the number of pixels in the object to be compressed
  • bits_per_pixel, the number of bits per pixel
  • pixels_per_scanline, the number of pixels per scan line
These values need not be independently supplied in the HDF5 environment as they are derived from the datatype and dataspace, which are already known. In particular, HDF5 sets pixels_in_object to the number of elements in a chunk and bits_per_pixel to the size of the element or pixel datatype.
The following algorithm is used to set pixels_per_scanline:
  • If the size of a chunk's fastest-changing dimension, size, is greater than 4K, set pixels_per_scanline to 128 times pixels_per_block.
  • If size is less than 4K but greater than pixels_per_block, set pixels_per_scanline to the minimum of size and 128 times pixels_per_block.
  • If size is less than pixels_per_block but greater than the number elements in the chunk, set pixels_per_scanline to the minimum of the number elements in the chunk and 128 times pixels_per_block.
The HDF5 datatype may have precision that is less than the full size of the data element, e.g., an 11-bit integer can be defined using H5Tset_precision(). To a certain extent, SZIP can take advantage of the precision of the datatype to improve compression:
  • If the HDF5 datatype size is 24-bit or less and the offset of the bits in the HDF5 datatype is zero (see H5Tset_offset() or H5Tget_offset()), the data is the in lowest N bits of the data element. In this case, the SZIP bits_per_pixel is set to the precision of the HDF5 datatype.
  • If the offset is not zero, the SZIP bits_per_pixel will be set to the number of bits in the full size of the data element.
  • If the HDF5 datatype precision is 25-bit to 32-bit, the SZIP bits_per_pixel will be set to 32.
  • If the HDF5 datatype precision is 33-bit to 64-bit, the SZIP bits_per_pixel will be set to 64.
HDF5 always modifies the options mask provided by the user to set up usage of RAW_OPTION_MASK, ALLOW_K13_OPTION_MASK, and one of LSB_OPTION_MASK or MSB_OPTION_MASK, depending on endianness of the datatype.
Since
1.6.0