A block allocation mechanism is a method the library uses to group "small" objects together, in order to increase I/O performance. The small objects can often be accessed together in less read or write calls.
This is currently implemented in the HDF5 library by a "metadata block aggregation" algorithm which allocates a fixed-size block for "generic" metadata, and then preferentially sub-allocates small metadata allocations from that larger block.
"Small data" is the "raw" dataset data stored in the file that is "small". "Small" in this case means it is similar in size to the typical size of metadata stored in the file. [Normally, this tends to be around 200-300 bytes in size.]
When the size of raw data for datasets is on the same order as the size of metadata in the file (or smaller), the allocation behavior of the raw data is similar to that of the metadata in the file and benefits from the same block allocation algorithm. The metadata in a file is already handled by a block allocation mechanism, but the "small data" in the file would benefit from a separate block allocation mechanism. An API function is provided to adjust the fixed-size block for "small data" up or down from its default setting of 2KB.
No. This block allocation mechanism is only used when raw data is small enough to fit into the current allocation block. Raw data that is larger than the space available in the block is allocated within the file in the normal manner.
This proposed change adds a block allocation mechanism for "small data" in the file to the library. This is done in a manner nearly identical to the metadata block allocation mechanism in the library, which operates as follows:
The "small data" block allocation mechanism would operate in the same fashion, although operating on raw data instead of metadata.
Adding this mechanism to the library requires a pair of "get/set" property list functions to adjust the initial block size of the block used to sub-allocate "small" raw data from.
It might be useful to specify more than one size of block to sub-allocate out of, i.e. a block for allocation less than 2KB, another block for allocations from 2-64KB, etc., up to the final limit on the blocks where allocations above the limit at allocated directly from the file. Due to the way free space in the file is handled currently, this may be a bad idea though...
None. We require all space allocations in a file to be performed collectively, so all processes will make identical decisions about allocating from the "small data" block or from the file directly.
Backward compatibility is the ability for applications using the HDF5 library to compile and link with future versions of the library. Forward compatibility is the ability for applications using the HDF5 library to compile and link with previous versions of the library.
Adding this change has no forward or backward compatibility issues, since it is only adding new API functions and the behavior of current API functions does not change.
None.
herr_t
H5Pset_small_data_block_size(fapl_id, size)
hid_t fapl_id;
|
IN: File access property list ID |
hsize_t size;
|
IN: The maximum size of the block used to sub-allocate "small data" from. |
herr_t
H5Gget_small_data_block_size(fapl_id, size)
hid_t fapl_id;
|
IN: File access property list ID |
hsize_t * size;
|
OUT: The maximum size of the block used to sub-allocate "small data" from. |