What’s New in HDF5
1.8.0-beta
October 22, 2007
Background:
HDF5 Release 1.8.0 will represent a major update in the HDF5 library and utilities. We have attempted to provide new capabilities and improve performance while retaining compatibility with previous releases.
Backward and Forward API Compatibility:
This release contains many new API routines to take advantage of the new features, but at the same time attempts to provide stability for applications by continuing to make existing API routines available and operate in a backwardly compatible manner, whenever possible.
API Compatibility in HDF5 Release 1.8.0 discusses the specifics of API compatibility with respect to new features.
Backward and Forward Format Compatibility:
The HDF5 Release 1.8.0 library will read all existing HDF5 files, from this or any prior release. Although this release contains features that require additions and/or changes to the HDF5 file format, by default this release will write out files that conform to a Òmaximum compatibilityÓ principle. That is, files are written with the earliest version of the file format that describes the information, rather than always using the latest version possible. This provides the best forward compatibility by allowing the maximum number of older versions of the library to read files produced with this release.
If library features are used that require new file format features, or if the application requests that the library write out only the latest version of the file format, the files produced with this version of the library may not be readable by older versions of the HDF5 library.
New Features in HDF5 Release 1.8.0 and Backward/Forward Format Compatibility Issues discusses the new features in the release from the point of view of their impact on format comaptibility.
Remaining Anticipated Change between this Beta and the Final Release:
The following change is anticipated in the final version of HDF5 Release 1.8.0:
á A small number functions will be renamed or deprecated to improve consistency with the new interface.
Features in this Beta not yet Described in this Document:
This beta includes API compatibility macros designed to facilitate application migration to HDF5 Release 1.8.0. These macros are intended to facilitate developer management of a clean, step-by-step migration from an older HDF5 Library to the new release; they can also be used to enable older applications to use the new library without requiring that the application be rewritten. See API Compatibility Macros in HDF5 for a full description.
Major New Features:
HDF5 Release 1.8.0 has now entered beta (1.8.0-beta), so the anticipated feature list is no longer expected to change. The features listed below are available in this 1.8.0-beta and are all thought to be stable (e.g., they routinely pass the daily tests).
á Use latest format – A switch to force use of the latest version of the HDF5 file format
This feature provides a new switch with which a user application can force the HDF5 Library to write data in the most up-to-date version of the file format.
See H5Pset_latest_format and H5Pget_latest_format in the HDF5 Reference Manual.
á Configurable Compact-or-Indexed Link Storage – Compact small groups and more scalable large groups
Compact link storage allows groups containing only a few links to take up much less space in the file; an improved implementation of indexed link storage provides a faster and more scalable method for storing and working with large groups containing many links. An application can set appropriate thresholds for swithing between the compact and indexed storage formats.
See H5Pset_link_phase_change and H5Pget_link_phase_change in the HDF5 Reference Manual.
á External Links
This feature allows links in a group to refer to objects in another HDF5 file and enables the library to access those objects as if they are in the current file.
See H5Lcreate_external, H5Lget_info, H5Lget_val, H5Lunpack_elink_val, H5Pset_elink_prefix, and H5Pget_elink_prefix in the HDF5 Reference Manual.
á Link Creation Order Tracking and Indexing in Groups
HDF5 now enables an index on the order in which the links are created, allowing iteration and lookup of links by creation order as well as alphanumeric by name.
See H5Pset_link_creation_order and H5Pget_link_creation_order in the HDF5 Reference Manual.
á Link (H5L) and Object (H5O) APIs
New Link and Object APIs enable greater flexibility in the creation of links and objects in an HDF5 file. The H5L routines allow links to be managed and manipulated more like objects in the HDF5 data model and provide detailed control of linking behavior. The H5O routines and related functions (H5Dcreate_anon, H5Gcreate_anon and H5Tcommit_anon) enable the creation and management of objects in a file independently of the links that integrate those objects into the file structure.
In the HDF5 Reference Manual, see the H5L and H5O APIs and the individual functions H5Dcreate_anon, H5Gcreate_anon, and H5Tcommit_anon.
á Enhanced Attribute Handling and Faster Access to Large Numbers of Attributes
The Attribute interface (H5A) includes several new functions for attribute management. When large numbers of attributes are attached to a single object, new functionality enables faster access and allows those attributes to be stored in much less space in the file.
In the HDF5 Reference Manual, see H5A API for the new attribute management functions and H5Pset_attr_phase_change and H5Pget_attr_phase_change for configuring the attribute storage format.
- Creation Order in Attributes Ð Attributes now allow an index on the order in which the attributes are created, allowing iteration and lookup of attributes by creation order as well as alphanumeric by name.
See H5Pset_attr_creation_order and H5Pget_attr_creation_order in the HDF5 Reference Manual.
- Shared Object Header Messages (SOHM) Ð To conserve space in an HDF5 file, the capacity has been added to designate large header messages as “shared.”
See H5Pset_shared_mesg_nindexes, H5Pget_shared_mesg_nindexes, H5Pset_shared_mesg_index, H5Pget_shared_mesg_index, H5Pset_shared_mesg_phase_change, and H5Pget_shared_mesg_phase_change in the HDF5 Reference Manual.
á UNICODE Support Ð The UTF-8 Unicode encoding is now supported for strings in datasets, the names of links and the names of attributes.
See “UTF-8 Character Encoding in HDF5,” “Character Encoding for Links in HDF5 Files,” and, in the HDF5 Reference Manual, H5Pset_char_encoding and H5Pget_char_encoding.
á “Create Intermediate Groups” Property Ð This feature allows intermediate groups that do not exist yet to be created when creating or copying an object in a file.
Creating Missing Groups [PDF] and, in the HDF5 Reference Manual, H5Pset_create_intermediate_group and H5Pget_create_intermediate_group.
á Object Copying Ð This feature allows an object in one HDF5 file to be easily copied to a new location within the current file or to another HDF5 file. This is done at a low-level in the HDF5 file, allowing entire group hierarchies to be copied quickly, as well as compressed datasets to be copied without going through a decompression/compression cycle.
In the HDF5 Reference Manual, see the functions H5Ocopy, H5Gcreate_anon, H5Pset_copy_object, and H5Pset_create_intermediate_group and the tool h5copy. (The h5copy tool is not yet documented, but entering 'h5copy --help' on the command line provides basic information.)
á Improved Object Information Retrieval Ð Three new routines have been added to enhance the object information that can be retrieved. H5Lget_info retrieves information regarding a link, H5Oget_info retrieves information regarding an object, and H5Gget_info retrieves information regarding a group. The routine H5Gget_objinfo remains unchanged from Release 1.6.x, though deprecated in favor of the three new functions.
á Extendible Identifier API Ð A new set of identifier management routines has been added, which allow an application to use the HDF5 identifier-to-object mapping routines.
See the H5I APIs in the HDF5 Reference Manual and “Allowing Users to Access HDF5’s ID System.”
á New Compression Filters Ð These new I/O filters allow better compression of certain types of data:
o N-Bit Filter Ð This filter compresses data which uses N-bit datatypes. See H5Pset_nbit in the HDF5 Reference Manual and the section “Using Filters / N-bit” in the “Datasets” chapter of the HDF5 User’s Guide.
o Scale+Offset Filter Ð This filter compresses scalar (integer and floating-point) datatypes which stay within a range. See H5Pset_scaleoffset in the HDF5 Reference Manual and the section “Using Filters / Scale-Offset” in the “Datasets” chapter of the HDF5 User’s Guide.
á User-defined Datatype Conversion Callback Functions: Revised Datatype Conversion Exception Handling Ð It is now possible for an application to have greater control over exceptional circumstances (range errors, etc.) during datatype conversion.
See “Revising Numeric Overflows in HDF5” and “Data Conversion Of Arithmetic Data Types.”
á Integer-to-Floating-point Conversion Support Ð It is now possible for the HDF5 library to convert between integer and floating-point datatypes.
See H5Tconvert in the HDF5 Reference Manual.
á ÒNULLÓ Dataspace Ð A new type of dataspace, which allows datasets and attributes without any elements to be described.
See H5Screate in the HDF5 Reference Manual.
á Collective Chunk I/O in Parallel Ð The library now attempts to use MPI collective mode when performing I/O on chunked datasets when using the parallel I/O file driver.
á Enhanced Error Handling Ð A new set of error API routines has been added, which allow an application to integrate its error reporting with the HDF5 library error stack.
In the HDF5 Reference Manual, see the error stack APIs. Also see the supporting document “Unified Error Reporting for HDF5 and Client Libraries.”
á Metadata Caching
See “
Metadata Caching in HDF5”
in the HDF5 User’s Guide and the following function entries
in HDF5 Reference Manual:
In the H5F API:
H5Fget_mdc_config
H5Fget_mdc_hit_rate
H5Fget_mdc_size
H5Freset_mdc_hit_rate_stats
H5Fset_mdc_config
In the H5P API:
H5Pset_mdc_config
H5Pget_mdc_config
á Arithmetic Data Transform on I/O Ð This feature allows arithmetic operations (add/subtract/multiply/divide) to be performed on data elements as they are being written to/read from a file. See H5Pset_data_transform in the HDF5 Reference Manual.
á Datatype and Dataspace Serial Conversion Ð Routines have been implemented to serialize/deserialize HDF5 datatypes and dataspaces. These routines allow datatype and dataspace information to be transmitted between processes or stored in non-HDF5 files.
See “Encode and Decode HDF5 Objects,” and the function entries in the HDF5 Reference Manual for H5Tencode, H5Tdecode, H5Sencode, and H5Sdecode
á Two-way Conversion Between Datatype and Text Description of Datatype Ð This feature enables the creation of a datatype from a text definition of that datatype and the creation of a formal text definition from a datatype. The text definition is in DDL format; DDL definitions of HDF5 datatypes can be found in the “DDL in BNF for HDF5.”
H5LTtext_to_dtype creates an HDF5 data type based on the text description and returns the data type identifier. Given a datatype identifier, H5LTdtype_to_text creates a DDL description of the datatype.
Also see “Conversion Between Text and Datatype.”
á New Packet Table and Dimension Scale High-Level APIs have been added to the high-level C interfaces.
The Packet Table API (H5PT) is designed to allow variable-length records to be added to tables easily.
The Dimension Scale API (H5DS) allows dimension scales to be created in HDF5 and attached to HDF5 datasets. Also see “HDF5 Dimension Scale Specification and Design Notes” (PDF).
á
High-Level Fortran APIs Ð Fortran APIs have been added
for the following High-Level HDF5 APIs:
H5Lite (H5LT)
H5Image (H5IM)
H5Table (H5TB)
á Tool Improvements Ð Three new tools have been added, and existing tools were enhanced:
o h5mkgrp is a new command-line tool that creates a new group in an HDF5 file. It is described in the next section, with other features that may change.
o h5stat (PDF) enables the analysis of an HDF5 file in various ways to determine useful statistics regarding the objects in the file, such as the numbers of objects per group, the sizes of datasets, the amount of free space in the file, etc.
o h5copy makes a complete copy of an object in an HDF5 file as a new object in that HDF5 file or as a new object in a different HDF5 file. (The h5copy tool is not yet documented, but entering 'h5copy --help' on the command line provides basic information.)
o Improved speed of h5dump Ð Performance improvements have been made to h5dump to speed it up when dealing with files that have large numbers of objects.
á Better UNIX/Linux Portability Ð This release now uses the latest GNU ÒautoÓ tools (autoconf, automake, and libtool) to provide much better portability between many machine and OS configurations. Building the HDF5 distribution can now be performed in parallel (with the gmake ÒÐjÓ flag), speeding up the process of building, testing and installing the HDF5 distribution. Many other improvements have gone into the build infrastructure as well.
á FORTRAN API Wrapper Improvements Ð Several improvements were made to the FORTRAN build infrastructure, as well as adding support for previously missing and new API routines.
á C++ API Wrapper Improvements Ð Several improvements were made to the C++ build infrastructure, as well as adding support for previously missing and new API routines.