In addition to converting HDF4 data to HDF5, many users want to bring forward their programming model to HDF5, including the use of dimension names and scales. These users would like something at least similar to HDF4 dimensions. Another reason to provide dimension scale support is that some software packages, such as VisAD, DODS, Matlab, and IDL work best when dimension information is available. Without dimension names and scales, these packages cannot use some of their most powerful features.
Dimension scales in HDF5 have been partly addressed in previous work. Two experiments have suggested possible approaches to dimension scales based on netCDF. The "NetCDF-H5 Prototype" explored a fairly complete implementation of netCDF on top of HDF5.[1] This work proposed a storage scheme and software to implement netCDF's model of dimensions. A later study, "Experiment with XSL," converted netCDF files to HDF5 files via XML and XSL style sheets.[2] This latter experiment used a different storage layout for dimensions from [1], and did not address any issues of programming model or compatibility.
The HDF4 to HDF5 Mapping is an official specification for a default representation of HDF4 objects in an HDF5 file. This specification includes a specification for storing dimension names and scales from an HDF4 object in an HDF5 file. Dimension scales are stored as one-dimensional datasets. The names and dimensions are associated with a dataset in conventional attributes. The attributes have a list of strings for the names, and a list of object references that point to the dimension datasets. ([3], section 3.1). This specification has been implemented by the h4toh5 utility and library, and it is already in use by important users.
Our brainstorming sessions floated a number of approaches to dimension scales, such as a facility to define 'generating functions' for dimension scales. These ideas seem to blend into the experimental 'transformation and units' activities. These approaches may be interesting in the long run, but it appears they require changes to the HDF5 library and/or format (which likely could not happen until 2002 at the earliest). Also, there hasn't been any consideration of how to support the dimension scales already being created by the h4toh5 utility.
It is important that we provide our uses with some basic support for dimension scales in HDF5 as soon as possible. This support should be compatible with the h4toh5 utility that is already in the hands of users. Based on the earlier work above, it seems likely that these features can be initially implemented as part of the HDF5 'convenience' suite, with no changes to the core HDF5 library. This could be done immediately, and could be in the hands of users this year.
Function | Description |
hid_t H5Ccreate_dimscale(hsize_t size, char *name) | Create a 1D dataset, marked as a dimension scale, with name 'name' |
H5Cset_dim( int dimindex, hid_t dataset, hid_t dimscale) | Attach dimension scale to dataset, associated with dimension number 'dimindex'. |
H5Cset_name( int dimindex, hid_t dataset, char * dname) | Attach dimension name to dataset, associated with dimension number 'dimindex'. |
hid_t[] H5Cget_dim_scales(hid_t dataset) | Get a list of the dimension scales. Some convention to represent dimensions with no scale defined. This list is in the order of the dimensions of the dataset. |
char * H5Cget_dim_name(hid_t dimindex) | Get the name of dimension dimindex. |
Note that these functions are intended to work with the datasets created by the h4toh5 utility. We may define a more general storage model, but it is important that this API deal with HDF4_DIMSCALES and the storage conventions of the current 4 to 5 mapping.
Dimension scales also can have attributes, and we may want to define other standard attributes (other than name). E.g., offset,scale, units, format. If defined, we can provide get/set methods.
Global order of dimensions, a la netCDF
Many users are used to the netCDF concept of dimensions that are global to the file, and that can be manipulated as a set. For instance, dimensions can be retrieved in order of creation, and have a global index for each dimension.
This feature could be supported using an approach similar to the HDF5 netCDF prototype [1]. If something like this is adopted, we will need to bring up HDF4 to HDF5 mapping to use this.
Management of Shared Dimensions
The h4toh5 conventions can adequately represent shared dimensions. However, it currently has no way to handle 'shared names'. In addition, since the association of dimension names and scales is an attribute of each dataset, when the API deletes a dimension, it will have to have some way to delect the reference to the dimension in all datasets that might be using it. This could be done be adding a table of which datasets are using which dimensions. This has not been specified.
Unlimited Dimensions
While both the dataspace and the dimension scale dataset can be UNLIMITED, i.e., expandable, there is no way to keep them coordinated without library support. That is, if the dimension is extended, there is no way to automatically extend the dimension scale dataset that is assigned to it.
Furthermore, users may expect the magical effect that expansion of a dimension expands the dataspaces of any dataset using that dimension. This is extremely difficult to provide without library support.
Our users need this now, so we should do what we can as soon as possible. We cannot do everything we might want, so we will simply have to do our best and document what can't be done.
2. Robert E. McGrath, "Experiment with XSL: translating scientific data", February 21, 2001. http://hdf.ncsa.uiuc.edu/HDF5/XML/nctoh5/writeup.htm
3. Mike Folk, Robert E. McGrath, Kent Yang,
"Mapping HDF4 Objects to HDF5 Objects" Revised: October, 2000. http://hdf.ncsa.uiuc.edu/HDF5/papers/h4toh5/