Update to h5toh4: design notes


May 27, 2003

The current specification: Mapping HDF4 Objects to HDF5 Objects (Oct. 2000) [pdf]

The man page for the current utiltity: http://hdf.ncsa.uiuc.edu/HDF5/doc/Tools.html#Tools-H5toh4

This document summarizes changes in the tool, the documents must be updated.

Summary of HDF5 to HDF4 mapping

HDF5 supports two primary objects
  1. Groups
  2. Datasets

Only HDF5 datasets and groups and their attributes can have HDF4 counterparts.  All other HDF5 objects should be assumed to have no HDF4 counterpart. For  instance, the HDF5 "named datatype" object has no HDF4 counterpart. Certain
HDF5 datasets, attributes and groups also have no HDF4 counterpart. These include:


Changes


A. GROUPS

HDF5 groups map to HDF4 Vgroups by default.

Attributes of Groups

If the HDF5 file was originally converted from an HDF4 file (using h4toh5), an HDF5 group may have attributes with names prefixed by "HDF4_VGROUP_ATTR_".  These attributes are written to the HDF4 Vgroup after stripping off the prefix. (NOT IMPLEMENTED YET)

All other attributes are written to the HDF4 Vgroup without any change.

The HDF5 root group


The HDF5 root group is a special case. HDF5 has a strictly single rooted tree structure, whereas HDF4 supports a forest structure (can have two or more disconnected trees). In the current version of h5toh4, the HDF5 root group is mapped to a corresponding HDF4 Vgroup,  with all HDF4 objects descending from that group. In essence, this is a 'dummy' (of no use) group as far as the HDF4 file is concerned.

In the new version of h5toh4, by default, the HDF5 root group is removed from the file after all the root group attributes are processed. For maintaining the behaviour of the previous implementations, there is an option to override the default mapping.

Attributes of the Root Group


In both cases, the HDF5 root group attributes are processed in the same manner.  Global attributes are set as attributes of the corresponding interface.  Attributes suffixed with GLO_SDS are written as global to the SD interface. Similarly, attributes suffixed with GLO_GR are written as global to the GR interface.

In case of an HDF5 input file originally converted from an HDF4 file, there may be HDF4 file annotations (file labels and/or file descriptions) written as  HDF5 root group attributes with the name HDF4_FILE_LABEL<n> for a file label
and the name HDF4_FILE_DESC<n> for a file description. These attributes are recreated as HDF4 file annotations.

All other root group attributes are written as HDF4 file descriptions with the value "HDF5_<AttributeName> = <AttributeValue>"

It should be noted that the root group attributes are processed, even if the  'do not suppress' root group option is specified, but the attributes are not removed from the dummy vgroup.

B. DATASETS

HDF5 datasets can represent SDS, GR Images, palettes and Vdatas. If there is no metadata to indicate which particular HDF4 object, the HDF5 dataset corresponds to, an HDF5 dataset with an HDF4-compatible scalar datatype is assumed to be an HDF4 SDS.

HDF5 Image Dataset to HDF4 GR Image

Image data is stored as an HDF5 dataset with values of HDF5 class Integer or Float. The HDF5 Image specification (http://hdf.ncsa.uiuc.edu/HDF5/doc/ADGuide/ImageSpec.html: Section 1) defines certain standard attributes for an Image dataset.

The HDF5 dataset for an image is distinguished from other datasets by giving it an attribute "CLASS=IMAGE". A HDF5 dataset with this attribute name-value pair is converted into an HDF4 GR Image.

In addition, the HDF5 Image dataset may have an optional attribute "PALETTE" that is an array of object references for zero or more palettes. However, HDF4 supports only one palette per image. Only the first object reference from the
array of object reference is converted into a corresponding HDF4 palette and set as the default palette for the particular image. In the current implementation, other palettes are discarded WITHOUT warning.

The HDF5 Image dataset may have an optional attribute "INTERLACE_MODE" that specifies the layout of the data for images with more than one component for each pixel. The value "INTERLACE_PIXEL" maps to MFGR_INTERLACE_PIXEL" in HDF4. The value "INTERLACE_PLANE" maps to "MFGR_INTERLACE_COMPONENT" in HDF4. If the attribute is not defined, the default value is assumed to be "INTERLACE_PIXEL".

Apart from these attributes ("CLASS", "PALETTE", "INTERLACE_MODE"), all other standard attributes defined by the Image Specification are written to the HDF4 GR Image with the attribute names prefixed with "HDF5_".

If the HDF5 Image dataset was created from an HDF4 GR Image (using h4toh5), the dataset may contain attributes with names prefixed by "HDF4_IMAGE_ATTR_". These attributes are written to the HDF4 GR Image by stripping off the prefix. (NOT IMPLEMENTED YET)

The HDF5 Image dataset may have additional non-standard attributes (not defined in the Image Specification) to describe the image. These attributes are written to the HDF4 GR Image without any change.

HDF5 Palette Dataset to HDF4 Palette

Palette data is stored as an HDF5 dataset with values of any HDF5 atomic numeric type.  The dataset will have dimensions (nentries by ncomponents), where 'nentries' is the number of colors (usually 256) and 'ncomponents' is the number of values per color (3 for RGB, 4 for CMYK, etc.).

The HDF5 Palette Specification (http://hdf.ncsa.uiuc.edu/HDF5/doc/ADGuide/ImageSpec.html: Section 2) defines certain standard attributes for a Palette dataset.

The HDF5 dataset for a palette is distinguished from other datasets by giving it an attribute "CLASS=PALETTE". As mentioned in section 4.B.a, an HDF5 Image Dataset may have an attribute "PALETTE" that is an array of object references for zero or more palette datasets, each of which has an attribute "CLASS=PALETTE". The first object reference in
the array corresponds to the default pallete and is converted into an HDF4 palette and attached to the GR Image. All other palette datasets in the array are discarded WITHOUT warning in this implementation.

Apart from this attribute (CLASS), all other standard attributes defined by the Palette Specification are written to the HDF4 GR Image as palette attributes (Tag value: "DFTAG_LUT") with the attribute names prefixed with "HDF5_". (NOT IMPLEMENTED YET)

If the HDF5 Image dataset to which the HDF5 Palette dataset corresponds to, was created from an HDF4 GR Image (using h4toh5), the dataset may contain attributes with names prefixed by "HDF4_PALETTE_ATTR_". These attributes are written to the HDF4 GR Image as palette attributes (Tag value: "DFTAG_LUT") by stripping off the prefix. (NOT IMPLEMENTED YET)

The HDF5 Palette dataset may have additional non-standard attributes (not defined in the Palette Specification & not prefixed by "HDF4_PALETTE_ATTR") to describe the palette. These attributes are written to the HDF4 GR Image as palette attributes (Tag value: "DFTAG_LUT") without any change. (NOT IMPLEMENTED YET)

HDF5 Table Dataset to HDF4 vdata

Table data is stored as an HDF5 one dimensional compound dataset. The HDF5 Table specification  (http://hdf.ncsa.uiuc.edu/HDF5/hdf5_hl/doc/RM_hdf5tb_spec.html) defines certain standard attributes for a table dataset.

The HDF5 dataset for a table is distinguished from other datasets by giving it an attribute "CLASS=TABLE". A HDF5 dataset with this attribute name-value pair is converted into an HDF4 vdata.

Apart from this attribute (CLASS), all other standard attributes defined by the Table Specification are written to the HDF4 vdata with the attribute names prefixed with "HDF5_".

The HDF5 table dataset may have additional non-standard attributes (not defined in the Table Specification) to describe the table. These attributes are written to the HDF4 vdata without any change.

Other HDF5 Dataset mappings

For any other dataset (not covered in sections above), the conversion is the same as in the current tool.

* If the datatype is an HDF4-compatible scalar datatype, it is converted to an HDF4 SDS.

If the HDF5 dataset was converted from an HDF4 dataset (using h4toh5), then it may contain attributes with names prefixed by "HDF_SDS_ATTR_". These attributes are written to the HDF4 SDS after stripping off the prefix. (NOT IMPLEMENTED YET)

All other attributes are written without any change.

* If the datatype is one dimensional compound, it is converted to an HDF4 vdata.

"Reversibility" Features, Not Implemented Yet

Many attributes are created by the h4toh5 utility or library, which are given special names to identify the original HDF4 object.  Ideally, the h5toh4 tool should "reverse" the conversion, and convert these into an HDF4 annotation or attribute resembling the original.  Table 1 lists cases that have been identified but not implemented yet.

Attributes with names prefixed by "HDF4_VGROUP_ATTR_".  These attributes should be written to the HDF4 Vgroup after stripping off the prefix. (NOT IMPLEMENTED YET)
An image dataset may contain attributes with names prefixed by "HDF4_IMAGE_ATTR_". These attributes should be written to the HDF4 GR Image by stripping off the prefix. (NOT IMPLEMENTED YET)
Apart from this attribute (CLASS), all other standard attributes defined by the Palette Specification should be written to the HDF4 GR Image as palette attributes (Tag value: "DFTAG_LUT") with the attribute names prefixed with "HDF5_". (NOT IMPLEMENTED YET)
If the HDF5 Image dataset to which the HDF5 Palette dataset corresponds to, was created from an HDF4 GR Image (using h4toh5), the dataset may contain attributes with names prefixed by "HDF4_PALETTE_ATTR_". These attributes should be written to the HDF4 GR Image as palette attributes (Tag value: "DFTAG_LUT") by stripping off the prefix. (NOT IMPLEMENTED YET)
The HDF5 Palette dataset may have additional non-standard attributes (not defined in the Palette Specification & not prefixed by "HDF4_PALETTE_ATTR") to describe the palette. These attributes should be written to the HDF4 GR Image as palette attributes (Tag value: "DFTAG_LUT") without any change. (NOT IMLEMENTED YET)
If the HDF5 dataset was converted from an HDF4 dataset (using h4toh5), then it may contain attributes with names prefixed by "HDF_SDS_ATTR_". These attributes should be written to the HDF4 SDS after stripping off the prefix. (NOT IMPLEMENTED YET)