Adding XML output to the h5dump Utility

Robert E. McGrath
September 9, 2000

Requirements

The goal of this task is to add an option to the h5dump utility to output a description of the HDF5 file formatted in XML. The XML must conform to the HDF5 DTD.

The following features are required:

The following principles guide the implementation:

Proposed User Interface

This feature will add two new flags to the h5dump command.

Technical Approach and Design Notes

Based on a detailed analysis and experiments with the current dumper, it is clear that the XML feature can be added to the existing code without disrupting the standard options. Most of the dumper code will be shared between the different versions of the output, with some additional code to support XML (described below). To date, no changes to libtool or any other code are needed, although at least one libtool call needs to be overridden for XML (see #4 below).

Key Changes

1. Add the options as described above, and global variables to store their values. Also will need to add some logic to disallow options that are not supported when XML is selected.

2. The dump_header format table must be changed. Note that the XML code will not use the 'header' strings, but will use other strings from that table.

3. Implement alternative versions of object dumps. The XML output is not only syntactically different, but some of the order of elements is different. The cleanest implementation will be to provide alternative versions of:

The proposed implementation will be to make a table of functions for the standard and xml dumps, which will be selected by the -xml option. Calls to the functions will be indirect, via this table. This is analogous to the way the dump_header_format table is implemented.

4. XML needs to output the target of references, not the value of the reference. This is required because it is required that the DTD can be used to create a new HDF5 file. (The dumper prints the reference value, which cannot be used to create a new copy of the file.)

The proposed output for reference data is a path that can be used to create a reference to the correct object. Region references would be a path plus additional mark up TBD, describing the region.

Implementing this feature requires additional code to the dumper.

First, there must be some mechanism for looking up at least one path, given an object reference.

There are two suggested implementations for this.

Option
Advantages
Disadvantages
New table of (reference, targetpath)
  • No change to existing code
  • No cost when not using XML
  • One more table of objects in memory
  • Additional complete pass through objects to create table
Add to existing object table.
  • One standard table with all needed information
  • All information collected in one pass
  • Significant change to existing table and code
  • Not needed when not requesting XML

The first option is recommended.

The second change will be to not call the tools library to dump references. Instead, a new routine will be called to read the object reference, look up the path and write the path is written to the XML file as the value of the <DataFromFile> element.

Example:

The dumper would show the value of an object reference thus:

 DATASET "Dataset3" {
      DATATYPE { H5T_REFERENCE }
      DATASPACE { SIMPLE ( 4 ) / ( 4 ) }
      DATA {
         DATASET 0:1696, DATASET 0:2152, 
            GROUP 0:1320, DATATYPE 0:2268
      }
   }
The XML for the data part should be something like:
 <Dataset Name="Dataset3" OBJ-XID="Dataset3" Parents="">
   <Dataspace>
     <SimpleDataspace Ndims="1">               
      <Dimension  DimSize="4" MaxDimSize="4"/>       
     </SimpleDataspace>      
   </Dataspace>             
   <DataType>                    
   <AtomicType> 
     <ReferenceType>        
       <ObjectReferenceType />             
     </ReferenceType>               
   </AtomicType>             
   </DataType>
   <Data>                      
    <DataFromFile>                    
     "/Group1/Dataset1"                  
     "/Group1/Dataset2"                      
     "/Group1"                                    
     "/Group1/Datatype1"         
    </DataFromFile> 
   </Data>                                                        
 </Dataset>             

5. Changes to the DTD

The DTD will need to be updated to support the following:


6. Other changes yet to be determined

There are several questions that remain unknown at this time and need to be investigated:


Summary and Miscellaneous comments

The overall changes are feasible, requiring several hundred lines of additional code and modification to about 50 lines of existing code.

An initial version, supporting the most important data types can be done in a month of part time work.

Some of this work is uncovering bugs in the XML DTD and h5gen tool, which makes debugging the h5dump code more complicated.