Robert E. McGrath
September 9, 2000
The following features are required:
Explanation: In an example XML file, the default preamble
is
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE HDF5-File PUBLIC "HDF5-File.dtd" "http://hdf.ncsa.uiuc.edu/HDF5/XML/DTD/HDF5-File.dtd"> <HDF5-File> <RootGroup OBJ-XID="root"> ....This instructs an interpreter to look for the DTD to interpret the XML at 'http://hdf.ncsa.uiuc.edu...' If the file is to be used off the network, or behind a firewall, or with some custom version of the DTD, then the third line would be changed to point to a differnt file or URL. E.g., to use my own copy of the DTD: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE HDF5-File PUBLIC "HDF5-File.dtd" "/tmp/mcgrath/my-HDF5-File.dtd"> <HDF5-File> <RootGroup OBJ-XID="root"> ....This could be done by editing the output file, or by using: h5dump -xml -dtd /tmp/mcgrath/my-HDF5-File.dtd file.h5 |
1. Add the options as described above, and global variables
to store their values. Also will need to add some logic to disallow
options that are not supported when XML is selected.
2. The dump_header format table must be changed.
Note that the XML code will not use the 'header' strings, but will use
other strings from that table.
3. Implement alternative versions of object dumps. The XML output is not only syntactically different, but some of the order of elements is different. The cleanest implementation will be to provide alternative versions of:
4. XML needs to output the target of references, not the value of the reference. This is required because it is required that the DTD can be used to create a new HDF5 file. (The dumper prints the reference value, which cannot be used to create a new copy of the file.)
The proposed output for reference data is a path that can be used to create a reference to the correct object. Region references would be a path plus additional mark up TBD, describing the region.
Implementing this feature requires additional code to the dumper.
First, there must be some mechanism for looking up at least one path, given an object reference.
There are two suggested implementations for this.
|
|
|
|
| New table of (reference, targetpath) |
|
|
| Add to existing object table. |
|
|
The first option is recommended.
The second change will be to not call the tools library to dump references.
Instead, a new routine will be called to read the object reference, look
up the path and write the path is written to the XML file as the value
of the <DataFromFile> element.
| Example:
The dumper would show the value of an object reference thus: DATASET "Dataset3" {
DATATYPE { H5T_REFERENCE }
DATASPACE { SIMPLE ( 4 ) / ( 4 ) }
DATA {
DATASET 0:1696, DATASET 0:2152,
GROUP 0:1320, DATATYPE 0:2268
}
}
The XML for the data part should be something like:
<Dataset Name="Dataset3" OBJ-XID="Dataset3" Parents="">
<Dataspace>
<SimpleDataspace Ndims="1">
<Dimension DimSize="4" MaxDimSize="4"/>
</SimpleDataspace>
</Dataspace>
<DataType>
<AtomicType>
<ReferenceType>
<ObjectReferenceType />
</ReferenceType>
</AtomicType>
</DataType>
<Data>
<DataFromFile>
"/Group1/Dataset1"
"/Group1/Dataset2"
"/Group1"
"/Group1/Datatype1"
</DataFromFile>
</Data>
</Dataset>
|
5. Changes to the DTD
The DTD will need to be updated to support the following:
6. Other changes yet to be determined
There are several questions that remain unknown at this time and need to be investigated:
The overall changes are feasible, requiring several hundred lines of additional code and modification to about 50 lines of existing code.
An initial version, supporting the most important data types can be done in a month of part time work.
Some of this work is uncovering bugs in the XML DTD and h5gen tool, which makes debugging the h5dump code more complicated.