This document is intended to serve as a review before the code is committed to CVS.
1. Changes to the main Program1.1. New command line options2. Dump functions
1.2. Initialize reference path table (XML only)
1.3. Alternative head and tail
1.4. dump_bb
1.5. dump_all == 0
1.6. dump_all == 12.1. Implementation of the main dump functions3. Handling of object references2.1.1. xml_dump_group2.2. Replacement output functions
2.1.2. xml_dump_dataset
2.1.3. xml_dump_data
2.1.4. xml_dump_attr
2.1.5. xml_dump_datatype and xml_dump_named_datatype
2.1.6. xml_dump_dataspace2.2.1. xml_print_refs
2.2.2. xml_print_strs
2.2.3. xml_print_enum
2.2.4. xml_print_datatype
4. Miscellaneous issues4.1. Options not supported for XML5. Changes to the DTD
4.2. Options not supported for standard output
4.3. The dataformat and dump_header tables
4.4. Forward References
6. Metrics and Code
If '-xml' is selected, the dump_function_table
is set to the xml_function_table, otherwise the standard ddl_function_table
is used. See below, section 4.
Usage: h5dump -xml file.h5 or h5dump -xml -dtd alternative.dtd file.h5
The functions are:
| Standard DDL Output | XML Output | Calling:
dump_function_table-> |
| dump_group | xml_dump_group | dump_group_function() |
| dump_named_datatype | xml_dump_named_datatype | dump_named_datatype_function() |
| dump_dataset | xml_dump_dataset | dump_dataset_function() |
| dump_dataspace | xml_dump_dataspace | dump_dataspace_function() |
| dump_datatype | xml_dump_datatype | dump_datatype_function() |
| dump_attr | xml_dump_attr | dump_attr_function() |
| dump_data | xml_dump_data | dump_data_function() |
The main dump functions (e.g., xml_dump_group) have identical
interfaces and similar semantics to the standard functions. However,
the order of output and other details are different.
2.1.1. xml_dump_group
To reduce one case of this, the xml_dump_group sorts the members of the group by type, outputting potential targets of references first. The xml_dump_group does the following:
original: dump all the objects in library orderNote that this applies only to the order of the objects in the XML output. Nothing is changed for the standard output, and the information is otherwise the same.revised:
dump all the H5_TYPE
then
dump all the H5_DATASET
then
dump all the H5_LINK
then
dump all the H5_GROUP
2.1.2. xml_dump_dataset
Similar to dump_dataset, but doesn't have chunking, attributes, etc.
2.1.5. xml_dump_datatype and xml_dump_named_datatype
Similar to standard dump_datatype, calls xml_print_datatype. (See below, section 2.2.4.)
This routine is straightforward, just outputs XML format.
2.2.1. xml_print_refs
Print out object references as a full path (with some characters escaped). In pseudocode:
for each obj_ref do
char * apath = xml_lookup_ref_path( obj_ref );
printf("\"%s\"\n", xml_escape_the_string( apath ));
done
2.2.2. xml_print_strs
Print out strings with some characters escaped, and suppressing NULL
padding. In pseudocode:
for each obj_ref do
char * apath = xml_lookup_ref_path( obj_ref );
printf("\"%s\"\n", xml_escape_the_string( apath ));
done
2.2.3. xml_print_enum
This is similar to the standard print_enum, except it does the correct XML elements.
This is similar to the standard print_datatype, except it does the correct XML.
The standard h5dump prints out object references as numbers.
For XML, we need to be able to reconstruct the reference, so we need something
to indicate what the reference refers to. The current design prints
out an absolute path for the object that is the target of the object reference.
E.g., a data value that is a reference to the the dataset 'palette-1'
in '/PAL-GROUP' will be output in the XML as:
"/PALGROUP/palette-1"If the object has more than one path to it, one of the paths is used.
To implement this feature, it is necessary to be able to find a path for an object from it's reference. This is done with a table of (object_reference, "a full path") records. The table is constructed by walking the tree to visit every object, to create an entry for any thing that could be the target of an object reference. This is done once during initialization (but only if XML output is requested). Duplicate paths are not entered in the table, each object has only one entry.
When processing the output, each object reference is looked up, and the path is printed in the XML output. References to nonexistent objects are a fatal error.
This implementation does not support region references, because the XML DTD does not specify them yet.
The reference lookup table is managed by three functions. This
table and the functions are only used by XML output functions. The
only change to the standard code is calls to initialize the table in the
main program. This is called only if XML output is selected.
| Code | Description |
| struct ref_path_entry_table_t {
hsize_t obj; hobj_ref_t * obje_ref char * apath struct ref_path_entry-table_t *next; } struct ref_path_entry_table_t * ref_path_table; |
The table. |
| static herr_t fill_ref_path_table(hid_t group, const char *name, void UNUSED *op_data) | The iterator function, used to initialize the table. Called once at startup. |
| char *lookup_ref_path(hobj_ref_t * ref) | Lookup a path for a give object reference. |
| hobj_ref_t*ref_path_table_put(hid_t obj, char *path) | Insert a record. Used only when building the table. |
| Option | Note |
| -header | This could be supported, but is not implemented at this time. |
| -bb | Not implemented for either standard or XML. |
| -v | The DTD does not define how to report OIDs. Also, the meaning of an OID in an XML description is not clear. |
| -o | Output to another file is not implemented. |
| -a, -g, -t, -d, -l | The XML DTD defines a description of the whole file. It is not clear how to report selected objects. |
Options specific to XML do not apply when XML is not selected.
| Option | Note |
| -dtd <URI> | The DTD is irrelevant to standard output, so a warning is issued. |
These formats in the xml_dataformat table are mostly set to null or blank, which controls the output from the h5tools routines. For example, data separators are set to " " for XML. This table controls the appearance of the data, and is critical.
The headers in the xml_format table are largely not used. Most XML elements have one or more attributes in them, and are not really compatible with the way this table is used. This table could be eliminated and replaced with hand coded strings.
The changes are:
The DTD for HDF
1.2.2, HDF
1.4, and the diff
of these two are available.
The XML support adds 18 functions and more than doubles the number
of lines of code in the dumper.
| Version | Lines (from 'wc') | Functions |
| h5dump.c (1.80) | 1860 | |
| h5dump with XML | 4308 | +18 |
The revised code is here: h5dump.c
The diffs are here: diffs