REMcG
Some global issues were suggested in passing, which should be noted.
What to do about data, especially binary data?
In genernal, XML does not have numeric types, or understand the semantics of numbers.
XML can:
For handling the data in HDF5 files, there seems to be two basic
strategies:
<DATA_FROM_FILE>
<POINTER_TO_DATA>
URL, path etc.
</POINTER_TO_DATA>
<DATA>
... character/unicode endoding of the
data....
</DATA>
</DATA_FROM_FILE>
Investigate and propose details of both pointer and character representation of data.
RE pointer: Need to investigate XML standards for Xpointer and Xlink, and "do the right thing".
RE character representation: Data in an HDF-5 file is often strucutred, i.e., it can be an array of structures. This opens the question of whether we want to "mark up" the data elements themselves, e.g., marking the rows, cols, fields, etc. of the data. This could be done be defining additional tags to be used within the '<DATA>' element.
An alternative is to have a standard for "flattening" data into a one dimensional array of UniCode.
And, of course we will probably follow a mixed approach. Strings and scalars can be represented in a straightforward way as UniCode strings with standard formats. Other data elements might be represented as several sub-elements, with further structure flattenned. For example, a 2D array of compound data types might be represented as several "<ROW>" elements with <CELL> elements, but perhaps each cell might be stored as a flat array of bytes with no further mark up.
We discussed the use of XML attributes and elements. There is some freedom here, and sometimes the decision may be a matter of taste. We can and should choose whichever makes sense in a given case.
Rules and tips for choosing
Case by case, decide about using attributes or elements....
We discussed how best to represent the structure of an HDF-5 file with
XML.
The first approach is 'elegant', and represents the actual way that HDF-5 works. This isn't the way the documentation describes the file, and isn't how the API or dumper works. Also, this approach does not take advantage of the 'treeness' of XML, even when the file really is a tree. It is more complicated than needed for the common simple cases.
The second approach still has links, but they are needed only for objects with more than one link. In most cases, the object will be nested in a natural way, with the XML matching the HDF-5 (and the DDL).
The general consensus was to do the tree plus links, because this makes the common case easy.
Revise DTD to do the tree with aux. links. Note: will need to define hueristics beyond the DTD for how to construct the tree.