REMcG
2-25-00
We want to have the HDF-5 DTD provide the ability to include data from the HDF-5 file in the XML. Scalars and strings are simple to support. We need to consider what we would like to do with structured data, including multidimensional arrays and tables (compound data types). To help this consideration, we are looking at existing proposals which we can borrow from or reuse.
The Astronomy community has been developing standards for exchanging information about datasets with XML. This is a broad activity, but one part of it has concerned the delivery of data via XML, especially tables. There is a pretty good page on this overall activity at the ADC (http://pioneer.gsfc.nasa.gov/public/xml/) and our local effort (Astronomy Markup Language) is at: http://monet.astro.uiuc.edu/~dguillau/these/.
We have already noted one of these proposals, XSIL. This note calls attention to a second proposal, from Ed Shaya of NASA Goddard Astrophysical Data Center, which he calls XDF (eXtensible Data Format). From what I read, this seems to be a superset of what XSIL does, in the sense that you can express everything in XSIL as a special case of XDF. I think XDF includes much of what we need for HDF. (XDF seems to have considerable support among data centers and seems to be 'blessed' as a good thing by Damien Guillaume of Project 30.)
My proposal is that we examine XDF as a good example. We may want to:
XDF will look extremely familiar. The white paper discusses a general model of data that is extremely similar to HDF-5 and other contemporary models. I have not made a detailed mapping, but it seems clear that there is a substantial overlap between the data objects in HDF 5 and XDF's objects, even without extensions to XDF.
Basically, XDF provides tags for the dimensions of a dataspace, and attributes for data types. It can describe tabular records (compound data types). It has some ideas about including data pointers within this kind of scheme. I.e., you can do a hybrid, putting in tags for rows and columns, but then point to the content of the row in an external location.
XDF is fairly well explained. The main Web page is at:
http://tarantella.gsfc.nasa.gov/xml/
Here are some key documents. In each case, I give the official URL and have made a local copy as well.
The best place to start is the "White Paper":
[http://tarantella.gsfc.nasa.gov/xml/XDFwhite.txt]
[local copy]
The DTD itself is at:
[http://tarantella.gsfc.nasa.gov/xml/XDF_DTD.txt]
[local copy]
(This DTD is, in my opinion, a classic example of how "lots of comments"
does not equal "convey more useful information to the reader".)
An alternative view of the DTD is given in a tree diagram:
[http://tarantella.gsfc.nasa.gov/xml/XDFhtml/DTD-TREE.html]
[local copy]
There is an example XML document, marking up a dataset:
[http://tarantella.gsfc.nasa.gov/xml/XDFhtml/DTD-TREE.html]
[local copy]
Please look at this and think about whether and what we might want to use of it. Also please prepare whatever other ideas you have about marking up data in XML.