REMcG, 2/16/00
1. It is important to realize that XML 'attributes' and 'elements' are very similar and often you can use either one. In the DTD-0 I followed the DDL, making the left side of the BNF rules to be XML 'elements', and other items to be XML 'attributes'.
To see this clearly, consider the example of the string datatype.
In DTD-0, I defined this to be:
<!ELEMENT StringType EMPTY> <!ATTLIST StringType CSET (H5T_CSET_ASCII) #REQUIRED STRSIZE CDATA #REQUIRED STRPAD (H5T_STR_NULLTERM|H5T_STR_NULLPAD|H5T_STR_SPACEPAD) #REQUIRED CTYPE (H5T_C_S1|H5T_FORTRAN_S1) #REQUIRED >
and the example XML looks like:
<DataType> <AtomicType> <StringType STRSIZE="17" STRPAD="H5T_STR_NULLTERM" CSET="H5T_CSET_ASCII" CTYPE="H5T_C_S1"> </StringType> </AtomicType> </DataType>
This could be done with XML elements instead. The DTD could
be written:
<!ELEMENT StringType (CSET,STRSIZE,STRPAD,CTYPE)> <!ELEMENT CSET (#PCDATA)> <!ELEMENT STRSIZE (#PCDATA)> <!ELEMENT STRPAD (#PCDATA)> <!ELEMENT CTYPE (#PCDATA)>and the example XML would look something like:
<DataType> <AtomicType> <StringType> <CSET>H5T_CSET_ASCII</CSET> <STRSIZE>17</STRSIZE> <STRPAD>H5T_STR_NULLTERM</STRPAD> <CTYPE>H5T_C_S1</CTYPE> </StringType> </AtomicType> </DataType>
There does not appear to be any strong reason to use one way or
the other. Partly it depends on how deep you want the trees, and
other factors about how you want to read the XML.
I've found two practical issues that seem relevant:
XML is strictly hierarchical and strictly a tree. All elements must be nested inside the document root. Also, the order of elements is significant to XML, which it isn't for HDF-5. XML can include pointers to other XML objects if we want, as I illustrated.
As we know from the DDL, it is difficult to express the generality of the HDF-5 model in a tree, as it requires special cross and forward references.
In DTD-0, I created a link structure, and make all 'linkable' ('named')
objects be 'top level' objects. That is why I declared the file to
include (BootBlock?,RootGroup,(Group?,Dataset?,DataType?)*.
This rule does not reflect the structure of the file.
The structure is reflected by all the 'Link' objects, which have source,
destination, and 'Name'. Paths can be traversed by reading the 'Links'
and constructing a map of the file. Note that each object pointed
to has a pointer to the link(s) pointing to it, so the XML allows backward
walking as well.
It would be possible to nest all members within groups, sort of like the DDL does. Each object would appear as a member of exactly one Group, with additional links to reflect multiple references.
The DTD would look something like this:
<!ELEMENT HDF5-File (BootBlock?,RootGroup)> <!ELEMENT BootBlock EMPTY> <!ELEMENT RootGroup (Attribute*,(Group?|Dataset?|DataType?|Link?|SoftLink?)*)> <!ELEMENT Group (Attribute*,(Group?|Dataset?|DataType?|Link?|SoftLink?)*)>and the example XML might look something like this (not the complete file, and without multiple references and please ignore the ID and IDREFS....):
<?xml version='1.0'?> <!-- From the DDL spec, file example.h5 --> <HDF5-File> <BootBlock></BootBlock> <RootGroup> <Attribute NAME="attr1"> <Dataspace> <ScalarDataspace> </ScalarDataspace> </Dataspace> <DataType> <AtomicType> <StringType STRSIZE="17" STRPAD="H5T_STR_NULLTERM" CSET="H5T_CSET_ASCII" CTYPE="H5T_C_S1"> </StringType> </AtomicType> </DataType> <DataObjectInFile xml:link="locator" href="example.h5" HDF5_PATH="/"> </DataObjectInFile> </Attribute> <Dataset OBJ-XID="H5_dset1" Parents="H5_dset1"> <Dataspace> <SimpleDataspace Ndims="2"> <Dimension DimSize="10" MaxDimSize="10"> </Dimension> <Dimension DimSize="10" MaxDimSize="10"> </Dimension> </SimpleDataspace> </Dataspace> <DataType> <AtomicType> <IntegerType Size="4" Sign="true" ByteOrder="BE" TypeCode="H5T_STD_I32BE"> </IntegerType> </AtomicType> </DataType> <DataObjectInFile xml:link="locator" href="example.h5" HDF5_PATH="/dset1"> </DataObjectInFile> </Dataset> <Dataset OBJ-XID="H5_dset2" Parents="H5_dset2" > <Dataspace> <SimpleDataspace Ndims="1"> <Dimension DimSize="5" MaxDimSize="5"> </Dimension> </SimpleDataspace> </Dataspace> <DataType> <CompoundType> <ScalarTypeDef FieldName="a"> <AtomicType> <IntegerType Size="4" Sign="true" ByteOrder="BE" TypeCode="H5T_STD_I32BE"> </IntegerType> </AtomicType> </ScalarTypeDef> <ScalarTypeDef FieldName="b"> <AtomicType> <FloatType Size="4" ByteOrder="BE" TypeCode="H5T_IEEE_F32BE"> </FloatType> </AtomicType> </ScalarTypeDef> <ScalarTypeDef FieldName="c"> <AtomicType> <FloatType Size="8" ByteOrder="BE" TypeCode="H5T_IEEE_F64BE"> </FloatType> </AtomicType> </ScalarTypeDef> </CompoundType> </DataType> <DataObjectInFile xml:link="locator" href="example.h5" HDF5_PATH="/dset2"> </DataObjectInFile> </Dataset> <Group> <DataType OBJ-XID="H5_type1" Parents="H5_type1"> <CompoundType> <ArrayTypeDef FieldName="a" Ndims="1"> <AtomicType> <IntegerType Size="4" Sign="true" ByteOrder="BE" TypeCode="H5T_STD_I32BE"> </IntegerType> </AtomicType> <Dimension DimSize="5" MaxDimSize="5"> </Dimension> </ArrayTypeDef> <ArrayTypeDef FieldName="b" Ndims="2"> <AtomicType> <FloatType Size="4" ByteOrder="BE" TypeCode="H5T_IEEE_F32BE"> </FloatType> </AtomicType> <Dimension DimSize="5" MaxDimSize="5"> </Dimension> <Dimension DimSize="6" MaxDimSize="6"> </Dimension> </ArrayTypeDef> </CompoundType> </DataType> </Group> <Link SOURCE="H5_type1" TARGET="H5_type1" NAME="type1"/> <SoftLink LinkName="slink1" Target="somevalue" Source="H5_type1" OBJ-XID="H5_slink"> </SoftLink> </RootGroup> </HDF5-File>
It should be possible to express everything either way, and I doubt
there would be a big difference in size or complexity of the XML.
The big difference will be for programs that need to generate and parse
the XML. It's hard to know what the trade offs or gotchas may be,
since we don't have much experience.