Enhancement Request for H5Gmove() and H5Glink()
===============================================
1. Introduction
---------------
Objects in the hdf5 API are identified four ways:
1. by handle to the object (type `hid_t')
2. by "loc" (location) and name
(e.g., H5Dopen(), H5Gopen(), etc).
3. by a unique, permanent object header address
(H5Gget_objinfo())
4. by pointer from a dataset or attribute
Method two takes an object name that is either absolute
(beginning
with a slash) or relative. All absolute names are looked up
beginning
at the root group of the file specified by the `loc'
argument; all
relative names are looked up beginning at the group
specified by the
`loc' argument (or the root group of the file if `loc' is a
file
handle).
Some examples: If I have a file that contains a root group
called "/"
(all files have such a group) and a subgroup called
"foo", and "foo"
contains a dataset called "bar", then I can open
the dataset in a
variety of ways:
If
hid_t file = H5Fopen(...);
hid_t root = H5Gopen(file,
"/");
then any of the following can open the group
"foo":
hid_t foo = H5Gopen(file,
"/foo");
hid_t foo = H5Gopen(file,
"./foo");
hid_t foo = H5Gopen(file,
"foo");
hid_t foo = H5Gopen(root,
"/foo");
hid_t foo = H5Gopen(root,
"./foo");
hid_t foo = H5Gopen(root,
"foo");
then any of the following can open the dataset
"bar":
hid_t bar = H5Dopen(file,
"/foo/bar");
hid_t bar = H5Dopen(file, "./foo/bar");
hid_t bar = H5Dopen(file,
"foo/bar");
hid_t bar = H5Dopen(root,
"/foo/bar");
hid_t bar = H5Dopen(root,
"./foo/bar");
hid_t bar = H5Dopen(root,
"foo/bar");
hid_t bar = H5Dopen(foo,
"/foo/bar");
hid_t bar = H5Dopen(foo, "bar");
hid_t bar = H5Dopen(foo,
"./bar");
This flexibility is important because:
1. It takes time to look up each component of a
name. If a client
is about to look up many
names in a common group:
then it is faster to look up
and obtain a handle to the group
first and then look up the
members relative to that group than
to look up the absolute
names:
hid_t baz =
H5Gopen(file, "/foo/bar/baz");
2. It prevents a client from having to
construct absolute
names. E.g., if a client is
given a group name and list of
datasets in that group, then
it only needs to open the group and
then look up each dataset:
hid_t
dataset[numDatasets];
for
(i=0; i<numDatasets; i++) {
rather than
for
(i=0; i<numDatasets; i++) {
3. It allows allows a client to
"forget" about the location of an
object. If a client has a
group that contains named datatypes
then it can open that group
once by name and then use the group
handle throughout the life of
the program to access the named
datatypes in that
group. (This is essentially #2 with the bit
about opening the group
separated from the `for' loop.)
4. It allows the client to obtain a handle to a
group of related
objects and then rename or
remove that group (or one of the
parent groups) without
affecting accessibility of the objects
within the group. Consider a
program that makes the following
calls sometime during its
execution:
hid_t
types = H5Gopen(file, "/some/deep/directory/containing/types");
2. Enhancment
-------------
Two API functions in HDF5-1.4 are deficient:
H5Gmove(hid_t loc, const char *source,
const char *destination);
H5Glink(hid_t loc, H5G_link_t link_type,
const char source, const char *destination);
These two functions use `loc' for both the source and
destination
objects. This means that the benefits described above only apply
if
the source and destination names are in the same group.
I propose a change to the API by adding a second `loc'
argument for
the destination:
H5Gmove(hid_t srcloc, const char *source,
hid_t dstloc, const char *destination);
H5Glink(hid_t srcloc, const char *source,
hid_t dstloc, const char *destination,
3. Repercussions
----------------
This not a backward-compatible API change. Any C/C++
application that
attemps to recompile with this API change and which includes
HDF5
public header files (e.g., "#include
<hdf5.h>") will get an error that
the number of actual arguments does not match the number of
formal
arguments. Any application that simply relinks with
the new hdf5
library will get an error about compile/link versions of the
library
not matching.
In order to ease the burden on hdf5 clients I also propose:
1. If the `dstloc' argument is zero then use
the `srcloc' value as
for `dstloc'. (A zero-valued
hid_t is not otherwise possible).
2. Make a public `#define H5G_SAME_LOC 0' so
clients can document
the fact that the destination
location is the same as the source
location (if they don't want
to repeat the location argument).
3. Document in the release notes that
compile-time errors involving
H5Gmove() and H5Glink() can
be fixed by changing:
H5Gmove(L1,SRC,DST) --> H5Gmove(L1,SRC,H5S_SAME_LOC,DST)
(or by repeating
the L1 argument in place of H5S_SAME_LOC)
4. If HDF5 is configured with backward
compatibility then the old
function prototypes are kept
and the source and destination
locations are presumed to be
the same. The new prototypes will
be available as the names
H5Gmove2() and H5Glink2(). This
capability will be removed in
the 1.7 development series.
4. Opinions
-----------
I prefer to fix H5Gmove() and H5Gunlink() rather than
creating two
additional functions because:
1. These functions are probably not used often
(limited impact)
2. The programmer will be notified of the
change by a compile
error (difficult to overlook)
3. The change is trivial to fix by adding
H5G_SAME_LOC to each call.
4. Any new name would be more obscure than the
current names.
5. It decreases the amount of code and
documentation to maintain.
6. The current implementation doesn't follow
the convention that
all objects are identified by
a location and name, and thus is
"broken" in my
opinion.