Enhancement Request for H5Gmove() and H5Glink()

===============================================

 

1. Introduction

---------------

 

Objects in the hdf5 API are identified four ways:

 

   1. by handle to the object (type `hid_t')

   2. by "loc" (location) and name (e.g., H5Dopen(), H5Gopen(), etc).

   3. by a unique, permanent object header address (H5Gget_objinfo())

   4. by pointer from a dataset or attribute

 

Method two takes an object name that is either absolute (beginning

with a slash) or relative. All absolute names are looked up beginning

at the root group of the file specified by the `loc' argument; all

relative names are looked up beginning at the group specified by the

`loc' argument (or the root group of the file if `loc' is a file

handle).

 

Some examples: If I have a file that contains a root group called "/"

(all files have such a group) and a subgroup called "foo", and "foo"

contains a dataset called "bar", then I can open the dataset in a

variety of ways:

 

If

    hid_t file = H5Fopen(...);

    hid_t root = H5Gopen(file, "/");

 

then any of the following can open the group "foo":

    hid_t foo = H5Gopen(file, "/foo");

    hid_t foo = H5Gopen(file, "./foo");

    hid_t foo = H5Gopen(file, "foo");

 

    hid_t foo = H5Gopen(root, "/foo");

    hid_t foo = H5Gopen(root, "./foo");

    hid_t foo = H5Gopen(root, "foo");

 

then any of the following can open the dataset "bar":

    hid_t bar = H5Dopen(file, "/foo/bar");

    hid_t bar = H5Dopen(file, "./foo/bar");

    hid_t bar = H5Dopen(file, "foo/bar");

 

    hid_t bar = H5Dopen(root, "/foo/bar");

    hid_t bar = H5Dopen(root, "./foo/bar");

    hid_t bar = H5Dopen(root, "foo/bar");

 

    hid_t bar = H5Dopen(foo, "/foo/bar");

    hid_t bar = H5Dopen(foo, "bar");

    hid_t bar = H5Dopen(foo, "./bar");

 

This flexibility is important because:

 

   1. It takes time to look up each component of a name. If a client

      is about to look up many names in a common group:

 

         /foo/bar/baz/apple

         /foo/bar/baz/banana

         /foo/bar/baz/cherry

         /foo/bar/baz/date

         ...

         /foo/bar/baz/zucchini

 

      then it is faster to look up and obtain a handle to the group

      first and then look up the members relative to that group than

      to look up the absolute names:

 

         hid_t baz = H5Gopen(file, "/foo/bar/baz");

         hid_t apple = H5Dopen(baz, "apple");

         hid_t apple = H5Dopen(baz, "banana");

         hid_t apple = H5Dopen(baz, "cherry");

         hid_t apple = H5Dopen(baz, "date");

         ...

         hid_t apple = H5Dopen(baz, "zucchini");

 

   2. It prevents a client from having to construct absolute

      names. E.g., if a client is given a group name and list of

      datasets in that group, then it only needs to open the group and

      then look up each dataset:

 

          hid_t dataset[numDatasets];

          hid_t group = H5Gopen(file, groupName);

          for (i=0; i<numDatasets; i++) {

              dataset[i] = H5Dopen(group, datasetName[i]);

          }

          H5Gclose(group);

 

      rather than

         

          hid_t dataset[numDatasets];

          for (i=0; i<numDatasets; i++) {

              char *absoluteName = malloc(strlen(groupName)+strlen(datasetName[i])+2);

              sprintf(absoluteName, "%s/%s", groupName, datasetName);

              dataset[i] = H5Dopen(file, absoluteName);

              free(absoluteName);

          }

 

   3. It allows allows a client to "forget" about the location of an

      object. If a client has a group that contains named datatypes

      then it can open that group once by name and then use the group

      handle throughout the life of the program to access the named

      datatypes in that group.  (This is essentially #2 with the bit

      about opening the group separated from the `for' loop.)

 

   4. It allows the client to obtain a handle to a group of related

      objects and then rename or remove that group (or one of the

      parent groups) without affecting accessibility of the objects

      within the group. Consider a program that makes the following

      calls sometime during its execution:

 

          hid_t types = H5Gopen(file, "/some/deep/directory/containing/types");

          ...

          H5Gmove(file, "/some/deep/directory", "/some/deep/group");

          ...

          hid_t timeval = H5Topen(types, "timeval");  /*still accessible*/

          /* but not accessible as /some/deep/directory/containing/types/timeval */

          ...

          H5Gunlink(file, "/some");

          ...

          hid_t statbuf = H5Topen(types, "statbuf"); /*still accessible*/

          /* but not accessible by any absolute name */




2. Enhancment

-------------

 

Two API functions in HDF5-1.4 are deficient:

 

    H5Gmove(hid_t loc, const char *source, const char *destination);

    H5Glink(hid_t loc, H5G_link_t link_type, const char source, const char *destination);

 

These two functions use `loc' for both the source and destination

objects. This means that the benefits described above only apply if

the source and destination names are in the same group.

 

I propose a change to the API by adding a second `loc' argument for

the destination:

 

    H5Gmove(hid_t srcloc, const char *source,

            hid_t dstloc, const char *destination);

 

    H5Glink(hid_t srcloc, const char *source,

            hid_t dstloc, const char *destination,

            H5G_link_t link_type);

 

3. Repercussions

----------------

 

This not a backward-compatible API change. Any C/C++ application that

attemps to recompile with this API change and which includes HDF5

public header files (e.g., "#include <hdf5.h>") will get an error that

the number of actual arguments does not match the number of formal

arguments.  Any application that simply relinks with the new hdf5

library will get an error about compile/link versions of the library

not matching.

 

In order to ease the burden on hdf5 clients I also propose:

 

   1. If the `dstloc' argument is zero then use the `srcloc' value as

      for `dstloc'. (A zero-valued hid_t is not otherwise possible).

 

   2. Make a public `#define H5G_SAME_LOC 0' so clients can document

      the fact that the destination location is the same as the source

      location (if they don't want to repeat the location argument).

 

   3. Document in the release notes that compile-time errors involving

      H5Gmove() and H5Glink() can be fixed by changing:

 

        H5Gmove(L1,SRC,DST)   --> H5Gmove(L1,SRC,H5S_SAME_LOC,DST)

        H5Glink(L1,T,SRC,DST) --> H5Glink(L1,SRC,H5S_SAME_LOC,DST,T)

        (or by repeating the L1 argument in place of H5S_SAME_LOC)

 

   4. If HDF5 is configured with backward compatibility then the old

      function prototypes are kept and the source and destination

      locations are presumed to be the same.  The new prototypes will

      be available as the names H5Gmove2() and H5Glink2(). This

      capability will be removed in the 1.7 development series.

 

4. Opinions

-----------

 

I prefer to fix H5Gmove() and H5Gunlink() rather than creating two

additional functions because:

 

   1. These functions are probably not used often (limited impact)

   2. The programmer will be notified of the change by a compile

      error (difficult to overlook)

   3. The change is trivial to fix by adding H5G_SAME_LOC to each call.

   4. Any new name would be more obscure than the current names.

   5. It decreases the amount of code and documentation to maintain.

   6. The current implementation doesn't follow the convention that

      all objects are identified by a location and name, and thus is

      "broken" in my opinion.