A Proposal of File Driver Change

                                                Quincey Koziol and Raymond Lu

                                                            Oct 22, 2004

 

In this document, we discuss the current library behavior regarding file driver and a proposed change for the internal library design.

                                   

I.                   Current Library Behavior and Design

 

When users create and close a HDF5 file with a certain file driver, our library does not save any information about the file driver except the name of file driver(The multi driver is an exception).  When users try to re-open the file, our current library asks users to pick the right file driver.  For family, multi, or split driver, if the driver is incorrect, the library will return failure in an unpredictable way. 

 

For example, for family driver, the library does not have the member file size saved anywhere.  The family driver and the member file size are set through function H5Pset_fapl_family.  Our User’s Guide says, “the member size is only used when creating a new file or truncating an existing file(?); otherwise the member size comes from the size of the first member of the family file being opened”.  If a user wants to open a family file, he has to open it with family driver and correct family name template assuming he knows the file is a family file.  The original member file size has been lost.  So the library sets the member file size to be the current size of the first member file.

 

Below is part of code which opens a family file named “family%05d.h5” with member file size of “FAMILY_SIZE”.

 

/* Create a file access property list */

if((fapl=H5Pcreate(H5P_FILE_ACCESS))<0)

      goto error;

/* Set family driver; set member file size to FAMILY_SIZE */

if(H5Pset_fapl_family(fapl, (hsize_t)FAMILY_SIZE, fapl)<0)

            goto error;

      /* Open family file with family name template */

   if((file= H5Fopen(“family%05d.h5”, H5F_ACC_RDWR, fapl))<0)

         goto error;

 

The library will replace the integer format specifier in the name template “family%05d.h5” with member number starting from 0, “family00000.h5, family00001.h5, family00002.h5 …

 

If a user does not know the file is a family file and tries to open the first member file family00000.h5 with default file access property list, a failure will happen,

 

      /* Open a family file as normal HDF5 file */

   if((file= H5Fopen(“family00000.h5”, H5F_ACC_RDWR, H5P_DEFAULT))<0)

         goto error;

 

So we can list the library behavior for family files in the following table,

 

The way to open family file

Open with default property list and 1st member file name

Open with default property list and 1st member name, but there’s only 1 member file

Open with default property list and correct file name template

Open with family driver and correct name template

Library behavior

Fail

Succeed(library treats this file not as family file).

Fail

Succeed

 

 

For all the file drivers we have, we are only concerned with the family, multi, and split drivers because these three drivers make the file physically different.  They break up a single output file into multiple files.  Whether users choose the right driver is critical.  For the other drivers, like the sec2, stream, core, MPIO, SRB, and GASS, what driver to use would not affect the output file physically. 

 

In the HDF5 file superblock, there is a space reserved for driver information.  It has 3 fields, driver information size, driver name, and driver information.  Each file driver can have its own encoding and decoding functions to save its information in this space.  This space is optional.  Currently, only multi driver stores its information in the superblock.

 

II.                Suggestion to Change the Design of File Driver

 

For each file driver, we can save its information necessary for file re-opening in the driver information in the superblock.  In this way, without knowing the file driver used to create the file, a user can simply open the file with default file driver.  Our library will check if the correct driver has been used.  If not, the library can automatically switch to the correct file driver.  Still, the drivers we discuss here are family, multi, and split.  If a user adds his own driver to the library, because we do not know if it changes the file physically, he has to modify this part of the library himself to enable this feature.  Otherwise, the user is responsible for choosing the right driver.   

 

Let’s still use family file driver as an example.  Not knowing the file driver, a user picks default file access property list to open the first member file,     

 

   if((file= H5Fopen(“family00000.h5”, H5F_ACC_RDWR, H5P_DEFAULT))<0)

         goto error;

 

Our library will try to open this file “family00000.h5” with the default sec2 driver.  Once the library finds out from the superblock the file is a family file, it retrieves the file’s correct name template, “family%05d.h5”, and member file size from the superblock.  The library then closes the file and re-opens it with family driver and the name template.  We’ll have to prohibit users to change their family file name after they created it.

 

If the user knows it is the family driver, he can still use the correct family name template and property list to open the file as normal.  However, the family file size he passes in through the file access property list is not used because the library will find the size saved in the superblock with which he created the file.

 

III.             Backward Compatibility

 

This suggested change is only for Release 1.8.  There’ll be no API change involved.  But we’ll have issues of backward compatibility with earlier release.  The library’s behavior may be slightly different. 

 

The table below shows the compatibility between Release 1.6 and 1.8 for all three drivers, family, multi, split. 

 

 

Open with v1.6 using default property list

Open with v1.6 using correct driver and name template

Open with v1.8 using default property list

Open with v1.8 using correct driver and name template

File created with v1.6

Fail

Succeed

Fail

Succeed

File created with v1.8

Fail

Succeed

Succeed(This is the major improvement for the new design)

Succeed

 

 

We still use family driver as an example.  For the older versions of the library without the change, it will return a failure if a user tries to open a family file with the default file access property list.  For the new version of the library with the change, it will also return a failure if a user tries to open a family file created with older library.  That is because there is no driver information in the superblock.     

 

However, the difference will happen when the user opens the file successfully with family driver.  For the older library, the original member file size has been lost while for the new library, that size is stored and retrieved from the superblock.