A Proposal of File Driver Change

                                                Quincey Koziol and Raymond Lu

                                                            Oct 22, 2004

                                                  Revised on May 9, 2005

 

In this document, we discuss the current library behavior regarding file driver and a proposed change for the internal library design.

                                   

I.                   Current Library Behavior and Design

 

When users create and close a HDF5 file with a certain file driver, our library of current design does not save any information about the file driver (the multi driver is an exception).  When users try to re-open the file, our current library asks users to pick the right file driver.  For family, multi, or split driver, if the driver is incorrect, the library will return failure in an unpredictable way. 

 

For example, for the family driver, the library does not have the member file size saved anywhere.  The family driver and the member file size are set through function H5Pset_fapl_family.  Our User’s Guide says, “the member size is only used when creating a new file or truncating an existing file; otherwise the member size comes from the size of the first member of the family file being opened”.  If a user wants to open a family file, he or she has to open it with the family driver and correct family name template assuming he knows the file is a family file.  The original member file size has been lost.  So the library sets the member file size to be the current size of the first member file.

 

Below is part of code which opens a family file named “family%05d.h5” with member file size of “FAMILY_SIZE”.

 

/* Create a file access property list */

if((fapl=H5Pcreate(H5P_FILE_ACCESS))<0)

      goto error;

/* Set family driver; set member file size to FAMILY_SIZE */

if(H5Pset_fapl_family(fapl, (hsize_t)FAMILY_SIZE, fapl)<0)

            goto error;

      /* Open family file with family name template */

   if((file= H5Fopen(“family%05d.h5”, H5F_ACC_RDWR, fapl))<0)

         goto error;

 

The library will replace the integer format specifier in the name template “family%05d.h5” with member number starting from 0, “family00000.h5, family00001.h5, family00002.h5 …

 

If a user does not know the file is a family file and tries to open the first member file family00000.h5 with default file access property list, a failure will happen,

 

      /* Open a family file as normal HDF5 file */

   if((file= H5Fopen(“family00000.h5”, H5F_ACC_RDWR, H5P_DEFAULT))<0)

         goto error;

 

So we can list the library behavior for family files in the following table,

 

The way to open family file

Open with default property list and 1st member file name

Open with default property list and 1st member name, but there’s only 1 member file

Open with default property list and correct file name template

Open with family driver and correct name template

Open with family driver but wrong file name template

Library behavior

Fail

Succeed(library treats this file not as family file).

Fail

Succeed

Fail

 

 

For all the file drivers we have, we are only concerned with the family, multi, and split drivers because these three drivers make the file physically different.  They break up a single output file into multiple files.  Whether users choose the right driver is critical.  For the other drivers, like the sec2, stream, core, MPIO, SRB, and GASS, what driver to use would not affect the output file physically. 

 

In the HDF5 file superblock, there is a space reserved for driver information.  It has 3 fields, driver information size, driver name, and driver information.  Each file driver can have its own encoding and decoding functions to save its information in this space.  This space is optional.  Currently, only multi driver stores its information in the superblock.

 

II.                Use Case

 

For the user Dan Anov from Denmark, he has multiple programs from different kinds of platforms accessing data files.  The datasets are extendable.  In order to support every day backup and Windows system, he chooses family file driver to break down his files, with each member smaller than 2GB.  Without saving original family member size in file, some programs may easily extend files bigger than 2GB.  Because there are multiple users accessing his files, some users, who have not realized they are dealing with family files, have tried to open certain member files with default or some other file drivers and corrupted the files.    

 

III.             Suggestion of Changing the Design of File Driver

 

For each of the file driver we are concerned, we can save its information necessary for file re-opening in the driver information block of the file’s superblock.  In this way, if a user opens a file with a certain file driver, our library will check if the correct driver has been used.  If not, a failure will return with an error message indicating the correct driver.

 

For family driver, the size for member files used to create can be also stored in the driver information block.  The library will also check if the user has passed in correct member file size.  If not, a failure will return with an error message indicating the correct size.  If the user does not know the right size, he or she can use the default value which is 0.  The name template is also saved for possible future use.  We also add a version number field in case we may want to save some other information in the superblock in the future.

 

If for any reason, the actual member size has been modified (like using the Unix tool “cat” to concatenate) and file cannot be opened.  Users can use the tool h5repart to restore the original size.  The original size can found in the error message.  If h5repart is used, we can let it modify the member size in the superblock.    

 

Still, the drivers we discuss here are family, multi, and split.  If a user adds his own driver to the library, because we do not know if it changes the file physically, he has to modify this part of the library himself to enable this feature.   

 

Let us still use family file driver as an example.  Not knowing the file driver, a user picks default file access property list to open the first member file,     

 

   if((file= H5Fopen(“family00000.h5”, H5F_ACC_RDWR, H5P_DEFAULT))<0)

         goto error;

 

Our library will try to open this file “family00000.h5” with the default sec2 driver.  Once the library finds out from the superblock the file is a family file, it returns failure with a message indicating family driver should be used.  After receive this error message,

the user can use the correct family name template and property list to open the file.

 

We can redraw the table of library behavior for family driver as this,

 

The way to open family file

Open with default property list and 1st member file name

Open with default property list and 1st member name, but there’s only 1 member file

Open with default property list and correct file name template

Open with family driver and correct name template

Open with family driver and wrong file size

Open with family driver but wrong file name template

Library behavior

Fail with error message indicating family driver.

Fail with error message indicating family driver.

Fail with error message indicating family driver.

Succeed

Fail with error message indicating wrong size

Fail with error message

 

For each of the cases in the table above, we give an example.  Let the name template be family%05d.h5 and member file size be 1024 bytes.

 

  1. Open with default property list and first member file name:

 

   if((file= H5Fopen(“family00000.h5”, H5F_ACC_RDWR, H5P_DEFAULT))<0)

         goto error;

 

            Fail with error message indicating family driver should be used.

 

  1. Open with default property list and first member name but there is only one member file:

 

Same as the example above.

 

  1. Open with default property list and correct file name template:

 

   if((file= H5Fopen(“family%05d.h5”, H5F_ACC_RDWR, H5P_DEFAULT))<0)

         goto error;

 

            Fail with error message indicating family driver should be used.

 

  1. Open with family driver and correct name template:

 

if((fapl=H5Pcreate(H5P_FILE_ACCESS))<0)

      goto error;

if(H5Pset_fapl_family(fapl, (hsize_t)1024, fapl)<0)

            goto error;

   if((file= H5Fopen(“family%05d.h5”, H5F_ACC_RDWR, fapl))<0)

         goto error;

 

            Succeed.

 

  1. Open with family driver and wrong file size:

 

if((fapl=H5Pcreate(H5P_FILE_ACCESS))<0)

      goto error;

if(H5Pset_fapl_family(fapl, 2048, fapl)<0)

            goto error;

   if((file= H5Fopen(“family%05d.h5”, H5F_ACC_RDWR, fapl))<0)

         goto error;

 

      Fail with error message indicating correct file size.

 

  1. Open with family driver and wrong name template:

 

if((fapl=H5Pcreate(H5P_FILE_ACCESS))<0)

      goto error;

if(H5Pset_fapl_family(fapl, (hsize_t)1024, fapl)<0)

            goto error;

   if((file= H5Fopen(“family%d.h5”, H5F_ACC_RDWR, fapl))<0)

         goto error;

 

            Fail with error message.

 

IV.              Backward Compatibility

 

This suggested change is only for Release 1.8, or even a later release.  In this section, we simply assume it will be in Release 1.8. 

 

There’ll be no API change involved.  But we’ll have issues of backward compatibility with earlier release.  The library’s behavior may be slightly different. 

 

The table below shows the compatibility between Release 1.6 and 1.8 for all three drivers, family, multi, split. 

 

 

Open with v1.6 using default property list

Open with v1.6 using correct driver and name template

Open with v1.8 using default property list

Open with v1.8 using correct driver and name template

File created with v1.6

Fail in unpredictable way

Succeed

Fail in unpredictable way

Succeed

File created with v1.8

Fail in unpredictable way

Succeed

Fail with error message

Succeed

 

 

We still use family driver as an example.  For the older versions of the library without the change, it will return a failure if a user tries to open a family file with the default file access property list.  For the new version of the library with the change, it will also return a failure if a user tries to open a family file created with older library.  That is because there is no driver information in the superblock.    

 

However, the difference will happen when the user opens the file successfully with family driver.  For the older library, the original member file size has been lost while for the new library, that size is stored and retrieved from the superblock.

 

V.                 Difference Between This and the Last Proposals

 

In the last document, we proposed to save the driver name, the file name template and member file size in the superblock for the family driver.  The library would be able to choose the right driver, name template and the member file size if the user has not done so. 

 

To simply the design and avoid the drawbacks of the last design, we decided to simply save the driver name and the member file size.  The name template is also saved but not used at this stage.  Once the library finds out the driver or the member file size is wrong,

it simply returns failure with an error message.

 

VI.              Future Features

 

In the future, depending on user’s request, we can provide some flexibility to users, like the option of forcing to open files even though the member file size or the file name template is wrong.  This can be done through an extra parameter in the property list function to set family driver.