A Proposal
of File Driver Change
Quincey Koziol and Raymond Lu
Revised
on
In this document, we discuss the current library behavior regarding file driver and a proposed change for the internal library design.
I.
Current
Library Behavior and Design
When users create and close a HDF5 file with a certain file driver, our library of current design does not save any information about the file driver (the multi driver is an exception). When users try to re-open the file, our current library asks users to pick the right file driver. For family, multi, or split driver, if the driver is incorrect, the library will return failure in an unpredictable way.
For example, for the family driver, the library does not have the member file size saved anywhere. The family driver and the member file size are set through function H5Pset_fapl_family. Our User’s Guide says, “the member size is only used when creating a new file or truncating an existing file; otherwise the member size comes from the size of the first member of the family file being opened”. If a user wants to open a family file, he or she has to open it with the family driver and correct family name template assuming he knows the file is a family file. The original member file size has been lost. So the library sets the member file size to be the current size of the first member file.
Below is part of code which opens a family file named “family%05d.h5” with member file size of “FAMILY_SIZE”.
/* Create a file access
property list */
if((fapl=H5Pcreate(H5P_FILE_ACCESS))<0)
goto error;
/* Set family driver; set member file size to
FAMILY_SIZE */
if(H5Pset_fapl_family(fapl, (hsize_t)FAMILY_SIZE, fapl)<0)
goto error;
/* Open family file with family name
template */
if((file= H5Fopen(“family%05d.h5”, H5F_ACC_RDWR, fapl))<0)
goto
error;
The library will replace the integer format specifier in the name template “family%05d.h5” with member number starting from 0, “family00000.h5, family00001.h5, family00002.h5 …”
If a user does not know the file is a family file and tries to open the first member file family00000.h5 with default file access property list, a failure will happen,
/* Open a family file as normal HDF5 file
*/
if((file= H5Fopen(“family00000.h5”, H5F_ACC_RDWR,
H5P_DEFAULT))<0)
goto
error;
So we can list the library behavior for family files in the following table,
The way to open family file |
Open with default property list and 1st member file name |
Open with default property list and 1st member name, but there’s only 1 member file |
Open with default property list and correct file name template |
Open with family driver and correct name template |
Open with family driver but wrong file name template |
Library behavior |
Fail |
Succeed(library treats this file not as family file). |
Fail |
Succeed |
Fail |
For all the file drivers we have, we are only concerned with the family, multi, and split drivers because these three drivers make the file physically different. They break up a single output file into multiple files. Whether users choose the right driver is critical. For the other drivers, like the sec2, stream, core, MPIO, SRB, and GASS, what driver to use would not affect the output file physically.
In the HDF5 file superblock, there is a space reserved for driver information. It has 3 fields, driver information size, driver name, and driver information. Each file driver can have its own encoding and decoding functions to save its information in this space. This space is optional. Currently, only multi driver stores its information in the superblock.
II.
Use
Case
For the user Dan Anov from
III.
Suggestion
of Changing the Design of File Driver
For each of the file driver we are concerned, we can save its information necessary for file re-opening in the driver information block of the file’s superblock. In this way, if a user opens a file with a certain file driver, our library will check if the correct driver has been used. If not, a failure will return with an error message indicating the correct driver.
For family driver, the size for member files used to create can be also stored in the driver information block. The library will also check if the user has passed in correct member file size. If not, a failure will return with an error message indicating the correct size. If the user does not know the right size, he or she can use the default value which is 0. The name template is also saved for possible future use. We also add a version number field in case we may want to save some other information in the superblock in the future.
If for any reason, the actual member size has been modified (like using the Unix tool “cat” to concatenate) and file cannot be opened. Users can use the tool h5repart to restore the original size. The original size can found in the error message. If h5repart is used, we can let it modify the member size in the superblock.
Still, the drivers we discuss here are family, multi, and split. If a user adds his own driver to the library, because we do not know if it changes the file physically, he has to modify this part of the library himself to enable this feature.
Let us still use family file driver as an example. Not knowing the file driver, a user picks default file access property list to open the first member file,
if((file= H5Fopen(“family00000.h5”, H5F_ACC_RDWR,
H5P_DEFAULT))<0)
goto
error;
Our library will try to open this file “family00000.h5” with the default sec2 driver. Once the library finds out from the superblock the file is a family file, it returns failure with a message indicating family driver should be used. After receive this error message,
the user can use the correct family name template and property list to open the file.
We can redraw the table of library behavior for family driver as this,
The way to open family file |
Open with default property list and 1st member file name |
Open with default property list and 1st member name, but there’s only 1 member file |
Open with default property list and correct file name template |
Open with family driver and correct name template |
Open with family driver and wrong file size |
Open with family driver but wrong file name template |
Library behavior |
Fail with error message indicating family driver. |
Fail with error message indicating family driver. |
Fail with error message indicating family driver. |
Succeed |
Fail with error message indicating wrong size |
Fail with error message |
For each of the cases in the table above, we give an example. Let the name template be family%05d.h5 and member file size be 1024 bytes.
if((file= H5Fopen(“family00000.h5”, H5F_ACC_RDWR, H5P_DEFAULT))<0)
goto
error;
Fail with error message indicating family driver should be used.
Same as the example above.
if((file= H5Fopen(“family%05d.h5”, H5F_ACC_RDWR, H5P_DEFAULT))<0)
goto
error;
Fail with error message indicating family driver should be used.
if((fapl=H5Pcreate(H5P_FILE_ACCESS))<0)
goto error;
if(H5Pset_fapl_family(fapl, (hsize_t)1024, fapl)<0)
goto error;
if((file= H5Fopen(“family%05d.h5”, H5F_ACC_RDWR, fapl))<0)
goto
error;
Succeed.
if((fapl=H5Pcreate(H5P_FILE_ACCESS))<0)
goto error;
if(H5Pset_fapl_family(fapl, 2048, fapl)<0)
goto error;
if((file= H5Fopen(“family%05d.h5”, H5F_ACC_RDWR, fapl))<0)
goto
error;
Fail with error message indicating correct file size.
if((fapl=H5Pcreate(H5P_FILE_ACCESS))<0)
goto error;
if(H5Pset_fapl_family(fapl, (hsize_t)1024, fapl)<0)
goto error;
if((file= H5Fopen(“family%d.h5”, H5F_ACC_RDWR, fapl))<0)
goto
error;
Fail with error message.
IV.
Backward
Compatibility
This suggested change is only for Release 1.8, or even a later release. In this section, we simply assume it will be in Release 1.8.
There’ll be no API change involved. But we’ll have issues of backward compatibility with earlier release. The library’s behavior may be slightly different.
The table below shows the compatibility between Release 1.6 and 1.8 for all three drivers, family, multi, split.
|
Open with v1.6 using default property list |
Open with v1.6 using correct driver and name template |
Open with v1.8 using default property list |
Open with v1.8 using correct driver and name template |
File created with v1.6 |
Fail in unpredictable way |
Succeed |
Fail in unpredictable way |
Succeed |
File created with v1.8 |
Fail in unpredictable way |
Succeed |
Fail with error message |
Succeed |
We still use family driver as an example. For the older versions of the library without the change, it will return a failure if a user tries to open a family file with the default file access property list. For the new version of the library with the change, it will also return a failure if a user tries to open a family file created with older library. That is because there is no driver information in the superblock.
However, the difference will happen when the user opens the file successfully with family driver. For the older library, the original member file size has been lost while for the new library, that size is stored and retrieved from the superblock.
V.
Difference
Between This and the Last Proposals
In the last document, we proposed to save the driver name, the file name template and member file size in the superblock for the family driver. The library would be able to choose the right driver, name template and the member file size if the user has not done so.
To simply the design and avoid the drawbacks of the last design, we decided to simply save the driver name and the member file size. The name template is also saved but not used at this stage. Once the library finds out the driver or the member file size is wrong,
it simply returns failure with an error message.
VI.
Future
Features
In the future, depending on user’s request, we can provide some flexibility to users, like the option of forcing to open files even though the member file size or the file name template is wrong. This can be done through an extra parameter in the property list function to set family driver.