Introduction to HDF5 HDF5 User Guide HDF5 Reference Manual Other HDF5 documents and links |
Parallel HDF5 Documents: Parallel HDF5 Design Implementation Release Notes |
This section describes the function requirements of the Parallel HDF5 (PHDF5) software and the assumed system requirements. Section 2 describes the programming model of the PHDF5 interface. Section 3 presents several sample PHDF5 programs.
Parallel HDF5 is designed to meet the following functional requirements:
PHDF5 supports parallel access to HDF5 files in the MPI environment. MPI is a standard interface for the distributed memory parallel computing environment in which inter-process communication is done by message passing. The MPI standard documents are available at http://www.mpi-forum.org. Other related MPI information, such as tutorials and implementations, can be found at http://www.mcs.anl.gov/Projects/mpi/.
The following discussion describes the programming model of HDF5 specific to the MPI environment. For a general and more complete understanding of the HDF5 library, one may consult the documents and user guide at http://hdf.ncsa.uiuc.edu/HDF5.
HDF5 uses property lists to control the file access mechanism. The general model in accessing an HDF5 file in parallel contains the following steps:
Each process of the MPI communicator creates an access property list via H5Pset_mpi
and sets it up with MPI information (communicator, info object) as required by the MPI_File_open
as defined in MPI-2
. Note that H5Pset_mpi
does not make duplicates of the communicator or the info object. The PHDF5 library will make duplicates of them when an HDF5 file is opened. Therefore, any changes to the communicator or info object will affect the H5Fcreate/H5Fopen
calls following the changes. Users are advised not to make changes to the communicator or the info object after the H5Pset_mpi
call.
(From this point on, processes are limited to those that are members of the communicator defined in the H5Pset_mpi
call.)
Example:
/* setup file access property list with parallel IO access. */
acc_pl = H5Pcreate (H5P_FILE_ACCESS);
H5Pset_mpi(acc_pl, comm, info);
All processes of the MPI communicator open an HDF5 file by a collective call (H5FCreate
or H5Fopen
) with the access property list. The call must be collective because the underlying MPI_File_open()
is a collective call.
Example:
/* create the file collectively */
fid=H5Fcreate("filexyz",H5F_ACC_TRUNC,H5P_DEFAULT,acc_pl);
All processes of the MPI communicator open a dataset by a collective call (H5Dcreate
or H5Dopen
). This version supports only collective dataset open. The call must be collective because all processes need to have a common knowledge of the dataset object being accessed. This allows cooperative changes to the dataset object later. A future version may support datasets opened by a subset of the processes that have opened the file.
Example:
/* create a 512x1024 dataset */
hsize_t dims[2] = {512, 1024};
sid = H5Screate_simple (2, dims, NULL);
dataset = H5Dcreate(fid, "dataset1", H5T_NATIVE_INT, sid, H5P_DEFAULT);
Each process may do independent and an arbitrary number of data I/O accesses by independent calls (H5Dread
or H5Dwrite
) to the dataset with the transfer property list set for independent access. (The default transfer mode is independent transfer.)
If the dataset has an unlimited dimension and if the H5Dwrite
is writing data beyond the current dimension size of the dataset, all processes that have opened the dataset must make a collective call (H5Dallocate
) to allocate more space for the dataset before the independent H5Dwrite
call. The reason is that when data is written beyond the current dimension size, that dimension size must be increased to hold the new data. Changing the dimension size of a dataset is a structural change of the object and must be done by all processes.
All processes that have opened the dataset may do collective data I/O access by collective calls (H5Dread
or H5Dwrite
) to the dataset with the transfer property list set for collective access. Pre-allocation (H5Dallocate
) is not needed for unlimited dimension datasets since the H5Dallocate
call, if needed, is done internally by the collective data access call. Though all collective accesses can be replaced with independent accesses by each process, collective accesses can provide a better performance if the equivalent independent accesses result in small fragments. A simple example is that of a two dimensional dataset stored in row major order. When each process needs to access the data by columns, individual independent access by each process would result in multiple uncoordinated accesses to the dataset with each access segment the size of the column width. But if all processes can access the dataset with one collective call, the library, with the extra information of the access pattern, can combine the small accesses into bigger I/O accesses and use gather/scatter to transfer data between all processes.
Changes to attributes can only occur at the main process (process 0). Read only access to attributes can occur independently in each process that has opened the dataset.
All processes that have opened the dataset must close the dataset by a collective call (H5Dclose
). The call must be collective so that all processes have the same knowledge that the dataset is no longer being accessed.
All processes that have opened the file must close the file by a collective call (H5Fclose
). The call must be collective because the underlying MPI_File_close()
is a collective call.
The following are examples of code using the parallel HDF5 API. The main program and the testphdf5.h files can be viewed at these links.
This example shows how to open two HDF5 files with two different communicators containing two groups of processes.
This example shows how to create a fixed dimension dataset. Each process then writes and reads data to and from part of the dataset independent of other processes.
This example shows how to create a fixed dimension dataset. All processes then write and read data to and from the dataset in the collective mode.
This example shows how to create an extendible dimension dataset. All processes then collectively extend the size of the dataset. Then each process writes and reads data to and from part of the dataset independent of other processes.
Example: Independent access to extendible dataset