Introduction to HDF5 
HDF5 User Guide 
HDF5 Reference Manual 
Other HDF5 documents and links 
Parallel HDF5 Documents:
Parallel HDF5 Design
Implementation Release Notes

Parallel HDF5 Design

1. Design Overview

This section describes the function requirements of the Parallel HDF5 (PHDF5) software and the assumed system requirements. Section 2 describes the programming model of the PHDF5 interface. Section 3 presents several sample PHDF5 programs.

1.1 Functional requirements

Parallel HDF5 is designed to meet the following functional requirements:

1.2. Design Specification

2. Programming Model

PHDF5 supports parallel access to HDF5 files in the MPI environment. MPI is a standard interface for the distributed memory parallel computing environment in which inter-process communication is done by message passing. The MPI standard documents are available at http://www.mpi-forum.org. Other related MPI information, such as tutorials and implementations, can be found at http://www.mcs.anl.gov/Projects/mpi/.

The following discussion describes the programming model of HDF5 specific to the MPI environment. For a general and more complete understanding of the HDF5 library, one may consult the documents and user guide at http://hdf.ncsa.uiuc.edu/HDF5.

HDF5 uses property lists to control the file access mechanism. The general model in accessing an HDF5 file in parallel contains the following steps:

2.1. Setup access property list

Each process of the MPI communicator creates an access property list via H5Pset_mpi and sets it up with MPI information (communicator, info object) as required by the MPI_File_open as defined in MPI-2. Note that H5Pset_mpi does not make duplicates of the communicator or the info object. The PHDF5 library will make duplicates of them when an HDF5 file is opened. Therefore, any changes to the communicator or info object will affect the H5Fcreate/H5Fopen calls following the changes. Users are advised not to make changes to the communicator or the info object after the H5Pset_mpi call.

(From this point on, processes are limited to those that are members of the communicator defined in the H5Pset_mpi call.)

Example:

/* setup file access property list with parallel IO access. */

acc_pl = H5Pcreate (H5P_FILE_ACCESS);

H5Pset_mpi(acc_pl, comm, info);

2.1. File create/open

All processes of the MPI communicator open an HDF5 file by a collective call (H5FCreate or H5Fopen) with the access property list. The call must be collective because the underlying MPI_File_open() is a collective call.

Example:

/* create the file collectively */

fid=H5Fcreate("filexyz",H5F_ACC_TRUNC,H5P_DEFAULT,acc_pl);

2.2. Dataset create/open

All processes of the MPI communicator open a dataset by a collective call (H5Dcreate or H5Dopen).  This version supports only collective dataset open.  The call must be collective because all processes need to have a common knowledge of the dataset object being accessed. This allows cooperative changes to the dataset object later. A future version may support datasets opened by a subset of the processes that have opened the file.

Example:

/* create a 512x1024 dataset */

hsize_t dims[2] = {512, 1024};

sid = H5Screate_simple (2, dims, NULL);

dataset = H5Dcreate(fid, "dataset1", H5T_NATIVE_INT, sid, H5P_DEFAULT);

2.3. Dataset access

2.3.1. Independent dataset access

Each process may do independent and an arbitrary number of data I/O accesses by independent calls (H5Dread or H5Dwrite) to the dataset with the transfer property list set for independent access.  (The default transfer mode is independent transfer.)

If the dataset has an unlimited dimension and if the H5Dwrite is writing data beyond the current dimension size of the dataset, all processes that have opened the dataset must make a collective call (H5Dallocate) to allocate more space for the dataset before the independent H5Dwrite call. The reason is that when data is written beyond the current dimension size, that dimension size must be increased to hold the new data. Changing the dimension size of a dataset is a structural change of the object and must be done by all processes.

2.3.2. Collective dataset access

All processes that have opened the dataset may do collective data I/O access by collective calls (H5Dread or H5Dwrite) to the dataset with the transfer property list set for collective access.  Pre-allocation (H5Dallocate) is not needed for unlimited dimension datasets since the H5Dallocate call, if needed, is done internally by the collective data access call. Though all collective accesses can be replaced with independent accesses by each process, collective accesses can provide a better performance if the equivalent independent accesses result in small fragments. A simple example is that of a two dimensional dataset stored in row major order. When each process needs to access the data by columns, individual independent access by each process would result in multiple uncoordinated accesses to the dataset with each access segment the size of the column width. But if all processes can access the dataset with one collective call, the library, with the extra information of the access pattern, can combine the small accesses into bigger I/O accesses and use gather/scatter to transfer data between all processes.

2.3.3. Dataset attributes access

Changes to attributes can only occur at the main process (process 0).  Read only access to attributes can occur independently in each process that has opened the dataset.  

2.4. Dataset close

All processes that have opened the dataset must close the dataset by a collective call (H5Dclose). The call must be collective so that all processes have the same knowledge that the dataset is no longer being accessed.

2.5. File close

All processes that have opened the file must close the file by a collective call (H5Fclose). The call must be collective because the underlying MPI_File_close() is a collective call.

3. Parallel HDF5 Example

The following are examples of code using the parallel HDF5 API. The main program and the testphdf5.h files can be viewed at these links.

3.1. Opening multiple HDF5 files with different communicators

This example shows how to open two HDF5 files with two different communicators containing two groups of processes.

Example: Multi-open

3.2. Accessing a dataset via independent transfer mode

This example shows how to create a fixed dimension dataset. Each process then writes and reads data to and from part of the dataset independent of other processes.

Example: Independent access

3.3. Accessing a dataset via collective transfer mode

This example shows how to create a fixed dimension dataset. All processes then write and read data to and from the dataset in the collective mode.

Example: Collective access

3.4. Accessing an extendible dimension dataset

This example shows how to create an extendible dimension dataset. All processes then collectively extend the size of the dataset. Then each process writes and reads data to and from part of the dataset independent of other processes.

Example: Independent access to extendible dataset


Comments and questions: hdfparallel@ncsa.uiuc.edu
Last modified: 29 Dec 1998