Use of MPI for Metadata I/O

Background:

All of HDF5's I/O operations that involve storing or retrieving metadata in the file are performed through the HDF5 "metadata cache". This central location coordinates access to all HDF5 metadata and enforces rules about metadata creation and access. The metadata cache within the HDF5 library provides deserialized metadata objects back to other parts of the HDF5 library by either reading metadata in from the file and deserializing it into a metadata object, or by providing an already deserialized object that it has cached from a prior use. When other parts of the HDF5 library are finished using a metadata object, they release it back to the metadata cache, which may hold it for a future use.

Eventually, as the limits of the cache are reached, metadata objects that haven't been used recently are evicted from the metadata cache. If a metadata object has been modified, the metadata cache serializes it and writes the serial form back to the HDF5 file. Unmodified metadata objects are destroyed without accessing the file.

HDF5 API Calls with an MPI Application:

When an MPI application creates or modifies metadata in an HDF5 file, all processes must perform the HDF5 API call collectively:

Why? - Because there is no "central" coordinating agent in the HDF5 library which controls space allocation in the file, the space allocation algorithms on all processes must be kept in sync by collectively performing all operations which might allocate/free space in the file. Additionally, there is no mechanism for locking a group or dataset in an HDF5 file in order to perform an operation on it, which would lead to changes from different MPI processes either wiping each other out, or corrupting the file.
Outcome: Because all operations that create or modify metadata must be performed collectively, all processes in the MPI application will have identical "dirty" metadata information in their HDF5 metadata caches. However, because an MPI process could have independently opened or read the HDF5 metadata object (see below), modifying a piece of HDF5 metadata may or may not require reading it from the file first.

One process in an MPI application may perform metadata operations that open or read objects in an HDF5 independently from other processes:

Why? - This allows an MPI application more flexibility with its algorithms.
Outcome: Because different processes could open or read different HDF5 objects, different "clean" metadata could be in each MPI processes HDF5 metadata cache.

The HDF5 Metadata Cache's use of MPI:

The metadata cache in HDF5 can use MPI to synchronize the I/O operations that are performed when evicting metadata objects and is covered in the following Overview of the HDF5 Metadata Cache.