Parallel HDF5 Performance Measuring

DRAFT VERSION

1 Introduction

Currently, measuring performance for parallel HDF5 on different platforms isn't formalized. Having in place a formalized way of measuring performance is a good way to determine what our strengths and weaknesses are on different platforms. The performance test results from different platforms can be compared against each other since they are tested in a consistent way.

The HDF5 library also has a list of adjustable parameters that affect performance of the library. The performance tools can help identify the proper parameter values according to characteristics of individual platforms or file systems.

2 Goals

There are many types of performance measurements (see Appendix A). The goals of this project are concentrated on the parallel I/O performance in an MPI environment. They focus on measuring the programming interfaces of Raw I/O interface, MPI-IO interface and HDF5 library calls. Initial Data models would be the Synthetic model and then the Application model.

3 Requirements

I/O speeds vs parameters of

Types of IO interface (Raw such as POSIX, MPI-IO, PHDF5)
Number of processes
Dataset sizes
Number of datasets per file
Data file total sizes
Number of data files
Transfer buffer sizes

Output in formats of

Text for human reading
Binary for plotting such as Gnu plot

User interface

Command line option to control the various parameters.
Environment variables to control the various parameters.

4 Algorithm Design

4.1 Current IOR Algorithm Design

IOR algorithm

4.2 NCSA's Performance Design Overview

This is a broad overview of the desired algorithmic features. We will implement the algorithm in stages: starting at the simplest design and adding more and more features. The basic features are:

Output in ASCII (leave hooks for binary format).
Use raw I/O. Don't assume there are read() and write() functions
Calculate Base Times

HDF5 Overhead

File Open/Close (HDF5_FILE_OPENCLOSE)

Average time to open and close a file by performing n opens and closes on m files
Dataset Creation (HDF5_DATASET_CREATE)

Average time to create a dataset by performing n dataset creations on a file

File I/O

Calculate read/write times using (in order of implementation)

Fixed Dimensions (HDF5_WRITE/READ_FIXED_DIMS)
Chunked/Fixed Dimensions (HDF5_WRITE/READ_CHUNKED_FIXED_DIMS)
Chunked/Unlimited Dimensions (HDF5_WRITE/READ_CHUNKED_UNLIM_DIMS)

Data Conversions
Hyperslab Performance; Partial I/O

Variance (Print statistics after each loop iteration)

Number of Processors
Data Size
I/O Buffer Size

4.3 NCSA's Algorithm Design

4.3.1 pio_perf.c

main():
    call MPI_Init()
    call MPI_Comm_size()

    opts = call parse_command_line()
    output = call fopen(opts.output_file)

    call run_test_loop(output, opts)

    call MPI_Finalize()

run_test_loop(output, opts):
    for (num_procs = opts.min_num_procs;
              num_procs <= opts.max_num_procs; num_procs = num_procs * 2) do

        call create_comm_world(num_procs)
        call output("Number of processors = " + num_procs)

        for (buf_size = opts.min_xfer_size;
                  buf_size <= opts.max_xfer_size; buf_size = buf_size * 2) do

            num_elmts = opts.file_size / (num_dsets * sizeof(int))

            call output("Transfer Buffer Size: " + buf_size)
            call output("  # of files: " + num_files + ", # of dsets " +
                        num_dsets + ", # of elmts per dset: " + num_elmts)

            if (run_raw_test)
                call run_test(output, RAW, opts)

            if (run_mpi_test)
                call run_test(output, MPIO, opts)

            if (run_phdf5_test)
                call run_test(output, PHDF5, opts)

            call destroy_comm_world()
        endfor
    endfor

run_test(output, type, parms):
    raw_size = opts.num_dsets * opts.num_elmts * sizeof(int)
    call output("Type of IO = ")

    switch (type) do
      case RAW:
        call output("RAW")
        break
      case MPIO:
        call output("MPIO")
        break
      case PHDF5:
        call output("PHDF5")
        break
    endswitch

    call MPI_Comm_size()

    initialize write and read Max/Min tables

    for (i = 0; i < parms.num_iters; ++i) do
        call MPI_Barrier()
        call do_pio(parms)

        collect Max/Min time for writes
        collect Max/Min time for reads
    endfor

    total_mm = accumulate_minmax_stuff(write_mm_table, parms.num_iters)

    call output("Write (" + parms.num_iters + " iterations)")
    call output("Minimum Time: " + total_mm.min + " (" + MB_PER_SEC(raw_size, total_mm.min) + "MB/s)")
    call output("Maximum Time: " + total_mm.max + " (" + MB_PER_SEC(raw_size, total_mm.max) + "MB/s)")
    call output("Average Time: " + total_mm.avg + " (" + MB_PER_SEC(raw_size, total_mm.max) + "MB/s)")

    total_mm = accumulate_minmax_stuff(read_mm_table, parms.num_iters)

    call output("Read (" + parms.num_iters + " iterations)")
    call output("Minimum Time: " + total_mm.min + " (" + MB_PER_SEC(raw_size, total_mm.min) + "MB/s)")
    call output("Maximum Time: " + total_mm.max + " (" + MB_PER_SEC(raw_size, total_mm.max) + "MB/s)")
    call output("Average Time: " + total_mm.avg + " (" + MB_PER_SEC(raw_size, total_mm.max) + "MB/s)")

4.3.2 Usage

usage: pio_perf [OPTIONS]
  OPTIONS
     -h, --help                  Print a usage message and exit
     -d N, --num-dsets=N         Number of datasets per file [default:1]
     -f S, --file-size=S         Size of a single file [default: 64M]
     -F N, --num-files=N         Number of files [default: 1]
     -H, --hdf5                  Run HDF5 performance test
     -i, --num-iterations        Number of iterations to perform [default: 1]
     -m, --mpiio                 Run MPI/IO performance test
     -o F, --output=F            Output raw data into file F [default: none]
     -P N, --max-num-processes=N Maximum number of processes to use [default: 1]
     -p N, --min-num-processes=N Minimum number of processes to use [default: 1]
     -r, --raw                   Run raw (UNIX) performance test
     -X S, --max-xfer-size=S     Maximum transfer buffer size [default: 1M]
     -x S, --min-xfer-size=S     Minimum transfer buffer size [default: 1K]

  F - is a filename.
  N - is an integer >=0.
  S - is a size specifier, an integer >=0 followed by a size indicator:

          K - Kilobyte
          M - Megabyte
          G - Gigabyte

      Example: 37M = 37 Megabytes

5 Implementation Steps

Determine the algorithm used by the IOR programs (Done)
Replicate some of the basic functions of the IOR programs
Verify that our program behaves in the "expected" way. I.e., that it measures performance in the same way that the IOR programs do
Continue adding more and more of the features of the IOR programs into our version verifying the accuracy at each step

6 Conclusion

Appendix A: Types of Performance Tests

Our goal is to have in place an automated way of doing performance tests for as many platforms as we support. The kinds of tests are grouped into three categories.

Tests by Processing
Tests by Programming Model
Tests by Programming Interface

A.1 Tests by Processing

I/O performance: read/write speeds, file open/close speeds, etc.
Data conversion speed: conversions of endianess, floating point representations, etc.

A.2 Tests by Programming Model

Sequential
MPI parallel

A.3 Tests by Programming Interface

Raw Interface - using basic C programming I/O calls such as fread/fwrite.
Special I/O library Interface - using I/O library calls built on the Raw Interface (e.g., MPI-IO).
Data Management Interface - using data management library calls (e.g., HDF, netCDF)
Application Model Interface - using application library built on the Data Management Interface.

A.4 Tests by data model

Synthetic Model - Arbitrary datasets with arbitrary data are processed. This can also be named as the Computer Science model.
Application Model - Datasets are defined according to the general data model of a class of applications (e.g., AMR).
Specific application - Real application programs are measured. E.g., the FLASH application of Univ. of Chicago.

Albert Cheng & Bill Wendling
National Center for Supercomputing Applications
Send comments to
hdfparallel@ncsa.uiuc.edu