Use Cases for the HDF5 Standalone Tests

(Draft)

Elena Pourmal

March 12, 2005

 

This document describes the motivation for creating a standalone tests suit, discusses the requirements for the test suit and gives examples of use cases. Design of the test suit and implementation details are subjects for another document.

 

Motivation

 

The HDF5 C, Fortran , C++ and High-Level (HL) libraries come with the comprehensive test suites. Test suites are part of the HDF5 source distribution. Most of the tests are configured, built and run when C, Fortran, C++, and HL HDF5 libraries are built. After HDF5 libraries are installed, several examples that are also part of the HDF5 source distribution can be run to verify the correctness of the installation.

 

HDF5 source contains a few HDF5 C performance programs such as h5perf, perf_meta and benchpar parallel benchmarks, and chunk and iopipe sequential benchmarks in the perform directory of the top HDF5 directory. Those performance tests do not run automatically when HDF5 sequential and parallel libraries are tested. Also, there is not an easy way to store the reported benchmarks results and use them to compare their performances on the different platforms or to compare performances for the different versions of the HDF5 library.     

 

Only the HDF5 C, C++ and Fortran examples are available with the installed HDF5 libraries. Performance programs are not installed, and users of the libraries usually do not have access to them. HDF5 examples are too trivial to verify the correctness of the libraries. Also they are not designed to be used as benchmarks.  [mf1] 

 

While the provided HDF5 tests suites serve well during the HDF5 libraries development process, they cannot be used for verification, integration and acceptance processes that are part of the software installation in many computational laboratories and research centers such as NASA DAACs, Government Labs, and industrial companies. The HDF group provides pre-built HDF5 libraries and tools. In most of the institutions where HDF5 is used in “production mode,” system administrators and support staff have to rebuild the libraries and rerun the HDF5 tests suites that come with the source to assure the correctness of the libraries instead of using pre-built binaries (even those binaries are often created on the same system).

 

Existing HDF5 benchmarks focus only on comparing I/O performance between different layers such as system I/O, MPI I/O and HDF5 I/O layers. They cannot be utilized by the HDF5 library developers and the HDF5 users to analyze performances of sequential and parallel implementations of the same application or to compare the performances of applications that use the same HDF5 objects but with different creation and/or access properties.

 

Neither HDF5 tests suites nor HDF5 benchmarks can be easily used to check performance improvements or degradation between the versions of the HDF5 libraries. [mf2] They also cannot be used to check backward/forward compatibility.[mf3]  Often, the tests and benchmarks are too difficult to understand and therefore they cannot serve as examples or templates for user’s application programs.[mf4] 

 

To summarize: current HDF5 tests suites and benchmarks do not satisfy the requirements which have emerged over the last few years. Those requirements are summarized in the next section.

 

Requirements for the new test suit

 

We need to provide[mf5] 

1.      Sequential and parallel test suites to perform

a.       HDF5 performance regression testing

b.      HDF5 backward/forward compatibility regression testing

2.      Sequential and parallel benchmarks to compare “HDF5 library to itself”, i.e. to compare performances of applications that use the same HDF5 objects but with different properties

3.      Sequential and parallel test suites to verify the installation of the pre-built libraries and tools

4.      Sequential and parallel test suites to verify correctness of the major HDF5 features implemented in the pre-build libraries and tools

5.      Sequential and parallel test suites that can serve as “real” examples or applications templates for the HDF5 users[mf6] 

6.      What else?[mf7] 

 

Unit*

System

Regression***

A. Correctness

Does this module (routine or set of related routines) do what it should.

When combined, do the modules work together correctly. 

Did changes in the code make the correctness of existing code worse?

B. Performance

How fast** does this module run under varying conditions?

How fast do library-level operations perform?

Did changes in the code make performance degrade?

C. Object properties testing. (#2) (There are both correctness and performance aspects to this? Or perhaps this just gets folded into correctness and performance – “properties” being just one example of things we test for.)

Correctness: At some level TBD, do I get the expected results for a given operation?
Performance: Do properties changes result in the expected changes in performance?  What is the amount of these changes.

 

Correctness: When properties are changed, are other library operations affected?

Performance: Do property changes affect overall performance?  How and how much?

Did changes make correctness of existing code worse?

Did changes degrade performance of existing code?

D. Testing of prebuilt libraries

Not applicable?

Is this the same as A/System.

Not applicable?

E. Examples (correctness and performance). (I would consider eliminating this from the testing stuff.)

Not applicable?

Do examples return correct results, and how do they perform?

When examples are changed, do they return same results they did before?  How does their performance compare to before?

*Granularity TBD.  I assume you are not including this in the plan.

** May also include space considerations, but this seems less of a concern, except for compression.

*** Applies to both modular and system testing.  Probably should be a different dimension.

 

Test suites, benchmarks and examples should also satisfy the following “implementation requirements”:

 

·        Come in a form of standalone test suites/benchmarks/examples (i.e. all programs have to be built against the pre-installed HDF5 libraries)

·        Can be easily built as an individual program, or the whole set of programs on all HDF5 supported platforms and with all supported compilers

·        Be easy to understand by HDF5 support staff and HDF5 users, and be well documented

·        Can be easily modified when new feature is requested

·        Implemented in C ( F90 and C++ later) (what about Java?)

·        Provide a standard way to measure performance

·        Timing results from the benchmarks need to be saved and used in performance regression testing to discover changes in the performance[mf8] 

·        All programs should be managed with CVS (Note: Quincey created subversion repository http://sleipnir.ncsa.uiuc.edu/svn/repos/hdf5test to manage the test suit and evaluate subversion software)

·        What else?

·        There should be a “test harness” to make it easy to add, change and remove tests in a systematic way.  A test harness is “A system of test drivers and other tools to support test execution (e.g., stubs, executable test cases, and test drivers).”

 

 

 

Requirements 1 – 6 from the first list fall in the following three categories:

·        Performance (1 and 2)

·        Library maintenance and support (3 and 4)

·        User support (1, 2 and 5) (?)

 

It is not easy [mf9] to address the requirements in one test suit since three different goals are pursued.  Also performance testing (1 and 2) is the most urgent and critical task for the successful HDF5 library development and support efforts. It has to be implemented first.

With careful design and deployment of the performance regression tests that satisfy “implementation requirements,” it will be not too difficult to address requirements 3 -5.

  [mf10] 

 

Use cases

 

This section discusses the use cases for standalone test suites, i.e. it focuses on how the test suit will be used; there is no intention to discuss what exactly will be tested.

 

1.      Sequential HDF5 performance regression testing

a.       We would like to check on a regular basis (say once a week or with each successful snapshot) that HDF5 performance does not degrade during the development process, i.e. bug fixes, new features added to the library, etc. do not affect common raw data writing/reading operations, for example

                                                   i.      writing/reading contiguous dataset(s) of

1.      atomic types

2.      compound types (by struct or by member of the struct)

3.      variable-length types

4.      etc. types TBD

                                                 ii.      writing/reading contiguous dataset(s) by hyperslabs

1.      using regular hyperslabs

2.      using hyperslabs that are results of the set operations on the hyperslabs

3.      etc. TBD

                                                iii.      writing/reading chunked dataset(s)

1.      etc. TBD

                                               iv.      writing/reading compressed dataset(s)

1.      GZIP compression

2.      SZIP compression

3.      Combinations with other filters

4.      etc. TBD

                                                 v.      TBD

b.      We would like to check on a regular basis (say once a week or with each successful snapshot) that HDF5 performance does not degrade during the development process, i.e. bug fixes, new features added to the library, etc. do not affect common metadata writing/reading operations, for example

                                                   i.      Creation/access/deletion of multiple objects in an HDF5 files (say 10^6)

                                                 ii.      Creation/access/deletion of attributes (a lot of variations here)

                                                iii.      Creation/access of flat files (all objects in one group)

                                               iv.      Creation/access of nested files (deep tree structure)

                                                 v.      Creation/access of nested files (shallow tree structure)

c.       We would like to check on a regular basis (say once a week or with each successful snapshot) that HDF5 performance does not degrade during the development process, i.e. bug fixes, new features added to the library, etc. do not affect a and b using different file drivers (especially family and split driver)

d.      TBD

2.      Parallel HDF5 performance regression testing

a.       We would like to check on a regular basis (say once a week or with each successful snapshot) that HDF5 parallel performance does not degrade during the development process, i.e. bug fixes, new features added to the library, etc. do not affect common operation described in 1 (a and b) implemented (when it is applicable) using parallel HDF5 library, for example

                                                   i.      Writing/reading a contiguous dataset in parallel

1.      Using collective I/O

2.      Using independent I/O

                                                 ii.      Writing/reading chunked dataset in parallel

1.      Using collective I/O

2.      Using independent I/O

                                                iii.      Writing/reading contiguous and chunked datasets by “simple” hyperslabs (aligned with a chunk or contiguous set of chunks)

                                               iv.      Writing/reading contiguous and chunked datasets by “complex” hyperslabs (hyperslab is an a result from set operations, or not aligned nicely with the contiguous chunks)

                                                 v.      Creation/access/deletion of multiple objects in the file

 

3.      Sequential and parallel HDF5 regression backward/forward compatibility

a.       We would like to check on a regular basis (say once a week or with each successful snapshot) that HDF5 sequential library is backward/forward compatible (TBD), for example

                                                   i.      Create a file with a chunked dataset with multiple attributes using HDF5 sequential library version 1.6.3 and read it with the HDF5 sequential library version 1.4.5-post9

                                                 ii.      Create a file with multiple objects using HDF5 sequential library version 1.6.3 and read it with the HDF5 sequential library version 1.7.26-snap1

 

b.      We would like to check on a regular basis (say once a week or with each successful snapshot) that HDF5 parallel library is backward/forward compatible (TBD), for example

                                                   i.      Similar to 3.a (i, ii) but implemented in parallel

c.       We would like to check on a regular basis (say once a week or with each successful snapshot) that HDF5 sequential/parallel libraries are backward/forward compatible (TBD), for example

                                                   i.      Create a file with a chunked dataset with multiple attributes using HDF5 sequential library version 1.6.3 and read it with the HDF5 parallel library version 1.4.5-post9

 

4.      Sequential benchmarks to compare “HDF5 library to itself”, i.e. to compare performances of applications that use the same HDF5 objects but with the different properties

a.       We would like to answer user’s question “How much performance and file space overhead I will get if I use chunking storage vs. contiguous storage?”

b.      We would like to answer user’s question “Is it better to create and access one dataset of compound datatype with N fields, or N datasets with atomic filed?”

c.       We would like to answer user’s question “Is it better to use variable-length datatypes or fixed length of an appropriate length plus compression?”

 

Note: It is clear that benchmarks needed in 4 can be successfully used in 1

 

5.      Parallel benchmarks to compare “HDF5 library to itself”, i.e. to compare performances of applications that use the same HDF5 objects but with the different properties

a.       We would like to answer user’s question “How much performance and file space overhead I will get if I use chunking storage vs. contiguous storage in my parallel application?”

b.      We would like to answer user’s question “Is it better to create and access one dataset of compound datatype with N fields, or N datasets with atomic filed in my parallel application?”

c.       HDF Help desk receives email from a user. User complains about poor performance on his system when his application uses extendible datasets. We point the user to the benchmark in our standalone test suit that does exactly what he is complaining about (or we can slightly modify it according to the request). User then runs our benchmark and compares the timing result with the timing result from his application to confirm his theory.

 

Note: It is clear that benchmarks needed in 5 can be successfully used in 2

 

6.      Benchmarks to compare HDF5 parallel library with HDF5 sequential library

a.       We would like to make sure  that parallel HDF5 is scalable on Altix SGI system, for example

                                                   i.      Writing/reading contiguous dataset in parallel is faster (or not) than writing contiguous is sequential mode

                                                 ii.      Writing/reading multiple datasets in parallel is faster (or not) than writing multiple datasets in sequential mode

b.      We would like sure that files created by parallel library can be accessed/modified with sequential library and vice versa.

 

7.      Sequential and parallel test suites to verify the installation of the pre-build libraries and tools

a.       We are asked by NERSC stuff to help them to install HDF5-1.6.4 on their systems. We take our binaries, ask stuff to install them, and then we or they run a standalone test suit to verify that NERSC system has “correct or compatible, or does it?” versions of compression libraries, MPI I/O libraries, that permissions are set up correctly, etc.

b.      User tries to build HDF5-1.6.4 on Solaris 2.9; we have only Solaris 2.8 binaries. We give him Solaris 2.8 binaries for installation along with the standalone test suit that user runs to verify he got working binaries

 

8.      Sequential and parallel test suites to verify correctness of the major HDF5 features implemented in the pre-built libraries and tools

a.       HDF Help desk receives a question “My program uses chunking, compression and Memory Core driver, it seems like that Memory Core driver is broken in the new release.”   We will point a user to the Memory Core driver test program in our standalone test suit that he can run and report the result. User can also easily modify the program to make it as close to his application as possible and send to the HDF Help desk to report the problem.

b.      LLNL installed new binaries for HDF5 libraries. They would like to know if the bug the saw with HDF5 collective write call was gone. They run several modified programs from the standalone test suit to verify the fix.

c.       User is evaluating HDF5 library and he wants to be sure that the HDF5 library does what he needs to be done. We point him to the several programs from the standalone test suit that he can use as templates for his application.  

9.      Sequential and parallel test suites that can serve as “real” examples or applications templates for the HDF5 users

a.       HDF Help desk receives a question: “I am new to HDF5 and has been studying your Tutorial. Tutorial example with compression works on my system. I used your example as a template for my application but I also need hypeslabs and compound datatypes.  What do I need to use?”

b.      User doesn’t use HDF5 at all, but someone gave him an HDF5 file with data he needs to read in his Fortran application. He asks us to help him to get the data from the file. We can point him to the simple HDF5 reading program and tell him how to integrate it with his application.

 

 

 

 

 

 


 [mf1]Once we have a test suites for correctness, backward/forward compatibility, and performance, I would not use examples for testing. Maybe that is what you are suggesting here.

 [mf2]Why not?

 [mf3]But we do have compatibility tests, don’t we?  They may be inadequate, but don’t we have them?

 [mf4]In an ideal world, I don’t think this is a good idea.  Performance benchmarks could be used as examples for writing benchmarks, but not necessarily for writing applications.  Same for correctness tests.

 [mf5]Seems like there are several categories and each has an absolute component and a regression component.  (1a) testing for functional correctness – do the functions do what we think they do, (1b) regression testing --  did changes in the code regress (make worse) the correctness of existing code, (2a) performance tests – what is the performance of a particular operation, (2b) performance regression – did changes in the code regress the performance of existing code. Not sure how this applies to the build process.  There’s also modular vs. system testing. (See table.)

 [mf6]I would not include these among the test suites.  This is documentation.

 [mf7]These are separate projects, right?  They might use some of the same techniques, even the same framework, but they’d be separate projects.

 [mf8]May be better to re-run older versions, rather than keep old results, because other system variables can change.  (E.g. test platform gets a faster CPU or disk, is less busy, etc.)

 [mf9]Agreed.  I think we should separate them.  Just do the performance tests at this point.  (Or the functionality tests.)  Develop a standard template (format, coding, documentation, configuration) that other types of test can adhere to, but just to one type at this point.

 [mf10]This is as far as I got.  I skimmed the rest of the doc, but didn’t read it in detail.  BTW, I came across this interesting web site in the process of reading the doc: http://www.geocities.com/xtremetesting/.