RFC:  Jam: A Tool To Fiddle with HDF5 User Blocks

Robert E. McGrath
July 7, 2004

Overview

HDF5 files may have arbitrary user data at the front of the file.  The user block is ignored by the HDF5 library, the HDF5 file is logically identical whether it has a user block or not.

The size of the user block can be set as a file creation property, which can be read to discover if a user block is present. The HDF5 library has no other functions on the user block.

At this time, there are no tools to put or get a user block for an exiting file. In the future, we might impolement features to read or add suer blocsk to tools such as h5repack and h5dump.

This RFC presents a proposal for two simple new utilities to add and remove the user block from an existing HDF5 file. These tools can be used with any HDF5 file.

The goal of these utilities is to provide an easy way to to add and remove the user block from an existing HDF5 file. These tools may or may not work well for some files, e.g., HDF5 files not stored in a single file.

Principle of Operation

An HDF5 file with a user block has three parts:

  1. the user block data (can be anything, text or binary)
  2. pad of space up to 512, 1024, etc.
  3. the HDF5 file, beginning with the HDF5 header.
Given an exsiting HDF5 file, it is necessary to create a new file (or rewrite the existing file) with the user block, optional padding, followed byd the exact image of the HDF5 file.

Removing a user block reverses this operation: The HDF5 file is rewritten, to start at byte 0.





Tool Name: jam
                      unjam
Syntax:
jam -u user_block -i in_file.h5 [-o out_file.h5] [-clobber]
jam -h
unjam -i in_file.h5 [-u user_block | --delete] [-o out_file.h5]
unjam -h

Purpose:
Add user block to front of an HDF5 file, to create a new concatenated file. 
Split user block and HDF5 file into two files, user block data and HDF5 data.

Description:
jam  concatenates a user_block file and and HDF5 file to create an HDF5 file with a user block. The user block can be any test (binary or text).  The output file is padded so that the HDF5 header begins on byte 512, 1024, etc..  (See the HDF5 File Format.)
If out_file.h5 is given, a new file is created with the user_block followed by the contents of in_file.h5. In this case, infile.h5 is unchanged.

If out_file.h5 is not specified, the user_block is added to in_file.h5.  

If in_file.h5 already has a user block, the contents of user_block will be added to the end of the existing user block, and hte file shifted to the next boundary. If -clobber is set, any existing user block will be overwritten.

unjam splits an HDF5 file, writing the user block to a file or stdout and the HDF5 file to an HDF5 file with a header at byte 0 (i.e., with no user block).

If out_file.h5 is given, a new file is created with the in_file.h5 without the user block. In this case, infile.h5 is unchanged.

If out_file.h5 is not specified, the user_block is removed and in_file.h5 is rewritten, starting at byte 0.

If user_block is set,the user block will be written to user_block.  If user_block is not set, the user block (if any) will be written to stdout. If -delete is selected, the user block will not be not written.
Example Usage
Create new file, newfile.h5,  with the text in file mytext.txt as the user block for the HDF5 file file.h5.
jam -u mytext.txt -i file.h5 -o newfile.h5
Add text in file mytext.txt to front of HDF5 dataset, file.h5.
jam -u mytext.txt -i file.h5
Overwrite the user block (if any) in file.h5 with the contents of mytext.txt.
jam -u mytext.txt -i file.h5 --clobber
For an HDF5 file, with_ub.h5,  with a user block, extract the user block to user_block.txt and the HDF5 file to wo_ub.h5.
unjam -i with_ub.h5 -u user_block.txt -i wo_ub.h5

Return Value
jam returns the size of the output file, or -1 if an error occurs.

unjam returns the size of the output file, or -1 if an error occurs.
Caveats

This tool copies all the data (sequentially) in the file(s) to new offsets.  For a large file, this copy will take a long time.

The most efficient way to create a user block is to create the file with a user block (see H5Pset_user_block...<<link>>), and write the user block data into that space from a program.

The user block is completely opaque to the HDF5 library and to the jam and unjam tools.  The user block is simply read or written as a string of bytes, which could be text or any kind of binary data.  It is up to the user to know what the contents of the user block means and how to process it.
When the user block is extracted, all the data is written to the output, including any padding or unwritten data.

This tool moves the HDF5 file through byte copies, i.e., it does not read or interpret the HDF5 objects.
See Also: 
HDF5 Format at <<link to hdf5 spec.>>