HDF Home

hrepack specification
version 1.0

hrepack is a command line tool that performs a logical copy of an input HDF4 file to an output HDF4 file, copying all the high level objects, optionally rewriting the objects with compression/uncompression and/or chunking. Only datasets and images support compression in HDF4. Datasets and images are generally referred as objects in this document. See also HDF compression and chunking.

Usage
Compression
Input examples
Reading the HDF file
Assumptions
Output
Options matrix
Use cases
Error messages
Testing

Usage

The usage and options are

hrepack -i input -o output [-h] [-v] [-t "comp_info"] [-c "chunk_info"][-m number]
  -i input          Input HDF File
  -o output         Output HDF File
  [-h]              Print usage message
  [-t "comp_info"]  Compression type: "comp_info" is a string with the format
                       "<list of objects> : <type of compression> <compression parameters>"
                       <list of objects> is a comma separated list of object names, 
                          meaning apply compression only to those objects. "*" means all objects
                       <type of compression> can be:
                         RLE, for RLE compression
                         HUFF, for Huffman
                         GZIP, for gzip
                         JPEG, for JPEG
                         SZIP, for szip
                         NONE, to uncompress
                       <compression parameters> is optional compression info 
                         RLE, no parameter
                         HUFF, the skip-size
                         GZIP, the deflation level
                         JPEG, the quality factor
                         SZIP, no parameter
  [-c "chunk_info"] Apply chunking. "chunk_info" is a string with the format
                       "<list of objects> : <chunk information>"
                       <list of objects> is a comma separated list of object names, 
                          meaning apply chunking only to those objects. "*" means all objects
                       <chunk information> is the chunk size of each dimension:
                         <dim_1 x dim_2 x ... dim_n> or
                         NONE, for to unchunck
  -f comp_file      File with compression info in it (instead of the two above options)
  -m number         Do not compress objects wich size in bytes is smaller than number. 
                    If no size is specified a minimum of 1024 bytes is assumed.

Compression

The compression options vary slightly when the object is an SDS or a GR. For SDSs the compression types are

RLE
Skipping Huffman
Gzip (LZW)
Szip

Some compression types have additional compression information. It is not needed for RLE. For Skipping Huffman, it is the skip factor. For gzip it is the compression factor, 1 to 9, with 9 being maximum compression. Unless space is the only consideration, a factor of 1 is recommended, as it goes much faster with only a few percent increase in size.

For GR images, an additional compression type is available

JPEG, for JPEG compression.

Alternatively, a compression control file can be specified. The compression info file has the same format as the command line options -t and -c.

[-t "comp_info"]
[-c "chunk_info"]

Input examples

The following table presents some examples

$hrepack -i file1.hdf -o file2.hdf -t "*:RLE"	compresses all objects in the file `file1.hdf`, using RLE compression
$hrepack -i file1.hdf -o file2.hdf -v -t "*:RLE"	the same as above, but with verbose mode. the -v option prints out the hierarchy of the input file while compressing it to the output file. The output of this command is `Compressing </group1/dset1> with <compression type>, <chunking size>...`
$hrepack -i file1.hdf -o file2.hdf -t "*:GZIP 6"	compress all objects in the file `file1.hdf`, using gzip compression with deflation level 6
$hrepack -i file1.hdf -o file2.hdf -c "*:10x10"	applies chunking (and no compression) to all objects using a chunk size of 10 for the 2 dimensions.
`$hrepack -i file1.hdf -o file2.hdf -t "A,B,C:HUFF 1" -t "D,E:RLE" -c "D,E:10x10"`	applies Skipping Huffman compression with skip factor of 1, for objects A, B and C. applies RLE compression for objects D and E. applies chunking to objects D and E using a chunk size of 10 for the 2 dimensions.
$hrepack -i file1.hdf -o file2.hdf -t "A:NONE"	applies no compression to object A. this can be used to uncompress the object, if it is compressed.

Reading the HDF file

When reading the input HDF file, a main loop is used to locate all the objects in the file. This loop preserves the hierarchy of the input file on the output file. The algorithm used is

Obtain the number of lone VGroups in the HDF file.

Do a loop for each one of these groups. In each iteration a table is updated with the tag/reference pair of an object.

Obtain the pairs of tag/references for the group

Switch between the tag of the current object. Four cases are possible:

Object is a group: recursively repeat the process (obtain the pairs of tag/references for this group and do another tag switch). Add the object to the table.

Object is a dataset: Add the object to the table. Process object (make an identical object in the output file, using the absolute path, and using the compression/chunking options).

Object is an image: Add the object to the table. Process object.

Object is a vdata: Add the object to the table. Save the object in the output file, creating the corresponding group(s).

Read all the HDF interfaces (SDS, GR and VS), checking for objects that are already in the table (meaning they belong to a previous inspected group, and should not be processed). If they are not in the table, then process and save the object. These objects belong to a root group.

Read all global attributes and annotations and save them in the output file.

Assumptions

These assumptions are made when the program is executed

If the objects are already compressed, process them with the new options.
Objects smaller than 1024 bytes (or a specified minimum) are not compressed.
If there is a conflict in the input options (same object name with different compression type, for example) then the program warns about a conflict, and exits.

Output

The hrepack output is divided in several steps. The following input

-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "dset4:HUFF 1" -c "dset4:2x2"

produces this output

Objects to chunk are...
        dset4 2 2
Objects to compress are...
        dset4            HUFF compression, parameter 1
Building list of objects in hrepacktst.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
                  dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp
Searching for objects to modify...
                  dset4  ...Found
Making new file hrepacktst_out.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
  chunk   HUFF    dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp

Step1

First a check is made if the input is valid. If so, it is printed

Objects to chunk are...
        dset4 2 2
Objects to compress are...
        dset4            HUFF compression, parameter 1

Step2

Then all the objects in the file are located, as described in "Reading the HDF file". The output is

Building list of objects in hrepacktst.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
                  dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp

Step3

Then, the objects given in the input are matched with the list of objects in the file. If any of the input objects are not present in the file, an error message is given. Otherwise, the output is

Searching for objects to modify...
                  dset4  ...Found

Step4

Finally, a second traversal of the file is made and the output file is generated with the input options. The previous condition of the objects in the file must be preserved

Making new file hrepacktst_out.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
  chunk   HUFF    dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp

Options matrix

The combination of the -t compress and -c chunking options gives the following matrix. There are 2 behavior states for compress and chunk: SELECTED and ALL. SELECTED is the case when some objects are specified. ALL is the case when the wildcard "*" is used.

	Compress
Chunking		SELECTED	ALL
	SELECTED	1	2
	ALL	3	4

For each one of the behavior states, there are several possible options:

Compress	ALL	NONE
		GZIP
		RLE
		HUFF
	SELECTED	NONE
		GZIP
		RLE
		HUFF
Chunk	ALL	NONE
	ALL	chunk
	SELECTED	NONE
		chunk

Use cases

Some example use cases are presented. The input file has the following objects

                  g1
                  g1/g2
                  g1/g2/g3
                  images
                  dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp

Case 1.a

This case compresses selected objects and chunks selected objects. The command line input and program output are

-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "dset4:HUFF 1" -c "dset4:2x2"

Objects to chunk are...
        dset4 2 2
Objects to compress are...
        dset4            HUFF compression, parameter 1
Building list of objects in hrepacktst.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
                  dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp
Searching for objects to modify...
                  dset4  ...Found
Making new file hrepacktst_out.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
  chunk   HUFF    dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp

Case 1.b

To uncompress and unchunk a previously compressed and chunked object we use

-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "dset_chunk_comp:NONE" -c "dset_chunk_comp:NONE"

Objects to chunk are...
        dset_chunk_comp NONE
Objects to compress are...
        dset_chunk_comp          NONE compression, parameter 0
Building list of objects in hrepacktst.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
                  dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp
Searching for objects to modify...
                  dset_chunk_comp...Found
Making new file hrepacktst_out.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
                  dset4
  chunk           dset_chunk
                  dset_chunk_comp
          GZIP    dset_comp

Case 2

This case compresses all objects and chunks selected objects. The command line input and program output are

-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "*:RLE" -c "dset4:2x2"

Objects to chunk are...
        dset4 2 2
Objects to compress are...
        Compress all
Building list of objects in hrepacktst.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
                  dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp
Searching for objects to modify...
                  dset4  ...Found
Making new file hrepacktst_out.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
  chunk   RLE     dset4
  chunk   RLE     dset_chunk
  chunk   RLE     dset_chunk_comp
          RLE     dset_comp

Case 3

This case compresses selected objects and chunks all objects. The command line input and program output are

-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "dset_comp:RLE" -c "*:2x2"

Objects to chunk are...
        Chunk all
Objects to compress are...
        dset_comp        RLE compression, parameter 0
Building list of objects in hrepacktst.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
                  dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp
Searching for objects to modify...
                  dset_comp...Found
Making new file hrepacktst_out.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
  chunk           dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
  chunk   RLE     dset_comp

Case 4.a

This case compresses all objects and chunks all objects. The command line input and program output are

-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "*:HUFF 1" -c "*:2x2"

Objects to chunk are...
        Chunk all
Objects to compress are...
        Compress all
Building list of objects in hrepacktst.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
                  dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp
Searching for objects to modify...
Making new file hrepacktst_out.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
  chunk   HUFF    dset4
  chunk   HUFF    dset_chunk
  chunk   HUFF    dset_chunk_comp
  chunk   HUFF    dset_comp

Case 4.b

To uncompress and unchunk all objects we use

-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "*:NONE" -c "*:NONE"

Objects to chunk are...
        Chunk all
Objects to compress are...
        Compress all
Building list of objects in hrepacktst.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
                  dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp
Searching for objects to modify...
Making new file hrepacktst_out.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
                  dset4
                  dset_chunk
                  dset_chunk_comp
                  dset_comp

Error messages

Each of the steps 1 to 4 outlined in Output has error checking functionalities.

Step 1 Error Messages

The error messages of this step have to do with invalid input, for example, a non existing compression type, an invalid compression parameter or invalid chunking information. The following table resumes the Step 1 Error Messages regarding invalid input.

Reason of error	Example input (other parameters omitted)	Example output
Invalid compression input. The compression type was not defined.	-t "dset4"	Input Error: Invalid compression input in <dset4>
Invalid compression type.	-t "dset4:MYCOMP 6"	Input Error: Invalid compression type in <dset4:MYCOMP 6>
Invalid compression parameter. This case refers for a non digit compression input.	-t "dset4:GZIP aa"	Input Error: Compression parameter not digit in <dset4:GZIP aa>
Invalid compression parameter. This case refers for a numeric compression input, but not valid for the context of the compression type (GZIP has values between 0 and 9)	-t "dset4:GZIP 10"	Input Error: Invalid compression parameter in <dset4:GZIP 10>
Missing compression parameter. This case refers for a non existing compression parameter.	-t "dset4:HUFF"	Input Error: Missing compression parameter in <dset4:HUFF>
Extra compression parameter. This case refers for an extra compression parameter (RLE has no parameter).	-t "dset4:RLE 8"	Input Error: Extra compression parameter in RLE <dset4:RLE 8>
Invalid chunking input.	-c "dset4"	Input Error: Invalid chunking input in <dset4>
Invalid chunking input.	-c "dset4:AxB"	Input Error: Invalid chunking in <dset4:AxB>

After all the compression and chunking input information is gathered, an additional test is made for repeated names or mixing of wilcard '*' with other names.

Reason of error	Example input (other parameters omitted)	Example output
Repeated names.	-t "dset4:RLE" -t "dset4:HUFF 1"	Input Error: compression information already inserted for <dset4>
Repeated names.	-c "dset4:10x10" -c "dset4:5x5"	Input Error: chunk information already inserted for <dset4>
Mixing of wildcard with names.	-c "*:10x10" -c "dset4:5x5"	Error: Invalid chunking input: '*' is present with other objects <dset4:5x5>
Mixing of wildcard with names.	-t "*:RLE" -t "dset4:RLE"	Error: Invalid compression input: '*' is present with other objects <dset4:RLE>

Step 2 Error Messages

After the checking for valid input a first traversal of the file is made with the purpose of examining if the input names are present in the file. The main loop used to locate all the objects in the file is explained in the Reading the HDF file section.

Reason of error	Example input	Example output
The object is not in the file	-i hrepacktst.hdf -o hrepacktst_out.hdf -t "dset:RLE"	Error: <dset> not found in file <hrepacktst.hdf>. Exiting...
Checks if the object given in the input is compressible. Only datasets and images are compressible.	-i hrepacktst.hdf -o hrepacktst_out.hdf -t "vdata4:RLE"	Error: <vdata4> not compressible/chunk object in file <hrepacktst.hdf>. Exiting...

Testing

Testing of hrepack is done with an application program that uses the hdiff function. An HDF test file is generated, then hrepack is called with several use cases, then hdiff is called to verify the difference between the input and output files. This only verifies that the data in both files is equal or not. To test the compress/chunking of the new file, API functions are used. The test of bad input from the command line is done manually.

Each test is of the form

hrepack_addcomp("dset4:GZIP 6",&options);
hrepack_addchunk("dset4:2x2",&options);
hrepack(FILENAME,FILENAME_OUT,&options);
ret=hdiff(FILENAME,FILENAME_OUT,fspec);

The hrepack_addcomp adds the compressed option and the hrepack_addchunk adds the chunk option. The return value of hdiff tells if the object data of both files contains differences. A sample output is

Testing copy all objects with no compression / no chunking             PASSED
Testing compress selected objects / no chunking                        PASSED
Testing no compress /  chunking selected                               PASSED

For debugging purposes the verbose output of both calls (hrepack and hdiff) can be turned on

#if defined (HREPACK_DEBUG)
	options.verbose=1; 
	fspec.verbose  =1;
#endif

and the output is

Testing compressing SELECTED, chunking SELECTED
-----------------------------------------------------
Objects to chunk are...
        dset4 2 2
Objects to compress are...
        dset4            HUFF compression, parameter 1
Building list of objects in hrepacktst.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp
Searching for objects to modify...
                  dset4  ...Found
Making new file hrepacktst_out.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
  chunk   HUFF    dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp

---------------------------------------
Tag Ref Name
---------------------------------------
1965 2 g1
1965 4 g1/g2
1965 5 g1/g2/g3
720 49 dset4
720 54 dset_chunk
720 47 dset_chunk_comp
720 78 dset_comp
---------------------------------------
Tag Ref Name
---------------------------------------
1965 2 g1
1965 4 g1/g2
1965 5 g1/g2/g3
720 47 dset4
720 74 dset_chunk
720 77 dset_chunk_comp
720 98 dset_comp
---------------------------------------
file1 file2
---------------------------------------
x x g1
x x g1/g2
x x g1/g2/g3
x x dset4
x x dset_chunk
x x dset_chunk_comp
x x dset_comp
Comparing <dset4>
Comparing <dset_chunk>
Comparing <dset_chunk_comp>
Comparing <dset_comp>

HDF Help Desk
Last modified: September 18, 2003
Describes hrepack

hrepack specification version 1.0