THG Home Page hrepack


hrepack is a command line tool that performs a logical copy of an input HDF4 file to an output HDF4 file, copying all the high level objects, optionally rewriting the objects with compression/uncompression and/or chunking. Only datasets and images support compression in HDF4. Datasets and images are generally referred as objects in this document. See also HDF compression and chunking

Usage

The usage and options are

hrepack -i input -o output [-h] [-v] [-t "comp_info"] [-c "chunk_info"][-m number]
  -i input          Input HDF File
  -o output         Output HDF File
  [-h]              Print usage message
  [-t "comp_info"]  Compression type: "comp_info" is a string with the format
                       "<list of objects> : <type of compression> <compression parameters>"
                       <list of objects> is a comma separated list of object names, 
                          meaning apply compression only to those objects. "*" means all objects
                       <type of compression> can be:
                         RLE, for RLE compression
                         HUFF, for Huffman
                         GZIP, for gzip
                         JPEG, for JPEG
                         SZIP, for szip
                         NONE, to uncompress
                       <compression parameters> is optional compression info 
                         RLE, no parameter
                         HUFF, the skip-size
                         GZIP, the deflation level
                         JPEG, the quality factor
                         SZIP, pixels per block
  [-c "chunk_info"] Apply chunking. "chunk_info" is a string with the format
                       "<list of objects> : <chunk information>"
                       <list of objects> is a comma separated list of object names, 
                          meaning apply chunking only to those objects. "*" means all objects
                       <chunk information> is the chunk size of each dimension:
                         <dim_1 x dim_2 x ... dim_n> or
                         NONE, for to unchunck
  -f comp_file      File with compression info in it (instead of the two above options)
  -m number         Do not compress objects wich size in bytes is smaller than number. 
                    If no size is specified a minimum of 1024 bytes is assumed.

Compression

The compression options vary slightly when the object is an SDS or a GR. For SDSs the compression types are 

RLE
Skipping Huffman
Gzip (LZW)
Szip

Some compression types have additional compression information. It is not needed for RLE. For Skipping Huffman, it is the skip factor. For GZIP it is the compression factor, 1 to 9, with 9 being maximum compression. Unless space is the only consideration, a factor of 1 is recommended, as it goes much faster with only a few percent increase in size. For SZIP it is the pixels per block parameter (and even number between 2 and 32). 

For GR images, an additional compression type is available

JPEG, for JPEG compression.

Alternatively, a compression control file can be specified. The compression info file has the same format as the command line options -t and -c.

[-t "comp_info"]
[-c "chunk_info"]

Input examples

The following table presents some examples 

$hrepack -i file1.hdf -o file2.hdf -t "*:RLE"
compresses all objects in the file file1.hdf, using RLE compression
$hrepack -i file1.hdf -o file2.hdf -v -t "*:RLE"

the same as above, but with verbose mode. the -v option prints out the hierarchy of the input file while compressing it to the output file. The output of this command is 
Compressing </group1/dset1> with <compression type>, <chunking size>... 

$hrepack -i file1.hdf -o file2.hdf -t "*:GZIP 6"
compress all objects in the file file1.hdf, using gzip compression with deflation level 6
$hrepack -i file1.hdf -o file2.hdf -c "*:10x10"
applies chunking (and no compression) to all objects using a chunk size of 10 for the 2 dimensions.
$hrepack -i file1.hdf -o file2.hdf 
-t "A,B,C:HUFF 1" -t "D,E:RLE"
-c "D,E:10x10"
applies Skipping Huffman compression with skip factor of 1, for objects A, B and C. 
applies RLE compression for objects D and E.
applies chunking to objects D and E using a chunk size of 10 for the 2 dimensions.
$hrepack -i file1.hdf -o file2.hdf -t "A:NONE"
applies no compression to object A. this can be used to uncompress the object, if it is compressed.

Reading the HDF file

When reading the input HDF file, a main loop is used to locate all the objects in the file. This loop preserves the hierarchy of the input file on the output file. The algorithm used is

  1. Obtain the number of lone VGroups in the HDF file.
  2. Do a loop for each one of these groups. In each iteration a table is updated with the tag/reference pair of an object.
    1. Obtain the pairs of tag/references for the group
    2. Switch between the tag of the current object. Four cases are possible:
      1. Object is a group: recursively repeat the process (obtain the pairs of tag/references for this group and do another tag switch). Add the object to the table.
      2. Object is a dataset: Add the object to the table. Process object (make an identical object in the output file, using the absolute path, and using the compression/chunking options).
      3. Object is an image: Add the object to the table. Process object.
      4. Object is a vdata: Add the object to the table. Save the object in the output file, creating the corresponding group(s).
  3. Read all the HDF interfaces (SDS, GR and VS), checking for objects that are already in the table (meaning they belong to a previous inspected group, and should not be processed).  If they are not in the table, then process and save the object. These objects belong to a root group.
  4. Read all global attributes and annotations and save them in the output file.

Assumptions

These assumptions are made when the program is executed 

Output

The hrepack output is divided in several steps. The following input

-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "dset4:HUFF 1" -c "dset4:2x2"

produces this output

Objects to chunk are...
        dset4 2 2
Objects to compress are...
        dset4            HUFF compression, parameter 1
Building list of objects in hrepacktst.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
                  dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp
Searching for objects to modify...
                  dset4  ...Found
Making new file hrepacktst_out.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
  chunk   HUFF    dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp

Step1

First a check is made if the input is valid. If so, it is printed

Objects to chunk are...
        dset4 2 2
Objects to compress are...
        dset4            HUFF compression, parameter 1

Step2 

Then all the objects in the file are located, as described in "Reading the HDF file". The output is

Building list of objects in hrepacktst.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
                  dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp

Step3

Then, the objects given in the input are matched with the list of objects in the file. If any of the input objects are not present in the file, an error message is given. Otherwise, the output is

Searching for objects to modify...
                  dset4  ...Found

Step4

Finally, a second traversal of the file is made and the output file is generated with the input options. The previous condition of the objects in the file must be preserved

Making new file hrepacktst_out.hdf...
                  g1
                  g1/g2
                  g1/g2/g3
                  images
  chunk   HUFF    dset4
  chunk           dset_chunk
  chunk   GZIP    dset_chunk_comp
          GZIP    dset_comp

 

Error messages

Each of the steps 1 to 4 outlined in Output has error checking functionalities. 

Step 1 Error Messages

The error messages of this step have to do with invalid input, for example, a non existing compression type, an invalid compression parameter or invalid chunking information. The following table resumes the Step 1 Error Messages regarding invalid input.

Reason of error Example input (other parameters omitted) Example output
Invalid compression input. The compression type was not defined. -t "dset4" Input Error: Invalid compression input in <dset4>
Invalid compression type. -t "dset4:MYCOMP 6" Input Error: Invalid compression type in <dset4:MYCOMP 6> 
Invalid compression parameter. This case refers for a non digit compression input. -t "dset4:GZIP aa" Input Error: Compression parameter not digit in <dset4:GZIP aa>
Invalid compression parameter. This case refers for a numeric compression input, but not valid for the context of the compression type (GZIP has values between 0 and 9) -t "dset4:GZIP 10" Input Error: Invalid compression parameter in <dset4:GZIP 10>
Missing compression parameter. This case refers for a  non existing compression parameter. -t "dset4:HUFF" Input Error: Missing compression parameter in <dset4:HUFF>
Extra compression parameter. This case refers for an extra compression parameter (RLE has no parameter). -t "dset4:RLE 8" Input Error: Extra compression parameter in RLE <dset4:RLE 8>
Invalid chunking input.  -c "dset4" Input Error: Invalid chunking input in <dset4>
Invalid chunking input. -c "dset4:AxB"  Input Error: Invalid chunking in <dset4:AxB>

After all the compression and chunking input information is gathered, an additional test is made for repeated names or mixing of wilcard '*' with other names.

Reason of error Example input (other parameters omitted) Example output
Repeated names. -t "dset4:RLE" 
-t "dset4:HUFF 1" 
Input Error: compression information already inserted for <dset4>
Repeated names. -c "dset4:10x10" 
-c "dset4:5x5" 
Input Error: chunk information already inserted for <dset4>
Mixing of wildcard with names. -c "*:10x10" 
-c "dset4:5x5" 
Error: Invalid chunking input: '*' is present with other objects <dset4:5x5>
Mixing of wildcard with names. -t "*:RLE" 
-t "dset4:RLE" 
Error: Invalid compression input: '*' is present with other objects <dset4:RLE>

Step 2 Error Messages

After the checking for valid input a first traversal of the file is made with the purpose of examining if the input names are present in the file. The main loop used to locate all the objects in the file is explained in the Reading the HDF file section.

Reason of error Example input  Example output
The object is not in the file -i hrepacktst.hdf -o hrepacktst_out.hdf -t "dset:RLE" Error: <dset> not found in file <hrepacktst.hdf>. Exiting...
Checks if the object given in the input is compressible. Only datasets and images are compressible. -i hrepacktst.hdf -o hrepacktst_out.hdf -t "vdata4:RLE" Error: <vdata4> not compressible/chunk object in file <hrepacktst.hdf>. Exiting...

 



Last modified: March 19, 2007
Describes hrepack