HDF Home |
The usage and options are
hrepack -i input -o output [-h] [-v] [-t "comp_info"] [-c "chunk_info"][-m number] -i input Input HDF File -o output Output HDF File [-h] Print usage message [-t "comp_info"] Compression type: "comp_info" is a string with the format "<list of objects> : <type of compression> <compression parameters>" <list of objects> is a comma separated list of object names, meaning apply compression only to those objects. "*" means all objects <type of compression> can be: RLE, for RLE compression HUFF, for Huffman GZIP, for gzip JPEG, for JPEG SZIP, for szip NONE, to uncompress <compression parameters> is optional compression info RLE, no parameter HUFF, the skip-size GZIP, the deflation level JPEG, the quality factor SZIP, no parameter [-c "chunk_info"] Apply chunking. "chunk_info" is a string with the format "<list of objects> : <chunk information>" <list of objects> is a comma separated list of object names, meaning apply chunking only to those objects. "*" means all objects <chunk information> is the chunk size of each dimension: <dim_1 x dim_2 x ... dim_n> or NONE, for to unchunck -f comp_file File with compression info in it (instead of the two above options) -m number Do not compress objects wich size in bytes is smaller than number. If no size is specified a minimum of 1024 bytes is assumed.
The compression options vary slightly when the object is an SDS or a GR. For SDSs the compression types are
RLE Skipping Huffman Gzip (LZW) Szip
Some compression types have additional compression information. It is not needed for RLE. For Skipping Huffman, it is the skip factor. For gzip it is the compression factor, 1 to 9, with 9 being maximum compression. Unless space is the only consideration, a factor of 1 is recommended, as it goes much faster with only a few percent increase in size.
For GR images, an additional compression type is available
JPEG, for JPEG compression.
Alternatively, a compression control file can be specified. The compression info file has the same format as the command line options -t and -c.
[-t "comp_info"] [-c "chunk_info"]
The following table presents some examples
$hrepack -i file1.hdf -o file2.hdf -t "*:RLE" |
compresses all objects in the file file1.hdf , using RLE
compression |
$hrepack -i file1.hdf -o file2.hdf -v -t "*:RLE" |
the same as above, but with verbose mode. the -v option prints out the hierarchy of the input file while compressing it
to the output file. The output of this command is |
$hrepack -i file1.hdf -o file2.hdf -t "*:GZIP 6" |
compress all objects in the file file1.hdf , using gzip
compression with deflation level 6 |
$hrepack -i file1.hdf -o file2.hdf -c "*:10x10" |
applies chunking (and no compression) to all objects using a chunk size of 10 for the 2 dimensions. |
$hrepack -i file1.hdf -o file2.hdf |
applies Skipping Huffman compression with skip factor of 1,
for objects A, B and C. applies RLE compression for objects D and E. applies chunking to objects D and E using a chunk size of 10 for the 2 dimensions. |
$hrepack -i file1.hdf -o file2.hdf -t "A:NONE" |
applies no compression to object A. this can be used to uncompress the object, if it is compressed. |
When reading the input HDF file, a main loop is used to locate all the objects in the file. This loop preserves the hierarchy of the input file on the output file. The algorithm used is
- Obtain the number of lone VGroups in the HDF file.
- Do a loop for each one of these groups. In each iteration a table is updated with the tag/reference pair of an object.
- Obtain the pairs of tag/references for the group
- Switch between the tag of the current object. Four cases are possible:
- Object is a group: recursively repeat the process (obtain the pairs of tag/references for this group and do another tag switch). Add the object to the table.
- Object is a dataset: Add the object to the table. Process object (make an identical object in the output file, using the absolute path, and using the compression/chunking options).
- Object is an image: Add the object to the table. Process object.
- Object is a vdata: Add the object to the table. Save the object in the output file, creating the corresponding group(s).
- Read all the HDF interfaces (SDS, GR and VS), checking for objects that are already in the table (meaning they belong to a previous inspected group, and should not be processed). If they are not in the table, then process and save the object. These objects belong to a root group.
- Read all global attributes and annotations and save them in the output file.
These assumptions are made when the program is executed
The hrepack output is divided in several steps. The following input
-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "dset4:HUFF 1" -c "dset4:2x2"
produces this output
Objects to chunk are... dset4 2 2 Objects to compress are... dset4 HUFF compression, parameter 1 Building list of objects in hrepacktst.hdf... g1 g1/g2 g1/g2/g3 images dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp Searching for objects to modify... dset4 ...Found Making new file hrepacktst_out.hdf... g1 g1/g2 g1/g2/g3 images chunk HUFF dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp
First a check is made if the input is valid. If so, it is printed
Objects to chunk are... dset4 2 2 Objects to compress are... dset4 HUFF compression, parameter 1
Then all the objects in the file are located, as described in "Reading the HDF file". The output is
Building list of objects in hrepacktst.hdf... g1 g1/g2 g1/g2/g3 images dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp
Then, the objects given in the input are matched with the list of objects in the file. If any of the input objects are not present in the file, an error message is given. Otherwise, the output is
Searching for objects to modify... dset4 ...Found
Finally, a second traversal of the file is made and the output file is generated with the input options. The previous condition of the objects in the file must be preserved
Making new file hrepacktst_out.hdf... g1 g1/g2 g1/g2/g3 images chunk HUFF dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp
The combination of the -t compress and -c chunking options gives the following matrix. There are 2 behavior states for compress and chunk: SELECTED and ALL. SELECTED is the case when some objects are specified. ALL is the case when the wildcard "*" is used.
Compress |
|||
Chunking | SELECTED | ALL | |
SELECTED | 1 | 2 | |
ALL | 3 | 4 |
For each one of the behavior states, there are several possible options:
Compress | ALL | NONE |
GZIP | ||
RLE | ||
HUFF | ||
SELECTED | NONE | |
GZIP | ||
RLE | ||
HUFF | ||
Chunk | ALL | NONE |
chunk | ||
SELECTED | NONE | |
chunk |
Some example use cases are presented. The input file has the following objects
g1 g1/g2 g1/g2/g3 images dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp
This case compresses selected objects and chunks selected objects. The command line input and program output are
-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "dset4:HUFF 1" -c "dset4:2x2"
Objects to chunk are... dset4 2 2 Objects to compress are... dset4 HUFF compression, parameter 1 Building list of objects in hrepacktst.hdf... g1 g1/g2 g1/g2/g3 images dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp Searching for objects to modify... dset4 ...Found Making new file hrepacktst_out.hdf... g1 g1/g2 g1/g2/g3 images chunk HUFF dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp
To uncompress and unchunk a previously compressed and chunked object we use
-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "dset_chunk_comp:NONE" -c "dset_chunk_comp:NONE"
Objects to chunk are... dset_chunk_comp NONE Objects to compress are... dset_chunk_comp NONE compression, parameter 0 Building list of objects in hrepacktst.hdf... g1 g1/g2 g1/g2/g3 images dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp Searching for objects to modify... dset_chunk_comp...Found Making new file hrepacktst_out.hdf... g1 g1/g2 g1/g2/g3 images dset4 chunk dset_chunk dset_chunk_comp GZIP dset_comp
This case compresses all objects and chunks selected objects. The command line input and program output are
-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "*:RLE" -c "dset4:2x2"
Objects to chunk are... dset4 2 2 Objects to compress are... Compress all Building list of objects in hrepacktst.hdf... g1 g1/g2 g1/g2/g3 images dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp Searching for objects to modify... dset4 ...Found Making new file hrepacktst_out.hdf... g1 g1/g2 g1/g2/g3 images chunk RLE dset4 chunk RLE dset_chunk chunk RLE dset_chunk_comp RLE dset_comp
This case compresses selected objects and chunks all objects. The command line input and program output are
-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "dset_comp:RLE" -c "*:2x2"
Objects to chunk are... Chunk all Objects to compress are... dset_comp RLE compression, parameter 0 Building list of objects in hrepacktst.hdf... g1 g1/g2 g1/g2/g3 images dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp Searching for objects to modify... dset_comp...Found Making new file hrepacktst_out.hdf... g1 g1/g2 g1/g2/g3 images chunk dset4 chunk dset_chunk chunk GZIP dset_chunk_comp chunk RLE dset_comp
This case compresses all objects and chunks all objects. The command line input and program output are
-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "*:HUFF 1" -c "*:2x2"
Objects to chunk are... Chunk all Objects to compress are... Compress all Building list of objects in hrepacktst.hdf... g1 g1/g2 g1/g2/g3 images dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp Searching for objects to modify... Making new file hrepacktst_out.hdf... g1 g1/g2 g1/g2/g3 images chunk HUFF dset4 chunk HUFF dset_chunk chunk HUFF dset_chunk_comp chunk HUFF dset_comp
To uncompress and unchunk all objects we use
-v -i hrepacktst.hdf -o hrepacktst_out.hdf -t "*:NONE" -c "*:NONE"
Objects to chunk are... Chunk all Objects to compress are... Compress all Building list of objects in hrepacktst.hdf... g1 g1/g2 g1/g2/g3 images dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp Searching for objects to modify... Making new file hrepacktst_out.hdf... g1 g1/g2 g1/g2/g3 images dset4 dset_chunk dset_chunk_comp dset_comp
Each of the steps 1 to 4 outlined in Output has error checking functionalities.
The error messages of this step have to do with invalid input, for example, a non existing compression type, an invalid compression parameter or invalid chunking information. The following table resumes the Step 1 Error Messages regarding invalid input.
Reason of error | Example input (other parameters omitted) | Example output |
Invalid compression input. The compression type was not defined. | -t "dset4" | Input Error: Invalid compression input in <dset4> |
Invalid compression type. | -t "dset4:MYCOMP 6" | Input Error: Invalid compression type in <dset4:MYCOMP 6> |
Invalid compression parameter. This case refers for a non digit compression input. | -t "dset4:GZIP aa" | Input Error: Compression parameter not digit in <dset4:GZIP aa> |
Invalid compression parameter. This case refers for a numeric compression input, but not valid for the context of the compression type (GZIP has values between 0 and 9) | -t "dset4:GZIP 10" | Input Error: Invalid compression parameter in <dset4:GZIP 10> |
Missing compression parameter. This case refers for a non existing compression parameter. | -t "dset4:HUFF" | Input Error: Missing compression parameter in <dset4:HUFF> |
Extra compression parameter. This case refers for an extra compression parameter (RLE has no parameter). | -t "dset4:RLE 8" | Input Error: Extra compression parameter in RLE <dset4:RLE 8> |
Invalid chunking input. | -c "dset4" | Input Error: Invalid chunking input in <dset4> |
Invalid chunking input. | -c "dset4:AxB" | Input Error: Invalid chunking in <dset4:AxB> |
After all the compression and chunking input information is gathered, an additional test is made for repeated names or mixing of wilcard '*' with other names.
Reason of error | Example input (other parameters omitted) | Example output |
Repeated names. | -t "dset4:RLE" -t "dset4:HUFF 1" |
Input Error: compression information already inserted for <dset4> |
Repeated names. | -c "dset4:10x10" -c "dset4:5x5" |
Input Error: chunk information already inserted for <dset4> |
Mixing of wildcard with names. | -c "*:10x10" -c "dset4:5x5" |
Error: Invalid chunking input: '*' is present with other objects <dset4:5x5> |
Mixing of wildcard with names. | -t "*:RLE" -t "dset4:RLE" |
Error: Invalid compression input: '*' is present with other objects <dset4:RLE> |
After the checking for valid input a first traversal of the file is made with the purpose of examining if the input names are present in the file. The main loop used to locate all the objects in the file is explained in the Reading the HDF file section.
Reason of error | Example input | Example output |
The object is not in the file | -i hrepacktst.hdf -o hrepacktst_out.hdf -t "dset:RLE" | Error: <dset> not found in file <hrepacktst.hdf>. Exiting... |
Checks if the object given in the input is compressible. Only datasets and images are compressible. | -i hrepacktst.hdf -o hrepacktst_out.hdf -t "vdata4:RLE" | Error: <vdata4> not compressible/chunk object in file <hrepacktst.hdf>. Exiting... |
Testing of hrepack is done with an application program that uses the hdiff function. An HDF test file is generated, then hrepack is called with several use cases, then hdiff is called to verify the difference between the input and output files. This only verifies that the data in both files is equal or not. To test the compress/chunking of the new file, API functions are used. The test of bad input from the command line is done manually.
Each test is of the form
hrepack_addcomp("dset4:GZIP 6",&options); hrepack_addchunk("dset4:2x2",&options); hrepack(FILENAME,FILENAME_OUT,&options); ret=hdiff(FILENAME,FILENAME_OUT,fspec);
The hrepack_addcomp adds the compressed option and the hrepack_addchunk adds the chunk option. The return value of hdiff tells if the object data of both files contains differences. A sample output is
Testing copy all objects with no compression / no chunking PASSED Testing compress selected objects / no chunking PASSED Testing no compress / chunking selected PASSED
For debugging purposes the verbose output of both calls (hrepack and hdiff) can be turned on
#if defined (HREPACK_DEBUG) options.verbose=1; fspec.verbose =1; #endif
and the output is
Testing compressing SELECTED, chunking SELECTED ----------------------------------------------------- Objects to chunk are... dset4 2 2 Objects to compress are... dset4 HUFF compression, parameter 1 Building list of objects in hrepacktst.hdf... g1 g1/g2 g1/g2/g3 dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp Searching for objects to modify... dset4 ...Found Making new file hrepacktst_out.hdf... g1 g1/g2 g1/g2/g3 chunk HUFF dset4 chunk dset_chunk chunk GZIP dset_chunk_comp GZIP dset_comp
--------------------------------------- Tag Ref Name --------------------------------------- 1965 2 g1 1965 4 g1/g2 1965 5 g1/g2/g3 720 49 dset4 720 54 dset_chunk 720 47 dset_chunk_comp 720 78 dset_comp --------------------------------------- Tag Ref Name --------------------------------------- 1965 2 g1 1965 4 g1/g2 1965 5 g1/g2/g3 720 47 dset4 720 74 dset_chunk 720 77 dset_chunk_comp 720 98 dset_comp --------------------------------------- file1 file2 --------------------------------------- x x g1 x x g1/g2 x x g1/g2/g3 x x dset4 x x dset_chunk x x dset_chunk_comp x x dset_comp Comparing <dset4> Comparing <dset_chunk> Comparing <dset_chunk_comp> Comparing <dset_comp>
HDF Help Desk
Last modified: September 18, 2003 Describes hrepack |