SZIP  Support-- Proposals for Handling "Read Only" Libraries.

Robert E. McGrath, Quincey Koziol, Elena Pourmal

April, 2004

1. Overview

The HDF libraries are required to include SZIP compression as a standard filter. The SZIP library has some restrictions on its use for commercial purposes. Specifically, the decoder is free for all to use, but the encoder may be used only for non-commercial purposes.

The SZIP library has been modified so that it can be compiled in two versions:

The former may require a license for commercial use. The latter is free for all use.

Because of the legal issues, it is very important that this change is deployed as soon as possible. We must be able to provide users with the ability to use HDF with both license conditions. At this time, many users are deconfiguring SZIP because they have no way to use the encoder-only option.

The overall approach will be to have one version of the HDF libraries, which can be linked to either version of SZIP, depending on the user's preference and rights. We will distribute two versions of the SZIP binaries, full and decode only, the user may download and use either.

In order to realize this goal the HDF libraries must be modified to behave reasonably in the case when the SZIP encoder is not available.  E.g., in this configuration, a dataset previously compressed with SZIP can be read, but datasets cannot be created with SZIP, nor can data be written compressed with SZIP.

In addition to the changes to the libraries, miscellaneous tools will need to be modified to provide meaningful feedback to the user, e.g.,"this dataset cannot be modified because you do not have the SZIP license".

This document proposes required changes to the HDF libraries.

2. Challenges for the HDF Libraries

The SZIP library presents a new and unprecedented case for HDF: it is a filter that may be configured to be "one-way." In the current libraries, a filter is either present or absent.  If present, it is always applied (although it may be silently skipped in some cases).

The SZIP library now has three configurations: absent, present read/write, and present read-only. The fundamental goal for the changes to the library is to handle the third case in a reasonable way, and in a way that the calling program can understand.

In the future, there may be other filters with similar 'read-only' configurations, so the solutions should be applicable to any filter.

3. Required Changes

3.1 Format Changes

No changes to either the HDF4 or HDF5 file format is required.

3.2 Filter Operations


A new error must be defined, i.e., "filter present, but writes not allowed". E.g., if a H5Dwrite fails because SZIP is required but encoding is disabled, the failure should tell the reason.

In HDF5, the registration protocol must be revised from:
typedef struct H5Z_class_t {
    H5Z_filter_t id;		/* Filter ID number			     */
    const char	*name;		/* Comment for debugging		     */
    H5Z_can_apply_func_t can_apply; /* The "can apply" callback for a filter */
    H5Z_set_local_func_t set_local; /* The "set local" callback for a filter */
    H5Z_func_t filter;		/* The actual filter function		     */
} H5Z_class_t;

to:
typedef struct H5Z_class_t {
    unsigned version;           /* Version # of structure                    */
    H5Z_filter_t id;		/* Filter ID number			     */
    unsigned encoder_present;	/* Flag to indicate the filter has an encoder */
    unsigned decoder_present;	/* Flag to indicate the filter has a decoder */
    const char	*name;		/* Comment for debugging		     */
    H5Z_can_apply_func_t can_apply; /* The "can apply" callback for a filter */
    H5Z_set_local_func_t set_local; /* The "set local" callback for a filter */
    H5Z_func_t filter;		/* The actual filter function		     */
} H5Z_class_t;

The new "encoder_present" and "decoder_present" flags should be set by the application registering the filter in order to indicate that the filter has an encoder and decoder, respectively. The "version" field should be set to the constant "H5Z_CLASS_T_VERS" by the application and will be used by the library to determine the correct format for interpreting the H5Z_class_t structure passed by the application. Including the version information allows an application to be re-linked with a later version of the HDF5 library without concern that the H5Z_class_t structure will be mis-interpreted.

In HDF5, the semantics of the H5Z_FLAG_OPTIONAL must be refined.  Currently, this flag is defined:

If the filter fails [...] during an H5Dwrite operation then the filter is just excluded from the pipeline for the chunk for which it failed...This is commonly used for compression filters: if the filter result would be larger than the input, then the compression filter returns failure and the uncompressed data is stored in the file.
If this bit is not set (i.e., the filter is required), the operation will fail.

When SZIP encoding is enabled, it should work as described above.  However, when encoding is disabled, all reads should succeed, but all writes should fail (rather than silently writing the data uncompressed).

Note that, while this behavior is new, it does not contradict the current documentation, nor change the behavior of existing code or files. Therefore, this is considered a "refinement" to the current library, which applies to a new case.

In HDF4,  the semantics of filters does not change.  If encoding is disabled, the write will fail.
(Details are TBD.)

4. User Visible Changes (HDF5)

There are user visible cases where the HDF5 library should recognize the read-only case.

4.1. Create Dataset with SZIP


When SZIP is configured read-only, a request to create a dataset with SZIP encoding should fail. There are three ways this may happen in HDF5.

1.  Call H5Pset_szip to add SZIP to a Dataset Creation Property List

The library should detect that SZIP encoding is not enabled, and return a new failure code that indicates "encoding is disabled".

2. Copy the Dataset Creation Properties from another dataset, try to create a new dataset.

In this scenario, a dataset in a file was created with another version of the library using SZIP. The program calls H5Dget_create_plist to retrieve the dataset creation properties, and then tries to create a new dataset, calling H5Dcreate with those properties.

In this case, the library must detect that SZIP encoding is not enabled, and H5Dcreate should fail, and should return "encoding is disabled".

3. Extend a dataset that is compressed with SZIP

In this scenario, a dataset in a file was created with another version of the library using SZIP. The dataset is extendible, has a fill value defined, and has a fill policy that requires writing the fill values when space is allocated.

This file is opened with SZIP encoding disabled, and H5Dextend is called to extend the dataset.

In this case, the H5Dextend should fail, and return "encoding is disabled".


4.2. Write Data to an SZIP Compressed Dataset

It is possible for data to be created by one program compressed with SZIP, and later read by another program with the encoder disabled. In this case, reading the data will succeed as expected, but an attempt to write back cannot be re-compressed, i.e., the attempt to compress will fail.

In this case, the library must do one of two actions:
  1. Fail the write, or
  2. write without compression
The proposed default is to 'fail', i.e., return an error from the write operation. See the discussion of the H5Z_FLAG_OPTIONAL flag, above. The error should be "encoding is disabled".

We could support the first behavior with a new transfer property to override the default.  This is discussed in section 6 below.

4.3. Discover Whether Encoding is Enabled

The HDF library has a function to discover the settings for compression and other filters. These facilities need to be enhanced so the calling program can discover whether SZIP encoding is enabled or not.

While a program can discover that SZIP is disabled by attempting to create or write using SZIP, it is highly desirable to provide inquiry functions so a program can easily determine whether SZIP encoding is enabled.  This can be used by tools to behave gracefully when SZIP is read-only, e.g., to inform the user that this dataset cannot be compressed with this version of the library.  

1. Filter availability

The availability of filters is a feature of the library (how it was linked), so there should be a new API call to test any filter.

We propose a new API function, e.g.:
Name:  H5Zget_filter_info
Signature:
herr_t H5Zget_filter_info(H5Z_filter_t filter, unsigned int *filter_config_flags)
Purpose:
Determines whether a filter is available, and if so, what features are enabled.
Description:
H5Zget_filter_info determines whether the filter specified in filter is available to the application. If so, the features are returned in a bit field.  The feature flags are:   

H5Z_FILTER_CONFIG_ENCODE_ENABLED     - encoding is enabled
H5Z_FILTER_CONFIG_DECODE_ENABLED     - decoding is enabled
Parameters:
H5Z_filter_t filter
IN: Filter identifier.
unsigned int * filter_config_flag  
         OUT: Bit mask of filter features.
Returns:
Returns a non-negative value if successful; otherwise returns a negative value. A negative value will be returned if a filter that is not registered with the library is queried.


2. Filter properties

Currently, there are several functions that retrieve the settings for a filter, e.g., the parameters to the compression algorithm. These are retrieved from a dataset creation property list. It is desirable that the inquiry functions, H5Zget_filter and so on should be extended to report whether writing is enabled.

The proposed extension is to add another returned value, to tell the availablility of the filter (READ, WRITE, NONE, BOTH).

For example:

herr_t H5Pget_filter_by_id( hid_t plist_id, H5Z_filter_t filter, unsigned int *flags, size_t *cd_nelmts, unsigned int cd_values[], size_t namelen, char name[] )

would be extended to have an new OUT parameter, which tells whether this filter is configured.

herr_t H5Pget_filter_by_id( hid_t plist_id, H5Z_filter_t filter, unsigned int *flags, size_t *cd_nelmts, unsigned intcd_values[], size_t namelen, char name[], unsigned int *filter_config )

unsigned int fileter_config
OUT: Bit vector specifying certain general properties of the filter.
H5Z_FILTER_CONFIG_ENCODE_ENABLED     - encoding is enabled
H5Z_FILTER_CONFIG_DECODE_ENABLED     - decoding is enabled

5. User Visible Changes (HDF4)

There are user visible cases where the HDF4 library should recognize the read-only case.

5.1. Create Dataset with SZIP


When SZIP is configured read-only, a request to create an object with SZIP encoding should fail.

An SDS (or GR image) is created with SDcreate (GRcreate), then compression is requested with SDsetcompress (GRsetcompress).

In this case, the SDsetcompress (GRsetcompress) should fail.  The dataset can be created, but it will not be compressed.

The failure code is TBD.


5.2. Write Data to an SZIP Compressed Dataset

In this scenario, a dataset (GR image) is created with one version of the library, and compressed with SZIP.  The file is opened using a different version of the library, with SZIP encoding disabled.  The program writes data to the SDS (GR), with SDwrite or SDwritechunk (GRwrite, GRwritechunk).

In this case, the write should fail, and return "encdoing is disabled".


5.3. Discover Whether Encoding is Enabled

As discussed above, there needs to be a method to discover whether SZIP encoding is enabled.  This can be used by tools to behave gracefully when SZIP is read-only, e.g., to inform the user that this dataset cannot be compressed with this version of the library.  

This information can be added as a new value to the comp_info_t union, which is returned by SDgetcompress (GRgetcompress).  

      struct
{
int32 bits_per_pixel;
int32 compression_mode;
int32 options_mask;
int32 pixels;
int32 pixels_per_block;
int32 pixels_per_scanline;
int32 config_flags
}
szip; /* for szip encoding */

/* values for config_flags */

CSZIP_ENCODER_ENABLED 1
CSZIP_DECODER_ENABLED 2


6.  Optional library features that might be done in the future

For the HDF5 library, we might add a new data transfer property to override the failure on write when encoding is disabled. I.e., when requested, the library could write uncompressed chunks into the dataset.  This feature should not be done now, but could be added in the future, if needed.

The HDF4 API could be extended to add an inquiry to determine if the compression method is available, e.g.,
HCcomp_available( comp_code_t )  
This should not be done now.

7. Changes to Tools

Once the library changes are available, several standard utilities and tools should be modified to provide clear information to the user when the SZIP encoding is disabled.  Essentially, any tool that may create or write data using SZIP needs to be modified to check for the availability and give a reasonable result or message when SZIP is read only.

These tools include:  hdfview (Java), h5repack, h4toh5, h5toh4, etc..

Users will need to make similar changes to their code, if needed.  We need to inform HDF-EOS and IDL, for example, so so they can determine how they wish to deal with this.

Note that work on tools and applications cannot begin until the new API functions are added to the libraries.

8. Documentation and Examples

It will be important to clearly document this behavior and provide examples for how to detect and handle the case when SZIP encoding is not available.

9. Summary of Changes


These changes need to be implemented as soon as possible. In the case of HDF4, these can be implemented this year in HDF4.2r1, or in 2005.  In the case of HDF5, we can choose to implement some or all of the features in HDF5.1.6.x, or HDF5-1.8.x. The latter will not be available to users until 2005.

One factor to consider is that tools and applications cannot implement the needed changes until the inquiry functions are available.  Deferring the inquiry functions until 2005 will mean that tools and user applications will not implement the needed changes until 2005+.

The two tables below list the changes, with suggested target releases.  The comment indicates the nature of the change and where it is discussed above. In the tables, the comments mean:

Changes to HDF5

Feature
Comment
Implement in:
New error message
Extension (Section 3.2)
1.6.3
Register new function
Change to H5Z_class_t for H5Zregister call (Section 3.2)
1.8.0
Refine semantics of H5Z_OPTIONAL
Refinement (Section 3.2)
1.6.3
H5Pset_szip, fail if ZIP encoder disabled
Refinement (Section 4.1)
1.6.3
H5Dcreate, fail if ZIP encoder disabled
Refinement (Section 4.1)
1.6.3
H5Dextend, fail some cases if ZIP encoder disabled
Refinement (Section 4.1)
1.6.3
H5Dwrite, fail if ZIP encoder disabled
Refinement (Section 4.2)
1.6.3
H5Zget_filter_info
New API (Section 4.3)
1.6.3
H5Pget_filter_by_id (etc.)
Change API (Section 4.3)
1.8.0
Documentation, examples
When implemented
1.6.3, etc.
Tool support
TBD, requires inquiry functions
1.6.3+
User applications
requires inquiry functions
1.63+


Changes to HDF4

Feature
Comment
Implement in:
New error message
Extension (Section 3.2)
r1
Register new function
(Probably no change?) (Section 3.2)
r1
SDsetcompress, GRsetcompress: fail if ZIP encoder disabled
Refinement (Section 5.1)
r1
SDwrite, SDwritechunk, GRwrite, GRwritechunk: fail if ZIP encoder disabled
Refinement (Section 5.2)
r1
change comp_info_t
Change API (Section 5.3)
r1
Documentation, examples
When implemented
r1
Tool support
TBD, requires inquiry functions
r1+
User applications
requires inquiry functions
r1+