SZIP  Support--Requirements and Proposals (under construction)

March, 2004

Overview

The SZIP library has been modified so that it can be compiled in two versions, the full library, and with the encoder removed. The latter is free for all use, the former may reuire a license for commercial use.

When disabled, an attempt to encode data returns an error.  The SZIP library includes a new function to discover if the encoder is enabled or disabled.

The HDF libraries and tools should be modified to use this feature, and to behave reasonably for both configurations of SZIP. This document describes required changes to the HDF libraries and tools.

Goals

The overall goal is to be able to distribute HDF libraries and tools that work with SZIP, whether enabled or disabled. We will distribute two versions of the SZIP library, with and without the encoder, with corresponding license.

Ideally, the HDF libraries should automatically detect the configuration of SZIP (i.e., whether SZIP encoding is allowed). This goal can be achieved with a few modifications to the existing SZIP filter and HDF5 library, and similar changes to HDF4.

In addtion, certain tools (such as h5repack and HDFview) should be modified to gracefully inform the user in the event that they attempt to write to a dataset that requires SZIP encoding when not available.

Goals for the HDF Libraries

The SZIP library presents a new and unprecedented case for HDF: it is a filter that may be configured to be "one-way." In the current libraries, a filter is either present or absent.  If present, it is always applied (although it may be silently skipped in some cases).

The SZIP library now has three configurations: absent, present read/write, and present read-only. The fundamental goal for the changes to the library is to handle the third case in a reasonable way, and in a way that the calling program can understand.

In the future, there may be other filters with similar 'read-only' configurations, so the solutions should be applicable to any filter.

Required Changes

Internal Operations

Fundamentally, when the HDF library discovers the SZIP module (e.g., HDF5 registers the SZ filter), it should probe the SZIP library to discover if encoding is enabled.  This probe is not difficult to implement, but a completely general implementation should consider a generic interface for all filters.  

<<not sure what changes may be needed in HDF5 or HDF4>>

A new error must be defined, i.e., "filter present, but writes not allowed".

User Visible Changes

There are three user visible cases where the HDF5 library should recognize the read-only case. (HDF4 has similar cases.)

1. Create Dataset with SZIP


When SZIP is configured read-only, a request to create a dataset with SZIP encoding should fail. In the case of HDF4, an attempt to enable SZIP encoding (SDsetcompress) should fail.

This is similar to the case where SZIP is not configured at all, although the error should be distinguishable from "no SZIP at all".


2. Write Data to an SZIP Compressed Dataset

It is possible for data to be created by one program compressed with SZIP, and later read by another program with the encoder disabled. In this case, reading the data will succeed as expected, but an attempt to write back cannot be re-compressed, i.e., the attempt to compress will fail.

In this case, the library must do one of two actions:
  1. Fail the write, or
  2. write without compression
The suggested default is to 'fail', i.e., return an error from the write operation.  Having an option to write through uncompressed could be provided if we want.

Note: the current, unmodified behavior of the HDF5 library is to silently skip the compression, i.e., to write the data without compression with no notification to the caller that SZIP is disabled.  This behavior does not meet the requirements.

<<I'm not sure what the beharior of HDF4 is>>

3. Discover Whether Encoding is Enabled

The HDF library has a function to discover the settings for compression and other filters. There needs to be a method to discover whether SZIP encoding is enabled.  This can be used by tools to behave gracefully when SZIP is read-only, e.g., to inform the user that this dataset cannot be compressed with this version of the library.  

Note: it would be possible to attempt to write to the dataset, and receive an error if ecoding is disabled.  This is not considered to be a good solution.

This goal can be met by either extending an existing API (e.g., H5Pget_filter_id, or SDgetcompress), or by creating a new API call. Either approach is very simple.

Extending existing calls

One approach is to extend the current H5Pget_filter_id (and HDF4 SDgetcompress) to return one additional value, indicating whether the filter is enabled or not.

This approach has the advantage that the user makes a single call to find out "everything of interest" about the filter.

On the other hand, this changes the behavior of existing API calls (albeit in a way that is not likely to be a problem). Also, it is mixing apples and oranges, permanent per dataset settings mixed with per library configuration settings.

Note, too, that the stored properties (i.e., the compression parameters) will never change, the configuration of the library is different depending on the linking of the library.  This might cause confusion, e.g., in a test program.

New API calls

An alternative is to create new calls to query the status of the SZIP (or other) filter.

Advantages of this approach include that existing features are not changes, and the API reflects the actual semantics of the software (i.e., the configuration of the filter is a library configuration issue, not a stored property of the dataset).

The main disadvantage is the added complexity and inconvenience of an extra API call in the user code.



<<Please make suggestions or revisions to these arguments>>

<<TBD:  concrete proposal>>