SZIP Support--Requirements and Proposals (under construction)
March, 2004
Overview
The SZIP library has been modified so that it can be compiled in two versions,
the full library, and with the encoder removed. The latter is free for all
use, the former may reuire a license for commercial use.
When disabled, an attempt to encode data returns an error. The SZIP
library includes a new function to discover if the encoder is enabled or
disabled.
The HDF libraries and tools should be modified to use this feature, and to
behave reasonably for both configurations of SZIP. This document describes
required changes to the HDF libraries and tools.
Goals
The overall goal is to be able to distribute HDF libraries and tools that
work with SZIP, whether enabled or disabled. We will distribute two versions
of the SZIP library, with and without the encoder, with corresponding license.
Ideally, the HDF libraries should automatically detect the configuration
of SZIP (i.e., whether SZIP encoding is allowed). This goal can be achieved
with a few modifications to the existing SZIP filter and HDF5 library, and
similar changes to HDF4.
In addtion, certain tools (such as h5repack and HDFview) should be modified
to gracefully inform the user in the event that they attempt to write to
a dataset that requires SZIP encoding when not available.
Goals for the HDF Libraries
The SZIP library presents a new and unprecedented case for HDF: it is a filter
that may be configured to be "one-way." In the current libraries, a filter
is either present or absent. If present, it is always applied (although
it may be silently skipped in some cases).
The SZIP library now has three configurations: absent, present read/write,
and present read-only. The fundamental goal for the changes to the library
is to handle the third case in a reasonable way, and in a way that the calling
program can understand.
In the future, there may be other filters with similar 'read-only'
configurations, so the solutions should be applicable to any filter.
Required Changes
Internal Operations
Fundamentally, when the HDF library discovers the SZIP module (e.g., HDF5
registers the SZ filter), it should probe the SZIP library to discover if
encoding is enabled. This probe is not difficult to implement, but
a completely general implementation should consider a generic interface for
all filters.
<<not sure what changes may be needed in HDF5 or HDF4>>
A new error must be defined, i.e., "filter present, but writes not allowed".
User Visible Changes
There are three user visible cases where the HDF5 library should recognize the read-only case. (HDF4 has similar cases.)
1. Create Dataset with SZIP
When SZIP is configured read-only, a request to create a dataset with SZIP
encoding should fail. In the case of HDF4, an attempt to enable SZIP encoding (SDsetcompress) should fail.
This is similar to the case where SZIP is not configured
at all, although the error should be distinguishable from "no SZIP at all".
2. Write Data to an SZIP Compressed Dataset
It is possible for data to be created by one program compressed with SZIP,
and later read by another program with the encoder disabled. In this case,
reading the data will succeed as expected, but an attempt to write back cannot
be re-compressed, i.e., the attempt to compress will fail.
In this case, the library must do one of two actions:
- Fail the write, or
- write without compression
The suggested default is to 'fail', i.e., return an error from the write
operation. Having an option to write through uncompressed could be
provided if we want.
Note: the current, unmodified behavior of the HDF5 library is to silently skip
the compression, i.e., to write the data without compression with no notification
to the caller that SZIP is disabled. This behavior does not meet the
requirements.
<<I'm not sure what the beharior of HDF4 is>>
3. Discover Whether Encoding is Enabled
The HDF library has a function to discover the settings for compression and
other filters. There needs to be a method to discover whether SZIP encoding
is enabled. This can be used by tools to behave gracefully when SZIP
is read-only, e.g., to inform the user that this dataset cannot be compressed
with this version of the library.
Note: it would be possible to attempt to write to the dataset, and receive an error if ecoding is disabled. This is not considered to be a good solution.
This goal can be met by either extending an existing API (e.g., H5Pget_filter_id, or SDgetcompress),
or by creating a new API call. Either approach is very simple.
Extending existing calls
One approach is to extend the current H5Pget_filter_id (and HDF4 SDgetcompress)
to return one additional value, indicating whether the filter is enabled
or not.
This approach has the advantage that the user makes a single call to find out "everything of interest" about the filter.
On the other hand, this changes the behavior of existing API calls (albeit
in a way that is not likely to be a problem). Also, it is mixing apples and
oranges, permanent per dataset settings mixed with per library configuration
settings.
Note, too, that the stored properties (i.e., the compression parameters)
will never change, the configuration of the library is different depending
on the linking of the library. This might cause confusion, e.g., in
a test program.
New API calls
An alternative is to create new calls to query the status of the SZIP (or other) filter.
Advantages of this approach include that existing features are not changes,
and the API reflects the actual semantics of the software (i.e., the configuration
of the filter is a library configuration issue, not a stored property of
the dataset).
The main disadvantage is the added complexity and inconvenience of an extra API call in the user code.
<<Please make suggestions or revisions to these arguments>>
<<TBD: concrete proposal>>