SZIP Support-- Proposals for Handling "Read Only" Libraries.
Robert E. McGrath, Quincey Koziol, Elena Pourmal
April, 2004
1. Overview
The HDF libraries are required to include SZIP compression as a standard
filter. The SZIP library has some restrictions on its use for commercial
purposes. Specifically, the decoder is free for all to use, but the encoder
may be used only for non-commercial purposes.
The SZIP library has been modified so that it can be compiled in two versions:
- the full library, and
- the library with the encoder disabled/removed.
The former may require a license for commercial use. The latter is free for all
use.
Because of the legal issues, it is very important that this change is deployed
as soon as possible. We must be able to provide users with the ability to
use HDF with both license conditions. At this time, many users are deconfiguring
SZIP because they have no way to use the encoder-only option.
The overall approach will be to have one version of the HDF libraries, which
can be linked to either version of SZIP, depending on the user's preference
and rights. We will distribute two versions of the SZIP binaries, full and
decode only, the user may download and use either.
In order to realize this goal the HDF libraries must be modified to behave
reasonably in the case when the SZIP encoder is not available. E.g.,
in this configuration, a dataset previously compressed with SZIP can be read,
but datasets cannot be created with SZIP, nor can data be written compressed
with SZIP.
In addition to the changes to the libraries, miscellaneous tools will need
to be modified to provide meaningful feedback to the user, e.g.,"this dataset
cannot be modified because you do not have the SZIP license".
This document proposes
required changes to the HDF libraries.
2. Challenges for the HDF Libraries
The SZIP library presents a new and unprecedented case for HDF: it is a filter
that may be configured to be "one-way." In the current libraries, a filter
is either present or absent. If present, it is always applied (although
it may be silently skipped in some cases).
The SZIP library now has three configurations: absent, present read/write,
and present read-only. The fundamental goal for the changes to the library
is to handle the third case in a reasonable way, and in a way that the calling
program can understand.
In the future, there may be other filters with similar 'read-only'
configurations, so the solutions should be applicable to any filter.
3. Required Changes
3.1 Format Changes
No changes to either the HDF4 or HDF5 file format is required.
3.2 Filter Operations
A new error must be defined, i.e., "filter present, but writes not allowed".
E.g., if a H5Dwrite fails because SZIP is required but encoding is disabled,
the failure should tell the reason.
In HDF5, the registration protocol must be revised from:
typedef struct H5Z_class_t {
H5Z_filter_t id; /* Filter ID number */
const char *name; /* Comment for debugging */
H5Z_can_apply_func_t can_apply; /* The "can apply" callback for a filter */
H5Z_set_local_func_t set_local; /* The "set local" callback for a filter */
H5Z_func_t filter; /* The actual filter function */
} H5Z_class_t;
to:
typedef struct H5Z_class_t {
unsigned version; /* Version # of structure */
H5Z_filter_t id; /* Filter ID number */
unsigned encoder_present; /* Flag to indicate the filter has an encoder */
unsigned decoder_present; /* Flag to indicate the filter has a decoder */
const char *name; /* Comment for debugging */
H5Z_can_apply_func_t can_apply; /* The "can apply" callback for a filter */
H5Z_set_local_func_t set_local; /* The "set local" callback for a filter */
H5Z_func_t filter; /* The actual filter function */
} H5Z_class_t;
The new "encoder_present" and "decoder_present" flags should be set by the
application registering the filter in order to indicate that the filter has an
encoder and decoder, respectively. The "version" field should be set to the
constant "H5Z_CLASS_T_VERS" by the application and will be used by the
library to determine the correct format for interpreting the H5Z_class_t
structure passed by the application. Including the version information allows
an application to be re-linked with a later version of the HDF5 library without
concern that the H5Z_class_t structure will be mis-interpreted.
In HDF5, the semantics of the H5Z_FLAG_OPTIONAL must be refined. Currently, this flag is defined:
If the filter fails [...] during an
H5Dwrite
operation then the filter is
just excluded from the pipeline for the chunk for which
it failed...This is commonly used for compression filters: if the
filter result would be larger than the input, then
the compression filter returns failure and the
uncompressed data is stored in the file.
If this bit is not set (i.e., the filter is required), the operation will fail.
When SZIP encoding is enabled, it should work as described above. However,
when encoding is disabled, all reads should succeed, but all writes should fail
(rather than silently writing the data uncompressed).
Note that, while this behavior is new, it does not contradict the current
documentation, nor change the behavior of existing code or files. Therefore,
this is considered a "refinement" to the current library, which applies to
a new case.
In HDF4, the semantics of filters does not change. If encoding is disabled, the write will fail. (Details are TBD.)
4. User Visible Changes (HDF5)
There are user visible cases where the HDF5 library should recognize the read-only case.
4.1. Create Dataset with SZIP
When SZIP is configured read-only, a request to create a dataset with SZIP
encoding should fail. There are three ways this may happen in HDF5.
1. Call H5Pset_szip to add SZIP to a Dataset Creation Property List
The library should detect that SZIP encoding is not enabled, and return
a new failure code that indicates "encoding is disabled".
2. Copy the Dataset Creation Properties from another dataset, try to create a new dataset.
In this scenario, a dataset in a file was created with another version of
the library using SZIP. The program calls H5Dget_create_plist to retrieve the dataset creation
properties, and then tries to create a new dataset, calling H5Dcreate
with those properties.
In this case, the library must detect that SZIP encoding is not enabled, and H5Dcreate should fail, and should return "encoding is disabled".
3. Extend a dataset that is compressed with SZIP
In this scenario, a dataset in a file was created
with another version of the library using SZIP. The dataset is extendible,
has a fill value defined, and has a fill policy that requires writing the
fill values when space is allocated.
This file is opened with SZIP encoding disabled, and H5Dextend is called to extend the dataset.
In this case, the H5Dextend should fail, and return "encoding is disabled".
4.2. Write Data to an SZIP Compressed Dataset
It is possible for data to be created by one program compressed with SZIP,
and later read by another program with the encoder disabled. In this case,
reading the data will succeed as expected, but an attempt to write back cannot
be re-compressed, i.e., the attempt to compress will fail.
In this case, the library must do one of two actions:
- Fail the write, or
- write without compression
The proposed default is to 'fail', i.e., return an error from the write
operation. See the discussion of the H5Z_FLAG_OPTIONAL flag, above. The error should be "encoding is disabled".
We could support the first behavior with a new transfer property to override
the default. This is discussed in section 6 below.
4.3. Discover Whether Encoding is Enabled
The HDF library has a function to discover the settings for compression and
other filters. These facilities need to be enhanced so the calling program can discover whether SZIP encoding is enabled or not.
While a program can discover that SZIP is disabled by attempting to create
or write using SZIP, it is highly desirable to provide inquiry functions
so a program can easily determine whether SZIP encoding is enabled. This
can be used by tools to behave gracefully when SZIP is read-only, e.g., to
inform the user that this dataset cannot be compressed with this version
of the library.
1. Filter availability
The availability of filters is a feature of the library (how it was linked),
so there should be a new API call to test any filter.
We propose a new API function, e.g.:
- Name: H5Zget_filter_info
- Signature:
- herr_t
H5Zget_filter_info
(H5Z_filter_t filter
, unsigned int *filter_config_flags)
- Purpose:
- Determines whether a filter is available, and if so, what features are enabled.
- Description:
H5Zget_filter_info
determines whether the filter
specified in filter
is available to the application. If so, the features are returned in a bit field. The feature flags are:
H5Z_FILTER_CONFIG_ENCODE_ENABLED
- encoding is enabledH5Z_FILTER_CONFIG_DECODE_ENABLED
- decoding is enabled- Parameters:
- H5Z_filter_t
filter
- IN: Filter identifier.
- unsigned int * filter_config_flag
- OUT: Bit mask of filter features.
- Returns:
- Returns a non-negative value if successful;
otherwise returns a negative value. A negative value will be
returned if a filter that is not registered with the library is
queried.
2. Filter properties
Currently, there are several functions that retrieve the settings for a filter,
e.g., the parameters to the compression algorithm. These are retrieved from
a dataset creation property list. It is desirable that the inquiry functions,
H5Zget_filter and so on should be extended to report whether writing is enabled.
The proposed extension is to add another returned value, to tell the availablility of the filter (READ, WRITE, NONE, BOTH).
For example:
herr_t H5Pget_filter_by_id
(
hid_t plist_id
,
H5Z_filter_t filter
,
unsigned int *flags
,
size_t *cd_nelmts
,
unsigned int cd_values[]
,
size_t namelen
,
char name[]
)
would be extended to have an new OUT parameter, which tells whether this filter is configured.
herr_t H5Pget_filter_by_id
(
hid_t plist_id
,
H5Z_filter_t filter
,
unsigned int *flags
,
size_t *cd_nelmts
,
unsigned intcd_values[]
,
size_t namelen
,
char name[]
, unsigned int *filter_config
)
- unsigned int
fileter_config
- OUT: Bit vector specifying certain general properties
of the filter.
H5Z_FILTER_CONFIG_ENCODE_ENABLED
- encoding is enabled
H5Z_FILTER_CONFIG_DECODE_ENABLED
- decoding is enabled
5. User Visible Changes (HDF4)
There are user visible cases where the HDF4 library should recognize the read-only case.
5.1. Create Dataset with SZIP
When SZIP is configured read-only, a request to create an object with SZIP
encoding should fail.
An SDS (or GR image) is created with SDcreate (GRcreate), then compression is requested with SDsetcompress (GRsetcompress).
In this case, the SDsetcompress (GRsetcompress) should fail. The dataset can be created, but it will not be compressed.
The failure code is TBD.
5.2. Write Data to an SZIP Compressed Dataset
In this scenario, a dataset (GR image) is created with one version of the
library, and compressed with SZIP. The file is opened using a different
version of the library, with SZIP encoding disabled. The program writes
data to the SDS (GR), with SDwrite or SDwritechunk (GRwrite, GRwritechunk).
In this case, the write should fail, and return "encdoing is disabled".
5.3. Discover Whether Encoding is Enabled
As discussed above, there needs to be a method to discover whether SZIP encoding
is enabled. This can be used by tools to behave gracefully when SZIP
is read-only, e.g., to inform the user that this dataset cannot be compressed
with this version of the library.
This information can be added as a new value to the comp_info_t union, which is returned by SDgetcompress (GRgetcompress).
struct
{
int32 bits_per_pixel;
int32 compression_mode;
int32 options_mask;
int32 pixels;
int32 pixels_per_block;
int32 pixels_per_scanline;
int32 config_flags
}
szip; /* for szip encoding */
/* values for config_flags */
CSZIP_ENCODER_ENABLED 1
CSZIP_DECODER_ENABLED 2
6. Optional library features that might be done in the future
For the HDF5 library, we might add a new data transfer property to override
the failure on write when encoding is disabled. I.e., when requested, the
library could write uncompressed chunks into the dataset. This feature
should not be done now, but could be added in the future, if needed.
The HDF4 API could be extended to add an inquiry to determine if the compression
method is available, e.g.,
HCcomp_available( comp_code_t )
This should
not be done now.
7. Changes to Tools
Once the library changes are available, several standard utilities and tools
should be modified to provide clear information to the user when the SZIP
encoding is disabled. Essentially, any tool that may create or write
data using SZIP needs to be modified to check for the availability and give
a reasonable result or message when SZIP is read only.
These tools include: hdfview (Java), h5repack, h4toh5, h5toh4, etc..
Users will need to make similar changes to their code, if needed. We
need to inform HDF-EOS and IDL, for example, so so they can determine how
they wish to deal with this.
Note that work on tools and applications cannot begin until the new API functions are added to the libraries.
8. Documentation and Examples
It will be important to clearly document this behavior and provide examples
for how to detect and handle the case when SZIP encoding is not available.
9. Summary of Changes
These changes need to be implemented as soon as possible. In the case
of HDF4, these can be implemented this year in HDF4.2r1, or in 2005. In
the case of HDF5, we can choose to implement some or all of the features
in HDF5.1.6.x, or HDF5-1.8.x. The latter will not be available to users until
2005.
One factor to consider is that tools and applications cannot implement the
needed changes until the inquiry functions are available. Deferring
the inquiry functions until 2005 will mean that tools and user applications
will not implement the needed changes until 2005+.
The two tables below list the changes, with suggested target releases. The
comment indicates the nature of the change and where it is discussed above.
In the tables, the comments mean:
- "Extension" -- adds to existing enumeration or list
- "Change" -- modifies existing API or data structure
- "Refinement" -- new or different behavior, does not impact existing code
Changes to HDF5
Feature
|
Comment
|
Implement in:
|
New error message
|
Extension (Section 3.2)
|
1.6.3
|
Register new function
|
Change to H5Z_class_t for H5Zregister call (Section 3.2)
|
1.8.0
|
Refine semantics of H5Z_OPTIONAL
|
Refinement (Section 3.2)
|
1.6.3
|
H5Pset_szip, fail if ZIP encoder disabled
|
Refinement (Section 4.1)
|
1.6.3
|
H5Dcreate, fail if ZIP encoder disabled
|
Refinement (Section 4.1)
|
1.6.3
|
H5Dextend, fail some cases if ZIP encoder disabled
|
Refinement (Section 4.1)
|
1.6.3
|
H5Dwrite, fail if ZIP encoder disabled
|
Refinement (Section 4.2)
|
1.6.3
|
H5Zget_filter_info
|
New API (Section 4.3)
|
1.6.3
|
H5Pget_filter_by_id (etc.)
|
Change API (Section 4.3)
|
1.8.0
|
Documentation, examples
|
When implemented
|
1.6.3, etc.
|
Tool support
|
TBD, requires inquiry functions
|
1.6.3+
|
User applications
|
requires inquiry functions
|
1.63+
|
Changes to HDF4
Feature
|
Comment
|
Implement in:
|
New error message
|
Extension (Section 3.2)
|
r1
|
Register new function
|
(Probably no change?) (Section 3.2)
|
r1
|
SDsetcompress, GRsetcompress: fail if ZIP encoder disabled
|
Refinement (Section 5.1)
|
r1
|
SDwrite, SDwritechunk, GRwrite, GRwritechunk: fail if ZIP encoder disabled
|
Refinement (Section 5.2)
|
r1
|
change comp_info_t
|
Change API (Section 5.3)
|
r1
|
Documentation, examples
|
When implemented
|
r1
|
Tool support
|
TBD, requires inquiry functions
|
r1+
|
User applications
|
requires inquiry functions
|
r1+
|