============================

Why these issues need to be faced now

How are things handled currently in HDF4 vs. HDF5?

HDF4

Dataset Storage Allocation
Allocating space to store a dataset is deferred until the space is needed. Space is only needed when non-fill-value data is written to a dataset. This allows for very large datasets to be defined, and if they are not written to, the file size can stay very small. This applies to both contiguous and chunked data.
Fill-values
1. Metadata
  Metadata documenting the fill-value is always written to a file. Either the default fill-value (of zero) or the user's fill value is written as an attribute of the dataset.
2. Writing
  Fill-values are only written to the dataset or chunk when the entire dataset or chunk is not going to be written in a single I/O request. For example: in a contiguously stored dataset, if a hyperslab in the middle of the dataset is written by the user (and this is the first piece of data to be written to the dataset), fill-values are written to the dataset and then the user's data is written in the hyperslab location. However, if the entire dataset is going to be written in one write call, then the fill-value writing step is skipped, since they would all be immediately over-written with the actual data. Note: Writing fill-values in HDF4 can be turned off completely by a user who either "knows" that they will be writing the entire dataset in successive calls, or who doesn't care about data outside the region(s) they are writing to in the dataset.
3. Reading
  If storage for the dataset or chunk is not allocated yet, the fill value is used to fill the buffer to return to the application and the file data is not read.

HDF5

Dataset Storage Allocation

Space for contiguously stored data is always allocated during the creation of the dataset. Space for chunk stored data is allocated as needed, when data needs to be written to the portion of the dataset that the chunk occupies. (Except in the case of parallel I/O, where all the chunks for a dataset are allocated at creation time also).
Fill-values
1. Metadata
  
  Metadata documenting the fill-value for a dataset is only written out if the user explicitly set a fill-value for the dataset during creation. Although there is an implicit zero fill-value assumed for the dataset, this is not enforced or recorded.
2. Writing
  
  Fill-values are only written to contiguously stored data when a dataset is created (and only if the user has set a fill-value). This occurs regardless of how the fill-values will be overwritten by future writes to the dataset. Fill-values for chunked storage data are somewhat more controlled, they are written only when data is actually written to a particular chunk. (The library may also be smart enough to notice when an entire chunk is being written and to not write the fill-values in that case, this case hasn't been investigated).
3. Reading
  
  Fill-values are only used for chunked storage datasets when an unallocated chunk is read from. Because contiguously stored data always allocates space in the file, the library assumes that there is always valid data to read for contiguous data.

Suggestions for improving HDF5's behavior

When to allocate space:
1. Early - during dataset create call. Certain VFLs like MPIO require space is allocated when dataset is created.
2. Late - during dataset write or close call.
When to write fill value:
Independent of whether space for dataset is allocated early or late, writing fill value should be an option that is able to be turned off entirely.
1. Never
2. Allocation - fill value is written once space is allocated.
What fill value to write:
1. Undefined - no value stored.
2. Default - library defined.
3. User-defined - user chosen.

When to allocate space	When to write fill value	What fill value to write	Library create-write-close behavior
early	never	-----	Library allocates space when dataset is created, but never write fill value to dataset.
late	never	-----	Library allocates space when dataset is written to, but never write fill value to dataset.
-----	allocation	undefined	Error on creating dataset, dataset not created.
early	allocation	default or user-defined	Allocate space for dataset when dataset is created. Write fill value(default or user-defined) to entire dataset.
late	allocation	default or user-defined	Doesn't allocate space for dataset until user's data value are written to dataset. Write fill value to entire dataset before writing user's data value.

Is space allocated?	What is the fill value?	When to write fill value?	H5Dread behavior
No	undefined	-----	Error. Dataset doesn't exist, no data has been written, fill value isn't defined.
No	default or user-defined	-----	Fill user's buffer with fill value.
Yes	undefined	-----	Return data from storage(dataset), trash is possible.
	default or user-defined	Never	Return data from storage(dataset), trash is possible.
	default or user-defined	allocation	Return data from storage(dataset).

QAK:1/9/02