Proposed changes to the dumper output (revision 4)

This document contains proposed changes to the h5dump tool. It uses the data description language (DDL) for an HDF5 file. 

Filters

What is being added

The word FILTERS followed by a list of filter identifiers, and extra filter parameters, if they exist.

where it goes

To the <dataset_info> part of the DDL

is this always printed or will it be an option

option

command line -H -p -d all ../testfiles/h5dumptst/tfilters.h5

Example output for FILTERS 

HDF5 "../testfiles/h5dumptst/tfilters.h5" {
DATASET "all" {
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 20, 10 ) / ( 20, 10 ) }
   STORAGE_LAYOUT CHUNKED {
      SIZE 458 ( 10, 5 )
   }
   FILTERS {
      PREPROCESSING SHUFFLE
      COMPRESSION SZIP {
         PIXELS_PER_BLOCK 4
         MODE HARDWARE
         CODING ENTROPY
         BYTE_ORDER LSB
         HEADER RAW
      }
      COMPRESSION DEFLATE { LEVEL 5 }
      CHECKSUM FLETCHER32
   }
   FILLVALUE {
      FILL_TIME IFSET
      ALLOC_TIME INCR
      VALUE  { 0 }
   }
}
}
command line -H -p -d myfilter ../testfiles/h5dumptst/tfilters.h5
Example output for FILTERS. Case where the filter is a user defined filter. Data cannot be read, and a message is printed
HDF5 "../testfiles/h5dumptst/tfilters.h5" {
DATASET "myfilter" {
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 20, 10 ) / ( 20, 10 ) }
   STORAGE_LAYOUT CHUNKED {
      SIZE 800 ( 10, 5 )
   }
   FILTERS {
      UNKNOWN_FILTER 405 PARAMS { 5 6 }
   }
   FILLVALUE {
      FILL_TIME IFSET
      ALLOC_TIME INCR
      VALUE  { 0 }
   }
}
}

 

Storage layout

What is being added

The word STORAGELAYOUT followed by the name of the layout, and extra parameters, if they exist.

where it goes

To the <dataset_info> part of the DDL

is this always printed or will it be an option

option

command line -H -p -d contiguous ../testfiles/h5dumptst/tfilters.h5

Example output for STORAGELAYOUT CONTIGUOUS

 

HDF5 "../testfiles/h5dumptst/tfilters.h5" {
DATASET "contiguous" {
COMMENT "This is a dataset with contiguous storage"
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 20, 10 ) / ( 20, 10 ) }
   STORAGE_LAYOUT CONTIGUOUS {
      SIZE 800 OFFSET 0
   }
   FILLVALUE {
      FILL_TIME IFSET
      ALLOC_TIME LATE
      VALUE  { 0 }
   }
}
}

command line -H -p -d compact ../testfiles/h5dumptst/tfilters.h5

Example output for STORAGELAYOUT COMPACT

HDF5 "../testfiles/h5dumptst/tfilters.h5" {
DATASET "compact" {
COMMENT "This is a dataset with compact storage"
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 20, 10 ) / ( 20, 10 ) }
   STORAGE_LAYOUT COMPACT {
      SIZE 800
   }
   FILLVALUE {
      FILL_TIME IFSET
      ALLOC_TIME EARLY
      VALUE  { 0 }
   }
}
}

command line -H -p -d deflate ../testfiles/h5dumptst/tfilters.h5

Example output for STORAGELAYOUT CHUNKED

HDF5 "../testfiles/h5dumptst/tfilters.h5" {
DATASET "deflate" {
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 20, 10 ) / ( 20, 10 ) }
   STORAGE_LAYOUT CHUNKED {
      SIZE 385 ( 10, 5 )
   }
   FILTERS {
      COMPRESSION DEFLATE { LEVEL 9 }
   }
   FILLVALUE {
      FILL_TIME IFSET
      ALLOC_TIME INCR
      VALUE  { 0 }
   }
}
}

command line -H -p -d contiguous ../testfiles/h5dumptst/tfilters.h5

Example output for STORAGELAYOUT CONTIGUOUS when we have an external file

 

HDF5 "../testfiles/h5dumptst/tfilters.h5" {
DATASET "external" {
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 100 ) / ( 100 ) }
   STORAGE_LAYOUT CONTIGUOUS EXTERNAL {
      FILENAME ext1.bin SIZE 200 OFFSET 0
      FILENAME ext2.bin SIZE 200 OFFSET 0
   }
   FILLVALUE {
      FILL_TIME IFSET
      ALLOC_TIME LATE
      VALUE  { 0 }
   }
}
}

 

Fill value

What is being added

The word FILLVALUE followed by 3 items:

1) The word FILL_TIME followed by the name identifier of the fill value’s writing time
2) The word ALLOC_TIME followed by the name identifier of the fill value’s allocation time
3) The fill value itself, in the data format.

 

where it goes

To the <dataset_info> part of the DDL

is this always printed or will it be an option

option

command line -H -p -d "fill early" ../testfiles/h5dumptst/tfilters.h5

Example output for FILL_VALUE

HDF5 "../testfiles/h5dumptst/tfilters.h5" {
DATASET "fill early" {
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 20, 10 ) / ( 20, 10 ) }
   STORAGE_LAYOUT CHUNKED {
      SIZE 800 ( 10, 5 )
   }
   FILLVALUE {
      FILL_TIME ALLOC
      ALLOC_TIME EARLY
      VALUE  { -99 }
   }
}
}

Comments

What is being added

The word COMMENT  followed by comments as a string

where it goes

To the <dataset_info> part of the DDL, at the beginning

is this always printed or will it be an option

option

Example output for COMMENT 

HDF5 "../testfiles/h5dumptst/tfilters.h5" {
DATASET "contiguous" {
COMMENT "This is a dataset with contiguous storage"
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 20, 10 ) / ( 20, 10 ) }
}
}

    

File superblock additions

What is being added

Boot block, file driver and B-tree information are added to SUPER_BLOCK.Note: the order below will be implemented in the order that the information is stored in the file format (the below order might not reflect this)

1) The word SUPERBLOCK_VERSION followed by the superblock version number
2) The word FREELIST_VERSION followed by the freelist version number
3) The word SYMBOLTABLE_VERSION followed by the symbol table version number 
4) The word OBJECTHEADER_VERSION followed by the object header version number
5) The word USERBLOCK_SIZE followed by the  userblock size
6) The word  OFFSET_SIZE followed by the offset size used in the HDF5 file
7) The word  LENGTH_SIZE followed by the length size used in the HDF5 file
8) The word  BTREE_RANK followed by the symbol table B-tree 1/2 rank
9) The word  BTREE_LEAF followed by the symbol table leaf node 1/2 size
10) The word FILE_DRIVER followed by the name identifier of the file driver
11) The word ISTORE_K  followed by the size of the parameter used to control the B-trees for indexing chunked datasets. 

See H5Pget_version(), H5Pget_userblock(), H5Pget_sizes(), H5Pget_sym_k(), H5Pget_driver and H5Pget_istore_k().

 

where it goes

To the <file_super_block>opt  part of the DDL

is this always printed or will it be an option

option

command line -H -B -d contiguous ../testfiles/h5dumptst/tfilters.h5

Example output

HDF5 "../testfiles/h5dumptst/tfilters.h5" {
SUPER_BLOCK {
   SUPERBLOCK_VERSION 0
   FREELIST_VERSION 0
   SYMBOLTABLE_VERSION 0
   OBJECTHEADER_VERSION 0
   USERBLOCK_VERSION 0
   OFFSET_SIZE 8
   LENGTH_SIZE 8
   BTREE_RANK 16
   BTREE_LEAF 4
   FILE_DRIVER H5FD_SEC2
   ISTORE_K 32
}
DATASET "contiguous" {
COMMENT "This is a dataset with contiguous storage"
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 20, 10 ) / ( 20, 10 ) }
}
}

 

File contents

What is being added

The word FILE_CONTENTS  followed by a list of all the objects in the file. The format is
<object type> <path>, where <object type> can be group, dataset, link, or datatype

where it goes

After the file name

is this always printed or will it be an option

option

command line -n ../testfiles/h5dumptst/tfilters.h5

Example output for FILE_INDEX 

 

HDF5 "../testfiles/h5dumptst/tfilters.h5" {
FILE_CONTENTS {
 dataset    /all
 dataset    /bitfield
 dataset    /char
 dataset    /compact
 dataset    /contiguous
 dataset    /deflate
 dataset    /enum
 dataset    /external
 dataset    /fill early
 dataset    /fill ifset
 dataset    /fill never
 dataset    /fletcher32
 datatype   /my type
 dataset    /myfilter
 datatype   /myvlen
 dataset    /shuffle
 dataset    /string
 dataset    /szip
 dataset    /vlen
 }
}


Array indices 

What is being added

At the beginning of each row, the position of the array in the format (row,col) , for the case of a 2D matrix

where it goes

In the DATA section

is this always printed or will it be an option

on by default

Example output for the array indices 

 

HDF5 "../testfiles/h5dumptst/tfilters.h5" {
DATASET "compact" {
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 20, 10 ) / ( 20, 10 ) }
   DATA {
        (0,0) 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
        (1,0) 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
        (2,0) 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
        (3,0) 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
        (4,0) 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
        (5,0) 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
        (6,0) 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
        (7,0) 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,
        (8,0) 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
        (9,0) 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
        (10,0) 100, 101, 102, 103, 104, 105, 106, 107, 108, 109,
        (11,0) 110, 111, 112, 113, 114, 115, 116, 117, 118, 119,
        (12,0) 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
        (13,0) 130, 131, 132, 133, 134, 135, 136, 137, 138, 139,
        (14,0) 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
        (15,0) 150, 151, 152, 153, 154, 155, 156, 157, 158, 159,
        (16,0) 160, 161, 162, 163, 164, 165, 166, 167, 168, 169,
        (17,0) 170, 171, 172, 173, 174, 175, 176, 177, 178, 179,
        (18,0) 180, 181, 182, 183, 184, 185, 186, 187, 188, 189,
        (19,0) 190, 191, 192, 193, 194, 195, 196, 197, 198, 199
   }
}
}

Print char arrays as a string

What is being added

a switch to see 'character array' datasets as 'strings'. i.e., instead of printing "C, H, A, R, , A, R, R, A, Y" they would like the output to look like "CHAR ARRAY".

where it goes

In the DATA section

is this always printed or will it be an option

always
command line -d string ../testfiles/h5dumptst/tfilters.h5

Example output

HDF5 "../testfiles/h5dumptst/tfilters.h5" {
DATASET "string" {
   DATATYPE  H5T_STRING {
         STRSIZE 12;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
   DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
   DATA {
        (0) "string\n new"
   }
}
}

 

Add ability to interpet CR/LF w/viewing attrs

What is being added

add a newline or tab when \n or \t are in the string

where it goes

In the DATA section

is this always printed or will it be an option

TBD

Example output for the array indices 

 HDF5 "example.h5"  {
GROUP "/" {
   DATASET "dset1" {
     DATATYPE H5T_STD_I8LE
      DATASPACE SIMPLE { ( 10, 10 ) / ( 10, 10 ) }
       DATA {
         "my string
          with a newline"
        }
   }

 

display object references as path

What is being added

Display object references as a path

where it goes

In the DATA section

is this always printed or will it be an option

always

Current output 

it prints the type of object pointed to (DATASET) and its object id. in this case we have 2 references (a 1D array with 2 elements)

 

HDF5 "../testfiles/h5dumptst/tfilters.h5" {
DATASET "reference" {
   DATATYPE  H5T_REFERENCE
   DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
   DATA {
        (0) DATASET 0:39104, DATASET 0:38832 
   }
}
}

New  output 

add the path to the previous information

 

HDF5 "../testfiles/h5dumptst/tfilters.h5" {
DATASET "reference" {
   DATATYPE  H5T_REFERENCE
   DATASPACE  SIMPLE { ( 2 ) / ( 2 ) }
   DATA {
        (0) DATASET 0:39104 /g1/dset1 , DATASET 0:38832 /g1/dset2
   }
}
}

 


Last modified: Wednesday, June 16, 2004