TOOL NAME: h5import SYNTAX: h5import -h[elp], OR h5import -c[onfig] [ -c[config] ...] -o[utfile] PURPOSE: To convert data stored in one or more ASCII or binary files into one or more datasets (in accordance with the user-specified type and storage properties) in an existing or new HDF5 file. DESCRIPTION: The primary objective of the utility is to convert floating point or integer data stored in ASCII text or binary form into a data-set according to the type and storage properties specified by the user. The utility can also accept ASCII text files and store the contents in a compact form as an array of one-dimensional strings (Not implemented in this version). The input data to be written as a data-set can be provided to the utility in one of the follwing forms: 1. ASCII text file with numeric data (floating point or integer data). 2. Binary file with native floating point data (32-bit or 64-bit) 3. Binary file with native integer (signed or unsigned) data (8-bit or 16-bit or 32-bit or 64-bit). 4. ASCII text file containing strings (text data). (Not implemented) Every input file is associated with a configuration file also provided as an input to the utility. (See Section "CONFIGURATION FILE" to know how it is to be organised). The class, size and dimensions of the input data is specified in this configuration file. A point to note is that the floating point data in the ASCII text file may be organized in the fixed floating form (for example 323.56) or in a scientific notation (for example 3.23E+02). A different input-class specification is to be used for both forms. (Note: Only the fixed form floating point version has been implemented in this version) The utility extracts the input data from the input file according to the specified parameters and saves it into an H5 dataset. The user can specify output type and storage properties in the configuration file. The user is requited to specify the path of the dataset. If the groups in the path leading to the data-set do not exist, the groups will be created by the utility. If no group is specified, the dataset will be created under the root group. In addition to the name, the user is also required to provide the class and size of output data to be written to the dataset and may optionally specify the output-architecure, and the output-byte-order. If output-architecture is not specified the default is NATIVE. Output-byte-orders are fixed for some architectures and may be specified only if output- architecture is IEEE, UNIX or STD. Also, layout and other storage properties such as compression, external storage and extendible data-sets may be optionally specified. The layout and storage properties denote how raw data is to be organized on the disk. If these options are not specified the default is Contiguous layout and storage. The dataset can be organized in any of the following ways: 1. Contiguous. 2. Chunked. 3. External Storage File (has to be contiguous) 4. Extendible data sets (has to be chunked) 5. Compressed. (has to be chunked) 6. Compressed & Extendible (has to be chunked) If the user wants to store raw data in a non-HDF file then the external storage file option is to be used and the name of the file is to be specified. If the user wants the dimensions of the data-set to be unlimited, the extendible data set option can be chosen. The user may also specify the type of compression and the level to which the data set must be compresses by setting the compressed option. SYNOPSIS: h5import -h[elp], OR h5import -c[onfig] [ -c[config] ...] -o[utfile] -h[elp]: Prints this summary of usage, and exits. : Name of the Input file(s), containing a single n-dimensional floating point or integer array in either ASCII text, native floating point(32-bit or 64-bit) or native integer(8-bit or 16-bit or 32-bit or 64-bit). Data to be specified in the order of fastest changing dimensions first. -c[config] : Every input file should be associated with a configuration file and this is done by the -c option. is the name of the configuration file. (See Section "CONFIGURATION FILE") -o[utfile] : Name of the HDF 5 output file. Data from one or more input files are stored as one or more data sets in . The output file may be an existing file or it maybe new in which case it will be created. CONFIGURATION FILE: The configuration file is an ASCII text file and must be organized as "CONFIG-KEYWORD VALUE" pairs, one pair on each line. The configuration file may have the following keywords each followed by an acceptable value. Required KEYWORDS: PATH INPUT-CLASS INPUT-SIZE RANK DIMENSION-SIZES OUTPUT-CLASS OUTPUT-SIZE Optional KEYWORDS: OUTPUT-ARCHITECTURE OUTPUT-BYTE-ORDER CHUNKED-DIMENSION-SIZES COMPRESSION-TYPE COMPRESSION-PARAM EXTERNAL-STORAGE MAXIMUM-DIMENSIONS Values for keywords: PATH: Strings separated by spaces to represent the path of the data-set. If the groups in the path do no exist, they will be created. For example, PATH grp1/grp2/dataset1 PATH: keyword grp1: group under the root. If non-existent will be created. grp2: group under grp1. If non-existent will be created under grp1. dataset1: the name of the data-set to be created. INPUT-CLASS: String denoting the type of input data. ("TEXTIN", "TEXTFP", "TEXTFPE", "FP", "IN", "STR", "TEXTUIN", "UIN"). INPUT-CLASS "TEXTIN" denotes an ASCII text file with signed integer data in ASCII form, INPUT-CLASS "TEXTIN" denotes an ASCII text file with unsigned integer data in ASCII form, "TEXTFP" denotes an ASCII text file containing floating point data in the fixed notation (325.34), "TEXTFPE" denotes an ASCII text file containing floating point data in the scientific notation (3.2534E+02) (Not implemented in this version), "FP" denotes a floating point binary file, "IN" denotes a signed integer binary file, "UIN" denotes an unsigned integer binary file, & "STR" denotes an ASCII text file the contents of which should be stored as an 1-D array of strings (Not implemented in this version). If INPUT-CLASS is "STR", then RANK, DIMENSION-SIZES, OUTPUT-CLASS, OUTPUT-SIZE, OUTPUT-ARCHITECTURE and OUTPUT-BYTE-ORDER will be ignored. INPUT-SIZE: Integer denoting the size of the input data (8, 16, 32, 64). For floating point, INPUT-SIZE can be 32 or 64. For integers (signed and unsigned) INPUT-SIZE can be 8, 16, 32 or 64. RANK: Integer denoting the number of dimensions. DIMENSION-SIZES: Integers separated by spaces to denote the dimension sizes for the no. of dimensions determined by rank. OUTPUT-CLASS: String dentoting data type of the dataset to be written ("IN","FP", "UIN") OUTPUT-SIZE: Integer denoting the size of the data in the output dataset to be written. If OUTPUT-CLASS is "FP", OUTPUT-SIZE can be 32 or 64. If OUTPUT-CLASS is "IN" or "UIN", OUTPUT-SIZE can be 8, 16, 32 or 64. OUTPUT-ARCHITECTURE: STRING denoting the type of output architecture. Can accept the following values STD IEEE INTEL CRAY MIPS ALPHA NATIVE (default) UNIX Refer to section 6 Predefined Atomic Types in the H5T (datatype interface) in the HDF5 User Guide to know more about these architecutres. (http://hdf.ncsa.uiuc.edu/HDF5/doc/Datatypes.html) (Only STD, IEEE and NATIVE are implemented in this version. The extensibiilty for implementing other architecutres has been provided for.) OUTPUT-BYTE-ORDER: String denoting the output-byte-order. Ignored if the OUTPUT-ARCHITECTURE is not specified or if it is IEEE, UNIX or STD. Can accept the following values. BE (default) LE CHUNKED-DIMENSION: Integers separated by spaces to denote the dimension sizes of the chunk for the no. of dimensions determined by rank. Required field to denote that the dataset will be stored with chunked storage. If this field is absent the dataset will be stored with contiguous storage. COMPRESSION-TYPE: String denoting the type of compression to be used with the chunked storage. Requires the CHUNKED-DIMENSION to be specified. The only currently supported compression method is GZIP. Will accept the following value GZIP COMPRESSION-PARAM: Integer used to denote compression level and this option is to be always specified when the COMPRESSION-TYPE option is specified. The values are applicable only to GZIP compression. Value 1-9: The level of Compression. 1 will result in the fastest compression while 9 will result in the best compression ratio. The default level of compression is 6. EXTERNAL-STORAGE: String to denote the name of the non-HDF5 file to store data to. Cannot be used if CHUNKED- DIMENSIONS or COMPRESSION-TYPE or MAXIMUM- DIMENSIONS is specified. Value : the name of the external file as a string to be used. MAXIMUM-DIMENSIONS: Integers separated by spaces to denote the maximum dimension sizes of all the dimensions determined by rank. Requires the CHUNKED-DIMENSION to be specified. A value of -1 for any dimension implies UNLIMITED DIMENSION size for that particular dimension. EXAMPLES: 1. Configuration File may look like: PATH work h5 pkamat First-set INPUT-CLASS TEXTFP RANK 3 DIMENSION-SIZES 5 2 4 OUTPUT-CLASS FP OUTPUT-SIZE 64 OUTPUT-ARCHITECTURE IEEE OUTPUT-BYTE-ORDER LE CHUNKED-DIMENSION 2 2 2 MAXIMUM-DIMENSIONS 8 8 -1 The above configuration will accept a floating point array (5 x 2 x 4) in an ASCII file with the rank and dimension sizes specified and will save it in a chunked data-set (of pattern 2 X 2 X 2) of 64-bit floating point in the little-endian order and IEEE architecture. The dataset will be stored at "/work/h5/pkamat/First-set" 2. Another configuration could be: PATH Second-set INPUT-CLASS IN RANK 5 DIMENSION-SIZES 6 3 5 2 4 OUTPUT-CLASS IN OUTPUT-SIZE 32 CHUNKED-DIMENSION 2 2 2 2 2 COMPRESSION-TYPE GZIP COMPRESSION-PARAM 7 The above configuration will accept an integer array (6 X 3 X 5 x 2 x 4) in a binary file with the rank and dimension sizes specified and will save it in a chunked data-set (of pattern 2 X 2 X 2 X 2 X 2) of 32-bit floating point in native format (as output-architecure is not specified). The first and the third dimension will be defined as unlimited. The data-set will be compressed using GZIP and a compression level of 7. The dataset will be stored at "/Second-set" ÿÿ