TOOL NAME:
        h5import
  
   SYNTAX:
	h5import -h[elp], OR
        h5import <infile> -c[onfig] <configfile> 
		  [<infile> -c[config] <configfile>...]  
					      -o[utfile] <outfile>

   PURPOSE:
 	To convert data stored in one or more ASCII or binary files
	into one or more datasets (in accordance with the 
	user-specified type and storage properties) in an existing 
	or new HDF5 file.
	
   DESCRIPTION:
	The primary objective of the utility is to convert floating
	point or integer data stored in ASCII text or binary form 
	into a data-set according to the type and storage properties
	specified by the user. 
        
        The utility can also accept ASCII text files and store the 
        contents in a compact form as an array of one-dimensional 
	strings (Not implemented in this version).

	The input data to be written as a data-set can be provided
	to the utility in one of the follwing forms:
	1. ASCII text file with numeric data (floating point or 
	integer data). 
	2. Binary file with native floating point data (32-bit or 
	64-bit) 
	3. Binary file with native integer (signed or unsigned)
	data (8-bit or 16-bit or 32-bit or 64-bit). 
	4. ASCII text file containing strings (text data).
        (Not implemented)

	Every input file is associated with a configuration file 
	also provided as an input to the utility. (See Section 
	"CONFIGURATION FILE" to know how it is to be organised).
	The class, size and dimensions of the input data is 
	specified in this configuration file. A point to note is
	that the floating point data in the ASCII text file may be
	organized in the fixed floating form (for example 323.56)
	or in a scientific notation (for example 3.23E+02). A 
	different input-class specification is to be used for both
	forms. (Note: Only the fixed form floating point version
        has been implemented in this version)
	
	The utility extracts the input data from the input file 
	according to the specified parameters and saves it into 
	an H5 dataset. 
	
	The user can specify output type and storage properties in 
	the configuration file. The user is requited to specify the 
	path of the dataset. If the groups in the path leading to 
	the data-set do not exist, the groups will be created by the
	utility. If no group is specified, the dataset will be
	created under the root group.

	In addition to the name, the user is also required to 
	provide the class and size of output data to be written to 
	the dataset and may optionally specify the output-architecure,
	and the output-byte-order. If output-architecture is not 
	specified the default is NATIVE. Output-byte-orders are fixed
	for some architectures and may be specified only if output-
	architecture is IEEE, UNIX or STD.

 	Also, layout and other storage properties such as 
	compression, external storage and extendible data-sets may be
	optionally specified.  The layout and storage properties 
	denote how raw data is to be organized on the disk. If these 
	options are not specified the default is Contiguous layout 
	and storage.

     	The dataset can be organized in any of the following ways:
	1. Contiguous.
	2. Chunked.
	3. External Storage File    (has to be contiguous)
	4. Extendible data sets     (has to be chunked)
	5. Compressed.		    (has to be chunked)
	6. Compressed & Extendible  (has to be chunked)
       
	If the user wants to store raw data in a non-HDF file then 
	the external storage file option is to be used and the name 
	of the file is to be specified. 
	
	If the user wants the dimensions of the data-set to be
	unlimited, the extendible data set option can be chosen. 

	The user may also specify the type of compression and the 
	level to which the data set must be compresses by setting 
	the compressed option.
  
   SYNOPSIS:
       h5import -h[elp], OR
       h5import <infile> -c[onfig] <configfile> 
		  [<infile> -c[config] <confile2>...]                                                  
					      -o[utfile] <outfile>
	   
        -h[elp]:
                Prints this summary of usage, and exits.
  
        <infile(s)>:
                Name of the Input file(s), containing a 
		single n-dimensional floating point or integer array 
		in either ASCII text, native floating point(32-bit 
		or 64-bit) or native integer(8-bit or 16-bit or 
		32-bit or 64-bit). Data to be specified in the order
		of fastest changing dimensions first.
		
	-c[config] <configfile>:
		Every input file should be associated with a 
		configuration file and this is done by the -c option.
		<configfile> is the name of the configuration file.
		(See Section "CONFIGURATION FILE")
                
        -o[utfile] <outfile>:
                Name of the HDF 5 output file. Data from one or more 
		input files are stored as one or more data sets in 
		<outfile>. The output file may be an existing file or 
		it maybe new in which case it will be created.

  
   CONFIGURATION FILE:
	The configuration file is an ASCII text file and must be 
	organized as "CONFIG-KEYWORD VALUE" pairs, one pair on each 
	line.

        The configuration file may have the following keywords each 
	followed by an acceptable value.
	
	Required KEYWORDS:
		PATH
                INPUT-CLASS
		INPUT-SIZE
                RANK
		DIMENSION-SIZES
		OUTPUT-CLASS
		OUTPUT-SIZE

	Optional KEYWORDS:
		OUTPUT-ARCHITECTURE
		OUTPUT-BYTE-ORDER
  		CHUNKED-DIMENSION-SIZES
		COMPRESSION-TYPE
		COMPRESSION-PARAM
		EXTERNAL-STORAGE
		MAXIMUM-DIMENSIONS
		
	
       	Values for keywords:
		PATH:
			Strings separated by spaces to represent
			the path of the data-set. If the groups in
			the path do no exist, they will be created. 
			For example,
				PATH grp1/grp2/dataset1
				PATH: keyword
				grp1: group under the root. If
				      non-existent will be created.
				grp2: group under grp1. If 
				      non-existent will be created 
				      under grp1.
				dataset1: the name of the data-set 
					  to be created.
		
                INPUT-CLASS:
			String denoting the type of input data.
                        ("TEXTIN", "TEXTFP", "TEXTFPE", "FP", "IN", 
			"STR", "TEXTUIN", "UIN"). 
			INPUT-CLASS "TEXTIN" denotes an ASCII text 
			file with signed integer data in ASCII form,
			INPUT-CLASS "TEXTIN" denotes an ASCII text 
			file with unsigned integer data in ASCII form,
			"TEXTFP" denotes an ASCII text file containing
			floating point data in the fixed notation
			(325.34),
			"TEXTFPE" denotes an ASCII text file containing
			floating point data in the scientific notation
			(3.2534E+02) (Not implemented in this version),
			"FP" denotes a floating point binary file,
			"IN" denotes a signed integer binary file,
			"UIN" denotes an unsigned integer binary file,
			 & "STR" denotes an ASCII text file the 
			contents of which should be stored as an 1-D 
			array of strings (Not implemented in this 
			version).
			If INPUT-CLASS is "STR", then RANK, 
			DIMENSION-SIZES, OUTPUT-CLASS, OUTPUT-SIZE, 
			OUTPUT-ARCHITECTURE and OUTPUT-BYTE-ORDER 
			will be ignored.
    
                
		INPUT-SIZE:
			Integer denoting the size of the input data 
			(8, 16, 32, 64). 

			For floating point,
			INPUT-SIZE can be 32 or 64.
			For integers (signed and unsigned)
			INPUT-SIZE can be 8, 16, 32 or 64.
			
		RANK:
			Integer denoting the number of dimensions.
		
		DIMENSION-SIZES:
		        Integers separated by spaces to denote the 
			dimension sizes for the no. of dimensions 
			determined by rank.
                
		OUTPUT-CLASS:
			String dentoting data type of the dataset to 
			be written ("IN","FP", "UIN")

		OUTPUT-SIZE:
			Integer denoting the size of the data in the 
			output dataset to be written.
			If OUTPUT-CLASS is "FP", OUTPUT-SIZE can be 
			32 or 64.
			If OUTPUT-CLASS is "IN" or "UIN", OUTPUT-SIZE
			can be 8, 16, 32 or 64.
		
		OUTPUT-ARCHITECTURE:
			STRING denoting the type of output 
			architecture. Can accept the following values
			STD
			IEEE
			INTEL
			CRAY
			MIPS
			ALPHA
			NATIVE (default)
			UNIX
			Refer to section 6 Predefined Atomic Types in the 
			H5T (datatype interface) in the HDF5 User Guide to
			know more about these architecutres.
			(http://hdf.ncsa.uiuc.edu/HDF5/doc/Datatypes.html)
			(Only STD, IEEE and NATIVE are implemented in
			this version. The extensibiilty for implementing 
			other architecutres has been provided for.)

		OUTPUT-BYTE-ORDER:
			String denoting the output-byte-order. Ignored
			if the OUTPUT-ARCHITECTURE is not specified or
			if it is IEEE, UNIX or STD. Can accept the 
			following values.
			BE (default)
			LE
			
		CHUNKED-DIMENSION:
			Integers separated by spaces to denote the 
			dimension sizes of the chunk for the no. of 
			dimensions determined by rank. Required field
			to denote that the dataset will be stored with
			chunked storage. If this field is absent the
			dataset will be stored with contiguous storage.
			
		COMPRESSION-TYPE:
			String denoting the type of compression to be
			used with the chunked storage. Requires the
			CHUNKED-DIMENSION to be specified. The only 
			currently supported compression method is GZIP. 
			Will accept the following value
			GZIP

		COMPRESSION-PARAM:
			Integer used to denote compression level and 
			this option is to be always specified when 
			the COMPRESSION-TYPE option is specified. The
			values are applicable only to GZIP 
			compression.			
			Value 1-9: The level of Compression. 
				1 will result in the fastest 
				compression while 9 will result in 
				the best compression ratio. The default
				level of compression is 6.
					
		EXTERNAL-STORAGE:
			String to denote the name of the non-HDF5 file 
			to store data to. Cannot be used if CHUNKED-
			DIMENSIONS or COMPRESSION-TYPE or MAXIMUM-
			DIMENSIONS is specified.
			Value <external-filename>: the name of the 
			external file as a string to be used.

		MAXIMUM-DIMENSIONS:
			Integers separated by spaces to denote the 
			maximum dimension sizes of all the 
			dimensions determined by rank. Requires the
			CHUNKED-DIMENSION to be specified. A value of 
			-1 for any dimension implies UNLIMITED 
			DIMENSION size for that particular dimension.

   EXAMPLES:
	1. Configuration File may look like:
		
		PATH work h5 pkamat First-set
                INPUT-CLASS TEXTFP
		RANK 3
		DIMENSION-SIZES 5 2 4
		OUTPUT-CLASS FP
		OUTPUT-SIZE 64
		OUTPUT-ARCHITECTURE IEEE
		OUTPUT-BYTE-ORDER LE
  		CHUNKED-DIMENSION 2 2 2 
		MAXIMUM-DIMENSIONS 8 8 -1
	
	The above configuration will accept a floating point array 
	(5 x 2 x 4)  in an ASCII file with the rank and dimension sizes 
	specified and will save it in a chunked data-set (of pattern 
	2 X 2 X 2) of 64-bit floating point in the little-endian order 
	and IEEE architecture. The dataset will be stored at
	"/work/h5/pkamat/First-set"

	2. Another configuration could be:

		PATH Second-set
                INPUT-CLASS IN	
		RANK 5
		DIMENSION-SIZES 6 3 5 2 4
		OUTPUT-CLASS IN
		OUTPUT-SIZE 32
  		CHUNKED-DIMENSION 2 2 2 2 2
		COMPRESSION-TYPE GZIP
		COMPRESSION-PARAM 7

		
	The above configuration will accept an integer array 
	(6 X 3 X 5 x 2 x 4)  in a binary file with the rank and 
	dimension sizes specified and will save it in a chunked data-set
	(of pattern 2 X 2 X 2 X 2 X 2) of 32-bit floating point in 
	native format (as output-architecure is not specified). The 
	first and the third dimension will be defined as unlimited. The 
	data-set will be compressed using GZIP and a compression level 
	of 7.
	The dataset will be stored at "/Second-set"

ےے