1 TOOL NAME:
        h5import
  
   2 SYNTAX:
	h5import -h[elp], OR
	h5import <infile> <options> [<infile> <options>...]
					      -o[utfile] <outfile>

   3 PURPOSE:
 	To convert data stored in one or more ASCII or binary files
	into one or more datasets (in accordance with the 
	user-specified type and storage properties) in an existing 
	or new HDF5 file.
	
   4 DESCRIPTION:
	The primary objective of the utility is to convert floating
	point or integer data stored in ASCII text or binary form 
	into a data-set according to the type and storage properties
	specified by the user. 
        
        The utility can also accept ASCII text files and store the 
        contents in a compact form as an array of one-dimensional 
	strings (Not implemented in this version).

	The input data to be written as a data-set can be provided
	to the utility in one of the follwing forms:
	1. ASCII text file with numeric data (floating point or 
	integer data). 
	2. Binary file with native floating point data (32-bit or 
	64-bit) 
	3. Binary file with native integer (signed or unsigned)
	data (8-bit or 16-bit or 32-bit or 64-bit). 
	4. ASCII text file containing strings (text data).
        (Not implemented)

	Every input file can be associated with options used to specify
	the data type and storage properties. There are two ways of 
	specifying the options: 
        1. command line arguments or 2. configuration file.

	For simple data-sets command line arguments may be used. Using
	command line arguments, only the class, size, dimensions of the 
	input data and the path of the output data-set can be specified. 
	The advanced storage features can only be specified
	using a configuration file which is also to be provided as an input 
	to the utility (See Section 5.B. "CONFIGURATION FILE" to know how it is 
	to be organised). The configuration file is the recommended way of 
	specifying options.

	The dimension sizes are the only required parameters in both ways
	of specifying options. Defaults for all other parameters exist in
	both forms. It should be noted that exactly one of the two 
	ways of specifying the options may be used for every input file, 
	and any number of input files can be specified in one command.

	The floating point data in the ASCII text file may be
	organized in the fixed floating form (for example 323.56)
	or in a scientific notation (for example 3.23E+02). A 
	different input-class specification is to be used for both
	forms. (Note: Only the fixed form floating point version
        has been implemented in this version)
	
	The utility extracts the input data from the input file 
	according to the specified parameters and saves it into 
	an H5 dataset. 
	
	The user can specify output type and storage properties in 
	the configuration file. The user can also specify the 
	path of the dataset. If the groups in the path leading to 
	the data-set do not exist, the groups will be created by the
	utility. If no group is specified, the dataset will be
	created under the root group. If no path is specified the dataset
	is created as 'dataset1'.

	The user can also provide the class and size of output data 
	to be written to the dataset and also the output-architecure,
	and the output-byte-order. If output-architecture is not 
	specified the default is NATIVE. Output-byte-orders are fixed
	for some architectures and is relevant only if output-
	architecture is IEEE, UNIX or STD.

 	Also, layout and other storage properties such as 
	compression, external storage and extendible data-sets may be
	optionally specified.  The layout and storage properties 
	denote how raw data is to be organized on the disk. If these 
	options are not specified the default is Contiguous layout 
	and storage.

     	The dataset can be organized in any of the following ways:
	1. Contiguous.
	2. Chunked.
	3. External Storage File    (has to be contiguous)
	4. Extendible data sets     (has to be chunked)
	5. Compressed.		    (has to be chunked)
	6. Compressed & Extendible  (has to be chunked)
       
	If the user wants to store raw data in a non-HDF file then 
	the external storage file option is to be used and the name 
	of the file is to be specified. 
	
	If the user wants the dimensions of the data-set to be
	unlimited, the extendible data set option can be chosen. 

	The user may also specify the type of compression and the 
	level to which the data set must be compresses by setting 
	the compressed option.

    5 SYNOPSIS:
	h5import -h[elp], OR
	h5import <infile> <options> [<infile> <options>...]
					      -o[utfile] <outfile>
	   
        -h[elp]:
                Prints this summary of usage, and exits.
  
        <infile(s)>:
                Name of the Input file(s), containing a 
		single n-dimensional floating point or integer array 
		in either ASCII text, native floating point(32-bit 
		or 64-bit) or native integer(8-bit or 16-bit or 
		32-bit or 64-bit). Data to be specified in the order
		of fastest changing dimensions first.
		
	<options>
		There are two ways of specifying options - 
		command-line arguments 
		(See Section 5.A. "COMMAND-LINE ARGUMENTS")
		OR  
		configuration file. 
		(See Section 5.B. "CONFIGURATION FILE")

		Every input file should be associated with exactly
		one of those two ways.
                
        -o[utfile] <outfile>:
                Name of the HDF 5 output file. Data from one or more 
		input files are stored as one or more data sets in 
		<outfile>. The output file may be an existing file or 
		it maybe new in which case it will be created.

	A. COMMAND-LINE ARGUMENTS
	The options can be specified on the command-line in the following 
	format.

	h5import <infile> -d[ims] <dim-list> [-p[ath] pathname] [-t[ype] <input-type>] 
		  [-s[ize] <input-size] [<infile> ...] -o[utfile] <outfile>


	-d[ims] <dim-list>:
		<dim-list> is a String with no spaces and only numbers separated by commas,
		describing the dimensions of the input data.
		For example, a 50 x 100 2D array is to be specified with
		    -dims 50,100
		    
                If the configuration file <-c option on the command line> is 
		not being used, then this is a required command-line
		argument.

	-p[ath] <pathname>: 
		<pathname> is a string consisiting of one or more strings separated 
		by '/' to represent the path of the data-set in the output file. 
		If the groups in the path do no exist, they will be created. 
		This is an optional argument. If not used, the default path will 
		be '/dataset1'.

	-t[ype] <input-class>:
		<input-class> is a string denoting the class of the input data. Also 
		determines the class of the output data. See section 5.C. "More on INPUT-CLASS".
		This is an optional argument. If not used, the default	value is 'FP'.

	-s[ize] <input-size>:
		<input-type> is a string denoting the size of the input data. Also 
		determines the size of the output data. See section 5.D. "More on INPUT-SIZE".
		This is an optional argument. If not used, the default	value is 32.

	It should be noted that the arguments although optional, when used have to appear 
	in a definite order as presented in the syntax. For example, the -p option can only be 
	used immediately following the -d option. The -t option immediately follows 
	the -p option (or the -d option if the -p option is absent) and immediately 
	before the -s option (if present).
	
		
	Examples using command-line arguments:
	    1. h5import infile -dims 2,3,4 -type TEXTIN -size 32 -o out1
	    This command will create a file 'out1' with a 2x3x4 32-bit integer dataset 
	    having the path '/dataset1'.
	    
	    2. h5import infile -dims 20,50 -path bin1/dset1 -type FP -size 64 -o out2
	     This command will create a file 'out2' with a 20x50 64-bit float dataset 
	    having the path '/bin1/dset1'.
  
	B. CONFIGURATION FILE:
	A configuration file can be specified using the -c option.
	The <options> in the syntax is to be replaced by 
	-c <configfile>.

	For example, the syntax of the command will be
		h5import <infile> -c <configfile> [<infile>...} ]
							-o[utfile] <outfile>

	The configuration file is an ASCII text file and must be 
	organized as "CONFIG-KEYWORD VALUE" pairs, one pair on each 
	line.

        The configuration file may have the following keywords each 
	followed by an acceptable value.
	
	Required KEYWORDS:
                RANK
		DIMENSION-SIZES

	Optional KEYWORDS:
		 PATH
                INPUT-CLASS
		INPUT-SIZE
		OUTPUT-CLASS
		OUTPUT-SIZE
		OUTPUT-ARCHITECTURE
		OUTPUT-BYTE-ORDER
  		CHUNKED-DIMENSION-SIZES
		COMPRESSION-TYPE
		COMPRESSION-PARAM
		EXTERNAL-STORAGE
		MAXIMUM-DIMENSIONS
		
	
       	Values for keywords:
		PATH:
			Strings separated by '/' to represent
			the path of the data-set. If the groups in
			the path do no exist, they will be created. 
			If this keyword is not specified, the default 
			value is '/dataset1'.
			For example,
				PATH grp1/grp2/dataset1
				PATH: keyword
				grp1: group under the root. If
				      non-existent will be created.
				grp2: group under grp1. If 
				      non-existent will be created 
				      under grp1.
				dataset1: the name of the data-set 
					  to be created.
		
                INPUT-CLASS:
			String denoting the type of input data.
			See section 5.C. "More on INPUT-CLASS".
                      
			If INPUT-CLASS is "STR", then RANK, 
			DIMENSION-SIZES, OUTPUT-CLASS, OUTPUT-SIZE, 
			OUTPUT-ARCHITECTURE and OUTPUT-BYTE-ORDER 
			will be ignored.
			
			If this keyword is not specified, the default
			value is 'FP'.
                
		INPUT-SIZE:
			Integer denoting the size of the input data 
			See section 5.D. "More on INPUT-SIZE"

			If this keyword is not specified, the default
			value is 32'
			
		RANK:
			This is a required keyword.
			Integer denoting the number of dimensions.
		
		DIMENSION-SIZES:
			This is a required keyword.
		        Integers separated by spaces to denote the 
			dimension sizes for the no. of dimensions 
			determined by rank.
                
		OUTPUT-CLASS:
			String dentoting data type of the dataset to 
			be written ("IN","FP", "UIN").
			If this keyword is not specified and the keyword
			INPUT-CLASS is specified then, 
			if INPUT-CLASS is "IN" or "TEXTIN", 
			OUTPUT-CLASS is "IN";
			if INPUT-CLASS is "FP" or "TEXTFP" or "TEXTFPE" 
			OUTPUT-CLASS is "FP";
			if INPUT-CLASS is "UIN" or "TEXTUIN", 
			OUTPUT-CLASS is "UIN";
			
			If this keyword is not specified and the keyword
			INPUT-CLASS is also not specified then, 
			the default value of OUTPUT-CLASS is "FP".

		OUTPUT-SIZE:
			Integer denoting the size of the data in the 
			output dataset to be written.
			If OUTPUT-CLASS is "FP", OUTPUT-SIZE can be 
			32 or 64.
			If OUTPUT-CLASS is "IN" or "UIN", OUTPUT-SIZE
			can be 8, 16, 32 or 64.

			If this keyword is not specified and the keyword
			INPUT-CLASS is specified then, OUTPUT-SIZE is same
			as INPUT-SIZE.
			
			If this keyword is not specified and the keyword
			INPUT-CLASS is also not specified then, default value
			for OUTPUT-SIZE is 32.
		
		OUTPUT-ARCHITECTURE:
			STRING denoting the type of output 
			architecture. Can accept the following values
			STD
			IEEE
			INTEL
			CRAY
			MIPS
			ALPHA
			NATIVE (default)
			UNIX
			Refer to section 6 Predefined Atomic Types in the 
			H5T (datatype interface) in the HDF5 User Guide to
			know more about these architecutres.
			(http://hdf.ncsa.uiuc.edu/HDF5/doc/Datatypes.html)
			(Only STD, IEEE and NATIVE are implemented in
			this version. The extensibiilty for implementing 
			other architecutres has been provided for.)

		OUTPUT-BYTE-ORDER:
			String denoting the output-byte-order. Ignored
			if the OUTPUT-ARCHITECTURE is not specified or
			if it is IEEE, UNIX or STD. Can accept the 
			following values.
			BE (default)
			LE
		
		By default, all of the following options are disabled, i.e.,
		the default storage properties are, no chunking, no compression,
		no external storage and no extensible dimensions.

		CHUNKED-DIMENSION:
			Integers separated by spaces to denote the 
			dimension sizes of the chunk for the no. of 
			dimensions determined by rank. Required field
			to denote that the dataset will be stored with
			chunked storage. If this field is absent the
			dataset will be stored with contiguous storage.
			
		COMPRESSION-TYPE:
			String denoting the type of compression to be
			used with the chunked storage. Requires the
			CHUNKED-DIMENSION to be specified. The only 
			currently supported compression method is GZIP. 
			Will accept the following value
			GZIP

		COMPRESSION-PARAM:
			Integer used to denote compression level and 
			this option is to be always specified when 
			the COMPRESSION-TYPE option is specified. The
			values are applicable only to GZIP 
			compression.			
			Value 1-9: The level of Compression. 
				1 will result in the fastest 
				compression while 9 will result in 
				the best compression ratio. The default
				level of compression is 6.
					
		EXTERNAL-STORAGE:
			String to denote the name of the non-HDF5 file 
			to store data to. Cannot be used if CHUNKED-
			DIMENSIONS or COMPRESSION-TYPE or MAXIMUM-
			DIMENSIONS is specified.
			Value <external-filename>: the name of the 
			external file as a string to be used.

		MAXIMUM-DIMENSIONS:
			Integers separated by spaces to denote the 
			maximum dimension sizes of all the 
			dimensions determined by rank. Requires the
			CHUNKED-DIMENSION to be specified. A value of 
			-1 for any dimension implies UNLIMITED 
			DIMENSION size for that particular dimension.

	Examples of configuration file:
	1. Configuration File may look like:
		
		PATH work h5 pkamat First-set
                INPUT-CLASS TEXTFP
		RANK 3
		DIMENSION-SIZES 5 2 4
		OUTPUT-CLASS FP
		OUTPUT-SIZE 64
		OUTPUT-ARCHITECTURE IEEE
		OUTPUT-BYTE-ORDER LE
  		CHUNKED-DIMENSION 2 2 2 
		MAXIMUM-DIMENSIONS 8 8 -1
	
	The above configuration will accept a floating point array 
	(5 x 2 x 4)  in an ASCII file with the rank and dimension sizes 
	specified and will save it in a chunked data-set (of pattern 
	2 X 2 X 2) of 64-bit floating point in the little-endian order 
	and IEEE architecture. The dataset will be stored at
	"/work/h5/pkamat/First-set"

	2. Another configuration could be:

		PATH Second-set
                INPUT-CLASS IN	
		RANK 5
		DIMENSION-SIZES 6 3 5 2 4
		OUTPUT-CLASS IN
		OUTPUT-SIZE 32
  		CHUNKED-DIMENSION 2 2 2 2 2
		COMPRESSION-TYPE GZIP
		COMPRESSION-PARAM 7

		
	The above configuration will accept an integer array 
	(6 X 3 X 5 x 2 x 4)  in a binary file with the rank and 
	dimension sizes specified and will save it in a chunked data-set
	(of pattern 2 X 2 X 2 X 2 X 2) of 32-bit floating point in 
	native format (as output-architecure is not specified). The 
	first and the third dimension will be defined as unlimited. The 
	data-set will be compressed using GZIP and a compression level 
	of 7.
	The dataset will be stored at "/Second-set"

	C. More on INPUT-CLASS
	The input-class can have any of the following values.  
	(TEXTIN, TEXTFP, TEXTFPE, FP, IN, STR, TEXTUIN, UIN). 
	TEXTIN denotes an ASCII text file with signed integer data in ASCII form,
	TEXTUIN denotes an ASCII text file with unsigned integer data in ASCII form,
	TEXTFP denotes an ASCII text file containing floating point data in the 
	fixed notation	(325.34),
	TEXTFPE denotes an ASCII text file containing	floating point data in the 
	scientific notation (3.2534E+02) (Not implemented in this version),
	FP denotes a floating point binary file,
	IN denotes a signed integer binary file,
	UIN denotes an unsigned integer binary file,
	& STR denotes an ASCII text file the contents of which should be stored as an 1-D 
	array of strings (Not implemented in this version).

	D. More on INPUT-SIZE
	The input-size can have any of the following values. (8, 16, 32, 64). 

	For floating point, (TEXTFP, TEXTFPE, FP)
		INPUT-SIZE can be 32 or 64.
	For integers (signed and unsigned) (TEXTIN, TEXTUIN, IN, UIN)
		INPUT-SIZE can be 8, 16, 32 or 64.