Reporting external error messages
Description
of the problem:
HDF5 error
stack keeps track of a list of errors the library "pushes"
on it.
(Technically, it is an limited length array of records.) The
stack requires
error records pushed to it must be of static storage.
(Again,
technically, the stack does not make a copy of the record. It
just stores
the 'reference' to the record. So, if
something changes
some part of
the record contents or even free a string pointer in it,
it would
affect the stack record and in the latter case, will result in
pointer
dereference errors if the program requests printing the error
messages.)
Currently,
only static messages strings coded in the hdf5 are pushed to
the error
stack. Message strings from system
(e.g. strerror()) or
other
external libraries (e.g. MPI_Error_string()) can not be pushed there.
There is no
other mechanism in hdf5 library to access these strings.
Possible
solutions:
One proposal
to fix this is to change the H5E error stack routines to
do a strdup
of all error strings passed to it, thus isolating the error
stack record
content from the original error strings.
This raised couple
issues:
1)
Currently, H5Eclear just resets the stack counter. It does not need to
free
anything since the stack holds only references or integer values of
the error
pushed to it. If it uses strdup for
incoming strings, it
would have
to walk through the stack record one by one and free all
the
strings. This adds processing time.
2) Often,
“error” are pushed to the stack when it is not real error. E.g.,
when a new
dataset is created, the name is searched.
The searching routine
will report
no object of such name and pushes a message to the stack and return
negative to the
upper routine which knows it is okay to create the new dataset.
It will
clear the error stack and creates the new dataset. This means there
will be many strdup and free that are not necessary and will likely fragment
the
hash memory.
One can
avoid the second problem by changing the H5Epush() to have an extra parameter
telling the stack routine to do a strdup or not. That will reduce a lot of the
unnecessary strdup and free but the first problem is still there. It also means
a change of API. (Of course, a new one
like H5Epush_dup() can be added too.)
Proposal:
Since the
current needs for non-static error message strings are really due to
external libraries and are at the very bottom of the virtual file driver, I
would
propose that we have the drivers to manage the strings themselves. E.g.,
the MPIO
driver can declare a static char string and use that to receive error
messages from MPI_Error_string(). Then
it is alright to pass this static string
to H5Epush(). One may raise the issue
that the next call to the MPIO file
driver may replace the content of the same static string. That is okay because
by then the
error stack would have been cleared at least once since the stack is
cleared every time the hdf5 code goes “down”.
Example:
Below is a example implementation of pushing MPI_Error_string to the error stack.
#ifdef H5_HAVE_PARALLEL
/*
* MPI error handling
macros.
*/
extern char H5E_mpi_error_str[MPI_MAX_ERROR_STRING];
extern int H5E_mpi_error_str_len;
#define HMPI_ERROR(mpierr){ \
MPI_Error_string(mpierr, H5E_mpi_error_str,
&H5E_mpi_error_str_len); \
HERROR(H5E_INTERNAL, H5E_MPIERRSTR, H5E_mpi_error_str); \
}
#define HMPI_GOTO_ERROR(retcode, str, mpierr){ \
HMPI_ERROR(mpierr); \
HGOTO_ERROR(H5E_INTERNAL, H5E_MPI, retcode, str); \
}
#define HMPI_RETURN_ERROR(retcode, str, mpierr){ \
HMPI_ERROR(mpierr); \
HRETURN_ERROR(H5E_INTERNAL, H5E_MPI, retcode, str); \
}
#endif
---
Revised: 2002/04/03
Albert Cheng (aching@ncsa.uiuc.edu)