==================================== Nested Datatypes support in PyTables ==================================== --------------- Design Document --------------- :Author: Ivan Vilata :Author: Francesc Altet :Company: Cárabos Coop. V. :Date: 2005-04-21 Abstract ======== The present document is a proposal for a redesign of PyTables_ software in order to support nested datatypes for Table_ objects. First of all, it is explained how to declare these nested datatypes by using generalizations of the existing declarative objects in PyTables. Then, it follows a discussion on how several existing classes in PyTables must be modified and enhanced to allow the same goal. And finally, a proposal on how the ``RecArray`` and ``Record`` classes (see numarray_), the foundation for I/O in Table objects, must be subclassed in order to support nested datatypes as well. Although it is not critical for this report, it is understood that these nested datatypes will be saved on disk by using nested compound datatypes in the underlying HDF5_ library. In the same way, a native HDF5 file which contains datasets with nested compound datatypes and following the HDF5_HL_ table specification will be supported by PyTables when the implementation phase of this proposal is finished. .. _PyTables: http://www.pytables.org .. _Table: http://pytables.sourceforge.net/html-doc/usersguide4.html#section4.5 .. _numarray: http://stsdas.stsci.edu/numarray/ .. _HDF5: http://hdf.ncsa.uiuc.edu/HDF5/ .. _HDF5_HL: http://hdf.ncsa.uiuc.edu/HDF5/hdf5_hl/ .. contents:: Declaration of Nested Datatypes during ``Table`` creation ========================================================= The user will be able to declare nested datatypes in ``Table`` objects by using generalizations of the existing declarative methods in PyTables. Such generalizations are described next. Nested Subclasses of ``IsDescription`` -------------------------------------- ``IsDescription`` is a metaclass designed to be used as an easy, yet meaningful way to describe the properties of ``Table`` objects through the use of classes that inherit properties from it. The generalization required to support nested datatypes in this case should allow the declaration of nested subclasses of ``IsDescription``. This should look like:: class NestedType(IsDescription): id = Int64Col() pos = Float32Col(shape=(2,)) class info(IsDescription): name = StringCol(length=2) value = Complex64Col() Nested Dictionary ----------------- Another way of describing the types in a ``Table`` is through a dictionary. The ``Table`` constructor will be enhanced so that it can accept nested dictionaries as type descriptor. Such a generalized dictionary should look like:: {'id': Int64Col(), 'pos': Float64Col(shape=(2,)), 'info': {'name': StringCol(length=2) 'value': Complex64Col()}, } NestedRecArray -------------- Finally, the ``Table`` constructor also accepts a ``RecArray`` object that will be used as a descriptor of the type of columns. As ``RecArray`` currently supports just flat datatypes, the ``Table`` constructor will be enhanced to accept ``NestedRecArray`` objects as well as ``RecArray`` ones. Using the same example than above, creating such a ``NestedRecArray`` should look like:: array(databuffer, names=['id', 'pos', ('info', ['name',' value'])], formats=['Int64', '(2,)Float64', ['a2', 'Complex64']]] (See `Subclassing RecArray and Record`_ for more information on the aforementioned classes.) Modifications needed for ``Table`` accessors ============================================ In order to have a complete support for nested datatypes, some modifications must be carried out in methods of the ``Table`` class, as well as in the ``Cols`` class. The modifications on the behavior of existing methods for these classes are documented here__. __ html/public/Table-module.html Subclassing ``RecArray`` and ``Record`` ======================================= In-memory table operations in PyTables make extensive use of the ``RecArray`` class in the ``numarray.records`` module. However, this class does not support nested fields (i.e. non-homogeneous fields with sub-fields). The ``NestedRecArray`` class shall extend the behavior of ``RecArray`` to support this kind of constructs and be as compatible as possible with the original class. A companion ``NestedRecord`` class will extend ``Record`` in the same sense. Original methods which currently return ``NumArray``, ``Record`` or ``RecArray`` objects will now be able to return ``NestedRecArray`` and ``NestedRecord`` objects. A complete explanation of the API intended to be provided by the ``nestedrecords`` module can be found here__. __ html/public/nestedrecords-module.html