""" Support for arrays of nested records. This module provides the `NestedRecArray` and `NestedRecord` classes, which can be used to handle arrays of nested records in a way which is compatible with ``numarray.records``. Nested record arrays are made up by a sequence of nested records. A nested record is made up of a set of non-nested and nested fields, each of them having a different name. Non-nested fields have homogeneous n-dimensional values (where n >= 1), while nested fields consist of a set of fields (sub-fields), each of them with a different name. Sub-fields can also be nested. Several utility functions are provided for creating nested record arrays. """ import sys import numarray.records from AttributeAccess import AttributeAccess __docformat__ = 'reStructuredText' """The format of documentation strings in this module.""" def array(buffer=None, formats=None, shape=0, names=None, byteorder=sys.byteorder, aligned=0, descr=None): """ Create a new instance of a `NestedRecArray`. This function can be used to build a new array of nested records. The new array is returned as a result. The function works much like ``numarray.records.array()``, with some differences: 1. In addition to flat buffers and regular sequences of non-nested elements, the `buffer` argument can take regular sequences where each element has a structure nested to an arbitrary depth. Of course, all elements in a non-flat buffer must have the same format. 2. The `formats` argument only supports sequences of strings and other sequences. Each string defines the shape and type of a non-nested field. Each sequence contains the formats of the sub-fields of a nested field. The structure of this argument must match that of the elements in `buffer`. This argument may have a recursive structure. 3. The `names` argument only supports lists of strings and 2-tuples. Each string defines the name of a non-nested field. Each 2-tuple contains the name of a nested field and a list describing the names of its sub-fields. The structure of this argument must match that of the elements in `buffer`. This argument may have a recursive structure. The `descr` argument is a new-style description of the structure of the `buffer`. It is intended to replace the `formats` and `names` arguments, so they can not be used at the same time [#descr]_. The `descr` argument is a list of 2-tuples, each of them describing a field. The first value in a tuple is the *name* of the field, while the second one is a description of its *structure*. If the second value is a string, it defines the format (shape and type) of a non-nested field. Else, it is a list of 2-tuples describing the sub-fields of a nested field. If `descr` is ``None`` (or omitted), the whole structure of the array is tried to be inferred from that of the `buffer`, and automatic names (``c1``, ``c2`` etc. on each nested field) are assigned to all fields. The `descr` argument may have a recursive structure. Please note that names used in `names` or `descr` should *not* contain the string ``'/'``, since it is used as the field/sub-field separator by `NestedRecArray.asRecArray()`. If the separator is found in a name, a ``ValueError`` is raised. .. [#descr] The syntax of `descr` is based on that of the ``__array_descr__`` attribute in the proposed standard `N-dimensional array interface`__. __ http://numeric.scipy.org/array_interface.html When to use `descr` or `formats` and `names` ============================================ Since the name and structure of fields must always be specified with `descr`, the `formats` and `names` arguments come more handy when one does not want to explicitly specify names or structure. In the first case, use `formats`; in the second case, use `names`. When fully specifying names and structure, the `descr` argument is preferred over `formats` and `names` for the sake of code legibility and conciseness. Disambiguating the `buffer` structure ===================================== Sometimes a field may be inferred as being a non-nested multi-dimensional field, or being a nested one-dimensional set of nested fields. Let us take the following `buffer` value:: [('x', (1, 2, 3.0))] Is the second field a non-nested one with a ``(3,)Float64`` format, or is it a nested field with a ``['Int32', 'Int32', 'Float64']`` format? In this case, since all three types can be promoted to a single type (``Float64``), the field will be considered as a three-dimensional non-nested field. Now, let us see this case:: [('x', ('y', 2, 3.0))] Here the types of the second field can not be unified, so it will be taken as a nested field with format ``['a1', 'Int32', 'Float64']``. To avoid this kind of potential ambiguities, please use the `names`, `formats` or `descr` arguments to force a structure. Examples ======== The following examples will help to clarify the words above. In them, an array of two elements is created. Each element has three fields: a 64-bit integer (``id``), a bi-dimensional 32-bit floating point (``pos``) and a nested field (``info``); the nested field has two sub-fields: a two-character string (``name``) and a 64-bit complex (``value``). Example 1 --------- In this example the array is created by specifying both its contents and its structure, so the structure of the used arguments must be coherent. This is how the array would be created in the old-style way, i.e. using the `formats` and `names` arguments: >>> nra = array( ... [(1, (0.5, 1.0), ('a1', 1j)), (2, (0, 0), ('a2', 1+.1j))], ... names=['id', 'pos', ('info', ['name', 'value'])], ... formats=['Int64', '(2,)Float32', ['a2', 'Complex64']]) And this is how the array would be created in the new-style way, i.e. using the `descr` argument: >>> nra = array( ... [(1, (0.5, 1.0), ('a1', 1j)), (2, (0, 0), ('a2', 1+.1j))], ... descr=[('id', 'Int64'), ('pos', '(2,)Float32'), ... ('info', [('name', 'a2'), ('value', 'Complex64')])]) Note how `formats` and `descr` mimic the structure of each the elements in `buffer`. Example 2 --------- Here the structure of the array is simply inferred from that of the `buffer` argument. >>> nra = array( ... [(1, (0.5, 1.0), ('a1', 1j)), (2, (0, 0), ('a2', 1+.1j))]) If names where to be given to the fields, the structure of `names` should match that of `buffer`: >>> nra = array( ... [(1, (0.5, 1.0), ('a1', 1j)), (2, (0, 0), ('a2', 1+.1j))], ... names=['id', 'pos', ('info', ['name', 'value'])]) Thus, the following call would fail with an ``IndexError``: >>> nra = array( ... [(1, (0.5, 1.0), ('a1', 1j)), (2, (0, 0), ('a2', 1+.1j))], ... names=['id', 'pos', ('info', ['name']), 'value']) The structure of `names` would match something like:: [(1, (0.5, 1.0), ('a1',), 1j), (2, (0, 0), ('a2',), 1+.1j)] Example 3 --------- Now the array is created from a flat string representing the data in memory. Names will be automatically assigned. For that to work, the resulting array shape and record format must be fully specified. >>> datastring = binary_representation_of_data >>> nra = array( ... datastring, shape=2, ... formats=['Int64', '(2,)Float32', ['a2', 'Complex64']]) Byte ordering and alignment is assumed to be that of the host machine, since it has not been explicitly stated via the `byteorder` and `aligned` arguments. """ raise NotImplementedError def fromarrays(arrayList, formats=None, names=None, shape=0, byteorder=sys.byteorder, aligned=0, descr=None): """ Create a new instance of a `NestedRecArray` from field arrays. This function can be used to build a new array of nested records from a list of arrays, one for each field. The new array is returned as a result. The function works much like ``numarray.records.fromarrays()``, but `arrayList` may also contain nested fields, i.e. sequences of other arrays (nested or not). All non-nested arrays appearing in `arrayList` must have the same length. The rest of arguments work as explained in `array()`. Disambiguating the `buffer` structure ===================================== Ambiguity may result in some cases. Let us take the following `arrayList` value:: [[1, 2], [(0, 0.5), (1, 1)]] Is the second field a non-nested one with a ``(2,)Float64`` format, or is it a nested field with a ``['Float64', 'Int32']`` format? In this case, since all two types can be promoted to a single type (``Float64``), the field will be considered as a bi-dimensional non-nested field. Now, let us see this case:: [[1, 2], [('x', 'y'), (1, 2)]] Here the types of the second field can not be unified, so it will be taken as a nested field with format ``['a1', 'Int32']``. To avoid this kind of potential ambiguities, please use the `names`, `formats` or `descr` arguments to force a structure. Example ======= Let us build the sample array used in the examples of `array()`. In the old way: >>> nra = fromarrays( ... [[1, 2], [(0.5, 1.0), (0, 0)], [['a1', 'a2'], [1j, 1+.1j]]], ... names=['id', 'pos', ('info', ['name', 'value'])], ... formats=['Int64', '(2,)Float32', ['a2', 'Complex64']]) In the new way: >>> nra = fromarrays( ... [[1, 2], [(0.5, 1.0), (0, 0)], [['a1', 'a2'], [1j, 1+.1j]]], ... descr=[('id', 'Int64'), ('pos', '(2,)Float32'), ... ('info', [('name', 'a2'), ('value', 'Complex64')])]) Note how `formats` and `descr` mimic the structure of the whole `arrayList`. """ raise NotImplementedError class NestedRecArray(numarray.records.RecArray): """ Array of nested records. This is a generalization of the ``numarray.records.RecArray`` class. It supports nested fields and records via the `NestedRecord` class. This class is compatible with ``RecArray``. However, part of its behaviour has been extended to support nested fields: 1. Getting a single item from an array will return a `NestedRecord`, a special kind of ``Record`` with support for nested structures. 2. Getting a range of items will return another `NestedRecArray` instead of an ordinary ``RecArray``. 3. Getting a whole field may return a `NestedRecArray` instead of a ``NumArray`` or ``CharArray``, if the field is nested. Fields and sub-fields can be accessed using both the `field()` method and the ``fields`` interface, which allows accessing fields as Python attributes: ``nrec = nrarr.fields.f1.fields.subf1[4]``. The `field()` method supports the ``'/'`` separator to access sub-fields. Nested record arrays can be converted to ordinary record arrays by using the `asRecArray()` method. Finally, the constructor of this class is not intended to be used directly by users. Instead, use one of the creation functions (`array()`, `fromarrays()` or the others). """ def __init__(self, recarray, descr): super(NestedRecArray, self).__init__( recarray._data, recarray._formats, recarray._shape, recarray._names, recarray._byteoffset, recarray._byeorder, recarray._aligned) self.fields = AttributeAccess(self, 'field') """ Provides attribute access to fields. For instance, accessing ``recarray.fields.x`` is equivalent to ``recarray.field('x')``, and ``recarray.fields.x.fields.y`` is equivalent to ``recarray.field('x/y')``. This functionality is mainly intended for interactive usage from the Python console. """ raise NotImplementedError def field(self, fieldName): """ Get field data as an array. If the named field (`fieldName`, a string) is not nested, a ``NumArray`` or ``CharArray`` object representing the values in that field is returned. Else, a `NestedRecArray` object is returned. `fieldName` can be used to provide the name of sub-fields. In that case, it will consist of several field name components separated by the string ``'/'``. For instance, if there is a nested field named ``x`` with a sub-field named ``y``, the last one can be accesed by using ``'x/y'`` as the value of `fieldName`. """ raise NotImplementedError def asRecArray(self): """ Convert a nested array to a non-nested equivalent array. This function creates a new vanilla ``RecArray`` instance equivalent to this one by *flattening* its fields. Only bottom-level fields are included in the array. Sub-fields are named by prepending the names of their parent fields up to the top-level fields, using ``'/'`` as a separator. The data area of the array is copied into the new one. Example ------- Let us take the following nested array: >>> nra = array([(1, (0, 0), ('a1', 1j)), (2, (0, 0), ('a2', 2j))], ... names=['id', 'pos', ('info', ['name', 'value'])], ... formats=['Int64', '(2,)Float32', ['a2', 'Complex64']]) Calling ``nra.asRecArray()`` would return the same array as calling: >>> ra = numarray.records.array( ... [(1, (0, 0), 'a1', 1j), (2, (0, 0), 'a2', 2j)], ... names=['id', 'pos', 'info/name', 'info/value'], ... formats=['Int64', '(2,)Float32', 'a2', 'Complex64']) Please note that the shape of multi-dimensional fields is kept. """ raise NotImplementedError class NestedRecord(numarray.records.Record): """ Nested record. This is a generalization of the ``numarray.records.Record`` class to support nested fields. It represents a record in a `NestedRecArray` or an isolated record. In the second case, its names are automatically set to ``c1``, ``c2`` etc. on each nested field. This class is compatible with ``Record``. However, getting a field may return a `NestedRecord` instead of a Python scalar, ``NumArray`` or ``CharArray``, if the field is nested. Fields and sub-fields can be accessed using both the `field()` method and the ``fields`` interface, which allows accessing fields as Python attributes: ``nfld = nrec.fields.f1.fields.subf1[4]``. The `field()` method supports the ``'/'`` separator to access sub-fields. Nested recors can be converted to ordinary records by using the `asRecord()` method. """ def __init__(self, input, row=0): self.fields = AttributeAccess(self, 'field') """ Provides attribute access to fields. For instance, accessing ``record.fields.x`` is equivalent to ``record.field('x')``, and ``record.fields.x.fields.y`` is equivalent to ``record.field('x/y')``. This functionality is mainly intended for interactive usage from the Python console. """ raise NotImplementedError def field(self, fieldName): """ Get field data. If the named field (`fieldName`, a string) is not nested, a Python scalar, ``NumArray`` or ``CharArray`` object with the value of that field is returned. Else, a `NestedRecord` object is returned. `fieldName` can be used to provide the name of sub-fields. In that case, it will consist of several field name components separated by the string ``'/'``. For instance, if there is a nested field named ``x`` with a sub-field named ``y``, the last one can be accesed by using ``'x/y'`` as the value of `fieldName`. """ raise NotImplementedError def setField(self, fieldName, value): """ Set field data. Sets the field indicated by `fieldName` (a string) to the given `value`. The structure of the value must match that of the field. `fieldName` can be used to provide the name of sub-fields, as described in `NestedRecord.field()`. """ raise NotImplementedError def asRecord(self): """ Convert a nested record to a non-nested equivalent record. This function creates a new vanilla ``Record`` instance equivalent to this one by *flattening* its fields. Only bottom-level fields are included in the array. The data area of the record is copied into the new one. Example ------- Let us take the following nested record: >>> nr = NestedRecord([1, (0, 0), ('a1', 1j)]) Calling ``nr.asRecord()`` would return the same record as calling: >>> r = numarray.records.Record([1, (0, 0), 'a1', 1j]) Please note that the shape of multi-dimensional fields is kept. """ raise NotImplementedError ## Local Variables: ## mode: python ## py-indent-offset: 4 ## tab-width: 4 ## fill-column: 72 ## End: