Module nestedrecords
[show private | hide private]
[frames | no frames]

Module nestedrecords

Support for arrays of nested records.

This module provides the NestedRecArray and NestedRecord classes, which can be used to handle arrays of nested records in a way which is compatible with numarray.records.

Nested record arrays are made up by a sequence of nested records. A nested record is made up of a set of non-nested and nested fields, each of them having a different name. Non-nested fields have homogeneous n-dimensional values (where n >= 1), while nested fields consist of a set of fields (sub-fields), each of them with a different name. Sub-fields can also be nested.

Several utility functions are provided for creating nested record arrays.


Classes
NestedRecArray Array of nested records.
NestedRecord Nested record.

Function Summary
  array(buffer, formats, shape, names, byteorder, aligned, descr)
Create a new instance of a NestedRecArray.
  fromarrays(arrayList, formats, names, shape, byteorder, aligned, descr)
Create a new instance of a NestedRecArray from field arrays.

Function Details

array(buffer=None, formats=None, shape=0, names=None, byteorder='little', aligned=0, descr=None)

Create a new instance of a NestedRecArray.

This function can be used to build a new array of nested records. The new array is returned as a result.

The function works much like numarray.records.array(), with some differences:

  1. In addition to flat buffers and regular sequences of non-nested elements, the buffer argument can take regular sequences where each element has a structure nested to an arbitrary depth. Of course, all elements in a non-flat buffer must have the same format.

  2. The formats argument only supports sequences of strings and other sequences. Each string defines the shape and type of a non-nested field. Each sequence contains the formats of the sub-fields of a nested field.

    The structure of this argument must match that of the elements in buffer. This argument may have a recursive structure.

  3. The names argument only supports lists of strings and 2-tuples. Each string defines the name of a non-nested field. Each 2-tuple contains the name of a nested field and a list describing the names of its sub-fields.

    The structure of this argument must match that of the elements in buffer. This argument may have a recursive structure.

The descr argument is a new-style description of the structure of the buffer. It is intended to replace the formats and names arguments, so they can not be used at the same time [1].

The descr argument is a list of 2-tuples, each of them describing a field. The first value in a tuple is the name of the field, while the second one is a description of its structure. If the second value is a string, it defines the format (shape and type) of a non-nested field. Else, it is a list of 2-tuples describing the sub-fields of a nested field.

If descr is None (or omitted), the whole structure of the array is tried to be inferred from that of the buffer, and automatic names (c1, c2 etc. on each nested field) are assigned to all fields.

The descr argument may have a recursive structure.

Please note that names used in names or descr should not contain the string '/', since it is used as the field/sub-field separator by NestedRecArray.asRecArray(). If the separator is found in a name, a ValueError is raised.

[1]The syntax of descr is based on that of the __array_descr__ attribute in the proposed standard N-dimensional array interface.

When to use descr or formats and names

Since the name and structure of fields must always be specified with descr, the formats and names arguments come more handy when one does not want to explicitly specify names or structure. In the first case, use formats; in the second case, use names. When fully specifying names and structure, the descr argument is preferred over formats and names for the sake of code legibility and conciseness.

Disambiguating the buffer structure

Sometimes a field may be inferred as being a non-nested multi-dimensional field, or being a nested one-dimensional set of nested fields. Let us take the following buffer value:

[('x', (1, 2, 3.0))]

Is the second field a non-nested one with a (3,)Float64 format, or is it a nested field with a ['Int32', 'Int32', 'Float64'] format? In this case, since all three types can be promoted to a single type (Float64), the field will be considered as a three-dimensional non-nested field. Now, let us see this case:

[('x', ('y', 2, 3.0))]

Here the types of the second field can not be unified, so it will be taken as a nested field with format ['a1', 'Int32', 'Float64'].

To avoid this kind of potential ambiguities, please use the names, formats or descr arguments to force a structure.

Examples

The following examples will help to clarify the words above. In them, an array of two elements is created. Each element has three fields: a 64-bit integer (id), a bi-dimensional 32-bit floating point (pos) and a nested field (info); the nested field has two sub-fields: a two-character string (name) and a 64-bit complex (value).

Example 1

In this example the array is created by specifying both its contents and its structure, so the structure of the used arguments must be coherent.

This is how the array would be created in the old-style way, i.e. using the formats and names arguments:

>>> nra = array(
...     [(1, (0.5, 1.0), ('a1', 1j)), (2, (0, 0), ('a2', 1+.1j))],
...     names=['id', 'pos', ('info', ['name', 'value'])],
...     formats=['Int64', '(2,)Float32', ['a2', 'Complex64']])

And this is how the array would be created in the new-style way, i.e. using the descr argument:

>>> nra = array(
...     [(1, (0.5, 1.0), ('a1', 1j)), (2, (0, 0), ('a2', 1+.1j))],
...     descr=[('id', 'Int64'), ('pos', '(2,)Float32'),
...            ('info', [('name', 'a2'), ('value', 'Complex64')])])

Note how formats and descr mimic the structure of each the elements in buffer.

Example 2

Here the structure of the array is simply inferred from that of the buffer argument.

>>> nra = array(
...     [(1, (0.5, 1.0), ('a1', 1j)), (2, (0, 0), ('a2', 1+.1j))])

If names where to be given to the fields, the structure of names should match that of buffer:

>>> nra = array(
...     [(1, (0.5, 1.0), ('a1', 1j)), (2, (0, 0), ('a2', 1+.1j))],
...     names=['id', 'pos', ('info', ['name', 'value'])])

Thus, the following call would fail with an IndexError:

>>> nra = array(
...     [(1, (0.5, 1.0), ('a1', 1j)), (2, (0, 0), ('a2', 1+.1j))],
...     names=['id', 'pos', ('info', ['name']), 'value'])

The structure of names would match something like:

[(1, (0.5, 1.0), ('a1',), 1j), (2, (0, 0), ('a2',), 1+.1j)]

Example 3

Now the array is created from a flat string representing the data in memory. Names will be automatically assigned. For that to work, the resulting array shape and record format must be fully specified.

>>> datastring = binary_representation_of_data
>>> nra = array(
...     datastring, shape=2,
...     formats=['Int64', '(2,)Float32', ['a2', 'Complex64']])

Byte ordering and alignment is assumed to be that of the host machine, since it has not been explicitly stated via the byteorder and aligned arguments.

fromarrays(arrayList, formats=None, names=None, shape=0, byteorder='little', aligned=0, descr=None)

Create a new instance of a NestedRecArray from field arrays.

This function can be used to build a new array of nested records from a list of arrays, one for each field. The new array is returned as a result.

The function works much like numarray.records.fromarrays(), but arrayList may also contain nested fields, i.e. sequences of other arrays (nested or not). All non-nested arrays appearing in arrayList must have the same length.

The rest of arguments work as explained in array().

Disambiguating the buffer structure

Ambiguity may result in some cases. Let us take the following arrayList value:

[[1, 2], [(0, 0.5), (1, 1)]]

Is the second field a non-nested one with a (2,)Float64 format, or is it a nested field with a ['Float64', 'Int32'] format? In this case, since all two types can be promoted to a single type (Float64), the field will be considered as a bi-dimensional non-nested field. Now, let us see this case:

[[1, 2], [('x', 'y'), (1, 2)]]

Here the types of the second field can not be unified, so it will be taken as a nested field with format ['a1', 'Int32'].

To avoid this kind of potential ambiguities, please use the names, formats or descr arguments to force a structure.

Example

Let us build the sample array used in the examples of array(). In the old way:

>>> nra = fromarrays(
...     [[1, 2], [(0.5, 1.0), (0, 0)], [['a1', 'a2'], [1j, 1+.1j]]],
...     names=['id', 'pos', ('info', ['name', 'value'])],
...     formats=['Int64', '(2,)Float32', ['a2', 'Complex64']])

In the new way:

>>> nra = fromarrays(
...     [[1, 2], [(0.5, 1.0), (0, 0)], [['a1', 'a2'], [1j, 1+.1j]]],
...     descr=[('id', 'Int64'), ('pos', '(2,)Float32'),
...            ('info', [('name', 'a2'), ('value', 'Complex64')])])

Note how formats and descr mimic the structure of the whole arrayList.


Generated by Epydoc 2.1 on Thu Apr 21 13:11:50 2005 http://epydoc.sf.net