Background:
    - Explain what groups are used for in HDF5
    - Explain about how hard & soft links work
    - Try to explain how B-tree & symbol tables are implemented currently (make
        some pictures!)

Feature Requests:
    - External:
        - Access links in a group according to the order they were created
            (netCDF4 & CGNS)
        - "Large" empty groups (Boeing) [Inefficient symbol table format
            (unneccesary/unused "cached info" field and other issues)]
        - Checksum file metadata (LANL)
        - non-ASCII link names (Intel?)
    - Internal:
        - Inefficient method for computing # of objects in group [must traverse
            entire B-tree for group, adding up the number of links in each
            symbol table]
        - Inefficient "local heap" storage method for names (heap can get
            large for atomic I/O operation to bring into metadata cache and
            heap data must be recopied in file when heap grows)
        - Entire B-tree must be traversed in order to reach link info, even for
            "early" exact matches of entries in the tree (inefficient)
        - References are allowed to dangle, which can cause bugs when objects
            they point to are deleted
        - Possibly other kinds of ordering of links in groups

Summary of New Use Cases:
    - Open n'th object in link creation order (or by other order)
    - Iterate over links in creation order (or by other order)
    - Detect file corruption by checksumming file metadata
    - Efficiency concerns with current format
    - Correct errors with current reference implementation
    - Store non-ASCII link names

New Requirements/Features:
    - From use cases:
        - Iterate over links in creation order
        - Open n'th object in link creation order
        - Open n'th object in alphabetic order
        - Track # of objects in group
        - Support non-ASCII character sets (ie. Unicode)
        - Checksum group metadata: (file format)
            - Object header
            - B-tree nodes
            - "Local" heap (or whereever link info is stored)
        - Make empty groups take much less space
        - Eliminate wasted space in each link
        - Revise references to be a form of link (or at least not allow them to
            dangle)
        - Add version # to B-tree nodes
        - Revise local heap format to allow multiple non-contigous blocks
        - Allow "early out" for exact matches during B-tree traversal
        - Allow indices on other aspects of links in the future
    - Interesting ideas not directly from use cases:
        - Add back-pointers to linked objects
        - Compress group info
        - Attributes on links
        - Add redundant copies of each piece of metadata, to possibly
            recover from metadata corruption

"No Format Change" Implementation Option:
    - Why?
        - Forwardly compatible with older libraries (sorta)
        - Probably less work (since they capture less requirements)
    - Why Not?
        - Older libraries modifying the file can corrupt "high level" data
            structures used to implement
        - "High level" data structures used to implement are in "user space"
    - Requirements Captured
        - Probably just these:
            - Iterate over links in creation order
            - Open n'th object in link creation order
            - Open n'th object in alphabetic order
            - Track # of objects in group
        - With more contortions, we could capture more, though...
    - Open Questions/Issues
        - Should existing API functions update "high level" data structures
                created (see implementation options, below)?
            - Yes, probably, since we want existing API functions to keep
                everything in sync
        - Doesn't address inefficiencies/drawbacks of current group storage
            mechanisms.
        - "High level" data structures can get out of sync if file is modified
            by older version of the library.
        - All of the implementation options put their new information in places
            that are visible to the user and can be modified by them.
            (This could be good (in rare cases, probably) or bad (most of the
            time...)  Also, the name(s) chosen for the objects to implement
            this method of implementing features could clash with names a user
            wanted to use.
    - Possible Implementation Options:
        - Attribute on each object linked to from a group
            - Implemented by storing "creation time" attribute on each object
                linked to from a group.
            - Pros:
                - Deleting a link in a group is handled easily and doesn't
                    require extra maintenance of other data structures
            - Cons:
                - This won't work for soft links, since they are allowed to
                    dangle and therefore might not have an object to hang the
                    "creation time" attribute on.
                - This is a nightmare for objects in multiple groups.  We'd
                    have to create multiple "creation time" attributes on the
                    object, each with some way to correlate back to the group
                    the attribute belonged to.
                - Determining the n'th object in creation order would require
                    accessing all the objects in the group, in order to gather
                    and sort the "creation time" attributes.
                - Iterating over objects in the group will need to access all
                    the objects in the group, in order to gather and sort the
                    "creation time" attributes.
        - Attribute on group for each link in it.
            - Implemented by storing a "creation time" attribute on each group
                for each of the links in the group.
            - Pros:
                - Works for soft links
                - Works for objects in multiple groups
            - Cons:
                - Will need some sort of naming scheme for the attributes that
                    involves the link name in either the name of the attribute
                    or stored in the attribute data, so the creation time
                    stored in the attribute can be correlated with the
                    appropriate link name in the group.
                - Duplicating link names (in group info and in attribute info)
                    makes storage space inefficiencies for groups worse
                - Would create potentially _lots_ of attributes on group
                    objects.  Since attributes are searched for in a linear
                    fashion, this can make operations on attributes in general
                    on groups very slow.  It will also fragment up the object
                    header for the group, making I/O on the it perform poorly.
                - Finding the appropriate attribute to delete requires a linear
                    access of all the attributes, in order to find the creation
                    time attribute which correlated to the link being deleted.
                    Also, deleting a link from a group requires deleting its
                    creation time attribute from the group, making object
                    header fragmentation issues worse.
                - Determining the n'th object in creation order would require
                    accessing all the attributes on the group, in order to
                    gather and sort the "creation time" attributes.
                - Iterating over objects in the group will need to access all
                    the attributes on the group, in order to gather and sort
                    the "creation time" attributes.
[Could pack all this information into one variable-length datatype attribute,
which has its own set of pros & cons]
        - Add dataset to each group to maintain "creation time" index
            - Implemented with a unlimited length 1-D array of variable-length
                strings (the names of links in the group), stored in the order
                they are created in the group.
            - Pros:
                - If the chunk size is 1 element and we make a way to delete
                    chunks from a dataset, deleting entries from the middle
                    of a the dataset is relatively cheap and easy.  (Otherwise,
                    deleting entries from a group is very painful, as all the
                    entries following the deleted one would need to be copied
                    up one position in the dataset)
                - Works for soft links
                - Works for objects in multiple groups
            - Cons:
                - Duplicating link names (in group info and in dataset info)
                    makes storage space inefficiencies for groups worse
                - Link names for dataset would be stored in "global" heap,
                    which would compete for space in the metadata cache with
                    the "local" heap info for the group's link name info
                - "Global" heaps can fragment the file badly in certain
                    circumstances  (This is true in general for "global" heap
                    info though, and not limited to their use in this case)
                - 1 element chunks make for inefficient chunked storage B-trees
                    (If API for deleting elements was implemented instead of
                    API for deleting chunks, this could be mitigated by using
                    larger chunks, but then more data would need to be shuffled
                    around when an element was deleted)
                - Would need to implement user-level API for deleting
                    chunks (or elements) from [the middle of] datasets.
                    (Could be "pro" if desired by users, though)
                - Deleting a link from the group would mean a linear traversal
                    of the dataset, to find the matching entry to remove for
                    the link
                - Current B-tree implementation not fast for finding the
                    n'th entry (since it needs to start at the beginning)
                    (Still, this is faster than the attribute methods outlined
                    earlier)
        - Add group to each group to maintain "creation time" index
            - Implemented with an "index" group within each "normal" group,
                which used a string form of "creation time" as the 'name' of
                each object in the index group and linked to the same object as
                the links in the normal group.
            - Pros:
                - Works for soft links with absolute path (soft link info for
                    links in index group is same as soft link info for links
                    in normal group)
                - Don't need new APIs for deleting elements from middle of
                    dataset
            - Cons:
                - Storing creation times in text form is inefficient
                - Creation times as names is stored in another "local" heap
                    which competes for space in the metadata cache with
                    normal group info
                - Soft links with relative path in normal group would need to
                    be converted to use absolute path name in index group
                    (since a path name of the form "foo/bar" in the normal
                    group would need to be something like "../foo/bar" in
                    the index group (since it's a group hanging off the normal
                    group) and we don't support "upward" traversals in the
                    group hierarchy).  This conversion could be difficult or
                    impossible to perform.
                - Deleting a link by name doesn't have a conclusive way to
                    remove correct link in index group, since the only way
                    to locate the link removed is to perform a linear search
                    of the index group for a link with the same object header
                    address (or soft link name) and if the same object is linked
                    to multiple times in the group (with different link names),
                    there is no way to determine which is the correct entry to
                    remove in the index group.
[Could kludge this by including link name after creation time 'name' somehow]
                - Current B-tree implementation not fast for finding the
                    n'th entry (since it needs to start at the beginning)
                    (Still, this is faster than the attribute methods outlined
                    earlier)
[Syncronization overhead for parallel operation, for all these 'user mode'
implementation]
                

"Format Change" Implementation Option:
    - Why?
        - Should be able to capture all requirements (if desired)
        - Adding some of these features means making core change to group
            functionality and if we're updating the format, we might as well
            sweep as many changes in as possible to avoid many updates to the
            format.
        - "Aggressive" format change option (outlined below) moves us toward
            indexing on datasets.
    - Why Not?
        - Not forwardly compatible (ie. older libraries can't read) (could
            be good though, since they can't mangle the information)
        - Users might be reluctant to change due to what they percieve as a
            format change
        - Lots of work/time to get new features done
        - Lots of work/time to maintain new format as well as older one in
            source code (hard to tell if this would be more work/time than
            the "no format change" implementations)
    - Requirements Captured
        - Could be all, but probably will need to just choose highest priority
            ones.
        - My lists would be:
            - Definitely:
                - Iterate over links in creation order
                - Open n'th object in link creation order
                - Open n'th object in alphabetic order
                - Track # of objects in group
                - Checksum group metadata:
                    - Object header
                    - B-tree nodes
                    - "Local" heap (or whereever link info is stored)
                - Make empty groups take much less space
                - Eliminate wasted space in each link
                - Revise references to be a form of link (or at least not allow
                    them to dangle)
                - Revise local heap format to allow multiple non-contigous
                    blocks
                - Add version # to B-tree nodes
                - Allow "early out" for exact matches during B-tree traversal
            - If possible:
                - Support non-ASCII character sets (ie. Unicode)
                - Add back-pointers to linked objects
                - Allow future indices on other aspects of links
                - Compress group info
            - Probably not:
                - Attributes on links
    - Open Questions/Issues
        - What other information about a link (besides the link name and its
            creation time) do we want to capture (and possibly index) for
            every link?
        - Are there other types of links besides hard & soft links? (since those
            two mostly focus on how we get to the linked-to object)
        - Back-pointer ideas:
            - Adding back-pointers to objects makes groups sort of
                "two way", by having links to objects within them and
                having links back to other objects that point to them.  Does
                this have any interesting implications?  (ie. could we
                traverse the group graph in reverse?)
            - If we add back-pointers to objects, should they include
                "normal" object & dataset region references, or just
                links from groups?
            - If we add back-pointers to objects, how should we "name"
                back-pointers?  (ie. should they be named, like the links
                to objects from groups, or should they be unnamed and chosen
                by specifying the n'th back-pointer?)
        - Should we allow "old style" groups to be created?  Should we allow
            mixing old & new style groups in the same file?  (I can think
            of ways to stop this, but they aren't very nice to users so we
            should probably allow this as gracefully as possible)
    - Possible Implementation Options:
        - Revise format data structures: (the "Conservative" solution)
            - Keep general current idea of groups (B-trees on links), but
                update to enable features desired.
            - Format Changes:
                - Revise "Group" message to allow for new information
                - Revise format of links
                - Revise format of B-trees
                - Revise format of "local" heap
        - Create new format data structures: (the "Aggressive" solution)
            - Revise "group" to actually be a dataset of vlen link info
                elements with indices on fields of the link info (like the link
                name and creation time)
            - Notes:
                - It would be nice if the dataset used for storing the group's
                    info didn't have coordinates associated with each element
                    and was just a "bag" of elements (link info objects) that
                    could be added to or subtracted from, and the indices on
                    the dataset were the only way to locate elements.
                - Is there a useful way to give this sort of feature to users
                    of datasets?  Probably, if we implement a new "indexed"
                    storage mechanism for datasets, in addition to contiguous,
                    chunked, etc.  We'd also have to think about how this would
                    affect H5Dwrite/H5Dread, since they are so closely tied up
                    with using coordinates for selecting the elements to
                    perform I/O on.  Assumption is to use B-trees for storing
                    indices on the elements.  This could be important for
                    NCASSR indexing work...
            - Format Changes:
                - Revise "Group" message to [actually] be a dataset, with a
                    dataspace(?), datatype, etc.  (would use new "indexed
                    storage" mechanism mentioned below)
                - Add new new storage mechanism for dataset's raw data: indexed
                    storage.
                - Create new type of heap which would be more efficient for
                    collections of variable-length info.  (Breaking the
                    information up into smaller blocks probably, etc.)
                - Update references so that they act like hard links [how to
                    handle current idea of soft links in groups?]

API Changes:
    - Depend on requirements captured and/or features added
    - Will probably require adding "group creation" and "group access"
        property lists, in order to control some of the new features.
    - Existing API routines may need to be changed or augmented (with new
        API functions) to handle operations on creation order as well as
        link name.


Key Insights:
    - What we are really trying to do with groups is create different types
        of indices on link objects within the group, as if the group were a
        dataset whose elements were links and we are indexing on fields of
        each link.
    - Current datasets are like "bags" of elements, with an implicit
        "coordinate" field for each element, indexed by the "coordinate" field,
        and stored according to this implicit index.  [Is there a useful way
        to take advantage of storing elements according to an type of implicit
        index?] [Packing elements this way makes it very painful to insert
        and/or delete elements in the "middle" of the index]