POOMA Tutorial 12
Object I/O

Contents:
    Overview
    Object Serialization and Object Persistence Models
    Design of POOMA I/O
    What's in POOMA Version 2.2
    Using POOMA I/O
    The ObjectSet Interface
        ObjectSet Constructors
        ObjectSet::open()
        ObjectSet::flush() and ObjectSet::close()
        ObjectSet::store() and ObjectSet::retrieve()
        Queries on ObjectSet
    Data Types Supported in POOMA 2.2
    Use Case
        Doof2d Example Modified for POOMA I/O

Overview

The POOMA framework has been engineered to support rapid development of scientific and engineering applications. POOMA provides its user's with a high-level C++ language interface for creating numerical applications optimized for performance on platforms ranging from desktop computers to parallel supercomputers with thousands of processors. POOMA data abstractions and programming models are general, flexible, and user-extensible.

The POOMA I/O classes have been designed to provide efficient I/O services while keeping to the design philosophy of POOMA. POOMA I/O supports the abstractions that make the POOMA framework powerful and flexible by making the classes that embody them persistent. As with the rest of POOMA, the I/O system is both flexible and extensible by users as well as by developers.

Object Serialization and Object Persistence Models

There are two broad categories of data management appropriate to object-oriented applications. The first is object serialization, and the second is object persistence.

The simplest I/O model is based on inserting data items into input or output streams. Data is typically extracted in the same order as originally stored. Object-oriented applications present special problems for I/O since the the ability of users to add new data types means that many if not most types are unknown to the system. Systems that support object serialization usually have some means of prescribing how the data contained in complex types is to be marshaled and inserted into a stream. Once this definition is in place, new object types can be read or written in the same way as intrinsic types. C++ allows users to overload the insertion operators (<< and >>) for this very purpose. However, as the structure of data types becomes more complicated, the burden falling on users to serialize new types for storage can be quite heavy. Several languages and frameworks provide means of facilitating object serialization. These include, for example, JAVA and Python.

The next level of sophistication in object storage is object persistence. In an object persistence model, objects are stored as a collection of discrete entities, each individually retrievable at random from a collection of objects. A full-featured object-oriented database (OODB) knows enough about the structure of the object types in its collection to perform sophisticated queries based on object metadata.

There is often a tradeoff between these two categories of services. Object serialization is typically more efficient than object database persistence since data is simply marshaled and inserted into a stream. However, the requirement that data-consuming applications know what types of objects to expect as well as their sequence often leads to overly tight coupling between data-producing and data-consuming applications. Thus serialization is fine for monolithic applications performing what amounts to state dumps, but not as good for multi-application collaborative environments. On the other hand, there are many situations when one would just as soon not have the overhead of an object-oriented database no matter how streamlined.

Object-oriented applications benefit enormously from object-oriented data management. After all, the principal reason many programmers prefer object-oriented languages is so that they can create and exploit new data types. Object storage systems provide a way to store and retrieve user-defined types as easily as intrinsic types.

Design of POOMA I/O

The goal of POOMA I/O is to provide object serialization and object database persistence models, both of which have been shown to be extremely useful in object-oriented frameworks. The challenge is to make both of these capabilities flexible enough and lightweight enough to satisfy the requirements of the POOMA framework for extensibility and performance. Here we discuss the basic ideas behind the design of POOMA I/O in order to give the reader a feeling for how these sometimes conflicting requirements can be satisfied simultaneously.

The first level of the design is comprised of a set of classes called the storage classes that are transparent to users. They organize any given storage resource into byte records. The system does not necessarily know the internal structure of a byte record, only its length in bytes. Records are elements of byte arrays. Each array is independently accessible within a storage resource and each record or element of a byte array is also independently addressable. A range of elements within a byte array can be read or written in one operation. These byte arrays are automatically extended whenever an operation writes past the current number of elements. Arrays are members of a collection called a storage set which serves as the logical interface to storage in terms of arrays. The physical storage in this implementation is a disk file, but a storage set is an abstraction barrier that need not be associated with a file in general. For example, future implementations may support storage sets based on databases or remote application resources.

The second level is made up of the object storage classes. These classes view storage as a set of typed objects called an object set. Any instance of a type supported by the I/O system can be stored along with a descriptive label in one operation. Object sets can be queried to reveal the number of objects contained, the types of objects contained, the number of objects of each type, and the labels of each object. A single operation is sufficient to retrieve an object given either its name, or its instance ID which is equivalent to its position in the list of object instances for a given type.

The storage of specific types is enabled by specializations of two generic classes: object serializers and object adapters. As one would infer from the discussion above, serializers serialize objects to a stream, whereas adapters adapt specific types to storage and retrieval in an object set. Adapters often use the services of serializers. The object storage classes in turn use the services provided by the storage set and byte array classes.

To support a different storage type or format, or to optimize I/O for performance, one need only modify the basic storage classes thus leaving the object storage classes unchanged. Several different types of storage can coexist in the same application. The benefit of this design is that new types can be supported simply by creating new serializer and adapter specializations. Our intent is to allow users as well as developers to extend the range of supported types by writing a small amount of new code, or by writing a simple high-level description of the new classes.

The main goal of the POOMA I/O design is to achieve a high level of support for object storage and management without incurring the overhead of a full-featured object-oriented database. Straightforward storage and retrieval operations are provided based on simple queries.

It was also considered important to expose the basic I/O mechanisms through the storage set and byte array classes so that developers could gauge the performance implications of an implementation based on generic storage abstractions. The separation of basic I/O from object management permits performance to be optimized without requiring modifications in any portion of the object management layer.

What's in POOMA Version 2.2

I/O for Version 2.2 of POOMA is experimental. As such it does not support the full scope of capabilities described above, nor the full complement of POOMA framework objects. The reason for including it in this release is to get user feedback and suggestions as early as possible.

Historically, an object persistence model was considered first and object serialization later. The compatibility of these two models, as well as a straightforward solution for supporting and leveraging both, emerged later in design iteration cycles. Thus, in this release users can store and retrieve POOMA objects in an object set, but cannot serialize the same objects to a standard output stream. This feature will be added in the next release.

Since storage adapters are currently hand-crafted, there are only a few basic types supported at this time. Experience gained in writing adapters and serializers for this release will allow us to semi-automate the process of adding support for new types. Some capability of this kind as well as full coverage of all POOMA objects is intended for the next major release of the software.

This release supports standard native binary I/O. Future releases may support storage using the HDF5 format.

Using POOMA I/O

This section describes the basic process of storing and retrieving objects in POOMA. The essential mechanism is very simple. Each supported type may be stored in a collection of objects called an object set. An object set is created, opened, and closed like a file. It has three templated member functions to perform object storage and retrieval operations, a store() function and two variants of retrieve() depending on whether the object is to be recovered by name or by ID. Simple query functions of the object set reveal its contents. Objects can be added to existing objects in an object set, or all objects can be removed upon opening. Once stored, an object cannot be deleted separately. To retrieve an object, the user must supply a default instance of the corresponding type. A typical session in which a user stores data would be as follows:

Create an object set giving it a name. This may be either a new object set or an existing one to which objects are to be added. To store objects, the access mode must be appropriate for write access.
Store one or more objects supplying a name or label for each. Names need not be unique.
Close the object set.

A session in which a user retrieves objects could be described in the following way:

Create an object set supplying a name matching an existing object set. To retrieve objects the access mode must be appropriate for reading.
Create default instances of objects matching the ones to be retrieved.
Retrieve the objects by giving either names or IDs.
Close the object set.

To aid in retrieving objects, a set of basic queries is provided by the object set interface. The essential functionality of object set queries may be summarized as follows:

Report the number of distinct types in the object set.
Report the name of a type given its index k, where k = 0, ..., (number of types -1).
For a given type indicated by type name or index, report the number of instances.
For a given type indicated by type name or index k, report the name of object j where j = 0 , ..., (number of instances -1).

The ID of an object is an integer (type long) that by convention is the position of the object in the list of instances of that type. That is, if an instance of a given type is second on the list, its ID is 1 (indexed from zero). The primary key for objects contained in an object set is the pair of attributes comprised of its type name or type index and its instance ID. Names are user-defined labels and are not primary keys, i.e., they are not unique and in fact may be null. If a request is made to retrieve an object by name, the object set restores the first instance that matches the name.

The following section provides details of the object set interface.

The ObjectSet Interface

The object set interface is the main interface to object storage. To store and retrieve objects, an instance of an object set must exist in the user's application with an access mode appropriate to the intended storage operations.

ObjectSet Constructors

The following constructors create instances of object sets:

ObjectSet()    This is the default constructor. Constructed this way, an object set is unusable until an open() operation places it in an appropriate state attached to a particular storage resource.
ObjectSet(const std::string& name, StorageResourceType type, StorageAccessMode mode)    This is the primary constructor. The arguments are:
name    The name of the object set. For file-based storage (the only type for this release) this is literally the name of the file.
type    This is an instance of an enumerated type called StorageResourceType whose allowed values for this release are:

StdStorage Standard binary file

mode    An instance of an enumerated type called StorageAccessMode that defines the access mode. The allowed values are:

storageIn Read-only access

storageOut Write-only

storageOutTrunc Write-only; destroy data if the resource exits

storageInOut Read-write; append new data to existing data

storageInOutTrunc Read-write; destroy existing data if the resource exists

Example:

ObjectSet obset("DataFile.std", Std5Storage, storageInOutTrunc);
Creates an object set obset as a binary file whose name will be "DataFile.std." The file is opened for read-write, but if a file by that name already exists, all exisiting data will be destroyed (i.e., the file will be truncated).

ObjectSet::open()

The open() operation assumes the existence of an object set and assumes that it has either been default constructed, or that it has been previously closed. There are two variants. They are:

int open(const std::string& name, StorageResourceType type, StorageAccessMode mode) Opens a default object set or closed set assuming all attributes of the object set are new. The arguments have the same meaning as in the main constructor. It returns 0 if successful.

int open(const std::string& name, StorageAccessMode mode) This variant assumes that the storage resource type has already been set. It generates an error if the object set has only been default constructed, and returns 0 if successful.

Examples:

status= obset.open("DataFile.std", storageIn);
assert(status==0);
Opens the previous file (assuming it has been closed) in read-only mode.
status= obset.open("OtherData.dat", stdStorage, storageOutTrunc);
assert(status==0);
Having closed the previous object set, this opens a completely different resource of a different type (standard binary in this case) for output, destroying any pre-existing version.

ObjectSet::flush() and ObjectSet::close()

These functions respectively flush and close the object set. They take no arguments. The flush() function ensures that all objects are persistent, and close() closes the file or resource. close() invokes flush() before closing the resource.

ObjectSet::store() and ObjectSet::retrieve()

These functions perform the main storage operations. There are two versions of retrieve() depending on whether one wants to retrieve an object by name or by ID.

template <class T>
long store(T& t, const std::string& objectName)    Stores an instance of the given type along with a user-defined label. The function returns the object ID assigned by the object set. Valid IDs are zero or greater.
t    The given memory-resident object instance.
objectName    The user-assigned name or label to be associated with this instance.
template <class T>
int retrieve(T& t, long id)        Retrieves an object given its ID. It returns 0 if successful.
t    The memory-resident object instance to be instantiated from the persistent version.
id    The ID for the stored instance.
template <class T>
int retrieve(T& t, const std::string& objectName)    Retrieves an object given its label. Labels are not unique. If there is more than one object of the given type with the same label, it restores the first one. It returns 0 if successful.
t    The memory-resident object instance to be instantiated from the persistent version.
objectName    The user-assigned name or label associated with this instance.

Examples:

int nTimeSteps=1000;
long id= obset.store(nTimeSteps, "Number of Time Steps");
assert(id>=0);
Stores the given int instance with the associated label "Number of Time Steps." An integer (long) ID is returned.
int nSteps;
int status= obset.retrieve(nSteps,id);
assert(status==0);
Retrieves the value previously stored given the ID, presumably known. Alternatively one could use:
status= obset.retrieve(nSteps,"Number of Time Steps");
assert(status==0);

Queries on ObjectSet

The following functions allow applications to query the status of an object set:

const std::string& name() const Returns the name of the object set.
StorageAccessMode mode() const Returns the current access mode.

bool isOpen() const Boolean operation to check whether the set is open.
bool isClosed() const Boolean operation to check whether the set is closed.

These functions query the contents of an object set:

int numTypes() const    Returns the number of types in the set.
int numInstances(const std::string& typeName)    Returns the number of instances of a given type referred to by type name.
typeName    The name of the type in question.
int numInstances(long typeID) const    Returns the number of instances of a given type referred to by type ID.
typeID    The type ID or index. Within a given object set, the types contained are indexed from 0, ..., (number of types -1).
const std::string& typeName(long typeID) const    Returns the type name given a type ID.
typeID    The type ID or index.
long typeID(const std::string& typeName)    Returns the type ID given the type name.
typeName    The name of the type in question.
const std::string& objectName(const std::string& typeName, long instanceID)    Returns the object name given a type name and instance ID.
typeName    The name of the type in question.
instanceID    The instance of this type. Instances are numbered from 0, ..., (number of instances -1) for a given type.
const std::string& objectName(long typeID, long instanceID)    Returns the object name given the type ID and the instance ID.
typeName    The name of the type in question.
instanceID    The instance of this type.

Examples:

The following is based on the premise that the application has opened an existing file by creating an instance called obset in read-only mode. The application generates a report on the contents of the file.

std::string obsetName= obset.name();
int nTypes= obset.numTypes();
std::cout<<"Contents of ObjectSet "<<obsetName<<std::endl;
std::cout<<"Number of types = "<<nTypes<<std::endl;
if(nTypes!=0){
    std::cout<<"Type    Type Name    Number of Instances"<<std::endl;
    int numInstances;
    int j;
    for(int i=0; i<nTypes; i++){
        numInstances= obset.numInstances(i);
        std::cout<<i<<"    "<<obset.typeName(i)<<"    "
                 <<numInstances<<std::endl;
        std::cout<<"    Instance    Object Name"<<std::endl;
        for(j=0; j<numInstances; j++){
            std::cout<<"    "<<j<<"    "
                     <<obset.objectName(i,j)<<std::endl;
        }
        std::cout<<std::endl;
    }
}

The next example is based on a similar premise. In this case, the application knows that there are several instances of complex<double> called "Field Value." Complex numbers are a templated type in C++ whose conventional type designation in POOMA I/O is "std::complex<T>." The application collects the values by retreiving each instance of this type that matches the name and putting it in a standard C++ vector container.

vector<std::complex<double> > fieldVals;
std::complex<double> complexVal;
int nInstances= obset.numInstances("std::complex<T>");
int status;
for(int i=0; i<nInstances; i++){
    if(obset.objectName("std::complex<T>",i)=="Field Value"){
        status= obset.retrieve(complexVal,i);
        assert(status==0);
        fieldVals.push_back(complexVal);
    }
}

Data Types Supported in POOMA 2.2

The range of data types supported by the object persistence capability in POOMA Version 2.2 is considerably short of the full scope of POOMA, but basic enough that it should be useful. It should also give a reasonable demonstration of this emerging POOMA framework capability. In the next version, not only with the range of types be considerably broadened, but serialization as well as pesistence will be supported. There will also be tools to facilitate inclusion of new types by users or developers. For now, the following are supported entities:

Intrinsic or atomic data types:

Type	Designation	Description
int	"int"	Native int
long	"long"	Native long
float	"float"	Native float
double	"double"	Native double

Complex number instances:

Type	Designation	Description
std::complex<T>	"std::complex<T>"	Complex numbers from the standard numerical library. T may be float or double.

Standard library strings:

Type	Designation	Description
std::string	"std::string"	Standard string of arbitrary length.

Pooma Vector instances:

Type	Designation	Description
Vector<Dim,T,Engine=Full>	"Vector<Dim,T>"	Pooma Vector class based on the standard Full engine where the dimension D may be any size, and T is int, long, float, double, or std::complex<T>.

Pooma Brick and Compressible Brick Arrays:

Type	Designation	Description
Array<Dim,T,Brick> and Array<Dim,T,CompressibleBrick>	"Array<Dim,T,Brick>" and "Array<Dim,T,CompressibleBrick>" respectively	Pooma Array of dimension Dim=1,... 7 of Brick or CompressibleBrick engine types. T may be int, long, float or double in this release.

Pooma Intervals:

Type	Designation	Description
Interval<Dim>	"Interval<Dim>"	Pooma Interval of dimension Dim=1,... 7.

Use Case

The following use case demonstrates how object persistence in POOMA Version 2.2 would be used in an application. This is a modification of the Doof2d example (simple diffusion calculation) given in another tutorial. The additional I/O instructions are highlighted in italics.

Doof2d Example Modified for POOMA I/O

// create arrays
Array<2> a, b;

// create an object set to store the data;
// truncate the file if it already exists
ObjectSet dataSet("Doof2dDB.dat", stdStorage, storageOutTrunc);

// get problem size
int n;
std::cout << "Size (typically 100-1000): ";
std::cin >> n;
int i, niters = n/2;

// create a description for this run using a string stream
// and then store as a string variable
std::ostringstream strstrm;
strstrm<<"This is a run of the Doof2d example with "
    <<" problem size N="<<n<<"."<<std::endl;
strstrm<<"Stencils were not used in this run."<<std::endl;
std::string descr= strstrm.str();
dataSet.store(descr,"Run Description");

// store the problem size and number of iterations
dataSet.store(n,"Problem Size");
dataSet.store(niters, "Number of Iterations");

// create array domain and resize arrays
Interval<1> N(1,n);
Interval<2> domain(N,N);

// store the problem domain interval
dataSet.store(domain,"Problem Domain Interval");

a.initialize(domain);
b.initialize(domain);

// get domains and constant for diffusion stencil
Interval<1> I(2,n-1), J(2,n-1);
const double fact = 1.0/9.0;

// store the numerical constant factor used to calculate
dataSet.store(fact,"Numerical Factor");

// reset array element values
a = 0.0; b=0.0;
double initialVal= 1000.0;
a(niters,niters) = initialVal;

// store the initial peak value
dataSet.store(initialVal,"Initial Peak Value");

// Run 9pt doof2d without coefficients using expression
std::cout << "Diffusion using expression ..." << std::endl;
std::cout << "iter = 0, a_mid = " << a(niters,niters) << std::endl;
for (i=1; i<=niters; ++i) {
  b(I,J) = fact * (a(I+1,J+1) + a(I+1,J  ) + a(I+1,J-1) +
                   a(I  ,J+1) + a(I  ,J  ) + a(I  ,J-1) +
                   a(I-1,J+1) + a(I-1,J  ) + a(I-1,J-1));
  a = b;
  std::cout << "iter = " << i << ", a_mid = " << a(niters,niters)
            << std::endl;

  // for each iteration store the result array
  // labeled by iteration number
  strstrm.str("");
  strstrm<<"Result at iteration "<<i;
  dataSet.store(a,strstrm.str());
}

dataSet.close();

If one were to write and execute the content report generator example given above on this file the output would read:

Contents of ObjectSet Doof2dDB.dat
Number of Types=5
Type    Type Name    Number of Instances
0    std::string    1
    Instance    Object Name
    0    Run Description
1    int    2
    Instance    Object Name
    0    Problem Size
    1    Number of Iterations
2    Interval<Dim>    1
    Instance    Object Name
    0    Problem Domain Interval
3    double    2
    Instance    Object Name
    0    Numerical Factor
    1    Initial Peak Value
4    Array<Dim,T,Brick>
(however many iterations)
    Instance    Object Name
    0    Result at Iteration 1
    1    Result at Iteration 2
    2    Result at Iteration 3
    ... (however many iterations)

The next example assumes that the application programer has some familiarity with the data-producing application. Let visArray(array,string) be the API to some hypothetical visualization tool that renders false color images of POOMA 2d arrays where array is the array and string is a standard string label for the plot. The following code segment would take the database file generated by the modified Doof2d example and produce plots.

ObjectSet dset("Doof2dDB.dat", stdStorage, storageIn);
int nIters;
int status;
status= dset.retrieve(nIters,"Number of Iterations");
assert(status==0);
std::string plotLabel;
Array<2> array;
for(int i=0;i<nIters;i++){
    plotLabel= dset.objectName("Array<Dim,T,Brick>",i);
    status= dset.retrieve(array,i);
    assert(status==0);
    visArray(array,plotLabel);
}
dset.close()

There are several other ways that the data could be recovered assuming less familiarity with the application, and using the object set queries to learn more. More sophisticated queries are needed in order to do a good job of acquiring data when nothing a priori is known about the contents of a dataset. Such queries are planned for the next version of POOMA.

[Prev]

[Home]

[Next]

storageIn	Read-only access
storageOut	Write-only
storageOutTrunc	Write-only; destroy data if the resource exits
storageInOut	Read-write; append new data to existing data
storageInOutTrunc	Read-write; destroy existing data if the resource exists

POOMA Tutorial 12 Object I/O

Data Types Supported in POOMA 2.2

POOMA Tutorial 12
Object I/O