|
|
Developed
at the Polish-Japanese Institute of Information
Technology © Copyright by ODRA team, © Copyright by PJIIT |
|
|
|
ODRA – Object
Database for Rapid Application development Description and Programmer Manual |
||
|
by Kazimierz Subieta and the ODRA team |
||
4. ODRA Object-Oriented Store ModelODRA is based on UML-like object model, with
complex objects, (nested) collections, classes, methods, static inheritance
and binary associations. In our plans we assume extension of the UML object
model with dynamic object roles and dynamic inheritance. The model fully
covers the relational model, as it can be considered as a primitive object
model where each tuple is an object and no inheritance, no methods and no
associations are supported. This observation is important for making
object-relational wrappers for ODRA. The ODRA model covers also the XML
model, which (conceptually) offers hierarchies of nested objects with no
classes, inheritance and associations. However, some minor features of the
XML model have no direct counterparts in the ODRA store model, in particular,
the order of XML subobjects is not supported in ODRA. In the same way the
ODRA store model can be 1:1 compatible with the RDF model, with the Topic
Maps model, etc. These properties of the ODRA store model allow to implement
SBQL for a lot of different data environments. Moreover, such implementation
can be strongly type checked and optimized by powerful SBQL query
optimization methods. 4.1 Modules
In ODRA the basic unit of database
organization is a module. As in
popular object-oriented languages, a module is a separate component of an
application. An ODRA module groups a set of database objects and compiled
programs and can be a base for reuse and separation of programmers
workspaces. From the technical point of view and the assumed object
relativism principle the modules can be perceived as special purpose complex
objects that store metadata and data. 4.1.1 Module metabase and database
Each module includes (apart from
some internal system data) two kinds of information: a metadata stored in a metabase and a data stored in a module database. A metabase stores information
needed during compilation of an SBQL source code. It is used for query
analysis, type checking and optimization. Objects that are stored in the
metabase contain meta-information about objects stored in the database. For
example, for a declaration of a particular object (a variable) in the module
source code the metabase stores such information as the name of the object,
its type and its cardinality. Thanks to this information many type errors can
be detected during compilation[1].
Moreover the information stored in the metabase is essential to query
optimization. The module metabase is used both during compilation and during
runtime. In contrast, a module database stores only data needed at runtime. 4.1.2 System module
A new database contains a single default
module called system module. A
system module is a root for all user defined modules. Additionally it stores
the data and metadata that can be perceived as ODRA standard library objects.
All user defined modules automatically import the system module. 4.1.1
4.1.3 User defined modules
Each new user account added to a database
server is ascribed with a default module that represents the root of the user
defined database. The name of the module is the same as the name of the user
account. All data created inside this module belong to the corresponding
user. The user data can be additionally organized with sub-modules. 4.2 Objects, Nested Objects
In this document we primarily use the term object to denote stored data
structures. Frequently, there is a correspondence between such data
structures and real world objects, but this is rather informal relationship
that not always holds. We do not make a difference between objects and variables known from a lot of programming languages. Sometimes
the concepts are distinguished according to membership in classes: objects
must be members of classes, while variables need not. Because there is very
subtle difference between the class
and type concepts, such a criterion
is not firm. Hence, any stored data structure we will call object or
variable, without assuming any syntactic or semantic difference between the
concepts. Our objects inherit a property of
programming variables that says that objects can be stored strucures only. SBQL queries never return objects, but
values of objects and references to objects, perhaps within some complex
structures, such as records and bags. We reject totally the so-called closure property, which claims that
input for queries (i.e. objects) and output from queries belong to the same
conceptual domain. Careful analysis of semantic situations convinced us that
the closure property, understood in this way, is a conceptual nonsense. During design of our data model we
have assumed important principles that govern semantic properties of objects.
They are known as object relativity,
total internal identification and orthogonal persistence. The principles
are formulated as follows: Object
relativity: If some object O1 can be defined, then
object O2 having O1 as a component can also be defined.
There are no limitations concerning the number of hierarchy levels of
objects. Objects on any hierarchy level is treated uniformly. In particular,
an atomic object (having no sub-objects inside) should be allowed as a
regular data structure, independent from other structures. The relativity of
objects implies the relativity of corresponding query capabilities, i.e.
there should be no difference in language concepts and constructs acting on
different object hierarchy levels. Traditionally, an object consists of
attributes, an attribute consists of sub-attributes, etc. In SBQL there is no
need for such distinction: attributes, sub-attributes, pointer links between
objects, procedures, methods, views, etc. are objects too. The principle cuts
the size of database model, the size of specification of query languages
addressing the model, the size of implementation, and the size of
documentation. It also supports easier learning of both a database model and
a corresponding query language. By minimizing the number of concepts the
principle of object relativity supports development of a universal theory of
query languages, which is necessary to reason about query optimization
methods. Total
internal identification: Each object, which could be separately
retrieved, updated, inserted, deleted, authorized, indexed, protected,
locked, etc. should possess a unique internal identifier. The identifier is
not printable and the programmer never uses it explicitly. A unique internal
identifier should be assigned not only to objects on the top level of their
hierarchy, but to all sub-objects, including atomic ones. If some atomic
objects create a repeating group, e.g. a person has many hobbies, each object
in the group should possess a unique identifier. For persistent objects (i.e.
database objects) their identifiers should be persistent too, i.e. invariant during
all the life of the objects. We are not interested in the structure and
meaning of internal identifiers. For us it is essential that all objects and
all their sub-objects can be unambiguously identified through its internal
unique name. The principle makes it possible to make references and pointers
to all possible objects, thus to avoid conceptual problems with binding,
scoping, updating, deleting, parameter passing, and other functionalities
that require object references as query primitives. Note that object identifier is purely technical
term, in contrast to object identity
that belongs to another domain of discourse, related to business modeling
rather than to data structures. Orthogonal persistence: No
conceptual difference in typing and accessing persistent and volatile
objects. In particular, a database can store individual objects (not only
collections) and the volatile main memory of an application can contain
collections of objects. Persistent objects are usually shared among many
clients, hence must obey the transactional semantics. However, persistent
(but non-shared) objects can also be stored at a client side; in this case
the transactional semantics is not necessary. ODRA introduces three kinds of
persistence: permanent that is
stored on a server and shared, temporal
that is stored at a client and not shared, and local that are assigned to a particular procedure, function,
method or transaction call. According to the object relativity
principle each ODRA data element is an object with an internal identifier i, the external name n and the value v. At the lowest (physical) level there are three kind of
objects.
Basic data store model (called M0)
is a set of objects described above and a set of identifiers of root objects
(starting points for database object graph navigation). Usually starting
points for objects are identifiers of modules. At the higher logical level a
complex object is used to represent different kind of conceptual objects
– modules, metabases, classes, views, procedures, database links,
indexes, and so on. An example of an ODRA object is presented in Fig.1-
4-
4- 4.3 Structures
A structure in ODRA differs from structures
that are known from Pascal records or structures of C/C++. Concerning stored
objects, we distinguish structures in the typing system. For instance, a
sequence of objects (<i6, name “Poe”>, <i7, sal,
2000>, <i8, worksIn, “Sales”>) can be considered a
structure of the type record{name:string, sal:integer, worksIn: string}. In
structure types the number of elements, their order, their names and their
type are fixed. However, a structure is a concept related to the typing
system only. Actually, in the object store model such a concept is not
necessary - structures are simply ordered collections of objects. In case of query results
structures are sequences of elements that are not collections and that are
results of queries. ODRA does not require that each structure element have to
be named. Any result of a query, except collections, can be an element of a
structures, in particular, atomic values, references to objects and any
binders. For instance, <i1, i2, x(5)> is a structure
instance having three elements – identifiers i1 and i2 and binder x(5). 4.4 Collections and Cardinalities
In the ODRA store model we assume
no uniqueness of external names on any level of object hierarchy. For
instance, in Fig.4-1 name Emp are
assigned to three objects and name Dept
is assigned to two objects. Within the “Trade” Dept object name location is assigned to two atomic sub-objects and within the
“Ads” Dept object name employs is assigned to two pointer
sub-objects. This is the way in which we deal with collections. Note that
similar assumptions are taken for XML. In this way we unify several concepts
related to collections, such as sets, bags, extents and repeating attributes.
We also abstract from the concepts of structure,
record and tuple, as known e.g. C/C++, Pascal and relational systems. For
the goal of building the formal semantics of query and programming languages
such notions are secondary and can be expressed in the terms of the ODRA
store model as complex objects. In the ODRA store model a
collection does not occur as a single entity having its own unique
identitfier. However, it is possible to create a complex object with
subobjects of the same type. For instance, one can create an object Employees having many Emp objects. This is the only way in
which a collection may obtain a unique identifier. Because each object differs from
other objects at least by its object identifier, it makes little sense to
distinguish stored collections by their kinds such as sets and bags (c.f. the
ODMG standard). The current ODRA version does not support stored collection
kinds known as sequence and array. Such extensions are planned in
the next release. The situation with collections is
a bit different when we consider results returned by queries. In general, we
consider the unification of collections stored at an object store and
collections returned by queries as conceptually doubtful[2].
Concerning this case, the current ODRA version supports collection types bags and sequences. As a query result, sequences may appear in the result
of the order by (sorting) operator.
Collection types sets are not
supported by the ODRA typing system, however, the programmer can make a set
from a bag by applying the function distinct,
just like in SQL. In the ODRA typing system
collections are constrained by cardinalities (known e.g. from UML). A
cardinality is a pair of two symbols written as [min..max], where min is
a non-negative integer denoting the minimal number of collection elements and
max is a natural number or *
denotin the maximal number of collection elements. The symbol * denotes
“as many as you like”. For instance, [0..1] denotes a collection
which is empty or contains one element, [1..1] is a collection having exactly
one element, [0..*] is a collection having any number of elements and [1..*]
is a non-empty collection having any number of elements. Other cardinalities
are possible. If max is a number,
then min ≤ max. Cardinality [1..1] is the default
and can be omitted. Moreover, a collection with exactly one element is
considered by the typing system as identical to that element. A cardinality
[0..1] denotes an elements which may occur or not. This is the way in which
ODRA deals with the concept that is known from relational systems as NULL. In
SBQL we apply a liberal typing system (called semi-structured) where any
collection having exactly one element e
is equivalent to this element e
(thus e.g. comparisons of elements and one-element collections are possible) and
each single element e can be considered a bag with e as a single element.
Note that similar coercion rules are also taken by SQL. 4.5 Links
In ODRA links are understood as
triples <i1, n, i2>, where i1 is a reference to a link, n is an external name used in a source
code and i2 is a reference to an
object that the link leads to. For instance, <i21, employs, i1> is a link (having the reference i21) that can be inserted into a Dept object and leads to an Emp object with the reference i1. Currently directed links (i.e.
pointers) and bidirectional links (i.e. twin pointers) are supported.
Bidirectional links are instances of the concept that is known as relationship (in the
Entity-Relationship Model or the ODMG standard) or associations (in UML). Links are strongly typed and can
be updated, inserted and deleted. Links follow the orthogonal persistence
principle, i.e. we do not restrict links to persistent and shared objects
only. Links implement association instances known from UML; however, only
binary associations with no properties and no association classes are
supported. Deleting any object A implies that all links leading to A are
deleted (or nullified) too; hence no dangling links (links leading to garbage
or improper objects) can appear. Note that we do not follow the idea that
removing an object A requires removing or nullifying all the links that lead
to A; object A is then removed by an automatic garbage collector (c.f. Java).
For several reasons, e.g. a restricted client subschema, such an idea is
inconsistent for database objects. Due to the limited view and access rights
the application programmer may have no possibility to remove or nullify all
the links that lead to an object that he/she wants to delete. Hence, ODRA and
SBQL explicitly deal with the deletion operator, just like SQL. 4.6 Procedures, Functions and Transactions
ODRA supports procedures and
functions in the classical variant known from majority of programming
languages; arbitrary calls of procedures/functions from procedures/functions
are supported, including any recursive calls. The novelty of ODRA procedures
and functions concerns parameter passing and a return from a function (a
functional procedure). Either the parameters and the return can be determined
by SBQL queries. This allow one to make programs much more conceptual and
shorter. ODRA basically supports the parameter passing method that is known
as strict-call-by-value. The method
means that the actual parameter is calculated before the function call, then
it is named by the name of the formal parameter, and then the body of the
procedure/function is executed. The parameter passing method combines call-by-value and call-by-reference known e.g. from Pascal. No syntax distinguishes
call-by-value and call-by-reference, just like in C/C++.
The big advantage of the method is that it is simple to implement, fully
consistent and allows for declarative and macroscopic (many-data-at-a-time)
processing that is implied by queries. Parameters of ODRA procedures and
functions are typed. The result of a function is typed too. Typing is
strongly checked during compile time and when necessary, typing is delegated
to run time. Procedures and functions can be
persistent, i.e. they can be store at a database server and shared among many
clients. This accomplishes the paradigm that is known from relational
database systems as database procedures. Procedure and functions are stored
within modules or within classes. In the last case they are called methods
and by default they act on an environment that includes internals of a class
member object. Concerning the source code,
transactions in ODRA are similar to procedures. Except one keyword transaction and the command abort their semantic and pragmatic
properties are the same as for procedures. Transactions are strongly
typechecked, may have parameters being queries, may have local data
environment and may return a result. As procedures, transactions can be
stored within modules or within classes, can be stored on a server side
(within the database) or on a client application side. Transactions can
invoke other transactions without limitations (hence nested transactions are
supported). Transaction invocations differ slightly from procedures during
run time because of the ACID semantics on shared resources. A transaction
invocation can be aborted and in this case its updates are canceled
(rollbacked). During runtime a transaction invocation is represented by a
special object. ODRA uses the traditional (pessimistic) 2PL transaction
processing algorithm with no deadlocks due to the wait-die method. More detailed description of procedures,
functions and transactions will be presented in proper chapters of this
documentation. 4.7 Views
For Virtual Repository concept within the eGov Bus project we have
applied a new approach to database views that allows us to achieve the power
of updateable views that has not been even considered so far in the database
domain. Our method has some commonalities with instead of trigger views implemented in Oracle, SQL Server and
DB2, but it is based on different principles, is much more powerful and
efficient, and may address any object-oriented database model, including an
XML datamodel. In general, the method is based on overloading generic
updating operations (create, delete, update, insert, etc.) acting on virtual
objects by invocation of procedures that are written by the view definer. The
procedures are the inherent part of the view definition. The procedures have
full algorithmic power, thus there are no limitations concerning the mapping
of view updates into updates of stored data. ODRA updatable views allow one
to achieve full transparency of virtual objects: they cannot be distinguished
from stored objects by any programming option. This feature is very important
for distributed and heterogeneous databases. ODRA views can be used as mediators
on top of local resources to convert them virtually to the required format,
as integrators that fuse fragmented
data from different sources, and as customizers
that adopt the data to the needs of a particular end user application. ODRA
views are the basis for the Virtual Repository Management System that lies in
the centre of the eGov Bus software. Concerning storage, views share properties of procedures, functions and
transactions. In particular, they can be stored within modules on a database
server, within modules of client applications or within classes. In the last
case views accomplishes the feature that is known as virtual attributes. Views are first-class entities that can be dynamically
inserted or removed into/from a particular environment. More detailed description of ODRA
views will be presented in proper chapters of this documentation. 4.8 Classes, Inheritance, Polymorphism, Types
and Schemata
A class in ODRA is a programming
entity having two forms:
A class has some number of member
objects. During processing of a member object the programmer can use all
properties stored within its class. Classes can be connected into an ODRA
schema, as shown in Fig.2-4. As in the UML object model,
classes inherit properties of their superclasses. Multiple inheritance is
allowed, but name conflicts are not automatically resolved (similarly to
UML). A method from a class hierarchy can be overridden. An abstract method
can be instantiated differently in different specialized classes (due to late
binding); this feature is known as polymorphism. ODRA assumes strong or semi-strong
type checking of all the programming entities and contexts. Strong typing is
a prerequisite for query optimization and for resolving some ambiguities or
ellipses that may occur in SBQL queries. For some purposes, however, strong
typing can be switched off. The ODRA typing system includes atomic types (integer, real, string, date, boolean) that are known from other
programming languages. Further atomic types are considered, but not
implemented yet. The programmer can also define his/her own complex types
known as records. All type constructors can be nested with no limitations.
Collection types are specified by cardinality numbers, for instance, [0..*],
[1..*], [0..1], etc. The ODRA internal typing system
checks some attributes that are assigned to type signatures. Currently the
following attributes are supported:
Other type signature attributes
are considered, e.g. type name (for
type equivalence based on type names), binary
large object (for checking operations on multimedia) and side effects of queries and functions.
The typing system makes also several automatic coercions (changing types) and
automatic dereferences. For instance, a bag can be coerced to an element of
this bag. If necessary, coercions are checked dynamically. A database schema in ODRA is a
specification of object types, classes and declarations that supports majority
of elements known from UML. More detailed specification of the
ODRA types, classes and schemata will be given in next chapters of the
document. |
|
Last modified: June 17, 2008 |
[1] It is also possible to execute the system in the special “unsafe”, un-optimized mode with compile-time query analysis switched off and all the control moved to the runtime environment.
[2] See SQL, where
stored collections (tables) are unordered sets, but collections returned an SQL
query can be sets (application of the distinct
operator), bags (in a typical case) and sequences (application of the order by operator).