Imperative Constructs in SBQL
Back to Description
of SBA and SBQL.
In SBA we reconstruct
query languages’ concepts from the point of view of programming languages
(PLs). The approach is motivated by our belief that there is no definite border
line between querying and programming; thus there should be a universal theory
that uniformly covers both aspects. The design of modern and universal database
PLs having querying capabilities requires methods and principles that are
already acknowledged by the common practice of developing compilers and
interpreters. Practically useful PLs deal with object-oriented notions
(classes, methods, inheritance, etc.), procedures and functional procedures
(including recursive ones), parameter passing methods, various control
statements, binding strategies, scoping rules, modularization, strong typing,
etc. They should follow software engineering principles such as orthogonality,
modularity, minimality, universality, genericity, typing safety, and clean,
precise semantics. SBA is an alternative to theoretical concepts emerging on
the object-oriented wave, such as nested relational algebras, object algebras,
object calculi, F-logic, comprehensions, structural recursion, monoid calculus,
functional approaches, etc. Careful analysis of them and our implementation experience
convinced us that all of them are limited and inadequate for this kind of
query/programming languages that we intend to develop.
The SBA solution relies on adopting
a run-time mechanism of PLs and introducing necessary improvements to it. The
main syntactic decision is the unification of PL expressions and queries;
queries remain the only kind of PL expressions. For instance, in SBA there is
no conceptual difference between expressions such as 2+2 and (x+y)*z and queries such as Employee where Salary = 1000 or (Employee where Salary = (x+y)*z).Name . All such expressions/queries can be used as
arguments of imperative statements, as parameters of procedures, functions or
methods and as a return from a functional procedure. This approach is unique to
SBQL – other PLs that involve querying capabilities always separate
queries and expressions (especially if SQL is used as a query language). Even
PLs that are claimed to be “integrated” with queries, such as
PL/SQL or .Net C# + LINQ, assume some subdivision between queries and
expressions. We consider this as an obvious sign of the impedance mismatch - a severe plague of
current database applications. We believe that currently only SBQL consequently
and in 100% has removed this plaque.
SBQL
queries can be embedded within statements that can change the database or
program state. Concerning this issue we follow the state-of-the-art standards
known from majority of programming languages. Typical imperative constructs are
creating a new object, deleting an object, assigning new value to an object (updating) and inserting an object into another object.
We also introduce typical control and loop statements such as if…then…else…, switch,
while loops, for each iterators and others. Some peculiarities are implied by
queries that may return collections; thus there are possibilities to generalize
imperative constructs according to this new feature.
Treating
queries as programming language’s expressions requires an essential
assumption concerning the semantics of queries. There was a lot of discussion
in the literature concerning what queries have to return. For instance, an SQL
query returns a tables that is a value rather than a collection of
references to tuples (tuple identifiers). The ODMG standard assumes that
queries return objects what is an
obvious nonsense (see the discussion on the closure
property) or some elliptic terminology. However, for some updating
operators, in particular, for assignment, deleting and inserting,
queries-expressions should return references
to objects (i.e. object identifiers) rather than values or objects.
Therefore during the development of SBA and SBQL we have returned to the
classical notions of programming languages. In SBQL from the very beginning we
have assumed that queries never return objects but references to objects (in
particular). This assumption is inevitable if one has to adopt queries as expressions
of a programming language.
The
presence of imperative constructs gives for the programmer some freedom if
he/she should use queries or a sequence of programming statements. This freedom
is first of all constrained by the conceptual modeling – the programmer
should program his/her task as simply as it can be from his/her point of view.
Usually declarative queries are much more comprehensive than sequences of
commands. There is also a performance aspect: declarative queries are much more
prospective for automatic query optimization than sequences of imperative
statements.
In this
chapter we describe in detail all the imperative constructs that we suggest to
introduce in an integrated query/programming language. However, this is only a
suggestion – we do not want to declare which construct can belong to SBQL
and which cannot. Our goal is to develop the principle, not details, and to
provide the basis for further examples presented in this book.
In general, we make no difference
between objects, as understood in
object-oriented programming languages, and variables,
as understood in classical programming languages. Both concepts are considered
triples <i, n, u>, as defined
in the store models AS0, AS1, AS2
and AS3, where i is an internal unique object identifier, n is an object name (not necessarily unique), and u is an object value, which can be
atomic (e.g. integer value), can be a pointer (an identifier of another object)
or can be a set of objects. This definition is recursive and can define objects
with any number of nested objects, with many hierarchy levels. In this
description we avoid to use the term variable,
naming variables objects too. Some
people distinguish objects and variables according to their attitude to
classes: objects are members of classes, unlike variables. However this distinction
is secondary looking at the semantics of SBQL: we can simply assume that
variables are objects for which classes are empty (i.e. there is no methods
that are associated to objects, hence creating such an empty class is omitted).
Usually we assume that both objects and variables are associated with types
that allow for strong compile-time type checking of the contexts in which
objects/variables are used. Objects that have the nature of variables are
associated with types only, while proper objects are associated with types that
are invariants of classes.
Objects can be created in different
environments. The most obvious environment is a local environment of the
procedure or method where the given object creating instruction is put and
executed. Such objects are volatile
by definition, because their life is terminated when their procedures/methods
are completed. Such objects are not available for any external bindings, e.g.
from external procedures or methods. Objects can also be created within the
module in which the given procedure/method is put and executed. By definition,
such objects are shared among all the procedures or methods that are inside the
given module. In other words, bodies of procedures or methods that are members
of the given module can refer to all objects that are stored inside the module.
Objects of a module can be also available for binding from other modules, under
the condition that this is explicitly allowed by some module interface or
export list (this is a form of encapsulation).
In such a case usually references to objects are preceded by the module name
(but this can be simplified in some cases).
Objects can also be persistent, that is, they are elements
of a database. Persistent objects are also members of modules, but all their
properties (values, OIDs, etc) outlive switching off the application (and
computer) that executes the given module. After restoring the application, its
persistent objects become available in exactly the same form and value as
before the switch off.
Objects can be shared among many
parallel processes. To avoid inconsistencies, access to shared objects must be
disciplined by ACID transactions.
Usually it is assumed that persistent objects are always shared and volatile
objects are never shared. In our view, sharing and persistence are orthogonal
features. In particular, it is possible that persistent objects are not shared
(they are available to only one application, hence the transactional semantics
makes little sense). A typical case concerns a client application that need to
store persistently some results of calculations. And v/v, volatile objects that
are stored within modules can be shared, hence should follow the transactional
semantics. Hence some syntactic option should distinguish persistent objects
and shared objects. Depending on a persistency/sharing model this option can
belong to different definitions. We avoid the concept that some classes (hence
all its member objects) or types are qualified as persistent/shared. This
concept makes impossible to define classes or types that are relevant to local,
volatile and persistent objects, hence violates the principle of orthogonal
persistence. Instead of that we can assume that proper flags (keywords) are
assigned to modules rather than to classes.
Following strong typing, all objects
that are to be created must be declared
within a proper environment. Declaration has the form:
objectName: type [cardinality];
For instance:
x: integer;
An object named x with an integer value is declared; cardinality [1..1] is skipped.
Employee: record {
name: string;
salary: integer;
job: string;
worksIn:
ref Company[0..1];
} [0..*];
A collection of objects named Employee is declared. The collection has
any number of elements. The type is record (we avoid the term structure known from C/C++) having a
subobject name (of string type), a
subobject salary (of integer type), a
subobject job (of string type) and an
optional (cardinality [0..1]) subobject worksIn
being a pointer to an object named Company
(keyword ref). In our convention,
typing of a pointer requires the name of the object that the pointer points to
rather than the type of the object (unlike majority of programming languages).
This decision is caused by database-orientation: in the database schemata
object names are first class citizens while types are rather second-class and
at high abstraction level are omitted at all.
The programmer is responsible to put
such declarations in proper places, for instance, within the body of a
procedure/method, within a proper module, etc. When declaration is valid,
corresponding objects can be created by special instructions. When the lower
cardinality of the type of an object is 1 ( this also concerns the cardinality
[1..1], usually skipped) declaration of an object is equivalent to its creation
with some default value. To be ready to use such an object must be initialized
by an assignment.
Creating an object requires determining two features:
the state of a new object and its location (where the object is to be created).
The second feature is usually related to object persistence - a newly created
object is to be persistent when it has to be stored in a persistent environment
(i.e. a database or persistent module).
Creation of an object can be determined by a very
simple syntax:
instruction ::= create query;
query ::= create query;
It is assumed that the query being the argument of the
create instruction should return all the information that is necessary to
create an object or objects, in particular, names of objects and subobjects and
their values. The concept of binder that was introduced in previous chapters is
a very good mean for determining that. Hence it is assumed that the query returns a (perhaps) nested binder
or several such binders. The instruction creates one or more objects according
to the binders returned by the query, assuming that names of objects are determined
by names of binders, values of objects are the same as the values of binders
and nesting of objects exactly follows the nesting of binders. If the
instruction is used as a query, it returns a references (a bag of references)
of a newly created object (objects).
|
E.IM.1 |
Create an Employee
object. |
|
SBQL: |
create (“Doe” as name, 1000 as salary, “analyst” as job,
(Company where compName = “ECME”) as worksIn) as Employee; |
The query being the argument of create returns the nested
binder Employee( struct{ name(“Doe”),
salary(1000), job(“analyst”), worksIn(iECME ) } ), where iECME is the identifier of
the ECME company object. This binder is used to create an object by a simple
rule which assigns a new unique identifier to each (nested) binder.
Some SBQL users consider this syntax as a bit awkward
in the context of creating instructions, thus one can invent a more friendly
syntax (perhaps, only in this context). In the ODRA system the syntax q as n can be substituted by n(q) , hence the above example can also be written as:
|
E.IM.2 |
Create an Employee
object. |
|
SBQL: |
create Employee(
name( “Doe”), salary(1000), job(“analyst”), worksIn( Company where compName = “ECME” ) ); |
In general, in this book we avoid the discussion on
syntactic issues hence we will use the form E.IM.1 rather than E.IM.2. Both
forms we consider semantically equivalent.
The create instruction must obey the strong typing
system. In SBQL it is impossible to create a new object that does not conform
to declared specifications. If the object is created as an element of a
collection, the cardinality of the collection after creation is dynamically
checked. Creating objects requires also altering the environment stack ENVS,
which must be augmented by a corresponding binder (binders) in a proper stack
section. If the created object is to be shared, it should follow the transactional
semantics, i.e. it is locked till the given transaction will be committed.
After committing, the object is to be available for other transaction thus its
binder must be propagated to all current environment stacks of particular
applications.
Concerning the functionality and semantics of this
instruction we need to give several comments.
a) In the query being the argument of this
instruction returns several binders, the instruction creates as many objects as
binders. In particular, if the query returns the empty bag, no object is
created and this is not considered a failure. This rule is valid for any level
of object nesting. For instance, if Company objects are declared as:
Company: record {
compName: string;
location: string [1..*];
employs: ref Employee[0..*];
} [0..*];
then the instruction:
|
E.IM.3 |
Create a Company
object. |
|
SBQL: |
create ( “ECME” as compName, bag(“
(Employee where job = “analyst”) as
employs) as Company; |
creates an object with three location subobjects and with some number
of pointer objects employs leading to
Employee objects.
b) Because the order
of subobjects within an object is inessential, the order of binder constructs
in the query is inessential too. For instance, E.IM.4 is equivalent to E.IM.1.
This rule, however, can be changed if the defined query language has features
that make the order of subobjects essential (this, for instance, is a desired
feature of XML query languages).
|
E.IM.4 |
Create an Employee object. |
|
SBQL: |
create ( “analyst” as job, (Company where compName = “ECME”) as worksIn
“Doe” as name, 1000 as salary,) as Employee; |
c)
In general, a query being the argument of a create instruction
can return binders with references as values. In such a case the instruction
performs dereferencing (according to the specified type). The dereferencing is
not performed in cases when the type specification provides a pointer object.
For instance, the Employee type
provides worksIn pointers and the Company type provides employs pointers (through ref keywords), hence the results of
corresponding queries Company
where compName = “ECME” and
Employee where job = “analyst” are not dereferenced. In some cases ref keywords can be added before such
queries, to explicitly show in the source code that dereferencing should not be
performed and a pointer object is to be created. Using these keywords should be
checked by the strong typing mechanism, to avoid programmers’ mistakes.
E.IM.5 is an equivalent to E.IM.1, but there is more opportunity for strong
type checking.
|
E.IM.5 |
Create an Employee
object. |
|
SBQL: |
create (“Doe” as name, 1000 as salary, “analyst” as job,
ref(Company where compName = “ECME”) as worksIn) as Employee; |
d)
It may also happen that a query being the argument of
the create instruction will return an identifier of a complex objects, but the
type does not provide that a pointer is to be created. In such a case the
identifier is dereferenced according to a simple rule that maps the
corresponding object into binders, by removing all object identifiers and
retaining the nested structure of objects.
|
E.IM.6 |
Create a copy of the Doe’s object. |
|
SBQL: |
create (Employee where name = “Doe”) as Employee; |
Assume that the Doe’s object is represented as
the following nested triples:
<i1, Employee, {<i2, name,
“Doe”>, <i3, salary, 1000>, <i4,
job, “analyst”>, <i5, worksIn, iACME>}>
In this case the query Employee where name = “Doe” will return i1.
Then, the create instruction requires dereferencing of i1:
deref( i1 ) = struct( name( “Doe”), salary(1000),
job(“analyst”), worksIn(iACME) )
Hence such an instruction is reduced to the case presented in E.IM.1 and
E.IM.2. Dereferencing of references to a complex object is a natural extension
of the dereferences of references to atomic objects (which return their
values).
Locations of
Created Objects
The location of an object is
precisely determined by the strong typing system, thus in majority of cases
there is no need to determine its locations by special syntax. During the
static analysis phase (compilation) the type of a newly created object can be
determined. This type is compared with type signatures that are stored at the
static environment stack (S_ENVS – it will be explained in a chapter
devoted to strong typing). The comparison starts from the top of the stack to
its bottom and should determine a stack section having the corresponding
signature. This environment is the proper one for creating the new object. If
no corresponding signature is found on the stack, the statement is incorrect
and the strong typing mechanism should display a typing error.
For instance, if the database
contains specification of objects Employee and the local environment of some
procedure contains a similar specification, then the instruction of creating an
Employee object within this procedure will locate the object in the local
environment rather than in the database.
The local environment section is higher on the environment stack than
the database section, hence it will be visited by the binding mechanism earlier
that the database section.
If the environment is untyped
(what we consider a very disadvantageous case) then the location of created
objects must be determined by special syntax. In particular, in the LOQIS
system [SMA90, Subi91] the keyword create can be augmented by keywords permanent or temporary. The first
one denotes the database and the second one denotes a global user environment.
If no such a keyword is specified, the new object is created within a local
environment of the procedure that contains the instruction. A similar solution
is also used in the ODRA system, but with a bit different meaning. SBQL
implemented in ODRA is strongly typed. Keywords permanent, temporal and local are used for better understanding
of programs by the programmer and present additional opportunity for strong
type checking. There is, of course, a lot of other syntactic decisions
concerning this aspect.
Because creating and deleting
objects requires actions on an environment stack (perhaps on many environment
stacks, if objects are shared among many applications or processes) the
organization of environment stacks should support these operations. Actually,
this optimization causes a bit different concept of environment stack
construction. This new concept has been implemented both in Loqis and in ODRA.
In this concept binders that are to be stored at the environment stack are
stored at the object store rather than a the environment stack. The stack
stores references to object store environments. This is illustrated in Fig. 62
(compare Fig.3 and Fig.17). The upper part of this figure presents the
conceptual view on ENVS that is used to define the semantics of SBQL. The lower
part of this figure presents the optimized version, where binders in some stack
sections are substituted by an identifier of the proper object store environment.
In this way operators create and delete that act on a store environment need
not to update all the ENVS stacks that are currently present in all user
sessions or processes. In Loqis and ODRA this organization is further optimized
by inserting a section with proper binders (the result of the nested function) at the beginning of
each complex object.

Fig.62. Conceptual
and optimized environment stack
The instruction can be
augmented by an explicit clause determining where a new object is to be
created. The syntax can be as follows:
instruction ::= create query within query;
query ::= create query within query;
In the construct create q1 within q2
the query q1 determines
the state of the newly created object and the query q2 determines the object inside which the new object is
created[1].
Note that the above instruction is some generalization of the SQL insert instruction. As previously, the
second syntactic form assumes that the construct returns references to newly
created objects.
|
E.IM.7 |
Create an Address
object within the Doe’s object. |
|
SBQL: |
create (“
within Employee where name = “Doe”; |
The instruction must confirm
to specification of types. The above instruction is correct only when the type
of Employee includes an Address subobject in the form as
presented above.
In principle, the above
instruction can be considered as redundant, because it can be accomplished by
two instructions: create and insert:
|
E.IM.8 |
Create an Address
object within the Doe’s opbject. |
|
SBQL: |
create (“ insert Address into Employee where name =
“Doe”; |
However, for two reasons this
method would be inconvenient for programmer. First, it requires explicit
specification of the type of object to be inserted and declaring a proper
variable, which means two or more additional lines of code. Second, there could
be problems with naming. For instance, declaring a local object Address could
be awkward if the database would also contain objects Address – in this
case binding to these database object becomes impossible from the body of the
current procedure code, hence some additional tricks could be necessary.
In SBQL we have assumed
explicit deleting, as in SQL. Some authors claim that object programming
languages should not allow for explicit deleting, as such an operation is
dangerous. It may cause that some pointers or references to deleted objects
will become inconsistent (will became so-called dangling pointers). Hence an object is to be removed implicitly and
automatically by the garbage collection mechanism in situation when the object
is no more available (e.g. all pointers leading to it are nullified).
While these arguments have
some value in case of lower-level languages, they are totally misleading in
case of database programming languages. First of all, deleting an object has
clear conceptual and business meaning. Conceptual modeling of programs is
incomparably more important than any considerations concerning machine actions
and features. Secondly, in the shared data environment the programmer may not
enough rights and may have severe difficulties to recognize and remove/nullify
all the pointers that lead to the object. There are also other arguments, related
to database schema and operation on the schema. Summing up, arguments that are
valid for low-level programming are totally inadequate for business-level
programming that we want to support.
The SBQL syntax for deleting
objects is the following:
instruction ::= delete query;
We assume that the query returns references to objects
that have to be deleted. The instruction physically deletes all of them, so
they are no more available for further querying. If there are pointers that
lead to the deleted objects they are deleted too or nullified. In ODRA it is
assumed that each pointer from object A to object B is physically coupled with
(invisible) backward pointer from B to A, hence this operation can be easily
performed and always leaves the database in a consistent state. The instruction
makes no difference concerning which kind of object is considered and on which
object hierarchy level it is located – in the same way we may delete
objects, attributes, views, stored procedures, etc. The instruction must conform
to the type specification. Type failures can be detected statically (during
compilation), e.g. deleting a subobject name
within Employee, or dynamically
(during runtime), in particular, violating declared cardinalities of a
collection.
|
E.IM.9 |
Delete all objects of analysts. |
|
SBQL: |
delete Employee where job = “analyst”; |
|
E.IM.10 |
Delete the Address subobject from the Doe’s object. |
|
SBQL: |
delete ( Employee where name = “Doe”).Address; |
There are several peculiarities with this instruction:
a) Duplicates returned
by the query should not cause failures. It may happen for various reasons that
the query being the argument of the delete instruction will return duplicate
references. If the object is deleted, an attempt do delete it the second time
may result in failure. Note that this requirement could be difficult to reduce
to the case delete unique(query); ,
where the unique function removes duplicates. As will be shown a bit later,
duplicates may appear due to other rules and can be tangled within complex
structures. Hence each instruction should collect identifiers of objects
already deleted and check each next deletion if it is necessary or not.
b) Duplicates within
pointers leading to deleted objects should not cause failures. If the list of
references to objects to be deleted contains duplicates the list of pointers
that lead to the deleted object contain duplicates too. If these pointers are
to be removed, the program should not fail.
c) The query may
return binders with references rather than references alone. In such cases the
deleting operation should not fail. It should remove objects according to their
references being the values of the binders.
|
E.IM.11 |
Delete the Doe’s object. |
|
SBQL: |
delete Employee as e where e.name
= “Doe”; |
d) A query being the
argument of the delete instruction may return arbitrarily complex structure
with references. This feature makes it possible to organize complex (cascade)
deleting within one query.
|
E.IM.12 |
Delete the ECME company together with all the
analysts that work for it. |
|
SBQL: |
delete (Company
where compName = “ECME”) as c
join (c.employs.Employee where
job = “analyst”) as a; |
The query will return the bag
of structures: bag( struct( c(iComp1), a(iEmp1),
struct( c(iComp2), a(iEmp2),…
) ). Note that in this case the query will return duplicates of c(iECME)
binders, which should be handled without failures.
The assignment operator can be introduced in the
classical version:
instruction ::= query :=
query ;
Semantically, the assignment leftQuery := rightQuery requires that the leftQuery returns a reference to an object and the rightQuery returns a value, which will
be the new value of the object. If the rightQuery
returns a reference too, then the dereferencing operator is implicitly applied.
The operator does not change the identifier of the object. In SQL the above
operator is accomplished in the form of the update
clause which also allows to make more than one update within one statement. The
same effect we can achieve by combining the assignment operator with the for each operator (the example will be
shown later). The SQL syntax we consider disadvantageous and not sufficiently
orthogonal with other query language constructs, hence we do not use it. We
also consciously avoid the use of the equality as an assignment operator (as in
C/C++. Java, C#, etc.) using = as a comparison (following some hundreds years
of the mathematical tradition and common elementary school teaching[2]).
In case when an object referenced by the leftQuery is specified with the
cardinality [0..1], [0..*], etc. it may happen that the leftQuery will return an empty result. This causes that the
programmer for all objects specified by a cardinality with lower bound 0 must
check the existence of the object before the assignment and create it if it
does not exist. This can be considered as an awkward feature. There is a
possibility to change the semantics of the assignment operator to avoid this inconveniency,
see the section Assignment to
Absent Object.
The assignment operator in the above syntax cannot be
macroscopic. That is, the leftQuery
should return exactly one reference (if the assignment to absent object is not
implemented) and the rightQuery
should return exactly one value. If these queries return bags it would be
impossible to determine which value is to be assigned to a particular reference.
If they return sequences, the situation is not better, because it could be very
difficult to assure that the sizes of sequences are the same and the orders of
references and values are correct. Moreover, in many cases parts of the queries
must be repeated in left and right queries. It is much easier to use the
operator for each to achieve the
effect of the macroscopic assignment.
In the Loqis system we also implemented the syntax
instruction ::= update query ;
with the assumption that the query returns a
two-column table (a bag of two-element structures), where the first column
contains references and the second column contains values. Then, for each row
of this table the corresponding value is assigned to the corresponding
reference. This form is a bit more powerful than the previous assignment form
embedded within a for each loop.
However, its syntax appears to be awkward for large queries and the advantage
over the for each construct is
difficult to justify. In our examples we show the use of this construct and the
reader will be able to assess if it has some advantages or not.
The same assignment operator can be used to update
pointer objects and to update complex objects. We illustrate these cases by
examples, see the schema from Fig.63.

|
E.IM.13 |
For all programmers get rise of 5% (disregard
optional jobs). |
|
SBQL: |
for each Emp where job = “programmer” do { sal := sal * 1.05; } |
|
SBQL: |
update Emp where job = “programmer”. (sal , sal * 1.05); |
|
E.IM.14 |
Increase the value of a local variable x by 1. |
|
SBQL: |
x := x + 1; |
|
SBQL: |
update x , x + 1; |
|
E.IM.15 |
For all employees older than 40 get rise of
100 and change their job to engineer. |
|
SBQL: |
for each Emp
where age > 40 do { sal := sal + 100; job := “engineer”; }; |
|
SBQL: |
update (Emp
where age > 40 ). bag((sal , sal + 100), ( job ,
“engineer”)); |
The second form is typologically incorrect (regarding
the current SBQL implementation), as it joins within one bag a structure of two
reals and a structure of two strings. It would be possible in languages without
strong typing or with heterogeneous (variant) types. However, such cases are
not recommended.
Note also that both statements are incorrect if job is absent. The full and correct
version of this statement could be the following:
|
E.IM.16 |
For all employees older than 40 get rise of
100 and change their job to engineer (a fully correct version). |
|
SBQL: |
for each Emp
where age > 40 do { sal := sal + 100; if
not exists job then create “nothing” as
job; job := “engineer”; }; |
Such an if..then
statement must be inserted in every case when lower cardinality of the object
type is 0. This is considered awkward, thus can be improved by altering the
semantics of the assignment; see the section Assignment to Absent Object.
|
E.IM.17 |
For departments located in |
|
SBQL: |
for each (Dept.loc as dl) where dl = “ |
Note that loc
within Dept-s are collections and we
have to update some elements of them. The assignment concerns the
“auxiliary variable” dl,
but the update will concern some loc
subobject. In contrast to other query languages that we are aware of, SBQL
auxiliary variables may bear references to objects as well.
|
E.IM.18 |
Move all programmers to the department
managed by Doe. |
|
SBQL: |
for each (Emp where exists job) where job = “programmer” do
{ worksIn := Dept where (boss.Emp.name) = “Doe”; } |
This statement updates pointers worksIn and employs. We
have assumed (following ODRA implementation) that pointers worksIn and employs are
bidirectional: updating of one of them causes automatic updating of its twin.
We have also assumed that worksIn is
typed as a pointer to a Dept object, hence ref before Dept is
optional (we skip it). If the query language is not strongly typed and has no
bidirectional pointers, the statement is more complicated:
|
E.IM.19 |
Move all programmers to the department
managed by Doe. |
|
SBQL: |
for each ((Emp where exists job) where job = “programmer”) as p do { create
(ref Dept where (boss.Emp.name) = “Doe”) as dd; delete
Dept.employs where Emp = p;
//removing old employs pointers
create
ref p as employs within dd.Dept; //new employs
pointers p.worksIn
:=
dd; //new worksIn
pointers }; |
A local object dd is used to store a
pointer to the Doe’s department.
The statement can also be used to update complex
objects. The semantics is very similar to the semantics of the create instruction. The left side of the
updating instruction must return the reference to an object being update. The reference
is not changed after updating. At the beginning, all subobjects of the object
are deleted. Then, new subobjects are inserted according to the right side of
the instruction. For a complex object it should contain a structure with
binders; each binder is then changed into a subobject by generating a new
unique object identifier. Names/values of binders are the names/values of
created subobjects.
|
E.IM.20 |
Get new data for the employee with eno =
223344. |
|
SBQL: |
(Emp where eno = 223344) :=
( “Lee” as name, 1980 as birthYear, (“ 223344 as eno,
“programmer” as job, 2000 as sal, (Dept
where dno = 789) as worksIn ); |
|
SBQL: |
(Emp where eno = 223344) :=
( name(
“Lee” ), birthYear(
1980 ), address(
city( “ eno(223344),
job( “programmer” ), sal( 2000 ), worksIn(
Dept where dno = 789 ) ); |
The second syntactic form uses n(q) rather than q as n.
It seems to be better for reading.
While the identifier of an object being updated is not
changed after updating, this does not concern its subobjects. In ODRA we
assumed that all old subobjects are cancelled and new subobjects receive new
identifiers. Although some identifiers of the subobjects can be retained, in
general it hay happen (due to optional and repeating subobjects) that some
identifiers must be lost and some new identifiers must be generated.
Semantically, it is difficult to imagine the situation in which the persistence
of the identifiers of subobjects may have some meaning. However, if such
situations exist, the identifiers of the subobjects can be retained as far as
possible. This requires a bit more sophisticated implementation.
Our definition of the assignment operator allows also
for cases when the right side query returns a reference to an objects. Such a
reference is then dereferenced according to a simple rule, as follows:
·
For atomic objects <i, n, v>: deref( i ) = v;
·
For pointer objects <i1, n, i2>: deref( i1 )
= i2;
·
For complex object <i, n, { <i1, n1, T1>, <i2,
n2, T2>,…,<ik,
nk, T1> }>, where Ti
is an atomic value, an identifier or a set of objects:
deref( i ) = struct{ n1(deref(i1)), n2(deref(i2)), … nk(deref(ik))}.
Note that the definition is recursive and general.
Below we present examples.
|
E.IM.21 |
Dereferencing a reference to a complex object |
|
Object: |
<i, Emp,
{ <i1,
name, “Lee”>, <i2,
birthYear, 1980>, <i3,
address, {<i4, city, “ <i7,
eno, 223344>, <i8,
job, “programmer” >,
<i9,
sal, 2000>, <i10,
worksIn, i789> }> |
|
deref(i): |
struct{ name(
“Lee” ), birthYear(
1980 ), address(
struct{city( “ eno(223344),
job(
“programmer” ), sal(
2000 ), worksIn(i789) } |
|
E.IM.22 |
For Nec assign all the data stored within the
Doe’s object. |
|
SBQL: |
(Person
where name = “Nec”) := (Person
where name = “Doe”); |
|
E.IM.23 |
Change the Poe’s address to the address
of Doe (disregard optional addresses). |
|
SBQL: |
(Person
where name = “Poe”).address := (Person where name = “Doe”).address; |
In some languages additional variants of the
assignment are introduced: x
+= y denoting x :=
x+y, x -= y
denoting x := x –y,
etc. Such shortcuts are a bit controversial, although many programmers consider
them useful.
The assignment operators meets also some problems with
the substitutability
principle. In the assignment X := Y both X and Y can be typed by type T, but according to the
substitutability X can return a reference to an object typed by some subtype T1
of T. This causes some doubts what to do with attributes that are present in T1
but absent in T. For instance, we can take the example E.IM.22, but the person
“Nec” is also an employee, hence his object may have also
subobjects eno, job and
worksIn (and perhaps optional subobjects sal and manages).
According to our standard semantics of the assignment, in the first stage the
internal of the object referenced by X is deleted, then the new internal is
created, according to Y. However, because “Doe” need not to have
subobjects eno, job and
worksIn, after the assignment the object referred by X cannot be
typed as an Emp object. Hence the
assignment has changed the type of the object, what may cause various
inconsistencies in other parts of the program. For instance, the object is no
more connected by a link manages,
hence some Dept object has no boss subobject, what is inconsistent
with its type.
Perhaps there are simple solutions of this problem.
For instance, we can cancel only those subobjects that are delivered by the
type of X, and leave without changes other subobjects. This means some
complication of the semantics. Usually more complicated semantics is more
difficult to implement and is more error prone. Perhaps some programming
experience for such cases is required to propose a reliable and simple
solution.
Inserting Objects
Insertion allows to insert an object
into another object[3]. The syntax is as follows:
instruction ::= insert query into query;
instruction ::= insert query into module;
In the second syntactic form insert q into
module we assume that modules
can be specified by different options that are unavailable as queries. For
instance, modules can possess their specific names or can be specified by
keywords such as local, global, persistent, etc. The semantics of this instruction is identical as
the semantics described above.
In ODRA the syntax of the
instruction reminds an assignment:
instruction ::= lquery :< rquery;
where lquery ::= query, rquery ::= query; an object (objects) referenced by rquery is inserted into an object referenced by lquery.
For the AS2
store model this instruction should allow for inserting a new role into an
object.
Similarly to creating instructions, the insert instruction
should leave all the environment stacks that are currently operating by running
processes or threads in a consistent state. Efficient implementation of this
requirement requires changes in physical stack’s organization, as
presented in Fig.62.
|
E.IM.24 |
For each employee younger than 30 and having
no job attribute, insert the subobject job
with the value “assistant” (cf. Fig.63). |
|
SBQL: |
for each (Emp
where age < 30 and not exists
job) as e do { insert create “assistant” as job into e; } |
We follow the semantics of the create
instruction which returns the reference of a newly created object. Note some typing
peculiarity. Formally, because the object job is created in the current
environment, we should assume that the type of the variable job is declared
within it. However, because the object is created only for a moment, we can
relax this requirement and skip declaring its type. The types will be finally
checked after performing the entire instruction.
|
E.IM.25 |
Move the |
|
SBQL: |
insert (Dept
where dname = “Sales”). loc
as p where p = “ into Dept
where dname = “Ads”; |
Note that the first query
returns a binder named p with a
reference to a loc subobject.
|
E.IM.26 |
Move all the employees older than 65 from the
current module to the Retired module. |
|
SBQL: |
insert Emp where age > 65 into Retired; |
Note that after the insertion all links worksIn/employs and manages/boss are not changed.
Sometimes the programmer wants to insert a copy of an
object into a given object. The syntax could be as follows:
instruction ::= insert copy query into query;
instruction ::= insert copy query into module;
The copies of objects receive new unique identifiers.
|
E.IM.26 |
For all the persons with no addresses copy
the address of Doe. |
|
SBQL: |
insert copy (Emp where name = “Doe”).address
into Emp where not
exists address; |
Changing Object Name
In many business applications
objects can change their business role without changing their identities. A
business role is frequently expressed by an object name. In terms of programming
languages, change the name of a business objects requires change the name of an
object or a programming variable. In majority of programming languages this is
impossible because names of objects are second class citizens: they can be
manipulated in a source code but cannot be manipulated during run time. The
situation is different in databases, where all the names are available during
run time, but usually there is no programming capabilities to change them.
Additionally, changing object name can be constrained by its type.
In the current implementation
of SBQL in the ODRA system all the names of objects (subobjects) are
first-class citizen due to the assumed late binding of program entities. Hence
there is no obstacles to change these names, according to the current business
needs. This feature is so far not implemented in ODRA, but it is planned.
Changing object name should be compatible with the declared types and the
cardinalities of collections The syntax can be as follows:
instruction ::= rename query to query;
Semantically, rename q1 to q2 implies the following action.
Query q1 returns
references to objects to be renamed and query q2 returns a string that will be the new name of the objects.
The new name is assigned to the objects determined by q1.
|
E.IM.27 |
Assume the database has collections Emp and
SeniorEmp of the same type and class. Move all employees older than 50 to the
collection SeniorEmp. |
|
SBQL: |
rename Emp where age > 50 to “SeniorEmp”; |
Control Statements
There are several good
patterns of control statements that are developed in existing programming
languages and we do not want to invent something radically new. Mostly we
follow Java. We present them only for completeness of the definition and for
using them in examples. In different versions of SBQL we have implemented the
following constructs:
instruction ::= if query
then programBlock
instruction ::= if query
then programBlock else programBlock
programBlock ::= { Instructions } | { variables
Instructions }
Instructions :=
instruction | instruction Instructions
In the statement if q then
p query q should return a Boolean value. If q returns true the
program block p is executed, otherwise
no action is performed. In the statement
if q then p1 else
p2 if query q
returns true then p1 is
executed, otherwise p2 is
executed. A programBlock can contain
declarations of local variables; their scope is limited to the block.
instruction ::= case query do labelledInstructions endcase
instruction ::= case query do labelledInstructions else programBlock endcase
labelledInstructions
::=
labelBlock | labelBlock labelledInstructions
labelBlock ::= label
: programBlock
label ::= literal
This switching command is well
known from other languages. In the syntax we assume that a label must be unique literal of the type determined by the query. After the query returns some value, it is compared with all the labels in the
given case statement. If a label matches the query result the
corresponding programBlock is
executed. If no label matches the query, then no action is performed (the first
syntactic variant) or the programBlock
after else is performed (the second syntactic variant).
Other assumptions concerning
labels are possible. In Loqis we implemented a variant where a label can be
determined by any query (which should return an atomic value, which is to be
compared with the result of the query after case).
This variant gives a bit more possibilities for the programmer. However,
because it is less explicit concerning the source code, it causes more
opportunities for bugs.
Other group of control
statements concerns organizing loops. In the current ODRA implementation these
instructions follow Java, with the well-known semantics:
instruction ::= while
query do programBlock
instruction ::= do
programBlock while (query)
instruction ::= for(initstmnt; query; incrstmnt) do programBlock
initstmnt ::=
instruction
incrstmnt ::=
instruction
|
E.IM.28 |
For intervals [0, 99], [100, 199], [200,
299], … , [2000, 2099] print an interval and the number of employees
earning a salary from the interval if the number is not zero. |
|
SBQL: |
for( x
:= 0; x <= 20; x := x+1 ) do { y:
integer; z: integer; y := 100 *
x; z
:= count(Emp where sal >= y and sal <= y+99); if z <> 0 then { print( y, y+99, z); } } |
For each statement
The for each statement follows the semantics of non-algebraic operators
of SBQL. It operates on the environment stack ENVS in a similar way as
non-algebraic operators do. The operator processes collections of unknown
sizes.
instruction ::= for
each query do programBlock
The semantics of the
instruction for each q do p
is as follows. First, query q is evaluated; it can return a single element e, a bag of elements bag{e1, e2, …} or a sequence of elements sequence{
e1, e2, …}. A singe element e is considered bag{e}. For each element e
returned by q the following actions
is performed:
·
nested(e) is calculated and pushed at the top
of ENVS. If e is a reference to an
object that is a member of class C1,
which is a subclass of C2, which is a subclass of C3, etc. then the top of ENVS contains the following sections
(starting from the top): nested(e), nested(iC1), nested(iC2), nested(iC3), … . In all details the rule is the same as
explained for non-algebraic operators in the models AS1, AS2 and AS3. For the AS2
model (dynamic object roles) super-roles also influence ENVS, as explained
previously. For the AS3 model (encapsulation) instead of the nested function the nested_public function is applied.
·
Program p is
executed within this new state of the environment stack.
·
All sections that are pushed at ENVS in the first step
are popped.
If q returns a bag then the order of elements e is not determined, but all of them must be taken into account. If
q returns a sequence, then the order
of elements e is determined by this
sequence. If an empty bag (sequence) is returned, the instruction does nothing
and this case do not cause errors or exceptions.
In general, the instruction
can be unsafe, e.g. when program p
removes an object having a reference returned by q. In ODRA we do not prevent this inconsistency just in this place;
in general, exception is risen if any operation is to be performed on
non-existing object.
One can consider a syntactic
variant of the for each instruction for the case when q returns exactly one element.
instruction ::= with
query do programBlock
In this case the machine
checks if query returns exactly one
element ad causes typological error or exception otherwise. The semantics is
exactly the same.
Unlike other proposals
concerning for each statements, we do
not introduce a special iteration variable (or variables). In SBQL such special
variables are unnecessary, as they can be introduced through as or group as operators as auxiliary names with the standard semantics
(see here).
The for each operator has been
already exemplified in examples E.IM.13, E.IM.15, E.IM.16, E.IM.17, E.IM.18,
E.IM.19, E.IM.24. Some next examples are presented below.
|
E.IM.29 |
A variant of E.IM.13 with an “iteration
variable”. For all programmers get rise of 5% (disregard optional
jobs). |
|
SBQL: |
for each (Emp where job = “programmer”) as e do { e.sal := e.sal * 1.05; } |
Two “iteration
variables”:
|
E.IM.30 |
Print names of employees and names of their
bosses in the alphabetic order according to the names of employees. |
|
SBQL: |
for each Emp
as e join (e.worksIn.Dept.boss.Emp) as b order by e.name do { print(“Employee
” + e.name + “ works
for ” + b.name); newline(); } |
|
E.IM.31 |
Increase the salary of Doe by 100 and change its job to engineer
(assume there is exactly one Doe). |
|
SBQL: |
with Emp
where name = “Doe” do
{ sal
:= sal +100; if
not exists job then {create “” as
job;} job
:= “engineer”; } |
Low-level Iterators
The presented above constructs
for organizing program loops, including the for
each statement, are sufficient to program almost all programming cases which
require iterations. However, there are tasks when these capabilities are
insufficient. The literature mentions an example of such tasks, namely, merging
two sorted collections into one sorted collection. Assume that the collections
are named C1 and C2 and the resulting collection is C. The algorithm can be
represented in a pseudocode as follows:
C := empty_sequence;
e1 := get_first_element(C1);
e2 := get_first_element(C2);
while e1 <> null or e2 <> null do {
if e2 = null or e1 < e2 then {
C := C + e1; //concatenate C with e1
e1 := get_next_element(C1);
}
else {
C
:= C + e2; //concatenate C with e2
e2 := get_next_element(C2);
}
}
The algorithm cannot be coded
by two nested for each statements, because
a next element to be processed can be taken from the first or from the second
collection, according to the result of comparison. For coding such tasks we
need functions such as get_first_element, get_next_element and similar. In
programming languages such functions are collected into a construct called
iterator. An iterator is a data structure that stores an iterator state (e.g. a
reference to a processed collection and a reference to a currently processed
object) and some (usually small) collection of functions such as getFirst, getNext, getPrior, etc. If an
iterator is to be applied to a result of a query, its state must contain this
result. Except SQL, no programming language implements so general iterators.
Usually iterators traverse stored collections, but generalization of them to
collections derived by queries or views seems not to be a challenging task.
Low-level iterators imply a
similar problem as for each
statements in case when a collection being traversed by a iterator is augmented
or reduced in its loop. Such modifications must be implemented with an extreme
care, to avoid the situation when some element is to be processed two times or
not processed at all.
In SQL such iterators are
known as cursors. A big disadvantage
of SQL cursors is that their names are global for the application, what makes
difficulties e.g. with nesting iterators within functions that can be called
from many parts of an application program (in particular, recursive functions).
This can also be considered as a sign of impedance mismatch, because SQL
interpreter (by definition) ignores the environment stack determining the scope
for names[4].
In integrated languages such as SBQL names and states of iterators should
follow the stack-based discipline, similarly to names of all other local data
structures in the stack-based approach. There are many good patterns of
iterators in existing programming languages, e.g. Java, and we do not want to
contribute to this state-of-the-art. In this chapter we give no syntax and
semantics of iterators, considering them as further development of SBQL.
Other
Issues
There are several other issues
related to imperative statements in SBQL. They are well recognized in other
object-oriented programming languages and are orthogonal to the stack-based
approach that we want to explain on these pages. In ODRA we implemented
exceptions taking the syntax and semantics from Java. Recently we have also
implemented a prototype version of templates, taking the idea from C++. Implementation
of other features are considered.
Last modified: February 23, 2010
[1] In the ODRA system this instruction has the syntax
remaining the assignment operator:
lQuery :<<
name(rQuery);
The
left side determines where a new object must be created and the right side
determines the state of the new object.
[2] In SQL the character = is used as an equality
and as an assignment in a single update
statement. Such an overloading of the operator is obviously error-prone.
[3] Note that the insert
clause in SQL inserts new tuples into a table, hence from the conceptual point
of view it corresponds to the create instruction explained above. SQL has no an
inserting clause as introduced in this section.
[4] For this reason in the DBPL project [MRSS92] the author was forced to organize an own
explicit stack of cursors that exactly duplicates the actions of the environment stack of Modula-2 (the
programming language for DBPL in which SQL clauses were implicitly nested).