The scope of this article is to suggest ways of storing boolean values into a

database, and to work on them. This is a topic that pops up quite frequently

at Oracle Support due to the fact that although PL/SQL does support the boolean

type, Oracle RDBMS doesn't have any built-in boolean data type for table

columns. This implies that it is not possible to create columns with such a

type. However, it is fairly easy, albeit cumbersome, to simulate one. To do

this we ideally should restrict ourselves to use whatever existing data type

and, possibly, package Oracle already offers us, in order to limit the overhead

to a minimum. We'll consider first an Oracle 7 procedural approach, which still

works in Oracle 8. In a later article, we'll use the new object-based features

of Oracle 8 to wrap this up in a much cleaner way.

Boolean: what's that ?

----------------------

Boolean values are extremely useful and used in everyday life, even though

most people don't call them by this name or suspect their existence. Each time

a question can be answered by yes or no, this answer can be stored in a boolean

variable. Such variables, sometimes called flags, are allowed only two values,

generally named TRUE or FALSE, but sometimes 1 or 0, ON or OFF, OPEN or CLOSED,

etc.., depending on their usage context. This type is important enough that it

deserved a whole mathematical theory of its own called boolean algebra: the

logic, with implications on computer technology and digital communications.

However important, Oracle RDBMS Server does not support this type directly. It

is left to the application developer to implement the specifics of using this

datatype, this article aims at aiding in this task. One often encounters columns

of datatype char(1) that has distinct values of 'Y'/'N' etc. But what if there

are tens of such flags ? And how to compare multiple such flags together ? Let's

not forget that the combination of n flags grows exponentially in 2**n. If for

instance one has to conditionally test 5 logical properties, one may have to do up

to 32 (i.e. 2**5) tests.

Requisites

----------

Proper support for such a type should enforce its domain integrity and provide

sensible operations on its variables. Domain integrity just consists in ensuring

that only one of two conventional values is stored in a boolean variable, whereas

usual logical operations to supply are: and, or, xor, not. What conventional

values to store into a boolean variable really depends on whether one wants to

implement a truly universal boolean-like type or simply an ad-hoc one that best

suits one current application's needs. Here, we'll assume that we don't want to

implement a new boolean type each time we need one. We want a generic, well-proven

one that can be reused over and over again. Thus, those generic conventional values

should be '1' for logical TRUE and '0' for logical FALSE. Also, what about the

capability to manipulate groups of boolean values as a whole? Besides adding

power, this drastically reduces name space cluttering. Such packs of booleans can

be seen as strings of bits, a handy representation for straightforward bitwise

operations. Incidentally, such strings of bits are ideally suited for

implementing sets, and the bitwise functions are directly usable for doing set

operations such as union (or), intersection (and), complement (not), and symmetric

complement (xor). Thus, an added benefit of providing support for booleans is an

almost implicit, free of charge support for sets.

One not so good solution

------------------------

One easy way to store boolean values would consist in using the VARCHAR2(n) column

type: it is expandable up to 4000 values and individual values can be easily

accessed through powerful Oracle built-in string functions. On the con side:

a) each boolean value would take up 1 or 2 characters (depending on whether a

single-byte or a double-byte character set has been selected);

b) we must make sure the chosen characters for representing the TRUE and FALSE

values are always respected regardless of character set considerations,

particularly when exporting/importing;

c) we must implement our own bitwise functions for manipulating such values or

at least providing an interface to some existing package if any;

A better solution

-----------------

Another better existing data type we can use to store boolean values is RAW(n).

This data type stores binary values such as multimedia ones, and guarantees that

no interpretation (i.e. conversions) is ever performed on them. Thus their original

value is preserved throughout their existence. The advantages are: expandability,

applicability of the built-in single-byte string functions, immunity to the

character set and national language variations including during export/import.

But the biggest incentive is the existence of the utl_raw package. Thanks to

this package, we won't have to implement the binary functions ourselves: that

package provides us with functions bit_and, bit_or, bit_xor and bit_complement

(i.e. the not operator). The package specification can be found in

$ORACLE_HOME/rdbms/admin/utlraw.sql, and its body in $ORACLE_HOME/rdbms/admin/prvtb.plb.

Here are the interesting functions' headers:

FUNCTION bit_and(r1 IN RAW,

r2 IN RAW) RETURN RAW;

FUNCTION bit_or(r1 IN RAW,

r2 IN RAW) RETURN RAW;

FUNCTION bit_xor(r1 IN RAW,

r2 IN RAW) RETURN RAW;

FUNCTION bit_complement(r IN RAW) RETURN RAW;

The n in RAW(n) specifies the number of allocated bytes for storage. Thus,

boolean values will be stored in bits by packs of 8. To find this size value,

one can use the following formula:

n = ceil(number_of _needed_booleans/8)

However, there are a few caveats in using RAW(n) and those functions. Firstly,

the programmer must keep a mental image of the flags and how they are mapped

into a RAW(n) string of bits. One must shift from independent, high-level single

variables to low-level concepts such as bits, bytes, words, quadwords or more.

While this is quite common for system programmers, it can be quite disconcerting

at first for database programmers and may need some practice. Secondly, the

programmer must express those strings bits in hexadecimal. This implies that bits

are divided in groups of 4, each of which is then transcribed each into a

hexadecimal digit. Moreover, since the utl_raw package's functions work on the RAW

data type, hexadecimal values must be converted from/to RAW(n) through the

function hextoraw() and rawtohex(). For example, one has to write:

MyVar := rawtohex(utl_raw.bit_and(hextoraw('07af'),hextoraw('ff81')));

Admittedly, this is not that big a problem but it clutters one's code. On the

pro side, the hextoraw() function ensures that the string literals or variables

really are hexadecimal numbers, which enforces domain integrity. If a

non-hexadecimal number is passed to hextoraw(), an ora-01465 is issued by the

function. Thirdly, the functions in this package have several quirks of their

own: one must ensure that the parameters are all the same length because these

functions work from left to right and don't left pad with 0 bits, contrary to

what would be expected by common arithmetic rules where right justification is

implicit. Consider this example:

select utl_raw.bit_or(hextoraw('300'), hextoraw('f0')) example1 from dual;

EXAMPLE1

-------

F300

instead of the expected 3F0.

If the user makes sure both operands are the same length, then the ambiguity

is resolved:

select utl_raw.bit_or(hextoraw('300'), hextoraw('0f0')) example2 from dual;

EXAMPLE2

--------

03F0

Note that a 0 has been prefixed because hexadecimal values are manipulated

one full byte (i.e. 2 nibbles or hexadecimal digits) at a time. This is the

reason why the first case yielded F300 instead of the more obvious F00.

Another quirk is illustrated here:

select utl_raw.bit_or(hextoraw('30123'), hextoraw('0f0')) example3 from dual;

EXAMPLE3

--------

03F123

Given the above, we could have expected 3F1 or 03F100. What happened here ?

We are simply hitting another peculiarity of those functions. When one argument

is shorter than the other one, evaluation simply stops and the remaining digits

of the longest arguments are appended to the result. The rule of thumb here is

therefore to make sure that both operands have the same length and that the

number of hexadecimal digits is even. This will prevent any bad surprise. Had

we respected this simple rule, the previous statement would have yielded:

select utl_raw.bit_or(hextoraw('030123'), hextoraw('0000f0')) example4 from dual;

EXAMPLE4

--------

0301F3

which was predictable.

Fourthly, it is important to stick to a convention about the direction of reading

the bit string, and the index numbering. Usually, the bits are read from right

to left. Bit number 0 (the least significant one) is the rightmost one, whereas

the last bit (the most significant one) is the leftmost one. This can be

disconcerting at first because the last bit is actually written first, but then

this is how we write numbers daily and we aren't that much amazed any more. The

index being 0-based should not be a surprise either because this corresponds to

the weight (i.e. the numeric base's exponent) in the positional notation.

Additional functions

--------------------

With this implementation based on RAW(n), one further issue is raised: the individual

accessibility and workability of the single variables. For this, one could use the

following functions:

PROCEDURE SetBit(R IN OUT RAW,

Index IN BINARY_INTEGER);

PROCEDURE ResetBit(R IN OUT RAW,

Index IN BINARY_INTEGER);

PROCEDURE ToggleBit(R IN OUT RAW,

Index IN BINARY_INTEGER);

FUNCTION TestBit(R IN RAW,

Index IN BINARY_INTEGER) RETURN BOOLEAN;

Note that here we can use the build-in PL/SQL boolean type because we are actually

programming and no longer dealing with storage considerations. We have thrown in

exception handling to make sure the index lays between the range 0..8 * utl_raw.length(R)

(remember: we want to access individual bits from a suite of bytes through their

0-based bit index). To simplify we have used the predefined exception

SUBSCRIPT_OUTSIDE_LIMIT . Moreover, we could add purity pragmas to allow using

these functions from within SQL statements. SetBit(), ResetBit(), ToggleBit()

and TestBit() could be implemented with bit-masking techniques, i.e. and-ing or

or-ing the passed value with a specially generated constant. A basic constructor

for doing this could be defined as follows:

FUNCTION MakeMask(NumberOfBits IN BINARY_INTEGER,

TheIndex IN BINARY_INTEGER) RETURN RAW IS

-- 0 <= TheIndex < NumberOfBits

-- otherwise exception SUBSCRIPT_OUTSIDE_LIMIT raised;

-- returns a string of NumberOfBits all zero bits but the TheIndex-th

-- one;

BEGIN

IF TheIndex < 0 OR Index >= NumberOfBits THEN

RAISE SUBSCRIPT_OUTSIDE_LIMIT;

END IF;

RETURN hextoraw(lpad(ToBase(power(2, TheIndex), 16),

2 * Ceil(NumberOfBits/8),

'0'));

END MakeMask;

where ToBase() is a function that returns a character string containing a number

in the given base (in the range 2..16). MakeMask() always returns an even number

of nibbles (i.e. generates full bytes). Here is an implementation of ToBase():

FUNCTION ToBase(N IN BINARY_INTEGER,

TheBase IN BINARY_INTEGER) RETURN STRING IS

TheNumber BINARY_INTEGER := N;

S STRING(100) := '';

Digit NUMBER(2);

BEGIN

IF TheBase < 2 OR TheBase > 16 THEN

RAISE VALUE_ERROR;

END IF;

<<Convert>>

LOOP

Digit := MOD(TheNumber, TheBase);

IF Digit >= 0 and Digit < 10 THEN

S := TO_CHAR(Digit) || S;

ELSIF Digit = 10 THEN

S := 'A' || S;

ELSIF Digit = 11 THEN

S := 'B' || S;

ELSIF Digit = 12 THEN

S := 'C' || S;

ELSIF Digit = 13 THEN

S := 'D' || S;

ELSIF Digit = 14 THEN

S := 'E' || S;

ELSIF Digit = 15 THEN

S := 'F' || S;

END IF;

TheNumber := FLOOR(TheNumber / TheBase);

EXIT Convert WHEN TheNumber = 0;

END LOOP;

RETURN S;

END ToBase;

With MakeMask() available, SetBit(), ResetBit(), ToggleBit() and TestBit()

could possibly be:

PROCEDURE SetBit(R IN OUT RAW,

Index IN BINARY_INTEGER) IS

BEGIN

R := utl_raw.bit_or(R,

MakeMask(8 * utl_raw.length(R), Index));

END SetBit;

PROCEDURE ResetBit(R IN OUT RAW,

Index IN BINARY_INTEGER) IS

BEGIN

R := utl_raw.bit_and(R,

utl_raw.bit_complement(MakeMask(8 * utl_raw.length(R),

Index)));

END ResetBit;

PROCEDURE ToggleBit(R IN OUT RAW,

Index IN BINARY_INTEGER) IS

BEGIN

R := utl_raw.bit_xor(R,

MakeMask(8 * utl_raw.length(R), Index));

END ToggleBit;

FUNCTION TestBit(R IN RAW,

Index IN BINARY_INTEGER) RETURN BOOLEAN IS

BEGIN

RETURN utl_raw.compare(

utl_raw.bit_and(R, MakeMask(8 * utl_raw.length(R), TheIndex)),

hextoraw('00'),

hextoraw('00')

) = 1;

END TestBit;

For completeness, one could define the constructors MakeResetAll() and

MakeSetAll() too:

FUNCTION MakeResetAll(NumberOfBits IN BINARY_INTEGER) RETURN RAW IS

-- returns a string of NumberOfBits 0 bits;

Temp RAW(2000);

BEGIN

Temp := MakeMask(NumberOfBits, 0);

RETURN utl_raw.bit_xor(Temp, Temp);

END MakeResetAll;

FUNCTION MakeSetAll(NumberOfBits IN BINARY_INTEGER) RETURN RAW IS

-- returns a string of NumberOfBits 1 bits;

BEGIN

RETURN utl_raw.bit_complement(MakeResetAll(NumberOfBits));

END MakeSetAll;

It is also possible to define lots of other functions that work on bits, such as

logical & arithmetical shifts and rotations but this is outside the scope of this

article.

Putting the pieces together

---------------------------

At this point, we have chosen built-in data type for storage (RAW(n)), we have

decided how to identify single boolean values from the string of bits

(through a 0-based index starting from the right and increasing toward the left)

and how to individually access and edit them (through the xxxxBit() functions

above), and we are able to use an existing package for working on pairs of

strings of booleans with due cautions (package utl_raw). Much of the overall

complexity can be hidden behind a package that supersedes the utl_raw one and

adds the xxxxBit() family of functions together with other useful functions:

PACKAGE BitsOps IS

FUNCTION BitAnd(r1 IN RAW,

r2 IN RAW) RETURN RAW;

FUNCTION BitOr(r1 IN RAW,

r2 IN RAW) RETURN RAW;

FUNCTION BitXor(r1 IN RAW,

r2 IN RAW) RETURN RAW;

FUNCTION BitNot(r IN RAW) RETURN RAW;

PROCEDURE SetBit(R IN OUT RAW,

Index IN BINARY_INTEGER);

PROCEDURE ResetBit(R IN OUT RAW,

Index IN BINARY_INTEGER);

PROCEDURE ToggleBit(R IN OUT RAW,

Index IN BINARY_INTEGER);

FUNCTION TestBit(R IN RAW,

Index IN BINARY_INTEGER) RETURN BOOLEAN;

FUNCTION MakeResetAll(NumberOfBits IN BINARY_INTEGER) RETURN RAW;

FUNCTION MakeSetAll(NumberOfBits IN BINARY_INTEGER) RETURN RAW;

-- private functions, shown here for documentation purposes;

-- FUNCTION MakeMask(NumberOfBits IN BINARY_INTEGER,

-- TheIndex IN BINARY_INTEGER) RETURN RAW;

-- FUNCTION ToBase(N IN BINARY_INTEGER,

-- Base IN BINARY_INTEGER) RETURN STRING;

END BitsOps;

MakeMask() and ToBase() do not need be called by the package's clients, as they

are privately defined. Similarly, as we have previously shown that sets could internally

be implemented as strings of bits, we could implement SetOps, a set package that

closely matches the BitOps one with following respective functions:

Intersection() (BitAnd), Union() (BitOr), SymetricComplement() (BitXor),

Incl() (i.e. inclusion, SetBit), Excl() (i.e. exclusion, ResetBit),

In() (i.e. test membership, TestBit), MakeEmptySet() (MakeResetAll),

MakeFullSet() (MakeSetAll). The following table summarizes the relationship

between bit operations and set operations.

BitOps functions SetOps functions

---------------- ----------------

BitAnd() Intersection()

BitOr() Union()

BitXor() SymetricComplement()

SetBit() Incl()

ResetBit() Excl()

TestBit() In()

MakeResetAll() MakeEmptySet()

MakeSetAll() MakeFullSet()

Let's now get practical.

A concrete example

------------------

Let's pick the case of the Oracle Support employees. We all have a schedule in

which we record the incumbent tasks on a daily basis: vacations (V), on-site

travels (OS), hot-line (HL), second-level (SL), project (P), education (E),

sickness (S) and miscellaneous (M). Actually, these tasks are recorded with a

half-day granularity and there may be other tasks but let's simplify for the sake

of our example. One obvious business rule demands that on a given day one given

employee be busy on one and only one task. How to implement such requirements ?

We could define one table column per task to monitor. Each record would contain

the employee id, the date and the columns V, OS, HL, SL, P, E, S and M. This can

be expressed by the notation Schedule(Employee#, TheDate, V, OS, HL, SL, P, E, S, M).

A value 1 in, say, HL would mean that the corresponding employee has hot line that

day, and a 0 the contrary. Thus a NUMBER(1) suffices to store the task.

The table would contain 365/366 rows per employee. We are about 50 employees at Support.

That's almost 20 thousands rows per year. Even more if several years are recorded.

Now, we all know how powerful Oracle is but what if we could reduce the table to

50 rows, i.e. one employee's year per row only ? This "miracle" is possible by

storing one full year schedule into a set, i.e. one array of booleans, a RAW(46),

where 46 = ceil(366/8). The new schedule's structure becomes

Schedule_s(Employee#, V_s, OS_s, HL_s, SL_s, P_s, E_s, S_s, M_s), where the

_s suffix changes the corresponding column's meaning to 'the set of that task during

one calendar year'. The record is bigger (about 300 bytes more) but this is largely

compensated by the advantage of being able to access a full year's worth of one

employee's schedule at once. If one consider the whole data size, one can calculate

that the second case is less than 1% of the first case. If several years have

to be kept at once in the table, then a simple

ALTER TABLE

Schedule_s

MODIFY

(V_s RAW(92), OS_s RAW(92), ... M_s RAW(92))

would add another year of schedule to the table since each year takes up 46 bytes.

Obviously, the date origin (i.e. the date of the first recorded day) must be stored

somewhere, typically in some application's global parameter table. But this storage

economy comes at a price. In the first case, adding 2 days of second-level support

for employee 100, on March 22nd and March 26th could simply be done thus (let's

suppose no schedule exists for that employee at those dates yet, otherwise a

SELECT ... FOR UPDATE is necessary):

INSERT INTO

Schedule

VALUES

(:EmployeeNo,

to_date('22-03-1999', 'DD-MM-YYYY'),

0, 0, 0, 1, 0, 0, 0, 0);

INSERT INTO

Schedule

VALUES

(:EmployeeNo,

to_date('26-03-1999', 'DD-MM-YYYY'),

0, 0, 0, 1, 0, 0, 0, 0);

In the second case, and supposing the row already exists for that employee (this

would be part of an application's initialization step), one could do it as follows:

DECLARE

DateToUpdate1 DATE := to_date('22-03-1999', 'DD-MM-YYYY');

DateToUpdate2 DATE := to_date('26-03-1999', 'DD-MM-YYYY');

Emp Schedule%rowtype;

BEGIN

SELECT INTO

Emp

FROM

Schedule_s

WHERE

Employee# = :EmployeeNo

FOR UPDATE;

SetOps.Incl(Emp.SL_s,

DateToUpdate1 - TRUNC(DateToUpdate1, 'YEAR'));

SetOps.Incl(Emp.SL_s,

DateToUpdate2 - TRUNC(DateToUpdate2, 'YEAR'));

INSERT INTO

Schedule_s

VALUES

(Emp.Employee#,

Emp.V_s, Emp.OS_s, Emp.HL_s, Emp.SL_s,

Emp.P_s, Emp.E_s, Emp.S_s, Emp.M_s);

COMMIT;

END;

As it is immediately apparent, manipulating these more compact data structures is

more complex and a fully procedural approach is unavoidable. However, a full year's

data is available for the current employee after only one SELECT. More importantly,

there is no risk that an employee's day schedule is inserted more than once, contrary

to the 1st case. Enforcing the business rule that all employees are daily busy on one

and only one task category can be quite easy in the 1st case:

SELECT

Employee#, TheDate, V + OS + HL + SL + P + E + S + M

FROM

Schedule

WHERE

V + OS + HL + SL + P + E + S + M != 1

AND

TheDate BETWEEN to_date(:DateToCheck1) AND to_date(:DateToCheck2);

In the 2nd case, one needs to resort to a PL/SQL procedure:

EXECUTE

TheApp.CheckBusinessRule(to_date(:DateToCheck1),

to_date(:DateToCheck2));

CheckBusinessRule()'s implementation looks much trickier:

PROCEDURE CheckBusinessRule(DateToCheck1 IN DATE,

DateToCheck2 IN DATE) IS

Emp Schedule%rowtype;

NbTasks BINARY_INTEGER;

BEGIN

FOR SELECT INTO Emp FROM Schedule_s LOOP

FOR DateToCheck IN DateToCheck1 - TRUNC(DateToCheck1, 'YEAR')

..

DateToCheck2 - TRUNC(DateToCheck2, 'YEAR')

LOOP

CountTasksAtDate(Emp,

DateToCheck1 + DateToCheck,

NbTasks);

IF NbTasks != 1 THEN

-- output the error, e.g. through dbms_output;

dbms_output.put_line(Emp.Employee#) || ' ' ||

to_char(DateToCheck1 + DateToCheck,

'DD-MM-YYYY') || ' ' ||

to_char(NbTasks));

END IF;

END LOOP;

END LOOP;

END CheckBusinessRule;

The procedure CountTasksAtDate() is implemented as follows:

PROCEDURE CountTasksAtDate(Emp IN Schedule%rowtype,

DateToCheck IN Date,

NbTasks OUT BINARY_INTEGER) IS

Sum BINARY_INTEGER;

Index BINARY_INTEGER := DateToCheck - TRUNC(DateToCheck, 'YEAR');

BEGIN

Sum := bool_to_number(BitOps.In(Emp.V_s, Index)) +

bool_to_number(BitOps.In(Emp.OS_s, Index)) +

bool_to_number(BitOps.In(Emp.HL_s, Index)) +

bool_to_number(BitOps.In(Emp.SL_s, Index)) +

bool_to_number(BitOps.In(Emp.P_s, Index)) +

bool_to_number(BitOps.In(Emp.E_s, Index)) +

bool_to_number(BitOps.In(Emp.S_s, Index)) +

bool_to_number(BitOps.In(Emp.M_s, Index));

NbTasks := Sum;

END CountTasksAtDate;

with bool_to_number() quite obviously returns 1 if its boolean argument is TRUE,

and 0 otherwise. Of course, those are only implementation examples and numerous

variations are possible. The relevant concept is that they all should be packaged

into an application-dependent module, examplified here by TheApp. Note that if

this business rule were coded as a check constraint at table creation time then it

would be automatically evaluated for each inserted or updated row. The moral

here is a general law in programming: the best data structures are always a

trade-off between space (storage requirements) and time (complexity in accessing

the data structures) in a given problem context. There is no One Size Fits All

rule. The choice heavily depends on the application's performance requirements,

or lack thereof.

An easier alternative

---------------------

An intermediate, mixed solution would consist in packing all the individual tasks

in one set of tasks. The new table structure would then be:

Schedule_s2(Employee#, TheDate, TaskSet). Enforcing the business rule would simply

consist in verifying that TaskSet, expressed as a number, is non null and a power

of 2 such as 1, 4, 8, ... etc. because such numbers have only one bit set to 1,

all the others being 0. Better yet, if one stays at the high-level concept of sets,

one would only need the SetOps.Card() cardinal function i.e. number of elements in

a set) and check that SetOps.Card(TaskSet) = 1 for a given row sub-range. Here is a

quick and dirty draft of this function:

FUNCTION Card(R IN RAW) RETURN BINARY_INTEGER IS

Cardinal BINARY_INTEGER := 0;

BEGIN

FOR I IN 0 .. 8 * utl_raw.length(R) - 1 LOOP

IF TestBits(R, I) THEN

Cardinal := Cardinal + 1;

END IF;

END LOOP;

RETURN Cardinal;

END Card;

In this implementation, checking the business rule becomes:

SELECT

Employee#, TheDate, SetOps.Card(TaskSet)

FROM

Schedule_s2

WHERE

SetOps.Card(TaskSet) != 1

AND

TheDate BETWEEN to_date(:DateToCheck1) AND to_date(:DateToCheck2);

This evidently looks simpler than the preceding implementation. Storage gain over

the standard solution is quite substantial too. Therefore, this alternative represents

a good compromise between storage efficiency and implementation complexity.

Conclusion

----------

Although there is no built-in boolean data type, it is reasonably easy to implement

one. Actually, it is more convenient to add support for packs of booleans, yielding

the immediate benefit of support for sets at almost no additional cost. Our

solution is implemented through RAW(n). We use the standard package utl_raw for

the bitwise functions it offers, and build the BitOps and SetOps packages on top

of it. The advantages of packing individual boolean values into RAWs is manipulation

power and flexibility along with storage economy and disk I/O efficiency. Yet,

this solution has one inherent drawback, which is the increased usage complexity.

However, this can be hidden inside functions and application-dependent packages

with well-defined, high-level interfaces, or alleviated by choosing intermediate,

less space efficient but simpler implementations.

database, and to work on them. This is a topic that pops up quite frequently

at Oracle Support due to the fact that although PL/SQL does support the boolean

type, Oracle RDBMS doesn't have any built-in boolean data type for table

columns. This implies that it is not possible to create columns with such a

type. However, it is fairly easy, albeit cumbersome, to simulate one. To do

this we ideally should restrict ourselves to use whatever existing data type

and, possibly, package Oracle already offers us, in order to limit the overhead

to a minimum. We'll consider first an Oracle 7 procedural approach, which still

works in Oracle 8. In a later article, we'll use the new object-based features

of Oracle 8 to wrap this up in a much cleaner way.

Boolean: what's that ?

----------------------

Boolean values are extremely useful and used in everyday life, even though

most people don't call them by this name or suspect their existence. Each time

a question can be answered by yes or no, this answer can be stored in a boolean

variable. Such variables, sometimes called flags, are allowed only two values,

generally named TRUE or FALSE, but sometimes 1 or 0, ON or OFF, OPEN or CLOSED,

etc.., depending on their usage context. This type is important enough that it

deserved a whole mathematical theory of its own called boolean algebra: the

logic, with implications on computer technology and digital communications.

However important, Oracle RDBMS Server does not support this type directly. It

is left to the application developer to implement the specifics of using this

datatype, this article aims at aiding in this task. One often encounters columns

of datatype char(1) that has distinct values of 'Y'/'N' etc. But what if there

are tens of such flags ? And how to compare multiple such flags together ? Let's

not forget that the combination of n flags grows exponentially in 2**n. If for

instance one has to conditionally test 5 logical properties, one may have to do up

to 32 (i.e. 2**5) tests.

Requisites

----------

Proper support for such a type should enforce its domain integrity and provide

sensible operations on its variables. Domain integrity just consists in ensuring

that only one of two conventional values is stored in a boolean variable, whereas

usual logical operations to supply are: and, or, xor, not. What conventional

values to store into a boolean variable really depends on whether one wants to

implement a truly universal boolean-like type or simply an ad-hoc one that best

suits one current application's needs. Here, we'll assume that we don't want to

implement a new boolean type each time we need one. We want a generic, well-proven

one that can be reused over and over again. Thus, those generic conventional values

should be '1' for logical TRUE and '0' for logical FALSE. Also, what about the

capability to manipulate groups of boolean values as a whole? Besides adding

power, this drastically reduces name space cluttering. Such packs of booleans can

be seen as strings of bits, a handy representation for straightforward bitwise

operations. Incidentally, such strings of bits are ideally suited for

implementing sets, and the bitwise functions are directly usable for doing set

operations such as union (or), intersection (and), complement (not), and symmetric

complement (xor). Thus, an added benefit of providing support for booleans is an

almost implicit, free of charge support for sets.

One not so good solution

------------------------

One easy way to store boolean values would consist in using the VARCHAR2(n) column

type: it is expandable up to 4000 values and individual values can be easily

accessed through powerful Oracle built-in string functions. On the con side:

a) each boolean value would take up 1 or 2 characters (depending on whether a

single-byte or a double-byte character set has been selected);

b) we must make sure the chosen characters for representing the TRUE and FALSE

values are always respected regardless of character set considerations,

particularly when exporting/importing;

c) we must implement our own bitwise functions for manipulating such values or

at least providing an interface to some existing package if any;

A better solution

-----------------

Another better existing data type we can use to store boolean values is RAW(n).

This data type stores binary values such as multimedia ones, and guarantees that

no interpretation (i.e. conversions) is ever performed on them. Thus their original

value is preserved throughout their existence. The advantages are: expandability,

applicability of the built-in single-byte string functions, immunity to the

character set and national language variations including during export/import.

But the biggest incentive is the existence of the utl_raw package. Thanks to

this package, we won't have to implement the binary functions ourselves: that

package provides us with functions bit_and, bit_or, bit_xor and bit_complement

(i.e. the not operator). The package specification can be found in

$ORACLE_HOME/rdbms/admin/u

Here are the interesting functions' headers:

FUNCTION bit_and(r1 IN RAW,

r2 IN RAW) RETURN RAW;

FUNCTION bit_or(r1 IN RAW,

r2 IN RAW) RETURN RAW;

FUNCTION bit_xor(r1 IN RAW,

r2 IN RAW) RETURN RAW;

FUNCTION bit_complement(r IN RAW) RETURN RAW;

The n in RAW(n) specifies the number of allocated bytes for storage. Thus,

boolean values will be stored in bits by packs of 8. To find this size value,

one can use the following formula:

n = ceil(number_of _needed_booleans/8)

However, there are a few caveats in using RAW(n) and those functions. Firstly,

the programmer must keep a mental image of the flags and how they are mapped

into a RAW(n) string of bits. One must shift from independent, high-level single

variables to low-level concepts such as bits, bytes, words, quadwords or more.

While this is quite common for system programmers, it can be quite disconcerting

at first for database programmers and may need some practice. Secondly, the

programmer must express those strings bits in hexadecimal. This implies that bits

are divided in groups of 4, each of which is then transcribed each into a

hexadecimal digit. Moreover, since the utl_raw package's functions work on the RAW

data type, hexadecimal values must be converted from/to RAW(n) through the

function hextoraw() and rawtohex(). For example, one has to write:

MyVar := rawtohex(utl_raw.bit_and(h

Admittedly, this is not that big a problem but it clutters one's code. On the

pro side, the hextoraw() function ensures that the string literals or variables

really are hexadecimal numbers, which enforces domain integrity. If a

non-hexadecimal number is passed to hextoraw(), an ora-01465 is issued by the

function. Thirdly, the functions in this package have several quirks of their

own: one must ensure that the parameters are all the same length because these

functions work from left to right and don't left pad with 0 bits, contrary to

what would be expected by common arithmetic rules where right justification is

implicit. Consider this example:

select utl_raw.bit_or(hextoraw('3

EXAMPLE1

-------

F300

instead of the expected 3F0.

If the user makes sure both operands are the same length, then the ambiguity

is resolved:

select utl_raw.bit_or(hextoraw('3

EXAMPLE2

--------

03F0

Note that a 0 has been prefixed because hexadecimal values are manipulated

one full byte (i.e. 2 nibbles or hexadecimal digits) at a time. This is the

reason why the first case yielded F300 instead of the more obvious F00.

Another quirk is illustrated here:

select utl_raw.bit_or(hextoraw('3

EXAMPLE3

--------

03F123

Given the above, we could have expected 3F1 or 03F100. What happened here ?

We are simply hitting another peculiarity of those functions. When one argument

is shorter than the other one, evaluation simply stops and the remaining digits

of the longest arguments are appended to the result. The rule of thumb here is

therefore to make sure that both operands have the same length and that the

number of hexadecimal digits is even. This will prevent any bad surprise. Had

we respected this simple rule, the previous statement would have yielded:

select utl_raw.bit_or(hextoraw('0

EXAMPLE4

--------

0301F3

which was predictable.

Fourthly, it is important to stick to a convention about the direction of reading

the bit string, and the index numbering. Usually, the bits are read from right

to left. Bit number 0 (the least significant one) is the rightmost one, whereas

the last bit (the most significant one) is the leftmost one. This can be

disconcerting at first because the last bit is actually written first, but then

this is how we write numbers daily and we aren't that much amazed any more. The

index being 0-based should not be a surprise either because this corresponds to

the weight (i.e. the numeric base's exponent) in the positional notation.

Additional functions

--------------------

With this implementation based on RAW(n), one further issue is raised: the individual

accessibility and workability of the single variables. For this, one could use the

following functions:

PROCEDURE SetBit(R IN OUT RAW,

Index IN BINARY_INTEGER);

PROCEDURE ResetBit(R IN OUT RAW,

Index IN BINARY_INTEGER);

PROCEDURE ToggleBit(R IN OUT RAW,

Index IN BINARY_INTEGER);

FUNCTION TestBit(R IN RAW,

Index IN BINARY_INTEGER) RETURN BOOLEAN;

Note that here we can use the build-in PL/SQL boolean type because we are actually

programming and no longer dealing with storage considerations. We have thrown in

exception handling to make sure the index lays between the range 0..8 * utl_raw.length(R)

(remember: we want to access individual bits from a suite of bytes through their

0-based bit index). To simplify we have used the predefined exception

SUBSCRIPT_OUTSIDE_LIMIT . Moreover, we could add purity pragmas to allow using

these functions from within SQL statements. SetBit(), ResetBit(), ToggleBit()

and TestBit() could be implemented with bit-masking techniques, i.e. and-ing or

or-ing the passed value with a specially generated constant. A basic constructor

for doing this could be defined as follows:

FUNCTION MakeMask(NumberOfBits IN BINARY_INTEGER,

TheIndex IN BINARY_INTEGER) RETURN RAW IS

-- 0 <= TheIndex < NumberOfBits

-- otherwise exception SUBSCRIPT_OUTSIDE_LIMIT raised;

-- returns a string of NumberOfBits all zero bits but the TheIndex-th

-- one;

BEGIN

IF TheIndex < 0 OR Index >= NumberOfBits THEN

RAISE SUBSCRIPT_OUTSIDE_LIMIT;

END IF;

RETURN hextoraw(lpad(ToBase(power

2 * Ceil(NumberOfBits/8),

'0'));

END MakeMask;

where ToBase() is a function that returns a character string containing a number

in the given base (in the range 2..16). MakeMask() always returns an even number

of nibbles (i.e. generates full bytes). Here is an implementation of ToBase():

FUNCTION ToBase(N IN BINARY_INTEGER,

TheBase IN BINARY_INTEGER) RETURN STRING IS

TheNumber BINARY_INTEGER := N;

S STRING(100) := '';

Digit NUMBER(2);

BEGIN

IF TheBase < 2 OR TheBase > 16 THEN

RAISE VALUE_ERROR;

END IF;

<<Convert>>

LOOP

Digit := MOD(TheNumber, TheBase);

IF Digit >= 0 and Digit < 10 THEN

S := TO_CHAR(Digit) || S;

ELSIF Digit = 10 THEN

S := 'A' || S;

ELSIF Digit = 11 THEN

S := 'B' || S;

ELSIF Digit = 12 THEN

S := 'C' || S;

ELSIF Digit = 13 THEN

S := 'D' || S;

ELSIF Digit = 14 THEN

S := 'E' || S;

ELSIF Digit = 15 THEN

S := 'F' || S;

END IF;

TheNumber := FLOOR(TheNumber / TheBase);

EXIT Convert WHEN TheNumber = 0;

END LOOP;

RETURN S;

END ToBase;

With MakeMask() available, SetBit(), ResetBit(), ToggleBit() and TestBit()

could possibly be:

PROCEDURE SetBit(R IN OUT RAW,

Index IN BINARY_INTEGER) IS

BEGIN

R := utl_raw.bit_or(R,

MakeMask(8 * utl_raw.length(R), Index));

END SetBit;

PROCEDURE ResetBit(R IN OUT RAW,

Index IN BINARY_INTEGER) IS

BEGIN

R := utl_raw.bit_and(R,

utl_raw.bit_complement(Mak

Index)));

END ResetBit;

PROCEDURE ToggleBit(R IN OUT RAW,

Index IN BINARY_INTEGER) IS

BEGIN

R := utl_raw.bit_xor(R,

MakeMask(8 * utl_raw.length(R), Index));

END ToggleBit;

FUNCTION TestBit(R IN RAW,

Index IN BINARY_INTEGER) RETURN BOOLEAN IS

BEGIN

RETURN utl_raw.compare(

utl_raw.bit_and(R, MakeMask(8 * utl_raw.length(R), TheIndex)),

hextoraw('00'),

hextoraw('00')

) = 1;

END TestBit;

For completeness, one could define the constructors MakeResetAll() and

MakeSetAll() too:

FUNCTION MakeResetAll(NumberOfBits IN BINARY_INTEGER) RETURN RAW IS

-- returns a string of NumberOfBits 0 bits;

Temp RAW(2000);

BEGIN

Temp := MakeMask(NumberOfBits, 0);

RETURN utl_raw.bit_xor(Temp, Temp);

END MakeResetAll;

FUNCTION MakeSetAll(NumberOfBits IN BINARY_INTEGER) RETURN RAW IS

-- returns a string of NumberOfBits 1 bits;

BEGIN

RETURN utl_raw.bit_complement(Mak

END MakeSetAll;

It is also possible to define lots of other functions that work on bits, such as

logical & arithmetical shifts and rotations but this is outside the scope of this

article.

Putting the pieces together

--------------------------

At this point, we have chosen built-in data type for storage (RAW(n)), we have

decided how to identify single boolean values from the string of bits

(through a 0-based index starting from the right and increasing toward the left)

and how to individually access and edit them (through the xxxxBit() functions

above), and we are able to use an existing package for working on pairs of

strings of booleans with due cautions (package utl_raw). Much of the overall

complexity can be hidden behind a package that supersedes the utl_raw one and

adds the xxxxBit() family of functions together with other useful functions:

PACKAGE BitsOps IS

FUNCTION BitAnd(r1 IN RAW,

r2 IN RAW) RETURN RAW;

FUNCTION BitOr(r1 IN RAW,

r2 IN RAW) RETURN RAW;

FUNCTION BitXor(r1 IN RAW,

r2 IN RAW) RETURN RAW;

FUNCTION BitNot(r IN RAW) RETURN RAW;

PROCEDURE SetBit(R IN OUT RAW,

Index IN BINARY_INTEGER);

PROCEDURE ResetBit(R IN OUT RAW,

Index IN BINARY_INTEGER);

PROCEDURE ToggleBit(R IN OUT RAW,

Index IN BINARY_INTEGER);

FUNCTION TestBit(R IN RAW,

Index IN BINARY_INTEGER) RETURN BOOLEAN;

FUNCTION MakeResetAll(NumberOfBits IN BINARY_INTEGER) RETURN RAW;

FUNCTION MakeSetAll(NumberOfBits IN BINARY_INTEGER) RETURN RAW;

-- private functions, shown here for documentation purposes;

-- FUNCTION MakeMask(NumberOfBits IN BINARY_INTEGER,

-- TheIndex IN BINARY_INTEGER) RETURN RAW;

-- FUNCTION ToBase(N IN BINARY_INTEGER,

-- Base IN BINARY_INTEGER) RETURN STRING;

END BitsOps;

MakeMask() and ToBase() do not need be called by the package's clients, as they

are privately defined. Similarly, as we have previously shown that sets could internally

be implemented as strings of bits, we could implement SetOps, a set package that

closely matches the BitOps one with following respective functions:

Intersection() (BitAnd), Union() (BitOr), SymetricComplement() (BitXor),

Incl() (i.e. inclusion, SetBit), Excl() (i.e. exclusion, ResetBit),

In() (i.e. test membership, TestBit), MakeEmptySet() (MakeResetAll),

MakeFullSet() (MakeSetAll). The following table summarizes the relationship

between bit operations and set operations.

BitOps functions SetOps functions

---------------- ----------------

BitAnd() Intersection()

BitOr() Union()

BitXor() SymetricComplement()

SetBit() Incl()

ResetBit() Excl()

TestBit() In()

MakeResetAll() MakeEmptySet()

MakeSetAll() MakeFullSet()

Let's now get practical.

A concrete example

------------------

Let's pick the case of the Oracle Support employees. We all have a schedule in

which we record the incumbent tasks on a daily basis: vacations (V), on-site

travels (OS), hot-line (HL), second-level (SL), project (P), education (E),

sickness (S) and miscellaneous (M). Actually, these tasks are recorded with a

half-day granularity and there may be other tasks but let's simplify for the sake

of our example. One obvious business rule demands that on a given day one given

employee be busy on one and only one task. How to implement such requirements ?

We could define one table column per task to monitor. Each record would contain

the employee id, the date and the columns V, OS, HL, SL, P, E, S and M. This can

be expressed by the notation Schedule(Employee#, TheDate, V, OS, HL, SL, P, E, S, M).

A value 1 in, say, HL would mean that the corresponding employee has hot line that

day, and a 0 the contrary. Thus a NUMBER(1) suffices to store the task.

The table would contain 365/366 rows per employee. We are about 50 employees at Support.

That's almost 20 thousands rows per year. Even more if several years are recorded.

Now, we all know how powerful Oracle is but what if we could reduce the table to

50 rows, i.e. one employee's year per row only ? This "miracle" is possible by

storing one full year schedule into a set, i.e. one array of booleans, a RAW(46),

where 46 = ceil(366/8). The new schedule's structure becomes

Schedule_s(Employee#, V_s, OS_s, HL_s, SL_s, P_s, E_s, S_s, M_s), where the

_s suffix changes the corresponding column's meaning to 'the set of that task during

one calendar year'. The record is bigger (about 300 bytes more) but this is largely

compensated by the advantage of being able to access a full year's worth of one

employee's schedule at once. If one consider the whole data size, one can calculate

that the second case is less than 1% of the first case. If several years have

to be kept at once in the table, then a simple

ALTER TABLE

Schedule_s

MODIFY

(V_s RAW(92), OS_s RAW(92), ... M_s RAW(92))

would add another year of schedule to the table since each year takes up 46 bytes.

Obviously, the date origin (i.e. the date of the first recorded day) must be stored

somewhere, typically in some application's global parameter table. But this storage

economy comes at a price. In the first case, adding 2 days of second-level support

for employee 100, on March 22nd and March 26th could simply be done thus (let's

suppose no schedule exists for that employee at those dates yet, otherwise a

SELECT ... FOR UPDATE is necessary):

INSERT INTO

Schedule

VALUES

(:EmployeeNo,

to_date('22-03-1999', 'DD-MM-YYYY'),

0, 0, 0, 1, 0, 0, 0, 0);

INSERT INTO

Schedule

VALUES

(:EmployeeNo,

to_date('26-03-1999', 'DD-MM-YYYY'),

0, 0, 0, 1, 0, 0, 0, 0);

In the second case, and supposing the row already exists for that employee (this

would be part of an application's initialization step), one could do it as follows:

DECLARE

DateToUpdate1 DATE := to_date('22-03-1999', 'DD-MM-YYYY');

DateToUpdate2 DATE := to_date('26-03-1999', 'DD-MM-YYYY');

Emp Schedule%rowtype;

BEGIN

SELECT INTO

Emp

FROM

Schedule_s

WHERE

Employee# = :EmployeeNo

FOR UPDATE;

SetOps.Incl(Emp.SL_s,

DateToUpdate1 - TRUNC(DateToUpdate1, 'YEAR'));

SetOps.Incl(Emp.SL_s,

DateToUpdate2 - TRUNC(DateToUpdate2, 'YEAR'));

INSERT INTO

Schedule_s

VALUES

(Emp.Employee#,

Emp.V_s, Emp.OS_s, Emp.HL_s, Emp.SL_s,

Emp.P_s, Emp.E_s, Emp.S_s, Emp.M_s);

COMMIT;

END;

As it is immediately apparent, manipulating these more compact data structures is

more complex and a fully procedural approach is unavoidable. However, a full year's

data is available for the current employee after only one SELECT. More importantly,

there is no risk that an employee's day schedule is inserted more than once, contrary

to the 1st case. Enforcing the business rule that all employees are daily busy on one

and only one task category can be quite easy in the 1st case:

SELECT

Employee#, TheDate, V + OS + HL + SL + P + E + S + M

FROM

Schedule

WHERE

V + OS + HL + SL + P + E + S + M != 1

AND

TheDate BETWEEN to_date(:DateToCheck1) AND to_date(:DateToCheck2);

In the 2nd case, one needs to resort to a PL/SQL procedure:

EXECUTE

TheApp.CheckBusinessRule(t

to_date(:DateToCheck2));

CheckBusinessRule()'s implementation looks much trickier:

PROCEDURE CheckBusinessRule(DateToCh

DateToCheck2 IN DATE) IS

Emp Schedule%rowtype;

NbTasks BINARY_INTEGER;

BEGIN

FOR SELECT INTO Emp FROM Schedule_s LOOP

FOR DateToCheck IN DateToCheck1 - TRUNC(DateToCheck1, 'YEAR')

..

DateToCheck2 - TRUNC(DateToCheck2, 'YEAR')

LOOP

CountTasksAtDate(Emp,

DateToCheck1 + DateToCheck,

NbTasks);

IF NbTasks != 1 THEN

-- output the error, e.g. through dbms_output;

dbms_output.put_line(Emp.E

to_char(DateToCheck1 + DateToCheck,

'DD-MM-YYYY') || ' ' ||

to_char(NbTasks));

END IF;

END LOOP;

END LOOP;

END CheckBusinessRule;

The procedure CountTasksAtDate() is implemented as follows:

PROCEDURE CountTasksAtDate(Emp IN Schedule%rowtype,

DateToCheck IN Date,

NbTasks OUT BINARY_INTEGER) IS

Sum BINARY_INTEGER;

Index BINARY_INTEGER := DateToCheck - TRUNC(DateToCheck, 'YEAR');

BEGIN

Sum := bool_to_number(BitOps.In(E

bool_to_number(BitOps.In(E

bool_to_number(BitOps.In(E

bool_to_number(BitOps.In(E

bool_to_number(BitOps.In(E

bool_to_number(BitOps.In(E

bool_to_number(BitOps.In(E

bool_to_number(BitOps.In(E

NbTasks := Sum;

END CountTasksAtDate;

with bool_to_number() quite obviously returns 1 if its boolean argument is TRUE,

and 0 otherwise. Of course, those are only implementation examples and numerous

variations are possible. The relevant concept is that they all should be packaged

into an application-dependent module, examplified here by TheApp. Note that if

this business rule were coded as a check constraint at table creation time then it

would be automatically evaluated for each inserted or updated row. The moral

here is a general law in programming: the best data structures are always a

trade-off between space (storage requirements) and time (complexity in accessing

the data structures) in a given problem context. There is no One Size Fits All

rule. The choice heavily depends on the application's performance requirements,

or lack thereof.

An easier alternative

---------------------

An intermediate, mixed solution would consist in packing all the individual tasks

in one set of tasks. The new table structure would then be:

Schedule_s2(Employee#, TheDate, TaskSet). Enforcing the business rule would simply

consist in verifying that TaskSet, expressed as a number, is non null and a power

of 2 such as 1, 4, 8, ... etc. because such numbers have only one bit set to 1,

all the others being 0. Better yet, if one stays at the high-level concept of sets,

one would only need the SetOps.Card() cardinal function i.e. number of elements in

a set) and check that SetOps.Card(TaskSet) = 1 for a given row sub-range. Here is a

quick and dirty draft of this function:

FUNCTION Card(R IN RAW) RETURN BINARY_INTEGER IS

Cardinal BINARY_INTEGER := 0;

BEGIN

FOR I IN 0 .. 8 * utl_raw.length(R) - 1 LOOP

IF TestBits(R, I) THEN

Cardinal := Cardinal + 1;

END IF;

END LOOP;

RETURN Cardinal;

END Card;

In this implementation, checking the business rule becomes:

SELECT

Employee#, TheDate, SetOps.Card(TaskSet)

FROM

Schedule_s2

WHERE

SetOps.Card(TaskSet) != 1

AND

TheDate BETWEEN to_date(:DateToCheck1) AND to_date(:DateToCheck2);

This evidently looks simpler than the preceding implementation. Storage gain over

the standard solution is quite substantial too. Therefore, this alternative represents

a good compromise between storage efficiency and implementation complexity.

Conclusion

----------

Although there is no built-in boolean data type, it is reasonably easy to implement

one. Actually, it is more convenient to add support for packs of booleans, yielding

the immediate benefit of support for sets at almost no additional cost. Our

solution is implemented through RAW(n). We use the standard package utl_raw for

the bitwise functions it offers, and build the BitOps and SetOps packages on top

of it. The advantages of packing individual boolean values into RAWs is manipulation

power and flexibility along with storage economy and disk I/O efficiency. Yet,

this solution has one inherent drawback, which is the increased usage complexity.

However, this can be hidden inside functions and application-dependent packages

with well-defined, high-level interfaces, or alleviated by choosing intermediate,

less space efficient but simpler implementations.