How to Create User-Defined Aggregates in Oracle

How to Create User-Defined Aggregates in Oracle

Before we begin creating these things, what are user-defined aggregates? They are a feature introduced in Oracle 9i that allows a developer to create his or her own functions like "SUM", "AVG", and "COUNT". Previously the only way to achieve this kind of functionality across a set of data would be to write a pl/sql routine to iterate through a cursor.

The aggregates are defined in multiple parts, using Oracle's Object Oriented functionality. In structure it's somewhat similar to creating an externally defined function; but instead of writing a c routine as a member of an external library, first you create an object type and body using a predefined API from the Data Cartridge (ODCI) library then, like the externals, you create a function using that type to define it rather than implementing the code directly within the function.

Data Cartridge API for Aggregates

The following four functions must be implemented in your object type and body:

odciaggregateinitialize(ctx IN OUT your_aggregate_object_type) RETURN NUMBER

odciaggregateiterate(self IN OUT your_aggregate_object_type, your_parameter_to_aggregate IN some_data_type) RETURN NUMBER

odciaggregatemerge(self IN OUT your_aggregate_object_type, ctx2 IN your_aggregate_object_type) RETURN NUMBER

odciaggregateterminate(self IN your_aggregate_object_type, returnvalue OUT some_data_type, flags IN NUMBER) RETURN NUMBER

Starting simply, a NULL aggregate

Let's start with a minimal aggregate. This function doesn't really "do" anything, it simply returns NULL for any group; but it illustrates all of the required pieces.

-----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      ----    NULL Aggregate - minimal aggregate syntax, returns NULL
                      -----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      
                      CREATE OR REPLACE TYPE null_agg_type
                          AS OBJECT
                      (
                          v_dummy char(1),  -- every Oracle object type must include at least one attribute
                      
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT null_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT null_agg_type, p_notused IN VARCHAR2)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT null_agg_type, ctx2 IN null_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateterminate(self          IN     null_agg_type,
                                                                 returnvalue      OUT VARCHAR2,
                                                                 flags         IN     NUMBER
                                                                )
                              RETURN NUMBER
                      );
                      /
                      
                      CREATE OR REPLACE TYPE BODY null_agg_type
                      IS
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT null_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                                -- instantiate our type, NULL the dummy attribute
                              ctx  := null_agg_type(null);
                              RETURN odciconst.success;
                          END odciaggregateinitialize;
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT null_agg_type, p_notused IN VARCHAR2)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- In normal functions you would do something with the input
                              -- add it, count it, square it, etc. but not in this null-aggregate
                              RETURN odciconst.success;
                          END odciaggregateiterate;
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT null_agg_type, ctx2 IN null_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- If you have 2 results sets to consolidate, for instance parallel processing of a table
                              -- you will resolve the combination here.
                              RETURN odciconst.success;
                          END odciaggregatemerge;
                      
                          MEMBER FUNCTION odciaggregateterminate(self          IN     null_agg_type,
                                                                 returnvalue      OUT VARCHAR2,
                                                                 flags         IN     NUMBER
                                                                )
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- Set your return value, usually this would involve your attributes and possibly other
                              -- functions. For these simple null-aggregates we simply assign NULL and exit.
                              returnvalue := NULL;
                              RETURN odciconst.success;
                          END odciaggregateterminate;
                      END;
                      /

Open in new window

Now create a function to implement the type created above

CREATE OR REPLACE FUNCTION nullagg(p_string VARCHAR2)
                          RETURN VARCHAR2
                          PARALLEL_ENABLE
                          AGGREGATE USING null_agg_type;

Open in new window

And finally, we can use the new function just as we would other aggregates; but, in this case, it's rather anticlimactic.

select count(*), nullagg(table_name) from all_tables;

Open in new window

Ok, that's simple enough, but not useful, so let's explore some more complex functionality by replicating some of the built-in aggregates. I don't recommend these for production use, but they are helpful for illustration to explore how the functionality works when compared with familiar routines.

Writing our own counting aggregate

-----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      ----    Counting Aggregate
                      -----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      
                      CREATE OR REPLACE TYPE count_agg_type
                          AS OBJECT
                      (
                          v_result INTEGER,  -- We need a place to store our count as we iterate through the group sets
                      
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT count_agg_type)
                              RETURN NUMBER,
                      
                          -- As with the null aggregate, we're not really using the input parameter, we're simply counting rows.
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT count_agg_type, p_notused IN VARCHAR2)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT count_agg_type, ctx2 IN count_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateterminate(self          IN     count_agg_type,
                                                                 returnvalue      OUT VARCHAR2,
                                                                 flags         IN     NUMBER
                                                                )
                              RETURN NUMBER
                      );
                      /
                      
                      CREATE OR REPLACE TYPE BODY count_agg_type
                      IS
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT count_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- At the start of our count we will, of course, have 0.
                              ctx  := count_agg_type(0);
                              RETURN odciconst.success;
                          END odciaggregateinitialize;
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT count_agg_type, p_notused IN VARCHAR2)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- As we iterate through each value, increment our counter
                              self.v_result  := self.v_result + 1;
                              RETURN odciconst.success;
                          END odciaggregateiterate;
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT count_agg_type, ctx2 IN count_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- If we have two results, the merged result is simply the sum of both
                              self.v_result  := self.v_result + ctx2.v_result;
                              RETURN odciconst.success;
                          END odciaggregatemerge;
                      
                          MEMBER FUNCTION odciaggregateterminate(self          IN     count_agg_type,
                                                                 returnvalue      OUT VARCHAR2,
                                                                 flags         IN     NUMBER
                                                                )
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- No finalizing is required when done, the count is complete, so return and exit.
                              returnvalue  := self.v_result;
                              RETURN odciconst.success;
                          END odciaggregateterminate;
                      END;
                      /

Open in new window

And, finally, create a function to use the type.

CREATE OR REPLACE FUNCTION countagg(p_string VARCHAR2)
                          RETURN VARCHAR2
                          PARALLEL_ENABLE
                          AGGREGATE USING count_agg_type;

Open in new window

Example usage:

select count(*),countagg(table_name) from all_tables;

Open in new window

Important to note, unlike Oracle's built-in function, we can't use the "*" input. That syntax is only exposed for the built-in. To get around this limit, simply pass in a non-null value (such as rowid) to count every row. On the other hand, the "*" syntax only applies to counting and no other aggregates. Since there's no good reason to rewrite COUNT, this is not a feature you should miss.

Writing a MAX aggregate for numbers

Next, we'll step up the complexity slightly by creating our own MAX function for numbers. Unfortunately we can't really do overloading as the Oracle built-ins can. User-defined aggregate types can only be assigned to standalone functions, not functions within a package. You could create several standalone functions then create a package with overloaded functions that called the appropriate standalone but that's a lot of overhead that simply creates the illusion of consolidated code. As mentioned above, this aggregate is solely for illustration of the API so you can learn from the example. In practice, you'll want to use the built-in function for efficiency.

-----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      ----    Maximum Aggregate for numbers
                      -----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      
                      CREATE OR REPLACE TYPE max_num_agg_type
                          AS OBJECT
                      (
                          v_result NUMBER,  -- We need a place to store our maximum as we iterate through the group sets
                      
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT max_num_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT max_num_agg_type, p_number IN NUMBER)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT max_num_agg_type, ctx2 IN max_num_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateterminate(self          IN     max_num_agg_type,
                                                                 returnvalue      OUT NUMBER,
                                                                 flags         IN     NUMBER
                                                                )
                              RETURN NUMBER
                      );
                      /
                      
                      CREATE OR REPLACE TYPE BODY max_num_agg_type
                      IS
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT max_num_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              ctx  := max_num_agg_type(NULL);  -- initialize our max value to null
                              RETURN odciconst.success;
                          END odciaggregateinitialize;
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT max_num_agg_type, p_number IN NUMBER)
                              RETURN NUMBER
                          IS
                          BEGIN
                              IF self.v_result IS NULL
                              THEN
                                  self.v_result  := p_number;  -- the first number is automatically the maximum
                              ELSE
                                  self.v_result  := GREATEST(self.v_result, p_number);  -- compare new number to old, keep the bigger
                              END IF;
                      
                              RETURN odciconst.success;
                          END odciaggregateiterate;
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT max_num_agg_type, ctx2 IN max_num_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              IF self.v_result IS NULL
                              THEN
                                  self.v_result  := ctx2.v_result;  -- if merging two sets, if the current is null, use the new one
                              ELSIF ctx2.v_result IS NOT NULL
                              THEN
                                  self.v_result  := GREATEST(self.v_result, ctx2.v_result); -- if both are populated then use the bigger
                              END IF;
                      
                              RETURN odciconst.success;
                          END odciaggregatemerge;
                      
                          MEMBER FUNCTION odciaggregateterminate(self          IN     max_num_agg_type,
                                                                 returnvalue      OUT NUMBER,
                                                                 flags         IN     NUMBER
                                                                )
                              RETURN NUMBER
                          IS
                          BEGIN
                              returnvalue  := self.v_result;  -- like the counting aggregate, no finalizing processes are needed
                              RETURN odciconst.success;
                          END odciaggregateterminate;
                      END;
                      /

Open in new window

And as always, create a function to implement the object

CREATE OR REPLACE FUNCTION maxnumagg(p_number NUMBER)
                          RETURN VARCHAR2
                          PARALLEL_ENABLE
                          AGGREGATE USING max_num_agg_type;

Open in new window

Example usage:

SELECT max(num_rows), maxnumagg(num_rows) FROM user_tables;

Open in new window

You could create a similar aggregate for MIN by switching GREATEST for LEAST. And you could build them for other data types by swapping the data type in the following lines:

v_result NUMBER,

MEMBER FUNCTION odciaggregateiterate(self IN OUT max_num_agg_type, p_number IN NUMBER)

returnvalue OUT NUMBER,

Aggregating Interval types

Next, we'll create a new aggregate that produces functionality not already available through an Oracle built-in function. The method is the same though, we'll mimic the SUM function, but our aggregate will support the INTERVAL type, specifically INTERVAL DAY TO SECOND.

-----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      ----    Sum Aggregate for Day to Second Intervals 
                      -----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      CREATE OR REPLACE TYPE dsinterval_agg_type
                          AS OBJECT
                      (
                          v_result INTERVAL DAY TO SECOND,  -- our working result
                      
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT dsinterval_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateiterate(
                              self         IN OUT dsinterval_agg_type,
                              p_interval   IN     INTERVAL DAY TO SECOND
                          )
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregatemerge(
                              self   IN OUT dsinterval_agg_type,
                              ctx2   IN     dsinterval_agg_type
                          )
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateterminate(
                              self          IN     dsinterval_agg_type,
                              returnvalue      OUT INTERVAL DAY TO SECOND,
                              flags         IN     NUMBER
                          )
                              RETURN NUMBER
                      );
                      /
                      
                      CREATE OR REPLACE TYPE BODY dsinterval_agg_type
                      IS
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT dsinterval_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              ctx := dsinterval_agg_type(NULL);  -- initialize the sum to NULL
                              RETURN odciconst.success;
                          END odciaggregateinitialize;
                      
                          MEMBER FUNCTION odciaggregateiterate(
                              self         IN OUT dsinterval_agg_type,
                              p_interval   IN     INTERVAL DAY TO SECOND
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              IF self.v_result IS NULL
                              THEN
                                  self.v_result := p_interval;  -- the first interval  is the sum
                              ELSE
                                  self.v_result := self.v_result + p_interval;  -- add each new interval to the running total
                              END IF;
                      
                              RETURN odciconst.success;
                          END odciaggregateiterate;
                      
                          MEMBER FUNCTION odciaggregatemerge(
                              self   IN OUT dsinterval_agg_type,
                              ctx2   IN     dsinterval_agg_type
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              IF self.v_result IS NULL
                              THEN
                                  self.v_result := ctx2.v_result;  -- If our sum is NULL, use the result of the other set
                              ELSIF ctx2.v_result IS NOT NULL
                              THEN
                                  self.v_result := self.v_result + ctx2.v_result; -- If both sets have sums, add them to get the total
                              END IF;
                      
                              RETURN odciconst.success;
                          END odciaggregatemerge;
                      
                          MEMBER FUNCTION odciaggregateterminate(
                              self          IN     dsinterval_agg_type,
                              returnvalue      OUT INTERVAL DAY TO SECOND,
                              flags         IN     NUMBER
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              returnvalue := self.v_result;   -- the iterate and merge did all the work, return the result
                              RETURN odciconst.success;
                          END odciaggregateterminate;
                      END;
                      /

Open in new window

Nothing special here, create a function using the object above.

CREATE OR REPLACE FUNCTION dsintervalagg(p_interval INTERVAL DAY TO SECOND)
                          RETURN INTERVAL DAY TO SECOND
                          PARALLEL_ENABLE
                          AGGREGATE USING dsinterval_agg_type;

Open in new window

And, finally, let's look at our results, we'll create 2 small intervals, the result will vary but should be slightly more than +0 00:00:3.42

SELECT dsintervalagg(i)
                        FROM (SELECT SYSTIMESTAMP - SYSDATE i FROM DUAL
                              UNION ALL
                              SELECT NUMTODSINTERVAL(3.42, 'second') FROM DUAL);

Open in new window

Concatenating Strings with an aggregate

For the next aggregate we'll construct, we'll tackle one of the most common questions asked for Oracle SQL. That is: "How can I combine (concatenate) strings from multiple rows into a single value?" Beginning with 11gR2, the LISTAGG aggregate is available and I recommend using that, but for 9i, 10g and 11gR1 the following function will achieve the results. Since this is a common goal, it's common to see similar functions to this around the internet, the most widely known is "stragg" written by Tom Kyte of Oracle. Others have written similar functions, with the same name (including me) but I think Tom's was the first. As you might expect from the previous examples, the API is largely fixed, so all of these string concatenation functions look and behave pretty much the same.

-----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      ----    Concatenation Aggregate for strings (a.k.a. stragg)
                      -----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      CREATE OR REPLACE TYPE concat_agg_type
                          AS OBJECT
                      (
                          -- This could be defined to 32767, but since the aggregate will be used in SQL,
                          -- the 4000 limit is appropriate since that's all that SQL varchar2 strings support
                          v_result VARCHAR2(4000),
                      
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT concat_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT concat_agg_type, p_string IN VARCHAR2)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT concat_agg_type, ctx2 IN concat_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateterminate(self          IN     concat_agg_type,
                                                                 returnvalue      OUT VARCHAR2,
                                                                 flags         IN     NUMBER
                                                                )
                              RETURN NUMBER
                      );
                      /
                      
                      CREATE OR REPLACE TYPE BODY concat_agg_type
                      IS
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT concat_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              ctx  := concat_agg_type(NULL);  -- initialize the concatenation to NULL
                              RETURN odciconst.success;
                          END odciaggregateinitialize;
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT concat_agg_type, p_string IN VARCHAR2)
                              RETURN NUMBER
                          IS
                          BEGIN
                              self.v_result  := self.v_result || ',' || p_string;   -- Append a delimiter and new value
                              RETURN odciconst.success;
                          END odciaggregateiterate;
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT concat_agg_type, ctx2 IN concat_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- If merging, simply concatenate them together
                              -- Since each string will either be NULL or delimiter prefixed, no need to re-delimit
                              self.v_result  := self.v_result || ctx2.v_result;
                              RETURN odciconst.success;
                          END odciaggregatemerge;
                      
                          MEMBER FUNCTION odciaggregateterminate(self          IN     concat_agg_type,
                                                                 returnvalue      OUT VARCHAR2,
                                                                 flags         IN     NUMBER
                                                                )
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- Since we prefix the string initially with a comma, remove the extra here before returning
                              returnvalue  := LTRIM(self.v_result, ',');
                              RETURN odciconst.success;
                          END odciaggregateterminate;
                      END;
                      /

Open in new window

Our function could be called "stragg" to mimic other versions found on various websites, I called this one "concatagg" to maintain the convention established in the other aggregates above. My recommendation, choose a name that is self-documenting and matches your corporate naming standards.

CREATE OR REPLACE FUNCTION concatagg(p_string VARCHAR2)
                          RETURN VARCHAR2 PARALLEL_ENABLE AGGREGATE
                          USING concat_agg_type;

Open in new window

Example usage:

select concatagg(table_name) from all_tables where owner = 'SCOTT';

Open in new window

One problem with this aggregate is it can only return a varchar2 of up to 4000 characters. Even if internally we construct a 32k varchar2 we can't return it in a sql statement. 12c allows us to create 32K varchar2 in sql, but then that begs the question - "What about 33K or more?" The answer then is to return a CLOB. You could simply take the code above and swap the data type from varchar2 to clob for the internal variable but that's not a good idea. CLOB operations are expensive compared to varchar2. We want to minimize the number of times we have to touch a CLOB when building our return. So, in this version I've added a temporary variable to to hold varchar2 values until they threaten to exceed the 32K limit and only then do we concatenate to the CLOB. This allows us to do varchar2 as much as possible for fastest results while still returning a CLOB for maximum length support.

CREATE OR REPLACE TYPE CONCAT_CLOB_AGG_TYPE                                      
                          AS OBJECT
                      (
                          -- This could be defined to 32767, but since the aggregate will be used in SQL,
                          -- the 4000 limit is appropriate since that's all that SQL varchar2 strings support
                          v_temp VARCHAR2(32767),
                          v_result CLOB,
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT concat_clob_agg_type)
                              RETURN NUMBER,
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT concat_clob_agg_type, p_string IN VARCHAR2)
                              RETURN NUMBER,
                          MEMBER FUNCTION odciaggregatemerge(
                              self   IN OUT concat_clob_agg_type,
                              ctx2   IN     concat_clob_agg_type
                          )
                              RETURN NUMBER,
                          MEMBER FUNCTION odciaggregateterminate(
                              self          IN     concat_clob_agg_type,
                              returnvalue      OUT CLOB,
                              flags         IN     NUMBER
                          )
                              RETURN NUMBER
                      );
                      /
                      
                      CREATE OR REPLACE TYPE BODY CONCAT_CLOB_AGG_TYPE
                      IS
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT concat_clob_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              ctx := concat_clob_agg_type('', ''); -- initialize the concatenation to NULL   
                              RETURN odciconst.success;
                          END odciaggregateinitialize;
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT concat_clob_agg_type, p_string IN VARCHAR2)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- Appending to clobs is slower than appending to varchar2
                              -- so use varchar2 until you can't anymore then append one big chunk
                              self.v_temp := self.v_temp || ',' || p_string; -- Append a delimiter and new value
                      
                              IF LENGTH(self.v_temp) > 28700
                              THEN
                                  self.v_result := self.v_result || self.v_temp;
                                  self.v_temp := NULL;
                              END IF;
                      
                              RETURN odciconst.success;
                          END odciaggregateiterate;
                      
                          MEMBER FUNCTION odciaggregatemerge(
                              self   IN OUT concat_clob_agg_type,
                              ctx2   IN     concat_clob_agg_type
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- If merging, simply concatenate them together
                              -- Since each string will either be NULL or delimiter prefixed, no need to re-delimit
                              self.v_result := self.v_result || self.v_temp || ctx2.v_result || ctx2.v_temp;
                              self.v_temp := NULL;
                              RETURN odciconst.success;
                          END odciaggregatemerge;
                      
                          MEMBER FUNCTION odciaggregateterminate(
                              self          IN     concat_clob_agg_type,
                              returnvalue      OUT CLOB,
                              flags         IN     NUMBER
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- Since we prefix the string initially with a comma, remove the extra here before returning
                              returnvalue := LTRIM(self.v_result || self.v_temp, ',');
                              RETURN odciconst.success;
                          END odciaggregateterminate;
                      END;
                      /

Open in new window

Next, create a function to that implements the type.

CREATE OR REPLACE FUNCTION concatclobagg(p_string VARCHAR2)
                          RETURN CLOB
                          PARALLEL_ENABLE
                          AGGREGATE USING concat_clob_agg_type;

Open in new window

And finally, test it the function with a large result set.

SELECT concatclobagg(owner || '.' || object_name) FROM all_objects;

Open in new window

That works, but in 10g Oracle introduced a new aggregate function called COLLECT. I recommend using it instead of the odci aggregate. It's native code so it's a little more efficient and it allows for easier sorting of the elements within the concatenated string. COLLECT will create a collection object, you then need a function to iterate through the collection and build your final string. This is not a user-defined aggregate per se; but, in this case, it's usually going to be a better solution. So, you can use the aggregate above if you want, but mostly I include it for illustration, not as a recommendation. I'll show the COLLECT option now as well as sample usage.

First you need to create a collection type to hold your varchar2 values.

CREATE OR REPLACE TYPE vcarray AS TABLE OF VARCHAR2(4000);

Open in new window

Then you need to create a function to iterate over them. I'll use a 32K varchar2 buffer as I did in the aggregate above to boost effciency by minimizing the number of times I need to update the result clob.

CREATE OR REPLACE FUNCTION tbl2clob(p_tbl IN vcarray, p_delimiter IN VARCHAR2 DEFAULT ',')
                          RETURN CLOB
                          DETERMINISTIC
                      IS
                          v_str  VARCHAR2(32767);
                          v_clob CLOB;
                      BEGIN
                          DBMS_LOB.createtemporary(v_clob, TRUE);
                      
                          IF p_tbl.COUNT > 0
                          THEN
                              v_str := p_tbl(1);
                      
                              FOR i IN 2 .. p_tbl.COUNT
                              LOOP
                                  -- Appending to clobs is slower than appending to varchar2
                                  -- so use varchar2 until you can't anymore then append one big chunk
                      
                                  v_str := v_str || p_delimiter || p_tbl(i);
                      
                                  IF LENGTH(v_str) > 28700
                                  THEN
                                      v_clob := v_clob || v_str;
                                      v_str := NULL;
                                  END IF;
                              END LOOP;
                          END IF;
                      
                          IF v_str IS NOT NULL
                          THEN
                              v_clob := v_clob || v_str;
                          END IF;
                      
                          RETURN v_clob;
                      END;

Open in new window

Note, this function has 2 parameters. So this is another benefit to writing a non-aggregate function as a wrapper for the COLLECT aggregate. With the aggregates you can only have one parameter. The wrapping method allows you to add more options, as in this case, where I'm adding support for different delimiters.

The return value of COLLECT isn't strongly typed, so you must CAST it to a known type before using it in a function. This is why we create a collection type as the first step. I'll repeat the example I used above with the user-defined aggregate, except here I'll also include an ORDER BY clause on the COLLECT to show how to better control the output.

SELECT tbl2clob(CAST(COLLECT(owner || '.' || object_name ORDER BY owner, object_name) AS vcarray))
                        FROM all_objects;

Open in new window

Multiplying numbers with a product aggregate

Next, we'll go back to numbers and introduce another new function that isn't available in the built-ins: set multiplication. With built-ins we can add numbers, we can average numbers, we can derive variance, median and standard deviation but we can't multiply them! So, let's create a function to allow that.

-----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      ----    Product Aggregate for numbers  (multiply numbers in set, vs. sum which adds)
                      -----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      CREATE OR REPLACE TYPE product_agg_type
                          AS OBJECT
                      (
                          v_product NUMBER,  -- Store our running product
                      
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT product_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT product_agg_type, p_number IN NUMBER)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT product_agg_type, ctx2 IN product_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateterminate(
                              self          IN     product_agg_type,
                              returnvalue      OUT NUMBER,
                              flags         IN     NUMBER
                          )
                              RETURN NUMBER
                      );
                      /
                      
                      CREATE OR REPLACE TYPE BODY product_agg_type
                      IS
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT product_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              ctx := product_agg_type(NULL);  -- Initialize the result to NULL
                              RETURN odciconst.success;
                          END odciaggregateinitialize;
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT product_agg_type, p_number IN NUMBER)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- Multiply each new number by the running product, if we don't have a product yet, seed it with 1.
                              self.v_product := NVL(self.v_product, 1) * p_number;
                              RETURN odciconst.success;
                          END odciaggregateiterate;
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT product_agg_type, ctx2 IN product_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- Multiply the products of 2 sets, using 1 for any NULL's
                              -- Make sure at least one of the sets has a value though
                              IF COALESCE(self.v_product, ctx2.v_product) IS NOT NULL
                              THEN
                                  self.v_product := NVL(self.v_product, 1) * NVL(ctx2.v_product, 1);
                              END IF;
                              RETURN odciconst.success;
                          END odciaggregatemerge;
                      
                          MEMBER FUNCTION odciaggregateterminate(
                              self          IN     product_agg_type,
                              returnvalue      OUT NUMBER,
                              flags         IN     NUMBER
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              returnvalue := self.v_product;
                              RETURN odciconst.success;
                          END odciaggregateterminate;
                      END;
                      /

Open in new window

Create a function to use the object.

CREATE OR REPLACE FUNCTION productagg(p_number IN NUMBER)
                          RETURN NUMBER
                          PARALLEL_ENABLE
                          AGGREGATE USING product_agg_type;

Open in new window

Example usage:

select productagg(level) from dual connect by level <= 6;

Open in new window

Aggregating Dates to detect event skew

For the next aggregate, we'll get a little more creative (some might say mathematically silly.) We'll create an Average function for DATE values. So, first, you have to ask yourself: "Given a set of dates, what would the 'average' look like? How would it be used? Would the generated date have any meaning?" For this aggregate, I'm going to define the "average date" within a set to be a point in time that best represents all of the times in the set. So, if I'm tracking events, for example: crimes by county, vehicular accidents by state, quadruplet births by nation, or sales perks given away by department; I might want to know when these things "usually" happen. Of course, this is an approximation at best, but if I've a lot of skew to my results, a function like this could be helpful. For instance, if I have 10 crimes in June and 2 crimes in November, I'd expect the result of the function to skew toward the June vales and return an "average" date indicating these particular crimes are more common in the summer. Again, this function is mostly intended for illustration, but for some purposes, you may find the approximations helpful as an indicator for areas that need more sophisticated statistical analysis. Now that we know how we are going to define what the results mean, how do we calculate such a thing? Oracle doesn't store dates as a single number, so it's not just matter of adding them up and dividing by the count as you would sales or batting averages. However, it is possible to generate a meaningful number that represents each date. By using the Julian date number of each day and adding in the fractional day amount for hours, minutes and seconds we can generate a single number that is an offset from the "beginning of time" as far as Oracle is concerned, i.e. Jan 1, 4712 BC 00:00:00. Once we have that, then the date average is just as simple as any numerical average.

-----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      ----    Average Aggregate for dates
                      -----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      
                      CREATE OR REPLACE TYPE avg_date_agg_type
                          AS OBJECT
                      (
                          v_day_sum NUMBER,  -- running total of date offsets
                          v_day_count INTEGER,  -- number of dates we've summed
                      
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT avg_date_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT avg_date_agg_type, p_date IN DATE)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT avg_date_agg_type, ctx2 IN avg_date_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateterminate(
                              self          IN     avg_date_agg_type,
                              returnvalue      OUT DATE,
                              flags         IN     NUMBER
                          )
                              RETURN NUMBER
                      );
                      /
                      
                      CREATE OR REPLACE TYPE BODY avg_date_agg_type
                      IS
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT avg_date_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              ctx := avg_date_agg_type(0, 0);  -- initialize the sum and count both to 0
                              RETURN odciconst.success;
                          END odciaggregateinitialize;
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT avg_date_agg_type, p_date IN DATE)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- The TO_CHAR function with 'j' format return the Julian value
                              -- for the day portion only and drops the time
                              -- this function will add the time as a fractional day to the Julian.
                              -- Subtract one to create a delta from minimum-supported time.
                              self.v_day_sum :=
                                  self.v_day_sum + TO_NUMBER(TO_CHAR(p_date, 'j')) + (p_date - TRUNC(p_date)) - 1;
                              self.v_day_count := self.v_day_count + 1;
                      
                              RETURN odciconst.success;
                          END odciaggregateiterate;
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT avg_date_agg_type, ctx2 IN avg_date_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- Add the sums and counts to generate an average over both sets
                              self.v_day_sum := self.v_day_sum + ctx2.v_day_sum;
                              self.v_day_count := self.v_day_count + ctx2.v_day_sum;
                              RETURN odciconst.success;
                          END odciaggregatemerge;
                          
                          MEMBER FUNCTION odciaggregateterminate(
                              self          IN     avg_date_agg_type,
                              returnvalue      OUT DATE,
                              flags         IN     NUMBER
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- If we haven't iterated over any values, the count will be 0, so the average is NULL
                              -- we can't divide by zero anyway.  If we do have a count then generate the mean delta
                              -- and add it to the start of the Oracle calendar.
                              returnvalue :=
                                  CASE
                                      WHEN self.v_day_count > 0
                                      THEN
                                          --  Add the average delta to min-time to create the "average" date
                                          TO_DATE('1', 'j') + (self.v_day_sum / self.v_day_count)
                                      
                                      ELSE
                                          NULL
                                  END;
                              RETURN odciconst.success;
                          END odciaggregateterminate;
                      END;
                      /

Open in new window

Create a function to use the object

CREATE OR REPLACE FUNCTION avgdateagg(p_date DATE)
                          RETURN DATE
                          PARALLEL_ENABLE
                          AGGREGATE USING avg_date_agg_type;

Open in new window

Example usage: (on average, when did we analyze our tables?)

SELECT avgdateagg(last_analyzed) FROM all_tables;

Open in new window

Using an aggregate to parse strings

The next function is from a real world example asked here on Experts-Exchange. That is, given a set of strings, what are the distinct characters used in those strings? For example, given ('cadbb','dcfba','zxaf') the result would be 'abcdfxz'. I'll admit, this is an unusual function. Where this might be used? Perhaps an inventory management system with quality codes assigned to each item as a string to indicate various defect types. So you want to check within each inventory location, what are all of the defects that need to be managed; so you can assign the appropriate staff and equipment?

This aggregate has some extra complexity and requires an extra member function to determine the unique strings, this function will be used in the iterate and merge steps to find unique characters as new strings are introduced.

-----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      ---   Distinct characters 
                      -----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      CREATE OR REPLACE TYPE distinct_char_agg_type
                          AS OBJECT
                      (
                          v_result VARCHAR2(4000), -- as with the string concatenation function above, we can only return 4000 characters in SQL
                      
                          MEMBER FUNCTION distinct_chars(p_unique IN VARCHAR2, p_new IN VARCHAR2)
                              RETURN VARCHAR2
                              DETERMINISTIC,
                      
                          STATIC FUNCTION odciaggregateinitialize(p_ctx IN OUT distinct_char_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT distinct_char_agg_type, p_string IN VARCHAR2)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregatemerge(
                              self     IN OUT distinct_char_agg_type,
                              p_ctx2   IN     distinct_char_agg_type
                          )
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateterminate(
                              self            IN     distinct_char_agg_type,
                              p_returnvalue      OUT VARCHAR2,
                              p_flags         IN     NUMBER
                          )
                              RETURN NUMBER
                      );
                      /
                      
                      CREATE OR REPLACE TYPE BODY distinct_char_agg_type
                      IS
                          -- Given a string of unique characters and a new string, check if the new string
                          -- introduces any new characters that need to be added to the unique set.
                          -- This function assumes the "unique" string has already been validated to be unique
                          -- The usage of this function in the rest of the object's functions ensures this is true.
                          MEMBER FUNCTION distinct_chars(p_unique IN VARCHAR2, p_new IN VARCHAR2)
                              RETURN VARCHAR2
                              DETERMINISTIC
                          IS
                              v_temp  VARCHAR2(32767) := NULL;
                              v_index INTEGER;
                              v_char  CHAR(1);
                          BEGIN
                              v_temp := p_unique;
                              -- Look at each character of the new string
                              -- if it's unique then add it via simple sorted insertion 
                              FOR i IN 1 .. NVL(LENGTH(p_new), 0)
                              LOOP
                                  v_char := SUBSTR(p_new, i, 1);
                      
                                  -- Is the new character already in our unique string?
                                  IF NVL(INSTR(v_temp, v_char), 0) = 0
                                  THEN
                                      v_index := 1;
                      
                                      -- The character is new, iterate through the unique string
                                      -- to determine where to put the new character, this creates a sorted output.
                                      WHILE v_index <= NVL(LENGTH(v_temp), 0)
                                      LOOP
                                          IF v_char < SUBSTR(v_temp, v_index, 1)
                                          THEN
                                              v_temp :=
                                                  SUBSTR(v_temp, 1, v_index - 1) || v_char || SUBSTR(v_temp, v_index);
                                              EXIT;
                                          END IF;
                      
                                          v_index := v_index + 1;
                                      END LOOP;
                      
                                      IF v_index > NVL(LENGTH(v_temp), 0)
                                      THEN
                                          v_temp := v_temp || v_char;
                                      END IF;
                                  END IF;
                              END LOOP;
                      
                              RETURN v_temp;
                          END distinct_chars;
                      
                          STATIC FUNCTION odciaggregateinitialize(p_ctx IN OUT distinct_char_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              p_ctx := distinct_char_agg_type(NULL);  -- Initialize our unique set to be NULL
                              RETURN odciconst.success;
                          END odciaggregateinitialize;
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT distinct_char_agg_type, p_string IN VARCHAR2)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- Use or new function to add distinct characters of each new string
                              self.v_result := distinct_chars(self.v_result, p_string); 
                              RETURN odciconst.success;
                          END odciaggregateiterate;
                      
                          MEMBER FUNCTION odciaggregatemerge(
                              self     IN OUT distinct_char_agg_type,
                              p_ctx2   IN     distinct_char_agg_type
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- As with iterate, simply add new characters from the other set.
                              self.v_result := distinct_chars(self.v_result, p_ctx2.v_result);  
                              RETURN odciconst.success;
                          END odciaggregatemerge;
                      
                          MEMBER FUNCTION odciaggregateterminate(
                              self            IN     distinct_char_agg_type,
                              p_returnvalue      OUT VARCHAR2,
                              p_flags         IN     NUMBER
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              p_returnvalue := self.v_result;  -- Iterate and Merge do all the work, return the result
                              RETURN odciconst.success;
                          END odciaggregateterminate;
                      END;
                      /

Open in new window

Create a function to use our object

CREATE OR REPLACE FUNCTION distinctagg(p_input VARCHAR2)
                          RETURN VARCHAR2
                          PARALLEL_ENABLE
                          AGGREGATE USING distinct_char_agg_type;

Open in new window

Example usage:

select distinctagg(table_name) from user_tables;

Open in new window

Aggregating Dates to find common days

The next aggregate is another example of event sampling. Given a set of dates, what day of the week is the most common? This function could be useful for determining sales promotions within a week based on customer activity, or scheduling audits based on system outages. The other reason I've included this aggregate is to show one of the limitations of the Oracle object model and a possible workaround. Conceptually, this function is just a glorified COUNT aggregate, except, instead of incrementing a single counter, it increments 7 of them, one counter for each day of the week. The simplest method to implement this would be an array indexed by the day of the week, either a numeric index 0-6 or a string index based on the names of the days of the week. However, it's not legal to declare our own collection types within an object. Nor can we declare a variable based on the collection types defined within the built-in package DBMS_SQL. Instead, we have to declare one variable for each counter. For days of the week, this isn't too bad. We only need 7 of them. If we were counting hundreds of items we might want to build a package to hold and manipulate the array. Also of note is tie-breaking. An aggregate can only produce one value for a set. But, if you have more than one day of the week with the same count, which do you return? There is no universal right answer, let your business requirements determine tie-breaker rules. For my function, the weeks start on Sunday and in the event of a tie, the earliest day of the week will be returned.

-----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      ----    Most common day of the week aggregate for dates
                      -----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      
                      CREATE OR REPLACE TYPE common_dow_agg_type
                          AS OBJECT
                      (
                          v_sun_count INTEGER,
                          v_mon_count INTEGER,
                          v_tue_count INTEGER,
                          v_wed_count INTEGER,
                          v_thu_count INTEGER,
                          v_fri_count INTEGER,
                          v_sat_count INTEGER,
                      
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT common_dow_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT common_dow_agg_type, p_date IN DATE)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregatemerge(
                              self   IN OUT common_dow_agg_type,
                              ctx2   IN     common_dow_agg_type
                          )
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateterminate(
                              self          IN     common_dow_agg_type,
                              returnvalue      OUT VARCHAR2,
                              flags         IN     NUMBER
                          )
                              RETURN NUMBER
                      );
                      /
                      
                      CREATE OR REPLACE TYPE BODY common_dow_agg_type
                      IS
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT common_dow_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- initialize all of the counters to zero
                              ctx :=
                                  common_dow_agg_type(
                                      0,
                                      0,
                                      0,
                                      0,
                                      0,
                                      0,
                                      0
                                  );
                              RETURN odciconst.success;
                          END odciaggregateinitialize;
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT common_dow_agg_type, p_date IN DATE)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- increment the appropriate counter for each day of the week
                              CASE TO_CHAR(p_date, 'Dy')
                                  WHEN 'Sun'
                                  THEN
                                      self.v_sun_count := self.v_sun_count + 1;
                                  WHEN 'Mon'
                                  THEN
                                      self.v_mon_count := self.v_mon_count + 1;
                                  WHEN 'Tue'
                                  THEN
                                      self.v_tue_count := self.v_tue_count + 1;
                                  WHEN 'Wed'
                                  THEN
                                      self.v_wed_count := self.v_wed_count + 1;
                                  WHEN 'Thu'
                                  THEN
                                      self.v_thu_count := self.v_thu_count + 1;
                                  WHEN 'Fri'
                                  THEN
                                      self.v_fri_count := self.v_fri_count + 1;
                                  WHEN 'Sat'
                                  THEN
                                      self.v_sat_count := self.v_sat_count + 1;
                              END CASE;
                      
                              RETURN odciconst.success;
                          END odciaggregateiterate;
                      
                          MEMBER FUNCTION odciaggregatemerge(
                              self   IN OUT common_dow_agg_type,
                              ctx2   IN     common_dow_agg_type
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- sum the counts from both sets
                              self.v_sun_count := self.v_sun_count + ctx2.v_sun_count;
                              self.v_mon_count := self.v_mon_count + ctx2.v_mon_count;
                              self.v_tue_count := self.v_tue_count + ctx2.v_tue_count;
                              self.v_wed_count := self.v_wed_count + ctx2.v_wed_count;
                              self.v_thu_count := self.v_thu_count + ctx2.v_thu_count;
                              self.v_fri_count := self.v_fri_count + ctx2.v_fri_count;
                              self.v_sat_count := self.v_sat_count + ctx2.v_sat_count;
                              RETURN odciconst.success;
                          END odciaggregatemerge;
                      
                          MEMBER FUNCTION odciaggregateterminate(
                              self          IN     common_dow_agg_type,
                              returnvalue      OUT VARCHAR2,
                              flags         IN     NUMBER
                          )
                              RETURN NUMBER
                          IS
                              v_max INTEGER
                                        := GREATEST(
                                               self.v_sun_count,
                                               self.v_mon_count,
                                               self.v_tue_count,
                                               self.v_wed_count,
                                               self.v_thu_count,
                                               self.v_fri_count,
                                               self.v_sat_count
                                           );
                          BEGIN
                              -- Return whichever day has the highest count
                              -- If there is a tie, then earliest in the week will be returned
                              returnvalue :=
                                  CASE
                                      WHEN self.v_sun_count = v_max THEN 'Sunday'
                                      WHEN self.v_mon_count = v_max THEN 'Monday'
                                      WHEN self.v_tue_count = v_max THEN 'Tuesday'
                                      WHEN self.v_wed_count = v_max THEN 'Wednesday'
                                      WHEN self.v_thu_count = v_max THEN 'Thursday'
                                      WHEN self.v_fri_count = v_max THEN 'Friday'
                                      ELSE 'Saturday'
                                  END;
                              RETURN odciconst.success;
                          END odciaggregateterminate;
                      END;
                      /

Open in new window

Create a function to use the object

CREATE OR REPLACE FUNCTION day_of_weekagg(p_date DATE)
                          RETURN VARCHAR2
                          PARALLEL_ENABLE
                          AGGREGATE USING common_dow_agg_type;

Open in new window

Example of usage:

SELECT day_of_weekagg(last_analyzed) FROM all_tables;

Open in new window

Selecting a random value from a set with an aggregate

The next two aggregates are also real requests I've received, either in EE or in my real work. Each of them returns a random value from within a set. The first is a simple version but statistically skewed. Basically it just flips a coin using the DBMS_RANDOM.RANDOM function to determine whether to keep a value or use the next one in the iteration. This skews the results in that the first value iterated will become less likely to be used and the last value will become most likely. The second is more complicated but ensures the random selection is more uniform across the set. That is, every element in the set has equal chance of being selected.

Note, by default DBMS_RANDOM does not have public EXECUTE privilege by default. If you want to use these aggregates, you'll either need to get privilege to execute it or use another random number generator.

-----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      ----    Return random value from a set (simple version)
                      -----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      
                      CREATE OR REPLACE TYPE random_agg_type
                          AS OBJECT
                      (
                          v_string VARCHAR2(4000),
                      
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT random_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT random_agg_type, p_string IN VARCHAR2)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT random_agg_type, ctx2 IN random_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateterminate(
                              self          IN     random_agg_type,
                              returnvalue      OUT VARCHAR2,
                              flags         IN     NUMBER
                          )
                              RETURN NUMBER
                      );
                      
                      CREATE OR REPLACE TYPE BODY random_agg_type
                      IS        
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT random_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              ctx := random_agg_type(NULL);  -- initialize our random selection to NULL
                              RETURN odciconst.success;
                          END odciaggregateinitialize;
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT random_agg_type, p_string IN VARCHAR2)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- If we haven't set a value yet, then use the first value passed in
                              -- if we have a value "flip a coin" to decide if we use the new value
                              -- or keep the old one.   
                              IF v_string IS NULL
                              THEN
                                  v_string := p_string;
                              ELSIF DBMS_RANDOM.random < 0
                              THEN
                                  v_string := p_string;           
                              END IF;
                      
                              RETURN odciconst.success;
                          END odciaggregateiterate;
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT random_agg_type, ctx2 IN random_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN       
                              -- "flip a coin" to decide which value to keep
                              IF  DBMS_RANDOM.random < 0
                              THEN
                                  v_string := ctx2.v_string;
                              END IF;
                      
                              RETURN odciconst.success;
                          END odciaggregatemerge;
                          
                          MEMBER FUNCTION odciaggregateterminate(
                              self       IN     random_agg_type,
                              returnvalue     OUT VARCHAR2,
                              flags    IN     NUMBER
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              returnvalue := v_string;
                              RETURN odciconst.success;
                          END odciaggregateterminate;
                      END;
                      /

Open in new window

Create a function to use the object

CREATE OR REPLACE FUNCTION random_value_agg(p_string VARCHAR2)
                          RETURN VARCHAR2
                          PARALLEL_ENABLE
                          AGGREGATE USING random_agg_type;

Open in new window

Example usage:

SELECT   department_id, random_value_agg(salary)
                          FROM hr.employees
                      GROUP BY department_id;

Open in new window

In the next one we still want to select a random value from our set but we want each value to be equally weighted. The simplest method would be to just put the values in a list and select randomly 1-N and take whichever value it is. Unfortunately, we won't know how many values there are until we are done and we'd have to store all of the values as we go. Instead, we'll adjust the odds of each subsequent value in the iteration down by the total number of elements seen. So, the first value will have a 100% chance of being used. The second value will have a 1/2 chance of winning over the currently selected value. The third value will have a 1/3 chance of being selected. The fourth value will have a 1/4 chance and so on.

-----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      ----    Return random value from a set (less skew version)
                      -----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      
                      CREATE OR REPLACE TYPE random_agg_type
                          AS OBJECT
                      (
                          v_string VARCHAR2(4000),  -- our current random selection
                          v_count NUMBER,  -- how many values have we iterated through
                      
                          MEMBER FUNCTION random_0n(p_n IN INTEGER)
                              RETURN INTEGER,
                      
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT random_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT random_agg_type, p_string IN VARCHAR2)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT random_agg_type, ctx2 IN random_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateterminate(
                              self          IN     random_agg_type,
                              returnvalue      OUT VARCHAR2,
                              flags         IN     NUMBER
                          )
                              RETURN NUMBER
                      );
                      
                      CREATE OR REPLACE TYPE BODY random_agg_type
                      IS
                          -- We only want to select random values 0-N (actually N-1), but no such function exists in DBMS_RANDOM package
                          -- so we'll create one and adjust for modulo bias to ensure even distribution.
                          MEMBER FUNCTION random_0n(p_n IN INTEGER)
                              RETURN INTEGER
                          IS
                              v_biasmax INTEGER;
                              v_result  INTEGER;
                          BEGIN
                              IF p_n <= 0 OR p_n > 2 ** 32 - 1
                              THEN
                                  RAISE VALUE_ERROR;
                              END IF;
                      
                              -- To prevent modulo bias limit the random range
                              -- to the largest number evenly divisible by N
                              v_biasmax := (2 ** 32 - 1) - MOD(2 ** 32 - 1, p_n);
                      
                              -- If random number is greater than the bias limit
                              -- then pick a new number.
                              -- This "could" theoretically run forever but in practice it'll terminate
                              -- within the first or "maybe" second cycle.
                              -- DBMS_RANDOM.RANDOM is obsolete, but included here simply
                              -- for purposes of example and is compatible with all versions from 9i forward.
                              -- Due to range limitation of dbms_random.random the result will only
                              -- be from the first 4,294,967,296 values within the aggregation group.
                              -- all others will be ignored.
                              LOOP
                                  v_result := DBMS_RANDOM.random + (2 ** 31);
                                  EXIT WHEN v_result <= v_biasmax;
                              END LOOP;
                      
                              RETURN MOD(v_result, p_n);
                          END random_0n;
                      
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT random_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              ctx := random_agg_type(NULL, 0);
                              RETURN odciconst.success;
                          END odciaggregateinitialize;
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT random_agg_type, p_string IN VARCHAR2)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- Adjust the count for the new element
                              v_count := v_count + 1;
                      
                              -- Generate a random number 0 to N-1, if it's 0, 
                              -- then the new value becomes the current selection
                              IF random_0n(v_count) = 0
                              THEN
                                  v_string := p_string;
                              END IF;
                      
                              RETURN odciconst.success;
                          END odciaggregateiterate;
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT random_agg_type, ctx2 IN random_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- Weight the selection of random value from the contexts by the relative sizes
                              -- For example:
                              --   self.v_count = 10
                              --   p_ctx2.v_count = 7
                              --  If random number from 0-16  < 7 then select the p_ctx2 value
                              --  otherwise keep the value currently in use
                              --  Note, this does have a 2^32-1 limit before there is an error
                              --  due to inability to prevent modulo bias
                              IF random_0n(v_count + ctx2.v_count) < ctx2.v_count
                              THEN
                                  v_string := ctx2.v_string;
                              END IF;
                      
                              v_count := v_count + ctx2.v_count;
                              RETURN odciconst.success;
                          END odciaggregatemerge;
                          
                          MEMBER FUNCTION odciaggregateterminate(
                              self       IN     random_agg_type,
                              returnvalue     OUT VARCHAR2,
                              flags    IN     NUMBER
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              returnvalue := v_string;
                              RETURN odciconst.success;
                          END odciaggregateterminate;
                      END;
                      /

Open in new window

Create a function to use the object

CREATE OR REPLACE FUNCTION random_value_agg(p_string VARCHAR2)
                          RETURN VARCHAR2
                          PARALLEL_ENABLE
                          AGGREGATE USING random_agg_type;

Open in new window

Example usage:

SELECT   department_id, random_value_agg(salary)
                          FROM hr.employees
                      GROUP BY department_id;

Open in new window

Aggregating a correlated value

The last function I'll show can be used to solve another problem I see frequently on EE. That is, given a set of values, when does the maximum occur? So, it's sort of like MAX, except we don't really care what the max value is, instead, we want to know "when" that value happened. If you do want to know the actual value, you can simply query MAX along with this aggregate.

User-defined aggregates can only accept a single parameter. We'll get around this limitation by creating our own object type that has both value and date fields and declare our parameter to be of that type.

CREATE OR REPLACE TYPE number_date_type AS OBJECT
                      (
                          value NUMBER,
                          correlated_time DATE
                      );
                      /

Open in new window

Now that we have our type to hold value/time pairs, we can build our aggregate. If the maximum occurs more than one time, we'll return the date of the last occurrence.

-----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      ----    Correlated Time for Maximum Number Aggregate
                      -----------------------------------------------------------------------------------------------------------------------
                      -----------------------------------------------------------------------------------------------------------------------
                      
                      CREATE OR REPLACE TYPE cortime_max_num_agg_type
                          AS OBJECT
                      (
                          v_temp NUMBER,  -- What is the max?
                          v_time DATE,   -- When did it happen?
                      
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT cortime_max_num_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateiterate(self IN OUT cortime_max_num_agg_type, p_pair IN number_date_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregatemerge(self IN OUT cortime_max_num_agg_type, ctx2 IN cortime_max_num_agg_type)
                              RETURN NUMBER,
                      
                          MEMBER FUNCTION odciaggregateterminate(self          IN     cortime_max_num_agg_type,
                                                                 returnvalue      OUT DATE,
                                                                 flags         IN     NUMBER
                                                                )
                              RETURN NUMBER
                      );
                      /
                      
                      CREATE OR REPLACE TYPE BODY cortime_max_num_agg_type
                      IS
                          STATIC FUNCTION odciaggregateinitialize(ctx IN OUT cortime_max_num_agg_type)
                              RETURN NUMBER
                          IS
                          BEGIN
                              ctx := cortime_max_num_agg_type(NULL, NULL);  -- initialize max and time to NULL
                              RETURN odciconst.success;
                          END odciaggregateinitialize;
                      
                          MEMBER FUNCTION odciaggregateiterate(
                              self     IN OUT cortime_max_num_agg_type,
                              p_pair   IN     number_date_type
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- The first pass through the value and correlated time are automatically
                              -- the max.  After that, check if the new value is bigger than the old
                              -- if so, use the new pair.  If the old and new are the same value, then
                              -- keep the time that is latest.
                              IF v_temp IS NULL
                              THEN
                                  v_temp := p_pair.VALUE;
                                  v_time := p_pair.correlated_time;
                              ELSIF v_temp < p_pair.VALUE
                              THEN
                                  v_temp := p_pair.VALUE;
                                  v_time := p_pair.correlated_time;
                              ELSIF v_temp = p_pair.VALUE
                              THEN
                                  v_time := GREATEST(v_time, p_pair.correlated_time);
                              END IF;
                      
                              RETURN odciconst.success;
                          END odciaggregateiterate;
                      
                          MEMBER FUNCTION odciaggregatemerge(
                              self   IN OUT cortime_max_num_agg_type,
                              ctx2   IN     cortime_max_num_agg_type
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              -- If the current value is null, use the other one
                              -- If the new set is greater than our current one, use the new one
                              -- If the new and current are the same, then take the latest time
                              IF v_temp IS NULL
                              THEN
                                  v_temp := ctx2.v_temp;
                                  v_time := ctx2.v_time;
                              ELSIF ctx2.v_temp > v_temp
                              THEN
                                  v_temp := ctx2.v_temp;
                                  v_time := ctx2.v_time;
                              ELSIF ctx2.v_temp = v_temp
                              THEN
                                  v_time := GREATEST(v_time, ctx2.v_time);
                              END IF;
                      
                              RETURN odciconst.success;
                          END odciaggregatemerge;
                      
                          MEMBER FUNCTION odciaggregateterminate(
                              self          IN     cortime_max_num_agg_type,
                              returnvalue      OUT DATE,
                              flags         IN     NUMBER
                          )
                              RETURN NUMBER
                          IS
                          BEGIN
                              returnvalue := self.v_time;
                              RETURN odciconst.success;
                          END odciaggregateterminate;
                      END;
                      /

Open in new window

Create a function to use our object

CREATE OR REPLACE FUNCTION correlated_timeagg(p_pair number_date_type)
                          RETURN DATE
                          PARALLEL_ENABLE
                          AGGREGATE USING cortime_max_num_agg_type;

Open in new window

Example usage (find the biggest row count and when that count was established)

SELECT max(num_rows), correlated_timeagg(number_date_type(num_rows,last_analyzed)) FROM all_tables;

Open in new window

You may note, sometimes I reference the current values of the object by referencing them directly (v_result) and sometimes I dereference them from "self" (self.v_result). This is intentional to show both are legal and you should use whichever method suits you and your corporate coding standards.

You may also note the parameter naming may not appear consistent. This was a style choice of mine. When I look at the odciaggregateiterate API, I see it as one part Oracle's (self) and one part mine (p_string, p_date, p_pair, etc.) The API forces me to declare the member functions a certain way. The parts that I don't really care about I left unprefixed and gave them the names found in the Oracle Data Cartridge Developer's Guide. The part that is "mine" I prefixed with "p_" as my own coding style for parameters. This is largely cosmetic though, just my personal preference.

For more information about the Data Cartridge and Object facilities in Oracle:

Data Cartridge Developer's Guide: http://docs.oracle.com/cd/E11882_01/appdev.112/e10765/toc.htm
Object-Relational Developer's Guide: http://docs.oracle.com/cd/E11882_01/appdev.112/e11822/toc.htm

Thank you for reading and I hope you found it helpful.
Questions and comments welcome.

out.txt

How to Create User-Defined Aggregates in Oracle

Data Cartridge API for Aggregates

Starting simply, a NULL aggregate

Writing our own counting aggregate

Writing a MAX aggregate for numbers

Aggregating Interval types

Concatenating Strings with an aggregate

Multiplying numbers with a product aggregate

Aggregating Dates to detect event skew

Using an aggregate to parse strings

Aggregating Dates to find common days

Selecting a random value from a set with an aggregate

Aggregating a correlated value

Comments (1)