Link to home
Start Free TrialLog in
Avatar of angel7170
angel7170Flag for United States of America

asked on

Function to concatenate CLOB

Hello,
I have an Oracle SQL query that joins more than 4 tables to get some data. One of them is a sub-query that basically concatenates rows into a column (Column 1). The “column 1” value returned is a CLOB type.  I am using XMLAGG function to return the value.
The sub-query currently is filtered based on VT_TEXT_TYPE. I have more other columns (Column 2 to Column 14) that also concatenate VT_TEXT but have its own filter VT_TEXT_TYPE (logic as attached in the spreadsheet).
Is there a way to place these in one function and pass the VT_SER_NUM and VT_TEXT_TYPE as parameters to get the concatenated string VT_TEXT as a result?
Since there are about million rows in the table, the performance is really slow even to bring one column.
Can someone please assist?
Thank you very much
SELECT AM.AM_SER_NUM,
         VT.VT_TEXT as Column 1
     FROM  AM@dbrd5 AM,
    (    select og_ser_num         from OG@dbrd5  
                group by og_ser_num
            )   OG  
       , ( select IU_SER_NUM from IU@dbrd5)  IU
      , ( select RI_SER_NUM FROM RI@dbrd5)  RI   
      , (  SELECT VT_SER_NUM
            , EXTRACT(XMLAGG(XMLELEMENT("s", vt_text) order by vt_ent_num), '/s/text()').getclobval()    VT_TEXT
          FROM VT@DBRD5 VT 
              WHERE REGEXP_LIKE(SUBSTR ( Vt.VT_TEXT_TYPE, 3, 2 ), '^[[:digit:]]+$')
                 and ( VT_TEXT_TYPE LIKE 'AT%')                
GROUP BY VT_SER_NUM                     ) VT
WHERE AM_SER_NUM = OG_SER_NUM     (+)
  AND AM_SER_NUM = RI_SER_NUM     (+)
  AND AM_SER_NUM = IU_SER_NUM     (+)
  AND AM_SER_NUM = VT_SER_NUM     (+)

Open in new window


Sub-query
SELECT VT_SER_NUM
            , EXTRACT(XMLAGG(XMLELEMENT("s", vt_text) order by vt_ent_num), '/s/text()').getclobval()    VT_TEXT
          FROM VT@DBRD5 VT 
              WHERE REGEXP_LIKE(SUBSTR ( Vt.VT_TEXT_TYPE, 3, 2 ), '^[[:digit:]]+$')
                 and ( VT_TEXT_TYPE LIKE 'AT%')                
GROUP BY VT_SER_NUM

Open in new window

Logic.xlsx
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

Just like the last question:
Please post sample data and expected results.

It might be better to create your own function.

Then you can use a simple cursor loop and dbms_lob.writeappend.

It may be faster than the XML trick form the last question.

Docs for dbms_lob.writeappend:
http://docs.oracle.com/cd/E11882_01/appdev.112/e40758/d_lob.htm#ARPLS600

Have you checked where the performance issue is?

If you are trying to pull 1 million rows across the link, it may just be the network.

You might need to function in the remote database.
I agree with Steve, your issue may be attributed to data transmission over the network

Why don't you think of a possibility to run the sql over the host database?

 if there is no such possibility in case of the host is a  production database then you can think of replicating the same set of data over your database and can perform the same query and see the improvement in the results.
Avatar of angel7170

ASKER

Thank you slightwv and Wasim.

Here are the record counts from each of the remote database tables.

VT = 95,511,263
AM = 10,708,390
OG = 104,980
RI = 150,158
IU = 2,150,696

As you said, it could be network.  I don't have the ability to create a function in remote database and I am not sure what would be the performance if I need to create a table in the host database with these many rows.

To answer Slightwv question: "Please post sample data and expected results".
Attached is the sample data and expected results in two different sheets.
Sample.xls
Are you looking to dynamically create columns based on the first two characters of VT_TEXT_TYPE?

If so, you can sort of stop now.  You need to know the maximum number of columns when the SQL is parsed.  You really can't dynamically create columns in a result set.

With that much data I don't think you can get around performance unless you bring the data local.

Does the data have to be 'real time'?
If not, I would look at bringing it ALL local as suggested above using Materialize Views:
http://docs.oracle.com/cd/B28359_01/server.111/b28324/tdpii_reppit.htm
It is not dynamically created columns.  It is standard.

If I can create a local table before running the scripts then I should be good. It doesn't have to be real real time...
>>It is not dynamically created columns.  It is standard.

OK,
I'll work on a test case for what you posted.  I'll post it when I get it done.

>>If I can create a local table before running the scripts then I should be good. It doesn't have to be real real time...

Read up on materialized views in the link I posted.  This will allow the local version to refresh during slow hours and you will have local copies of the tables that are only as old as the last refresh.
great! Thank you.
SOLUTION
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Another one very similar to the function version.

The difference here is it uses the CLOBAGG function that is found all over the web.

My original function did a simple string concatenation.  CLOBAGG should be a little faster.

You'll just need to test them and see which one works best.

create or replace type clobagg_type as object(
  text clob,
  static function ODCIAggregateInitialize(sctx in out clobagg_type
                                         ) return number,
  member function ODCIAggregateIterate(self in out clobagg_type,
                                       value in clob
                                      ) return number,
  member function ODCIAggregateTerminate(self in clobagg_type,
                                         returnvalue out clob,
                                         flags in number
                                        ) return number,
  member function ODCIAggregateMerge(self in out clobagg_type,
                                     ctx2 in clobagg_type
                                    ) return number
);
/

show errors 

create or replace type body clobagg_type is
  static function ODCIAggregateInitialize(sctx in out clobagg_type
                                         ) return number is
  begin
    sctx := clobagg_type(null);
    return ODCIConst.Success;
  end;
  member function ODCIAggregateIterate(self  in out clobagg_type,
                                       value in     clob
                                      ) return number is
  begin
    self.text := self.text || value;
    return ODCIConst.Success;
  end;
  member function ODCIAggregateTerminate(self in clobagg_type,
                                         returnvalue out clob,
                                         flags in number
                                        ) return number is
  begin
    returnValue := self.text;
    return ODCIConst.Success;
  end;
  member function ODCIAggregateMerge(self in out clobagg_type,
                                     ctx2 in clobagg_type
                                    )return number is
  begin
    self.text := self.text || ctx2.text;
    return ODCIConst.Success;
  end;
end;
/

show errors 

create or replace function clobagg(input clob) return clob
  deterministic
  parallel_enable
  aggregate using clobagg_type;
/

show errors 

create or replace function myfunc2(p_ser_num in number, p_type in varchar2) return clob is
	retClob clob;
begin

	select clobagg(vt_text) into retClob from (select vt_text from tab1 where vt_ser_num=p_ser_num and substr(vt_text_type,1,2) = p_type order by vt_ent_num);
	return retClob;
end;
/

show errors

select vt_ser_num,
	myFunc2(vt_ser_num,'AT') COLUMN_1,
	myFunc2(vt_ser_num,'CS') COLUMN_2,
	myFunc2(vt_ser_num,'DM') COLUMN_3,
	myFunc2(vt_ser_num,'DR') COLUMN_4
from
(
	select distinct vt_ser_num from tab1
)
/

Open in new window

similar idea to what slightwv posted,  except I changed the aggregate to accept varchar2 inputs but return a clob

then I use stand pivot technique to generate the output


CREATE OR REPLACE TYPE vcagg_to_clob_type
    AS OBJECT
(
    v_result CLOB,
    STATIC FUNCTION odciaggregateinitialize(sctx IN OUT vcagg_to_clob_type)
        RETURN NUMBER,
    MEMBER FUNCTION odciaggregateiterate(self IN OUT vcagg_to_clob_type, p_string IN VARCHAR2)
        RETURN NUMBER,
    MEMBER FUNCTION odciaggregateterminate(
        self          IN     vcagg_to_clob_type,
        returnvalue      OUT CLOB,
        flags         IN     NUMBER
    )
        RETURN NUMBER,
    MEMBER FUNCTION odciaggregatemerge(self IN OUT vcagg_to_clob_type, ctx2 IN vcagg_to_clob_type)
        RETURN NUMBER
);
/

SHOW ERRORS

CREATE OR REPLACE TYPE BODY vcagg_to_clob_type
IS
    STATIC FUNCTION odciaggregateinitialize(sctx IN OUT vcagg_to_clob_type)
        RETURN NUMBER
    IS
    BEGIN
        sctx := vcagg_to_clob_type(NULL);
        RETURN odciconst.success;
    END;

    MEMBER FUNCTION odciaggregateiterate(self IN OUT vcagg_to_clob_type, p_string IN VARCHAR2)
        RETURN NUMBER
    IS
    BEGIN
        self.v_result := self.v_result || p_string;
        RETURN odciconst.success;
    END;

    MEMBER FUNCTION odciaggregateterminate(
        self          IN     vcagg_to_clob_type,
        returnvalue      OUT CLOB,
        flags         IN     NUMBER
    )
        RETURN NUMBER
    IS
    BEGIN
        returnvalue := self.v_result;
        RETURN odciconst.success;
    END;

    MEMBER FUNCTION odciaggregatemerge(self IN OUT vcagg_to_clob_type, ctx2 IN vcagg_to_clob_type)
        RETURN NUMBER
    IS
    BEGIN
        self.v_result := self.v_result || ctx2.v_result;
        RETURN odciconst.success;
    END;
END;
/

SHOW ERRORS

CREATE OR REPLACE FUNCTION clobagg(p_string VARCHAR2)
    RETURN CLOB
    DETERMINISTIC
    PARALLEL_ENABLE
    AGGREGATE USING vcagg_to_clob_type;
/

Open in new window


  SELECT vt_ser_num,
         clobagg(CASE WHEN vt_text_type LIKE 'AT%' THEN vt_text END) column_1,
         clobagg(CASE WHEN vt_text_type LIKE 'CS%' THEN vt_text END) column_2,
         clobagg(CASE WHEN vt_text_type LIKE 'DM%' THEN vt_text END) column_3,
         clobagg(CASE WHEN vt_text_type LIKE 'DR%' THEN vt_text END) column_4
    FROM tab1
GROUP BY vt_ser_num
/

Open in new window

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I don't think  myfunc2 in  http:#a40286515  is reliable for generating a sorted list.  

The aggregation has no requirement to process the subquery results in the order presented.  

 That's part of the reason analytics have an order by clause for them.  To enable the function results to be ordered regardless of the input data
Agreed.  I was thinking it was questionable but played with the data changing the  'insert order' and it appeared to work.

Might have just been dumb luck.
If the query runs singly-threaded (i.e. non parallel) then it will probably work most (all?) of the time; but if the query goes parallel then all bets are off.

for example:  subquery returns abcdefg

thread one gets abcd
thread two gets efg

two finishes first, and clob aggregates as:  efgabcd


or, the aggregate sorts on a hash thereby negating the alphabetic sort, regardless of how many threads are involved
Makes sense.  Agreed.
Here's another more sophisticated aggregate that allows you to pass in ordering sequence along with the values


CREATE OR REPLACE TYPE string_order_type AS OBJECT
(
    str VARCHAR2(32767),
    seq INTEGER
);

CREATE OR REPLACE TYPE string_order_tab AS TABLE OF string_order_type;

CREATE OR REPLACE TYPE vcagg_to_clob_sort_type
    AS OBJECT
(
    v_strings string_order_tab,
    STATIC FUNCTION odciaggregateinitialize(sctx IN OUT vcagg_to_clob_sort_type)
        RETURN NUMBER,
    MEMBER FUNCTION odciaggregateiterate(
        self    IN OUT vcagg_to_clob_sort_type,
        p_obj   IN     string_order_type
    )
        RETURN NUMBER,
    MEMBER FUNCTION odciaggregateterminate(
        self          IN     vcagg_to_clob_sort_type,
        returnvalue      OUT CLOB,
        flags         IN     NUMBER
    )
        RETURN NUMBER,
    MEMBER FUNCTION odciaggregatemerge(
        self   IN OUT vcagg_to_clob_sort_type,
        ctx2   IN     vcagg_to_clob_sort_type
    )
        RETURN NUMBER
);
/

CREATE OR REPLACE TYPE BODY vcagg_to_clob_sort_type
IS
    STATIC FUNCTION odciaggregateinitialize(sctx IN OUT vcagg_to_clob_sort_type)
        RETURN NUMBER
    IS
    BEGIN
        sctx := vcagg_to_clob_sort_type(string_order_tab());
        RETURN odciconst.success;
    END;

    MEMBER FUNCTION odciaggregateiterate(
        self    IN OUT vcagg_to_clob_sort_type,
        p_obj   IN     string_order_type
    )
        RETURN NUMBER
    IS
    BEGIN
        self.v_strings.EXTEND;
        self.v_strings(self.v_strings.COUNT) := p_obj;
        RETURN odciconst.success;
    END;

    MEMBER FUNCTION odciaggregateterminate(
        self          IN     vcagg_to_clob_sort_type,
        returnvalue      OUT CLOB,
        flags         IN     NUMBER
    )
        RETURN NUMBER
    IS
        v_result CLOB := '';
    BEGIN
        FOR x IN (SELECT str
                    FROM TABLE(v_strings)
                  ORDER BY seq,str)
        LOOP
            v_result := v_result || x.str;
        END LOOP;

        returnvalue := v_result;
        RETURN odciconst.success;
    END;

    MEMBER FUNCTION odciaggregatemerge(
        self   IN OUT vcagg_to_clob_sort_type,
        ctx2   IN     vcagg_to_clob_sort_type
    )
        RETURN NUMBER
    IS
    BEGIN
        FOR i IN 1 .. ctx2.v_strings.COUNT
        LOOP
            self.v_strings.EXTEND;
            self.v_strings(self.v_strings.COUNT).str := ctx2.v_strings(i).str;
            self.v_strings(self.v_strings.COUNT).seq := ctx2.v_strings(i).seq;
        END LOOP;

        RETURN odciconst.success;
    END;
END;
/

Open in new window


Then to use it, it's similar to what I posted before except you include the ordering criteria

SELECT vt_ser_num,
       clobsortagg(
           CASE WHEN vt_text_type LIKE 'AT%' THEN string_order_type(vt_text, vt_ent_num) END
       )
           column_1,
       clobsortagg(
           CASE WHEN vt_text_type LIKE 'CS%' THEN string_order_type(vt_text, vt_ent_num) END
       )
           column_2,
       clobsortagg(
           CASE WHEN vt_text_type LIKE 'DM%' THEN string_order_type(vt_text, vt_ent_num) END
       )
           column_3,
       clobsortagg(
           CASE WHEN vt_text_type LIKE 'DR%' THEN string_order_type(vt_text, vt_ent_num) END
       )
           column_4
  FROM tab1
GROUP BY vt_ser_num

Open in new window

Thank you both Slightwv and Sdstuber. I am trying to run both of your codes but no luck of returning any results. It has been more than 16 hours this query is still running... I even brought the remote database tables locally and still it is running slow. I hope this would work today.

Again, thanks to both of you. Without your help I won't be even this far.
what does your query look like with the functions added in?
I am running the query that uses the function CLOBAGG. I have 14 columns similar to it but I only added 4 columns as below to run first to see how the results will be. No luck :(


SELECT vt_ser_num,
       column_1,
       column_2,
       column_3,
       column_4
  FROM (SELECT vt_ser_num,
               clobagg(
                   CASE WHEN vt_text_type LIKE 'AT%' THEN vt_text END
               )
               OVER(
                   PARTITION BY vt_ser_num
                   ORDER BY vt_ent_num
                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
               )
                   column_1,
               clobagg(
                   CASE WHEN vt_text_type LIKE 'CS%' THEN vt_text END
               )
               OVER(
                   PARTITION BY vt_ser_num
                   ORDER BY vt_ent_num
                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
               )
                   column_2,
               clobagg(
                   CASE WHEN vt_text_type LIKE 'DM%' THEN vt_text END
               )
               OVER(
                   PARTITION BY vt_ser_num
                   ORDER BY vt_ent_num
                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
               )
                   column_3,
               clobagg(
                   CASE WHEN vt_text_type LIKE 'DR%' THEN vt_text END
               )
               OVER(
                   PARTITION BY vt_ser_num
                   ORDER BY vt_ent_num
                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
               )
                   column_4,
               ROW_NUMBER() OVER(PARTITION BY vt_ser_num ORDER BY ROWNUM) rn
          FROM DQ_VT_TBL)
 WHERE rn = 1

Open in new window

Assuming dq_vt_tbl is the same VT table that listed the counts above:
You are wanting to concatenate all 95 million rows at once?
No, DQ_VT_TBL is limited only for the VT_TEXT_TYPE  like the 14 text types (AT%, DR% etc) I mentioned. Now the total count of this table is 14,637,556. I am calling the function to concatenate the VT_TEXT column for each of these text types.
DQ_VT_TBL is the table I created locally to limit the rows/columns and doesn't have to cross the network.
Trying to concatenate 14 million rows is going to take a while.

Do you really need all 14 million at once?

I cannot imagine an application that needs 14 million rows appearing all at once.
I don't want it to concatenate all 14 million rows at once.  The logic is, for each serial number, there are about 50-100 rows of each TEXT TYPES like AT%, DR%, DM% etc.... I want these 50-100 rows (VT_TEXT) to concatenate for each of these VT_TEXT_TYPE and it should be per serial number (VT_SER_NUM) and ordered by a sequence number (VT_ENT_NUM)

I am attaching the logic document again. I hope I am not confusing :)
Logic.xlsx
>>I don't want it to concatenate all 14 million rows at once.

There is no where clause on the inner select:
...
     column_4,
               ROW_NUMBER() OVER(PARTITION BY vt_ser_num ORDER BY ROWNUM) rn
          FROM DQ_VT_TBL)
...

So, you are running that select against ALL rows in DQ_VT_TBL.
Ok, I thought the CASE statement is looking for the text types that starts with and then it concatenates. Does this code needs to be changed?
the case ensures only text of the single type is concatenated into each string

but you still probably want to apply a filter to only pull the rows you're interested in

SELECT vt_ser_num,
       column_1,
       column_2,
       column_3,
       column_4
  FROM (SELECT vt_ser_num,
               clobagg(
                   CASE WHEN vt_text_type LIKE 'AT%' THEN vt_text END
               )
               OVER(
                   PARTITION BY vt_ser_num
                   ORDER BY vt_ent_num
                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
               )
                   column_1,
               clobagg(
                   CASE WHEN vt_text_type LIKE 'CS%' THEN vt_text END
               )
               OVER(
                   PARTITION BY vt_ser_num
                   ORDER BY vt_ent_num
                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
               )
                   column_2,
               clobagg(
                   CASE WHEN vt_text_type LIKE 'DM%' THEN vt_text END
               )
               OVER(
                   PARTITION BY vt_ser_num
                   ORDER BY vt_ent_num
                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
               )
                   column_3,
               clobagg(
                   CASE WHEN vt_text_type LIKE 'DR%' THEN vt_text END
               )
               OVER(
                   PARTITION BY vt_ser_num
                   ORDER BY vt_ent_num
                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
               )
                   column_4,
               ROW_NUMBER() OVER(PARTITION BY vt_ser_num ORDER BY ROWNUM) rn
          FROM DQ_VT_TBL
WHERE vt_text_type like 'AT%'
or vt_text_type like 'DR%'
or vt_text_type like 'DM%'
or vt_text_type like 'CS%'
)
 WHERE rn = 1
                                          

Open in new window

The case looks at the first two characters of every row it is provided.

The where clause on a select limits the rows returned.

A case statement cannot reduce the number of rows returned by the select.
WHERE vt_text_type like 'AT%'
or vt_text_type like 'DR%'
or vt_text_type like 'DM%'
or vt_text_type like 'CS%'

Open in new window


It needs to go farther than just that.  When they add in ALL the necessary types, it will still probably pull in ALL 14 million rows.

I would expect the WHERE clause to limit based on vt_ser_num, a list of vt_ser_num's or some date range, etc...
Ok, thank you.

Now I am running for one column "column 1" with the filter applied as "AT%". It has been more than 10 minutes, it is still running... see screenshot.

Total records from the select statement is 4,958,809


SELECT vt_ser_num,
       column_1
  FROM (SELECT vt_ser_num,
               clobagg(
                   CASE WHEN vt_text_type LIKE 'AT%' THEN vt_text END
               )
               OVER(
                   PARTITION BY vt_ser_num
                   ORDER BY vt_ent_num
                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
               )
                   column_1,
                    ROW_NUMBER() OVER(PARTITION BY vt_ser_num ORDER BY ROWNUM) rn
          FROM DQ_VT_TBL
WHERE vt_text_type like 'AT%'
)
 WHERE rn = 1
                 

Open in new window

screenshot.PNG
>>>>>I would expect the WHERE clause to limit based on vt_ser_num, a list of vt_ser_num's or some date range, etc...

There is no date range or any other filter I could apply. It would be much easier If we do. This is a data migration project
and is complicated to bring data from different data source and moving it into Oracle database.
If you have 4.9million rows, then you'll need to call the clobagg 4.9million times. Which means doing 4.9 million lob operations as well as 4.9 million context switches

also what is the execution plan of that query? Is it doing 4.9 million index lookups? Those will take a long time, but so will scanning the full table and filtering out only the 4.9 million rows you're really interested in
>>This is a data migration project

So there will eventually be some INSERT associated with this?

Then you'll have the issue of HUGE rollback/undo to deal with.

Create a temp table of ALL vt_ser_num's to migrate.

Create a pl/sql block with the logic like this:

loop:
grab a vt_ser_num from the tmep table
exit when temp table is empty
concatenate ALL the rows for that one entry
put the migrated data where it goes
if everything checks out, delete that vt_ser_num from the temp table
commit every 10000-50000 rows based off a loop counter(depends on what works best for you)
next