Searching through a CLOB with DBMS_LOB

I want to search NOTES column on ARREST_RECORDS from start of the string (ex: 'Notes Type:  created by:' or 'Source: LA Records' or Source: Calabassas Records, etc) and grab the value from start point till next open brace ‘[‘( bolded).

This is what I have so far, I am using a combo of SUBSTR and INSTR and I have the offset but my issue is determining the amount  until the open brace because it will vary based on column.

SELECT DBMS_LOB.SUBSTR (
          NOTES,
          ??,
            DBMS_LOB.INSTR (NOTES,
                            'Notes Type:  created by:')
          + LENGTH ('Notes Type:  created by:'))
          note_case
  FROM ARREST_RECORDS
WHERE note_type = 'Master Note';


Table:ARREST_RECORDS
Lob column: NOTES

Lob Contents


--------------------------------------------------------------[ Notes Type:  created by: Johnny American on 1/8/2015 17:15:21]--------------------------------------------------------------

What problem are you experiencing?

                  --------------------------------------------------------------[Source: LA Records created by: Kim Kardashian on 9/26/2014 7:0:50]--------------------------------------------------------------

Arrests:

Being vapid in California, on or about January

                  --------------------------------------------------------------[Source: Calabasas Records created by: JOSHUA.MARTIN on 9/26/2014 6:59:34]--------------------------------------------------------------

Riverside County Court
DBAnewbie77Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

slightwv (䄆 Netminder) Commented:
Given that input, what are the expected results?
0
DBAnewbie77Author Commented:
SELECT DBMS_LOB.SUBSTR (
          NOTES,
          ??,
            DBMS_LOB.INSTR (NOTES,
                            'Notes Type:  created by:')
          + LENGTH ('Notes Type:  created by:'))
          note_case
  FROM ARREST_RECORDS
WHERE note_type = 'Master Note';

Should return

 Notes Type:  created by Johnny American on 1/8/2015 17:15:21]--------------------------------------------------------------

What problem are you experiencing?

                  --------------------------------------------------------------
0
slightwv (䄆 Netminder) Commented:
Is this just the first step in a further requirement?

Seems odd you want to remove the opening '[' from the string and return everything until the next '['.
0
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

DBAnewbie77Author Commented:
Yes, it the first step..I have the remaining steps worked out
0
slightwv (䄆 Netminder) Commented:
See if this works for the original question asked.  I've included my table and data used for testing.

/*

drop table tab1 purge;
create table tab1(col1 clob);

insert into tab1 values('--------------------------------------------------------------[ Notes Type:  created by: Johnny American on 1/8/2015 17:15:21]--------------------------------------------------------------' || chr(13)||chr(10) ||
'' || chr(13)||chr(10) ||
'What problem are you experiencing?' || chr(13)||chr(10) ||
'' || chr(13)||chr(10) ||
                   '--------------------------------------------------------------[Source: LA Records created by: Kim Kardashian on 9/26/2014 7:0:50]--------------------------------------------------------------' || chr(13)||chr(10) ||
'' || chr(13)||chr(10) ||
 'Arrests:' || chr(13)||chr(10) ||
'' || chr(13)||chr(10) ||
 'Being vapid in California, on or about January ' || chr(13)||chr(10) ||
'' || chr(13)||chr(10) ||
                   '--------------------------------------------------------------[Source: Calabasas Records created by: JOSHUA.MARTIN on 9/26/2014 6:59:34]--------------------------------------------------------------' || chr(13)||chr(10) ||
'' || chr(13)||chr(10) ||
 'Riverside County Court ' || chr(13)||chr(10)
);
commit;
*/

select substr(col1,
	instr(col1, '[ ')+2,
	instr(col1,'[',1,2)-instr(col1, '[ ')-2
	) mystring
from tab1
/

Open in new window

0
slightwv (䄆 Netminder) Commented:
>>Yes, it the first step..I have the remaining steps worked out

Please tell us what they are.  The final solution might be really simple and what you get now might make it complex.

For example:  My guess is you are after the headers themselves?

If so, using the same table above try this:
with mydata as (
select regexp_substr(col1,'\[.+\]',1,level) mystring
from tab1
connect by level <= length(col1)-length(replace(col1,']'))
)
select rtrim(substr(mystring,instr(mystring,':',1,2)+2),']')
from mydata
/

Open in new window


That returns 3 rows:
Johnny American on 1/8/2015 17:15:21
Kim Kardashian on 9/26/2014 7:0:50
JOSHUA.MARTIN on 9/26/2014 6:59:34
0
DBAnewbie77Author Commented:
Thanks Slightwv that works for the first set of data, but I need it to be reusable for the next set of data starting after the [ so that's why I wasnt using the [ but the beginning text after [.  So ultimately, I need to run three queries against this column. I need the header and all the data after the header until the next header

SELECT DBMS_LOB.SUBSTR (
          NOTES,
          ???,
                            'Notes Type:'),
            DBMS_LOB.INSTR (NOTES,
                            'Notes Type: ')
          + LENGTH ('Notes Type: '))
          note_case
  FROM ARREST_RECORDS
WHERE note_type = 'Master Note';

should return

Notes Type:  created by Johnny American on 1/8/2015 17:15:21]--------------------------------------------------------------

What problem are you experiencing?

                  --------------------------------------------------------------



SELECT DBMS_LOB.SUBSTR (
          NOTES,
          ???,
                            'Source: LA Records:'),
            DBMS_LOB.INSTR (NOTES,
                            'Notes Type: ')
          + LENGTH ('Source: LA Records: '))
          note_case
  FROM ARREST_RECORDS
WHERE note_type = 'Master Note';

should return

Source: LA Records created by: Kim Kardashian on 9/26/2014 7:0:50]--------------------------------------------------------------

Arrests:

Being vapid in California, on or about January

                  -------------------------------------------------------------

and so on and so forth...
0
slightwv (䄆 Netminder) Commented:
What is the final end result?

I doubt it is just the 3 entire blocks of text.

I'll work on getting you that but once you get it and need the 'next' piece of the puzzle, the SQL could completely change.

I would rather skip to the final solution and not keep rewriting SQL from step to step because the previous SQL doesn't work well for the next step.
0
slightwv (䄆 Netminder) Commented:
Given the last set of requirements and my sample table setup above, try this:
select ltrim(regexp_substr(col1,'\[[^\[]+',1,level),'[ ')
from tab1
connect by level <= length(col1)-length(replace(col1,']'))
/

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
DBAnewbie77Author Commented:
The end result is to grab each block of data and its pk and insert however many rows it returns into a temp table. in I tried your code against one row in a production table and finally stopped it after 8 mins..it wouldnt return, but running it against your table worked..
0
slightwv (䄆 Netminder) Commented:
How large is the LOB that you tried it on?

You might need to write a stored procedure to handle this for you.
0
DBAnewbie77Author Commented:
It was 975 characters, but only two headers.

For the stored procedure I am thinking

select rowid and lob from records into a cursor  
open cursor
loop through cursor
use your select statement to insert into temp table
end

Does this seem to be on the right track?
0
slightwv (䄆 Netminder) Commented:
>>It was 975 characters, but only two headers.

This should not have caused the SQL to run 'forever'.

Check the data for anything weird that might throw a wrench into it.

If you spool the LOB out to a text file, redact the important data (keeping the number of characters the same) and place it into my test case, does it run fast or 'hang'?

>>For the stored procedure I am thinking

I don't see where this would be really any different than straight SQL.

If my select breaks for one row, it would break in a cursor loop.
0
DBAnewbie77Author Commented:
I got it to work using your sql. I had to use a where clause which it didnt like, so I created a temp table with just the rows I needed and ran the sql against that and it worked perfectly.
0
DBAnewbie77Author Commented:
Slightwv was patient with me and i appreciate that! He made me look like a genius to my boss! Thanks!
0
slightwv (䄆 Netminder) Commented:
Glad it is working and always glad to help!
0
slightwv (䄆 Netminder) Commented:
>>so I created a temp table with just the rows I needed

I should also point out that you shouldn't need the temp table...

We could work on the join syntax if you want.

Worst case, CTE should work:

with mydata as (
select clob_column from master_table where something=something_else
)
select ltrim(regexp_substr(clob_column,'\[[^\[]+',1,level),'[ ')
from mydata
connect by level <= length(clob_column)-length(replace(clob_column,']'))
/
0
DBAnewbie77Author Commented:
@slightvw - i'm back. Running into an issue of the sql returning the same blurb more than once.. I created a test table with 3 rows. One row had  a clob that had 19 headers, one has 9, and one has 3 so I was expecting only 31 rows to return but instead it returned 20,721!  Not sure what happened...

select row_id, ltrim(regexp_substr(col1,'\[[^\[]+',1,level),'[ ')
from tab1
connect by level <= length(col1)-length(replace(col1,']'))
/
0
slightwv (䄆 Netminder) Commented:
Ah yes, I keep forgetting about when connect by gets the values.


Try this:
with mydata as (
select ltrim(regexp_substr(col1,'\[[^\[]+',1,column_value),'[ ') mystring
from tab1,
 table(
  cast(
  multiset(select level from dual connect by level <= length(col1)-length(replace(col1,']')))
  as sys.odcinumberlist
  )
  )
)
select rtrim(substr(mystring,instr(mystring,':',1,2)+2),']')
from mydata
/

Open in new window

0
DBAnewbie77Author Commented:
It works but its cutting off the header name for instance it is returning

Kim Kardashian on 0/26/2014 19:50:13]--------------------------------------------------------------

testing

                  --------------------------------------------------------------

instead of

Source: LA Arrest Records created by: Kim Kardashian on 0/26/2014 19:50:13]--------------------------------------------------------------

testing

                  --------------------------------------------------------------
0
slightwv (䄆 Netminder) Commented:
Simple fix.

Replace:
select rtrim(substr(mystring,instr(mystring,':',1,2)+2),']')


With
select mystring
0
DBAnewbie77Author Commented:
perfect!!!! thank you!
0
slightwv (䄆 Netminder) Commented:
Glad to help.
0
DBAnewbie77Author Commented:
@slightwv - Back again!

I am running into issues where I will have to parse again what I have already parsed. For instance after parsing this returns as one row

Source: LA Records created by: Kim Kardashian on 9/26/2014 7:0:50]--------------------------------------------------------------

Arrests:

Being vapid in California, on or about January

Source: Calabasas Records

Being Vapid in Calabassas County

Source: San Francisco Records

Being especially Vapid in Northern California

                  -------------------------------------------------------------

but I need to break that up more to get 3 separate lines, so im guessing I have to parse on the word "Source", so I tried updating the above code to use that but i cant get it to work.
0
slightwv (䄆 Netminder) Commented:
>> so im guessing I have to parse on the word "Source",

Is that not sort of what I guessed you were after many posts ago?

http:#a40701631

If not, a new requirement needs to be a new question.

In the new question please provide sample data and expected results.
0
DBAnewbie77Author Commented:
I can't parse on Source for all the records because they all dont have the word Source in the header. For instance, there are headers named Arrest Note, Source, Arrest Comments..etc..It varies..The only thing they have in common is the ], which is what we used and worked but within Arrest Note header, there are other Sources and those need to be parsed out.  let me know if I need to start another question and I will..

Arrest Note: LA Records created by: Kim Kardashian on 9/26/2014 7:0:50]--------------------------------------------------------------

Arrests:

Being vapid in California, on or about January

Source: Calabasas Records

Being Vapid in Calabassas County

Source: San Francisco Records

Being especially Vapid in Northern California

                  -------------------------------------------------------------
0
slightwv (䄆 Netminder) Commented:
>>I can't parse on Source for all the records because they all dont have the word Source in the header.

I don't see where I parsed on "source" in the code I referenced.  I looked for everything between the [] and after the ':'.


>>let me know if I need to start another question and I will..

I think this is a big enough requirement change that it should be a new question.

You might post the link to this one in the new one so the experts know the background.

Make sure you post sample data and expected results.

Very similar to the create table and insert statements I posted in http:#40701628
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.