We help IT Professionals succeed at work.
Troubleshooting Question

I have a sql function which I would like to see if I can produce the same results in an oracle query without using a function

44 Views
Last Modified: 2020-11-13
I have this function in sql where it can determine the format of a string and would like to be able to use something similar in oracle without using a function if possible.

USE [db89]
GO
/****** Object:  UserDefinedFunction [dbo].[get_MN_Format1]    Script Date: 11/11/2020 2:25:35 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER FUNCTION [dbo].[get_MN_Format1]( 
        @pStr   nvarchar(4000) 
) RETURNS nvarchar(4000) 
AS 
BEGIN 
  
DECLARE   @pos int, @sFormat  nvarchar(4000) , @candidate nvarchar(4000)
SET @pos = 1 ---Init variable to enter the loop 
SET @sFormat = '' --Init return variable  
  
WHILE @pos>0 
BEGIN  
        SET @pos = PATINDEX ( '%[^0-9a-Z]%', @pStr )---Any character not number 0-9 or letter a-Z 
        SET @candidate = CASE WHEN @pos<=0
         THEN isnull(nullif(CAST(LEN(@pStr) AS NVARCHAR(4)),'0'),'')
         ELSE isnull(nullif(CAST(@pos-1 AS NVARCHAR(4)),'0'),'')+SUBSTRING(@pStr, @pos, 1) END
        SET @sFormat = @sFormat+ @candidate
        SET @pStr= SUBSTRING(@pStr  , @pos+1 , LEN(@pStr)) ---cut to get the next string until no more format char 
END --END LOOP 
---optional not to shows the zeros -replace them with empty, I like it better 
        RETURN @sFormat; 

END
Comment
Watch Question

HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
what is this function doing
any sample input - output make it clear

Author

Commented:
It is basically telling me the format of the string.

example:

V5G70-60008      5-5
Z3M95UC#ABA   7#3
C3Y70EC#ABJ   7#3
574121-001         6-3
L21263-AB1         6-3
L43065-DR1         6-3
2RN40AA            7
716395-002         6-3
H6A42EP#ACH   7#3
758083-161         6-3
L77030-00A         6-3
HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
look at this one

select REGEXP_SUBSTR(code,'^([a-z|A-Z|0-9])+',1,1), 
       REGEXP_SUBSTR(code,'([^a-z|^A-Z|^0-9])'), 
       REGEXP_SUBSTR(code,'([a-z|A-Z|0-9])+',1,2)
from myTable;

CERTIFIED EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
Using Hain's start, you can use LENGTH to get the lengths and do it without a PL/SQL block.

Question before I try to put it all together, is there ALWAYS only one special non-alphanum?

Can you have ABC#DEF-GHI or ABC$DEF^GHI*KJL*MNO?


What version of Oracle are you using?

Author

Commented:
No, it could be all numeric or a combination of alpha-numerics with a _,-,# or other special character.

What I am looking for is a select statement that for the following field, show what the format is:
Objek1                        Format
V5G70-60008             5-5
Z3M95UC#ABA          7#3
C3Y70EC#ABJ           7#3
3343389000               10
V34Y98-001                6-3
CERTIFIED EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
As I said, just add the length to what Hain posted:
create table tab1(col1 varchar2(20));
insert into tab1 values('V5G70-60008');
insert into tab1 values('Z3M95UC#ABA');
insert into tab1 values('C3Y70EC#ABJ');
insert into tab1 values('3343389000');
insert into tab1 values('V34Y98-001');

select length(REGEXP_SUBSTR(col1,'^([a-z|A-Z|0-9])+',1,1)) ||
       REGEXP_SUBSTR(col1,'([^a-z|^A-Z|^0-9])') ||
       length(REGEXP_SUBSTR(col1,'([a-z|A-Z|0-9])+',1,2))
from tab1;

Open in new window



My output:
5-5
7#3
7#3
10
6-3

Open in new window


Please answer my questions about all possible formats?
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
Same idea as above, but eliminating the subexpressions which add extra overhead.
For the small data sets in the examples above the extra cost won't be noticed but if you run this on thousands or millions of rows the cost will add up.  Adding cosmetic parentheses is most sql or functions is generally not a performance problem as they will usually be parsed out.  For regular expressions though, the extra parentheses forces recursive subexpression parsing - that is, a regular expression within a regular expression.  Sometimes it's necessary to use them - but not in this case.


select length(regexp_substr(objek1,'^[a-z0-9]+',1,1,'i')) ||
       regexp_substr(objek1,'[^a-z0-9]+',1,1,'i')||
       length(regexp_substr(objek1,'[a-z0-9]+',1,2,'i')) from your_table;

or, if you don't want to use case-insensitive searches

select length(regexp_substr(objek1,'^[a-zA-Z0-9]+',1,1)) ||
       regexp_substr(objek1,'[^a-zA-Z0-9]+',1,1')||
       length(regexp_substr(objek1,'[a-zA-Z0-9]+',1,2)) from your_table
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
Despite looking more complex, this is a lot more efficient (about 1/4 the run time on my machine) than what I posted above above because it only uses a single regexp_substr call, again, with no subexpressions. And it allows for short-cutting - once the delimiter is found, the regexp doesn't need to continue parsing the remainder of the string. The call is deterministic, so Oracle can reuse the results instead of reinvoking it.

This does make the assumption though that the delimiter will only ever be 1 character (or none at all) and that 1 character will be the only non-alphanumeric character in the string.

SELECT objek1,
       CASE
           WHEN REGEXP_SUBSTR(objek1, '[^a-zA-Z0-9]') IS NOT NULL
           THEN
                  (INSTR(objek1, REGEXP_SUBSTR(objek1, '[^a-zA-Z0-9]')) - 1)
               || REGEXP_SUBSTR(objek1, '[^a-zA-Z0-9]')
               || (LENGTH(objek1) - INSTR(objek1, REGEXP_SUBSTR(objek1, '[^a-zA-Z0-9]')))
           ELSE
               TO_CHAR(LENGTH(objek1))
       END format
  FROM your_table
HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
I dont get it why we are trying to delve into this too much?

select REGEXP_SUBSTR(str,'^([a-z|A-Z|0-9])+',1,1), 
       REGEXP_SUBSTR(str,'([^a-z|^A-Z|^0-9])'), 
       REGEXP_SUBSTR(str,'([a-z|A-Z|0-9])+',1,2),

       LENGTH(REGEXP_SUBSTR(str,'^([a-z|A-Z|0-9])+',1,1)) || 
       REGEXP_SUBSTR(str,'([^a-z|^A-Z|^0-9])') || 
       LENGTH(REGEXP_SUBSTR(str,'([a-z|A-Z|0-9])+',1,2))
 from myTable;
this looks simple enough to me,
and no need to optimize anything,
unless we have millions of records and a couple of second is really important here...


HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
by the way, if regex is too expensive (or not available), we can use instr/replace/translate...
not sure about performance...
we can use this logic in a cte query or create a function

s:=translate(str,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890',
                 '???????????????????????????????????????????????????????????????');
fx:=nvl(instr(s,replace(s,'?','')),0);
dbms_output.put_line('fx='||fx);
if (fx=0) then 
  f:=length(str);
else 
  f:=(fx-1)||substr(s,fx,1)||(length(s)-fx);
end if;

CERTIFIED EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
If they don't want a function, they probably don't want a PL/SQL block either.
CERTIFIED EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
Just for fun, recursive CTE.

Assumptions:  
1:  Delimiter can be more than one character.
2:  There can be more than just one set of delimiters like 'a@#bb$%^ccc^&*dddd'

with getStrings(col1, col1_alpha, col1_non_alpha, counter) as (
	select
		col1,
		regexp_substr(col1,'[a-zA-Z0-9]+',1,1),
		regexp_substr(col1,'[^a-zA-Z0-9]+',1,1),
		2 counter
	from tab1
	union all
	select col1,
		regexp_substr(col1,'[a-zA-Z0-9]+',1,counter),
		regexp_substr(col1,'[^a-zA-Z0-9]+',1,counter),
		counter+1
	from getStrings
	where col1_alpha is not null
)
select col1, listagg(length(col1_alpha) || col1_non_alpha, null) within group (order by counter) result
from getStrings
group by col1
order by col1
;

Open in new window

Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:

>>>and no need to optimize anything,
>>>unless we have millions of records and a couple of second is really important here... 

The cost of subexpressions is not trivial.  The difference between using them and not using them in the examples above was a factor of about 1.5x in my testing.  

I don't know about your systems but I regularly work with tables containing millions and billions of rows.  These costs add up.

It's possible the asker has a low enough volume and/or a wide enough time window that the difference between efficient and inefficient versions won't matter - but optimizing isn't just a matter of fixing a single query for a single task.  It's also about establishing example so the efficiencies can be replicated in other queries for other tasks as well.

Author

Commented:
Thanks for all of the responses, I really appreciate it.  Just a FYI, my recordset has over 15million records.

Author

Commented:
These are all of the formats that I have via the ms sql function.....

formats.txt
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
so you do have multiple substrings?  Not just a-b  but a-b-c-d and a-b-c-d-e-f, etc?  

That makes a big difference.  slightwv's post should work.  It's expensive but it's the only one of the above that will handle more than one (not that they couldnt' be expanded)

Is there an upper limit on the number of substrings to count within a single value?


Can you provide more sample data to process to match some of these examples?

Author

Commented:
There is no upper limit.

If you need more samples, please let me know.formats.txt
CERTIFIED EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
Try the SQL I posted in #a43192864

Author

Commented:
I am receiving this error when using #a43192864


ORA-32044: cycle detected while executing recursive WITH query
CERTIFIED EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
That is normally when it things it has an infinite loop in the recursion.

Can you reproduce that based on any of the sample data you provided?

If not, there is likely some weird edge case in your data.  If you are unable to find it, I'll try to get time later, I'll see if I can think of what it is.

Author

Commented:
I created some sample tables (fetching 100, 1000 and 5000 rows).  It worked fine until I got to the 5000 so I am assuming there is something in the data after the 1000 records causing it but I am not sure how to find the actual culprit.
CERTIFIED EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
Found it.  From the first results you posted I see:  "3-3.2""-3"

That means you should have data like:  "aaa-aaa.aa""-aaa"

When I test against that, I get the infinite loop.  Can you confirm the data and results are accurate?

Author

Commented:
Yes, I have formats like the following that might cause issues.

"12 "
"15 "
"1-6"
"3-3.2""-3"
"5-5"
1
1 3-3-4
1 5-5 1
1.2+2
1.4+3
1_2_2_4_4


If you look further up in the history of this request, I attached all of the formats that we have in our database table.
HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
can you upload an excel with 2 columns (hundred rows maybe)

data + format

so, anyone who has time and patience can play with that sample data :)
HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
also, what is max length of your data?
is there any limit in the format, maybe max 5 parts?
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
What about an embedded function?  That is, you don't need to create a function in the database, but you can declare one as part of your sql statement.

WITH
    FUNCTION fmt(p_string IN VARCHAR2)
        RETURN VARCHAR2
    IS
        v_length   PLS_INTEGER := LENGTH(p_string);
        v_start    PLS_INTEGER := 1;
        v_index    PLS_INTEGER;
        v_result   VARCHAR2(4000);
    BEGIN
        WHILE (v_start <= v_length)
        LOOP
            v_index := REGEXP_INSTR(p_string, '[^a-zA-Z0-9]', v_start);

            IF v_index = 0
            THEN
                v_result := v_result || LENGTH(SUBSTR(p_string, v_start));
                v_start := v_length + 1;
            ELSE
                v_result :=
                    v_result || LENGTH(SUBSTR(p_string, v_start, v_index - v_start)) || SUBSTR(p_string, v_index, 1);
                v_start := v_index + 1;
            END IF;
        END LOOP;

        RETURN v_result;
    END;
SELECT objek1, fmt(objek1)
  FROM your_table;

of course, you can create the fmt function too  and then you don't need to embed it

Author

Commented:
The max length of the lookup field is 40 characters, there is no max on the format.

Here is the attachment with sample data.ee_1.xlsx
HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
I checked max is 5, this format
 
C_VE_CMP_XXXX_WW011_2_3_4_49

Sr. System Analyst
CERTIFIED EXPERT
Commented:
This problem has been solved!
(Unlock this solution with a 7-day Free Trial)
UNLOCK SOLUTION
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
I see an answer has been accepted  -
Even if the performance is adequate with the noted deficiencies -  please note it has an ERROR in it.

Remove the "|" characters from the [] lists to correct the logic error
You should also remove the extraneous () to correct the most significant performance issues

and, of course, if your real data exceeds the sample data, you'll have to modify the query to accomodate

Author

Commented:
Thanks Sean, I really appreciate it!!
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
you're welcome - but note, you did not accept one of my answers.

not that you have to, you can use what you want, just make sure it actually does what you need
HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
so here is the cleaned-up code
select code,
       LENGTH(REGEXP_SUBSTR(code,'[a-zA-Z0-9]+',1,1)) || 
       REGEXP_SUBSTR(code,'[^a-z^A-Z^0-9]+',1,1) || 
       LENGTH(REGEXP_SUBSTR(code,'[a-zA-Z0-9]+',1,2)) || 
       REGEXP_SUBSTR(code,'[^a-z^A-Z^0-9]+',1,2) || 
       LENGTH(REGEXP_SUBSTR(code,'[a-zA-Z0-9]+',1,3)) || 
       REGEXP_SUBSTR(code,'[^a-z^A-Z^0-9]+',1,3) || 
       LENGTH(REGEXP_SUBSTR(code,'[a-zA-Z0-9]+',1,4)) || 
       REGEXP_SUBSTR(code,'[^a-z^A-Z^0-9]+',1,4) || 
       LENGTH(REGEXP_SUBSTR(code,'[a-zA-Z0-9]+',1,5)) pattern
 from codes;
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
the updated regular expressions are still wrong - sorry  I didn't list every syntax flaw.  The extra negations aren't correct either

the multiple regexp calls are also many times slower than either of the embedded function methods - as well as having an artificial cap on substring parsing.
HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
ok, here again :)

select code,
       LENGTH(REGEXP_SUBSTR(code,'[a-zA-Z0-9]+',1,1)) || 
       REGEXP_SUBSTR(code,'[^a-zA-Z0-9]+',1,1) || 
       LENGTH(REGEXP_SUBSTR(code,'[a-zA-Z0-9]+',1,2)) || 
       REGEXP_SUBSTR(code,'[^a-zA-Z0-9]+',1,2) || 
       LENGTH(REGEXP_SUBSTR(code,'[a-zA-Z0-9]+',1,3)) || 
       REGEXP_SUBSTR(code,'[^a-zA-Z0-9]+',1,3) || 
       LENGTH(REGEXP_SUBSTR(code,'[a-zA-Z0-9]+',1,4)) || 
       REGEXP_SUBSTR(code,'[^a-zA-Z0-9]+',1,4) || 
       LENGTH(REGEXP_SUBSTR(code,'[a-zA-Z0-9]+',1,5)) pattern
 from codes;
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
yes - that looks correct - except for the artificial limit of 5 subfields

HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
except for the artificial limit of 5 subfields

based on actual data, 406K rows... max is 5 parts 4 delimiter

Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
yes I know, but as the asker stated there is no upper limit.  So, arbitrarily assigning one based on a sample of less than 3% of the stated volume seems dubious

The pattern is simple enough, I have to assume the asker can/will extend when they hit such data.

The bigger worry though is that it won't generate an error, it will simply spit out truncated/wrong results and people might not notice right away.
CERTIFIED EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
@Hain,

They posted there are 15 million rows with NO upper limit.  Only delimiter is 40 characters.

They can easily have "_a_a_a_a_a_a_a_a_a_a_a_a_a_a_a_a_a_a_a_a"


The function Sean posted is better than anything I could do with a recursive query which is why I never came back.

Hard-coding a limit using your SQL is probably the worst solution of the three proposed.
HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
The function Sean posted is better than anything I could do with a recursive query which is why I never came back.
do you mean this?

with function fmt(...)
...;
SELECT code, fmt(code) pattern
  FROM codes;
i never used this function...
our oracle versions are all 9-11 something...
this is added later, with 12+ I guess...

whats the point of inline function?
we can just add that function to database and call it anywhere...

is this faster this way? I dont think so...

only reason may be, we do not have permission to create a function...

am I wrong?
CERTIFIED EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
>>only reason may be, we do not have permission to create a function...

I don't recall anyone stating they cannot create a function.

oracle without using a function if possible.

Likely asked because of the permissions/approvals necessary to have one created.  Using an inline function doesn't require any permissions but does require at least 12c.

>>is this faster this way? I dont think so...

I'll trust Sean's performance testing.  Feel free to post results of yours say with 200,000 or 400,000 rows?
HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
I'll trust Sean's performance testing.  Feel free to post results of yours say with 200,000 or 400,000 rows?
I meant regular function vs inline function...
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
yes, the inline functions are faster than compiled/regular functions because they don't have context switching.
the UDF pragma for compiled/regular functions can address this too.

But - even with a context switch, the compiled function is still many times faster than using multiple regular expressions.

I suggested it because the asker asked to not use a function so I assumed they didn't have permission to create one, so this is not only faster it also doesn't require special privileges because no function is ever created.  
Yes, the syntax is similar but it is not DDL.  Also it wasn't marginally faster, it was multiple times faster.

Here is my test harness.  
Run twice - first with a single pass of each through a table filled with all of the sample data given above.
Then again with 10 loops of each, this should let each take advantage of any caching (other than result_cache which neither are using.)

Both examples were run from my pc, but the database is an oracle free tier pdb.

SQL> set serveroutput on
SQL> DECLARE
  2      c_loop_count   CONSTANT INTEGER := 1;
  3      v_time1                 TIMESTAMP;
  4      v_time2                 TIMESTAMP;
  5      v_time3                 TIMESTAMP;
  6      v_diff1                 NUMBER;
  7      v_diff2                 NUMBER;
  8
  9      v_dummy                 NUMBER;
 10  BEGIN
 11      v_time1 := SYSTIMESTAMP;
 12
 13      FOR n IN 1 .. c_loop_count
 14      LOOP
 15          FOR x IN (SELECT objek1,
 16                           LENGTH(REGEXP_SUBSTR(objek1,'[a-zA-Z0-9]+',1,1)) ||
 17         REGEXP_SUBSTR(objek1,'[^a-zA-Z0-9]+',1,1) ||
 18         LENGTH(REGEXP_SUBSTR(objek1,'[a-zA-Z0-9]+',1,2)) ||
 19         REGEXP_SUBSTR(objek1,'[^a-zA-Z0-9]+',1,2) ||
 20         LENGTH(REGEXP_SUBSTR(objek1,'[a-zA-Z0-9]+',1,3)) ||
 21         REGEXP_SUBSTR(objek1,'[^a-zA-Z0-9]+',1,3) ||
 22         LENGTH(REGEXP_SUBSTR(objek1,'[a-zA-Z0-9]+',1,4)) ||
 23         REGEXP_SUBSTR(objek1,'[^a-zA-Z0-9]+',1,4) ||
 24         LENGTH(REGEXP_SUBSTR(objek1,'[a-zA-Z0-9]+',1,5))
 25                      FROM your_table)
 26          LOOP
 27              NULL;
 28          END LOOP;
 29      END LOOP;
 30
 31      v_time2 := SYSTIMESTAMP;
 32
 33      FOR n IN 1 .. c_loop_count
 34      LOOP
 35          FOR x IN (SELECT objek1, fmt(objek1) FROM your_table)
 36          LOOP
 37              NULL;
 38          END LOOP;
 39      END LOOP;
 40
 41      v_time3 := SYSTIMESTAMP;
 42
 43      v_diff1 :=
 44            EXTRACT(SECOND FROM (v_time2 - v_time1))
 45          + EXTRACT(MINUTE FROM (v_time2 - v_time1)) * 60
 46          + EXTRACT(HOUR FROM (v_time2 - v_time1)) * 3600
 47          + EXTRACT(DAY FROM (v_time2 - v_time1)) * 86400;
 48
 49      v_diff2 :=
 50            EXTRACT(SECOND FROM (v_time3 - v_time2))
 51          + EXTRACT(MINUTE FROM (v_time3 - v_time2)) * 60
 52          + EXTRACT(HOUR FROM (v_time3 - v_time2)) * 3600
 53          + EXTRACT(DAY FROM (v_time3 - v_time2)) * 86400;
 54
 55      DBMS_OUTPUT.put_line(RPAD('-', 80, '-'));
 56      DBMS_OUTPUT.put_line('Test1: ' || v_time1 || '   ' || v_time2 || '   ' || v_diff1);
 57      DBMS_OUTPUT.put_line('Test2: ' || v_time2 || '   ' || v_time3 || '   ' || v_diff2);
 58      DBMS_OUTPUT.put_line('Test1 took: ' || v_diff1 / v_diff2 || ' times as long as Test2');
 59      DBMS_OUTPUT.put_line(RPAD('-', 80, '-'));
 60  END;
 61  /
--------------------------------------------------------------------------------
Test1: 12-NOV-20 11.34.30.756434 PM   12-NOV-20 11.34.46.693211 PM   15.936777
Test2: 12-NOV-20 11.34.46.693211 PM   12-NOV-20 11.34.48.939193 PM   2.245982
Test1 took: 7.09568331357953892773851259716239934247 times as long as Test2
--------------------------------------------------------------------------------

PL/SQL procedure successfully completed.

SQL> DECLARE
  2      c_loop_count   CONSTANT INTEGER := 10;
  3      v_time1                 TIMESTAMP;
  4      v_time2                 TIMESTAMP;
  5      v_time3                 TIMESTAMP;
  6      v_diff1                 NUMBER;
  7      v_diff2                 NUMBER;
  8
  9      v_dummy                 NUMBER;
 10  BEGIN
 11      v_time1 := SYSTIMESTAMP;
 12
 13      FOR n IN 1 .. c_loop_count
 14      LOOP
 15          FOR x IN (SELECT objek1,
 16                           LENGTH(REGEXP_SUBSTR(objek1,'[a-zA-Z0-9]+',1,1)) ||
 17         REGEXP_SUBSTR(objek1,'[^a-zA-Z0-9]+',1,1) ||
 18         LENGTH(REGEXP_SUBSTR(objek1,'[a-zA-Z0-9]+',1,2)) ||
 19         REGEXP_SUBSTR(objek1,'[^a-zA-Z0-9]+',1,2) ||
 20         LENGTH(REGEXP_SUBSTR(objek1,'[a-zA-Z0-9]+',1,3)) ||
 21         REGEXP_SUBSTR(objek1,'[^a-zA-Z0-9]+',1,3) ||
 22         LENGTH(REGEXP_SUBSTR(objek1,'[a-zA-Z0-9]+',1,4)) ||
 23         REGEXP_SUBSTR(objek1,'[^a-zA-Z0-9]+',1,4) ||
 24         LENGTH(REGEXP_SUBSTR(objek1,'[a-zA-Z0-9]+',1,5))
 25                      FROM your_table)
 26          LOOP
 27              NULL;
 28          END LOOP;
 29      END LOOP;
 30
 31      v_time2 := SYSTIMESTAMP;
 32
 33      FOR n IN 1 .. c_loop_count
 34      LOOP
 35          FOR x IN (SELECT objek1, fmt(objek1) FROM your_table)
 36          LOOP
 37              NULL;
 38          END LOOP;
 39      END LOOP;
 40
 41      v_time3 := SYSTIMESTAMP;
 42
 43      v_diff1 :=
 44            EXTRACT(SECOND FROM (v_time2 - v_time1))
 45          + EXTRACT(MINUTE FROM (v_time2 - v_time1)) * 60
 46          + EXTRACT(HOUR FROM (v_time2 - v_time1)) * 3600
 47          + EXTRACT(DAY FROM (v_time2 - v_time1)) * 86400;
 48
 49      v_diff2 :=
 50            EXTRACT(SECOND FROM (v_time3 - v_time2))
 51          + EXTRACT(MINUTE FROM (v_time3 - v_time2)) * 60
 52          + EXTRACT(HOUR FROM (v_time3 - v_time2)) * 3600
 53          + EXTRACT(DAY FROM (v_time3 - v_time2)) * 86400;
 54
 55      DBMS_OUTPUT.put_line(RPAD('-', 80, '-'));
 56      DBMS_OUTPUT.put_line('Test1: ' || v_time1 || '   ' || v_time2 || '   ' || v_diff1);
 57      DBMS_OUTPUT.put_line('Test2: ' || v_time2 || '   ' || v_time3 || '   ' || v_diff2);
 58      DBMS_OUTPUT.put_line('Test1 took: ' || v_diff1 / v_diff2 || ' times as long as Test2');
 59      DBMS_OUTPUT.put_line(RPAD('-', 80, '-'));
 60  END;
 61  /
--------------------------------------------------------------------------------
Test1: 12-NOV-20 11.35.08.407132 PM   12-NOV-20 11.37.48.026742 PM   159.61961
Test2: 12-NOV-20 11.37.48.026742 PM   12-NOV-20 11.38.09.991098 PM   21.964356
Test1 took: 7.26721102134749591565534632565598554312 times as long as Test2
--------------------------------------------------------------------------------

PL/SQL procedure successfully completed.

SQL> select count(*) from your_table;

  COUNT(*)
----------
    426791



The "fmt" function used in the test is a compiled version of the last one I posted above.

CREATE OR REPLACE FUNCTION fmt(p_string IN VARCHAR2)
    RETURN VARCHAR2
IS
    v_cnt      PLS_INTEGER := 0;
    v_result   VARCHAR2(4000);
BEGIN
    FOR i IN 1 .. LENGTH(p_string)
    LOOP
        CASE
            WHEN INSTR('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890', SUBSTR(p_string, i, 1)) > 0
            THEN
                v_cnt := v_cnt + 1;
            WHEN v_cnt > 0
            THEN
                v_result := v_result || TO_CHAR(v_cnt) || SUBSTR(p_string, i, 1);
                v_cnt := 0;
            ELSE
                v_result := v_result || SUBSTR(p_string, i, 1);
                v_cnt := 0;
        END CASE;
    END LOOP;

    IF v_cnt > 0
    THEN
        v_result := v_result || TO_CHAR(v_cnt);
    END IF;

    RETURN v_result;
END;

The inline functions defined in a WITH clause are a 12cR1 feature; but I wouldn't call that "new" anymore.
12cR1 was released a little over 7 years ago.


Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
Here's more info about the feature with links to other test cases
http://stevenfeuersteinonplsql.blogspot.com/2017/03/speed-up-execution-of-your-functions.html
HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
I found an Oracle version 19 in our dev environment :)
and
tested inline function with 5 part REGEXP_SUBSTR, with 800K rows...

like

with function fmt(...) ...;
     s as (...)
select * from s where pattern='111';

vs

with s as (... sql with regexp_substr here ...)
select * from s where pattern='111';

result is like 2-3 s vs 19-20 s (inline function is like 6-10 times faster)...
HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
and here another version without using REGEX

LOGIC: data converted into 0s and delimiters

C_VE_CMP_XXXX_WW01 >>> 0_00_000_0000_0000

then loops, counts 0 and when encounters non zero adds it to result

WITH FUNCTION fmt (p_string IN VARCHAR2) RETURN VARCHAR2
     IS
       v_length   PLS_INTEGER := LENGTH (p_string);
       v_string   VARCHAR2 (40);
       v_result   VARCHAR2 (40);
       v_c        CHAR (1);
       v_p        PLS_INTEGER := 0;
     BEGIN
       v_string := TRANSLATE (p_string,
                   'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
                   '00000000000000000000000000000000000000000000000000000000000000');

       FOR i IN 1 .. v_length
       LOOP
         v_c := SUBSTR (v_string, i, 1);

         IF v_c <> '0'
         THEN
           IF v_p <> 0
           THEN
             v_result := v_result || v_p || v_c;
             v_p := 0;
           ELSE
             v_result := v_result || v_c;
           END IF;
         ELSE
           v_p := v_p + 1;
         END IF;
       END LOOP;

       IF v_c = '0'
       THEN
         v_result := v_result || v_p;
       END IF;

       RETURN v_result;
     END;
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
Hain, I specifically didn't use TRANSLATE in any of my posted solutions because it slows down the functions.  It's still faster than regexp, but since a TRANSLATE call must do a write, it's extra work compared to only reading as in the INSTR.

So I would expect it to be slower - however, when I tested it on the provided sample data it failed trying to do all of them.
If I skip the error it is slower by a little more than double.

If however, you're looking to make some improvements on the fmt function I posted, I suggest rearranging the INSTR search string so the lower case letters are searched last.  This is based on the sample data which only showed upper case.  If that pattern holds, or is at least the majority of cases, then searching upper case first will be faster.

Doing this, I shaved about 11% off my previous execution time, and it makes it about 3 times faster than the TRANSLATE.

CREATE OR REPLACE FUNCTION fmt(p_string IN VARCHAR2)
    RETURN VARCHAR2
IS
    v_cnt      PLS_INTEGER := 0;
    v_result   VARCHAR2(4000);
BEGIN
    FOR i IN 1 .. LENGTH(p_string)
    LOOP
        CASE
            WHEN INSTR('ABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890abcdefghijklmnopqrstuvwxyz', SUBSTR(p_string, i, 1)) > 0
            THEN
                v_cnt := v_cnt + 1;
            WHEN v_cnt > 0
            THEN
                v_result := v_result || TO_CHAR(v_cnt) || SUBSTR(p_string, i, 1);
                v_cnt := 0;
            ELSE
                v_result := v_result || SUBSTR(p_string, i, 1);
                v_cnt := 0;
        END CASE;
    END LOOP;


    IF v_cnt > 0
    THEN
        v_result := v_result || TO_CHAR(v_cnt);
    END IF;


    RETURN v_result;
END;
/

I hope that helps.
You can keep pursuing it if you'd like, but I'm done with it unless the asker comes back.
If you do keep going, please test and make sure you're moving the thread forward with improvements over what came before and not just variants.
I'm sure we could all come up with dozens of variants that appear to work, but if they aren't actually better, then why clutter the thread with the noise.

HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
I did not try it on sample data but I used some address column on our member database with 800K+ records...
and I did not notice any significant difference...
both around 2-3 seconds, where as regex solution takes 18-19 seconds...

can you give me a sample data, so I can check the logic