William Peck
asked on
using regexp_instr to compare values, need a little help in Oracle Sql
I used Regexp_substr to good effect recently (with EE tutoring), now I need a little help with regexp_instr.
Here's the neat trick that EE helped me with:
regexp_substr(REPLACE(p_mt g_time_in, ' ',''),'([MWTRFS]+[0-9]+,?) +')
this is a scheduling application at a school, with class periods 1 - 10, meeting on Mon - Sat. the regexp_substr took these values to convert:
MTWR1256,FBYARRANG produces MTWR1256 (stripping out BYARRANG and also the "F" because it's not followed by 123456789 (or 10).
Now I want to compare values and say "Ok" or "Bad !". Here are a bunch of examples:
A B C
F12 F1 good
F12 F2 good
A B C
MW6,BY ARRANGEMENT M6 good
MW6,BY ARRANGEMENT W6 good
MW6,BY ARRANGEMENT R6 bad
A B C
MWF2,R34 M2 good
MWF2,R34 W2 good
MWF2,R34 R3 good
MWF2,R34 R4 good
MWF2,R34 M3 bad
MWF2,R34 W3 bad
A B C
T10,R9 T10 good
T10,R9 T5 BAD
T10,R9 T6 BAD
T10,R9 R3 BAD
T10,R9 R4 BAD
T10,R9 R9 good
T10,R9 R10 BAD - 10 is not preceded by R !
T10,R9 T9 BAD - 9 is not preceded by T since R breaks the pattern !
column B is always the day (M,T,W,R,F,S) followed by the period (1 - 10)
so here are some rules
if column B is found in Column A, it's good (F1 = F1)
you can find other letters before the number (MWF2,R34 | M2 good !)
you can find other numbers after the letter (F12 | F2 good !)
but MWF2,R34 | M3 bad M3 isn't found ! only R3 and R4 would be good
MWF23,R34 | M3 ok
M2 ok
M4 bad ! because of the intervening R
R2 bad ! there's no 2 after R
R3 ok
R4 ok
F4 bad ! because of the intervening R
F3 ok
F2 ok
also, the comma is irrelevant.
Here's the neat trick that EE helped me with:
regexp_substr(REPLACE(p_mt
this is a scheduling application at a school, with class periods 1 - 10, meeting on Mon - Sat. the regexp_substr took these values to convert:
MTWR1256,FBYARRANG produces MTWR1256 (stripping out BYARRANG and also the "F" because it's not followed by 123456789 (or 10).
Now I want to compare values and say "Ok" or "Bad !". Here are a bunch of examples:
A B C
F12 F1 good
F12 F2 good
A B C
MW6,BY ARRANGEMENT M6 good
MW6,BY ARRANGEMENT W6 good
MW6,BY ARRANGEMENT R6 bad
A B C
MWF2,R34 M2 good
MWF2,R34 W2 good
MWF2,R34 R3 good
MWF2,R34 R4 good
MWF2,R34 M3 bad
MWF2,R34 W3 bad
A B C
T10,R9 T10 good
T10,R9 T5 BAD
T10,R9 T6 BAD
T10,R9 R3 BAD
T10,R9 R4 BAD
T10,R9 R9 good
T10,R9 R10 BAD - 10 is not preceded by R !
T10,R9 T9 BAD - 9 is not preceded by T since R breaks the pattern !
column B is always the day (M,T,W,R,F,S) followed by the period (1 - 10)
so here are some rules
if column B is found in Column A, it's good (F1 = F1)
you can find other letters before the number (MWF2,R34 | M2 good !)
you can find other numbers after the letter (F12 | F2 good !)
but MWF2,R34 | M3 bad M3 isn't found ! only R3 and R4 would be good
MWF23,R34 | M3 ok
M2 ok
M4 bad ! because of the intervening R
R2 bad ! there's no 2 after R
R3 ok
R4 ok
F4 bad ! because of the intervening R
F3 ok
F2 ok
also, the comma is irrelevant.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you both ! I can't test it until tomorrow, but I'll do that first thing,
just a note on usage - slightwv's assumes the data has already been cleansed.
mine assumes the data is still dirty and will need to be cleansed as part of the check, so you can't use exactly the same input data for both
mine assumes the data is still dirty and will need to be cleansed as part of the check, so you can't use exactly the same input data for both
sdstuber's post is the way to go.
I also missed the possibility of:
M1,T6,WR89,F3,S10
I took just TWO values based on the data you provided. I wasn't thinking about the previous questions and additional possibilities.
Personally, instead of writing a bunch of 'magic' SQL for a bad design, I would change the design and normalize the design. It would probably save you a LOT of headaches.
I also missed the possibility of:
M1,T6,WR89,F3,S10
I took just TWO values based on the data you provided. I wasn't thinking about the previous questions and additional possibilities.
Personally, instead of writing a bunch of 'magic' SQL for a bad design, I would change the design and normalize the design. It would probably save you a LOT of headaches.
ASKER
sdstuber, the data has been cleaned (at least the comparison side - T1, M2, etc.). I can use last week's lesson to clean the left side - MW6,BY ARRANGEMENT.
slightwv - thanks as well. sorry, I forgot to include a wild (but valid) example, like M1,T6,WR89,F3,S10. There are several "triple" combinations as well.
>>instead of writing a bunch of 'magic' SQL for a bad design, I would change the design and normalize the design.
- I don't think I can change anything. For the previous question where I asked about regexp_substr, I was taking the input of M1,T6,WR89,F3,S10 and producing
1 on Mondays
6 on Tuesdays
89 on Weds and Thurs
etc.
So what this query is doing is validating the output, there are like 10,000 results of courses, dates, etc.
If I can get this post working, it's an independent - and automated - validation of the program that produced the periods for the given days.
I probably won't have time to fiddle with it tonight, but I'll be at it first thing tomorrow.
Even if I can handle 80% of the combinations, that's going to make validation a lot easier in that I'd only have to review 20%.
slightwv - thanks as well. sorry, I forgot to include a wild (but valid) example, like M1,T6,WR89,F3,S10. There are several "triple" combinations as well.
>>instead of writing a bunch of 'magic' SQL for a bad design, I would change the design and normalize the design.
- I don't think I can change anything. For the previous question where I asked about regexp_substr, I was taking the input of M1,T6,WR89,F3,S10 and producing
1 on Mondays
6 on Tuesdays
89 on Weds and Thurs
etc.
So what this query is doing is validating the output, there are like 10,000 results of courses, dates, etc.
If I can get this post working, it's an independent - and automated - validation of the program that produced the periods for the given days.
I probably won't have time to fiddle with it tonight, but I'll be at it first thing tomorrow.
Even if I can handle 80% of the combinations, that's going to make validation a lot easier in that I'd only have to review 20%.
>>- I don't think I can change anything.
Think about this:
Just because the inputs and outputs are 'fixed' doesn't mean there isn't a better way between those boundaries.
There are a few ways to 'mask' the underlying architecture.
Off the top of my head I don't have the sql to turn a table like:
id,day,period
1,M,1
1,T,1
1,R,3
2,M,1
2,T,1
2,W,1
into:
MT1,R3
MTW1
but it should be possible. You can parse the inputs into the normalized table.
This should make things much easier in the reporting/validation/etc.. . fronts.
Think about this:
Just because the inputs and outputs are 'fixed' doesn't mean there isn't a better way between those boundaries.
There are a few ways to 'mask' the underlying architecture.
Off the top of my head I don't have the sql to turn a table like:
id,day,period
1,M,1
1,T,1
1,R,3
2,M,1
2,T,1
2,W,1
into:
MT1,R3
MTW1
but it should be possible. You can parse the inputs into the normalized table.
This should make things much easier in the reporting/validation/etc..
ASKER
Ok, I'll mull that over too, thanks for following up.
if your data is already clean then it's even easier, same idea, just remove the cleaningsubquery from my first post
SELECT a,
b,
MAX(CASE
WHEN REGEXP_REPLACE(REGEXP_SUBS TR(a,
'[^,]+',
1,
COLUMN_VALUE),
'[^' || b || ']') = b
THEN
'good'
ELSE
'bad'
END)
c
FROM yourtable x,
TABLE( SELECT COLLECT(LEVEL)
FROM DUAL
CONNECT BY LEVEL <= LENGTH(x.a) - LENGTH(REPLACE(x.a, ',')) + 1)
GROUP BY a, b
SELECT a,
b,
MAX(CASE
WHEN REGEXP_REPLACE(REGEXP_SUBS
'[^,]+',
1,
COLUMN_VALUE),
'[^' || b || ']') = b
THEN
'good'
ELSE
'bad'
END)
c
FROM yourtable x,
TABLE( SELECT COLLECT(LEVEL)
FROM DUAL
CONNECT BY LEVEL <= LENGTH(x.a) - LENGTH(REPLACE(x.a, ',')) + 1)
GROUP BY a, b
ASKER
Ok, thanks ! I'll be on this tomorrow :-)
ASKER
Holy cow, it's working !!!! great team effort :-)
Slightwv, I used your table creation and popped in the data
sdstuber, I loaded slightwv's data to [my_table], and ran your query as is - bingo ! it worked with test data.
I've loaded my (complete set of) test data to [my_table], and am checking it now - so far so good ! There are some bad combinations, but that looks accurate as far as the data is concerned - so far the query is looking good . . . have to spot check 1,110 results, but that won't take too awful long.
Slightwv, I used your table creation and popped in the data
sdstuber, I loaded slightwv's data to [my_table], and ran your query as is - bingo ! it worked with test data.
I've loaded my (complete set of) test data to [my_table], and am checking it now - so far so good ! There are some bad combinations, but that looks accurate as far as the data is concerned - so far the query is looking good . . . have to spot check 1,110 results, but that won't take too awful long.
>>great team effort :-)
Thanks for partial inclusion here but I was WAY off on this one. sdstuber's method is the way to go here!
Thanks for partial inclusion here but I was WAY off on this one. sdstuber's method is the way to go here!
ASKER
Now if only I could understand what it's doing internally . . . LOL . . .
ASKER
550 checked, 550 accurate, including correctly identifying some bad combinations . . . 550 to go.
I will also check some of the wild ones, but all the triple ones have come back accurate, though all are good. So I'll pop in something bad to confirm.
But so far these all look good
I will also check some of the wild ones, but all the triple ones have come back accurate, though all are good. So I'll pop in something bad to confirm.
But so far these all look good
MW56,R123,F34 M5 good
MW56,R123,F34 M6 good
MW56,R123,F34 W5 good
MW56,R123,F34 W6 good
MW56,R123,F34 R1 good
MW56,R123,F34 R2 good
MW56,R123,F34 R3 good
MW56,R123,F34 F3 good
MW56,R123,F34 F4 good
.
MT567 W5 bad
MT567 W6 bad
MT567 W7 bad
.
MF1,MWF3 M1 good
MF1,MWF3 M3 good
MF1,MWF3 M6 bad
MF1,MWF3 T6 bad
MF1,MWF3 W3 good
MF1,MWF3 W6 bad
MF1,MWF3 R6 bad
MF1,MWF3 F1 good
MF1,MWF3 F3 good
MF1,MWF3 F6 bad
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
sdstuber, thanks ! I will definitely study this, and thanks for breaking it down.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Ok, thanks for the followup !
I will finish testing the current logic and then fold in the "10" logic.
Bill
I will finish testing the current logic and then fold in the "10" logic.
Bill
Had some time to elaborate on what I posted back in http:#a40369773
Normalize the data, mask the normalization with a view.
The only thing it doesn't have is parsing the inputs to insert into the base tables.
Read up on INSTEAD OF triggers:
http://docs.oracle.com/cd/E11882_01/appdev.112/e10766/tdddg_triggers.htm#TDDDG52800
This is pretty quick and dirty but shows the abilities (I think).
Normalize the data, mask the normalization with a view.
The only thing it doesn't have is parsing the inputs to insert into the base tables.
Read up on INSTEAD OF triggers:
http://docs.oracle.com/cd/E11882_01/appdev.112/e10766/tdddg_triggers.htm#TDDDG52800
This is pretty quick and dirty but shows the abilities (I think).
drop table event_days purge;
create table event_days(event_id number, series_id number, day char(1));
drop table event_periods purge;
create table event_periods(event_id number, series_id number, period number);
--MWF23,R34,T10,F5
insert into event_days values(1,1,'M');
insert into event_days values(1,1,'W');
insert into event_days values(1,1,'F');
insert into event_days values(1,2,'R');
insert into event_days values(1,3,'T');
insert into event_days values(1,4,'F');
insert into event_periods values(1,1,2);
insert into event_periods values(1,1,3);
insert into event_periods values(1,2,3);
insert into event_periods values(1,2,4);
insert into event_periods values(1,3,10);
insert into event_periods values(1,4,5);
--MTWR1256
insert into event_days values(2,1,'M');
insert into event_days values(2,1,'T');
insert into event_days values(2,1,'W');
insert into event_days values(2,1,'R');
insert into event_periods values(2,1,1);
insert into event_periods values(2,1,2);
insert into event_periods values(2,1,5);
insert into event_periods values(2,1,6);
commit;
create or replace view event_vw as
select ed.event_id, listagg(days_for_event || periods_for_event,',') within group(order by case substr(days_for_event,1,1) when 'M' then 1 when 'T' then 2 when 'W' then 3 when 'R' then 4 when 'F' then 5 when 'S' then 6 end) event_list
from
(
select event_id, series_id,
listagg(day) within group (order by case day when 'M' then 1 when 'T' then 2 when 'W' then 3 when 'R' then 4 when 'F' then 5 when 'S' then 6 end) days_for_event
from event_days ed
group by ed.event_id,ed.series_id
) ed
join
(
select event_id, series_id,
listagg(period) within group (order by period) periods_for_event
from event_periods ep
group by ep.event_id,ep.series_id
) ep
on ed.event_id=ep.event_id and ed.series_id=ep.series_id
group by ed.event_id
/
select * from event_vw;
It also doesn't have the primary keys but that is ALL columns on both tables.
This will keep 'accidents' from happening.
It would also make the answer to this question a LOT easier.
With the primary keys, this record would never enter the tables:
MMTW224
This will keep 'accidents' from happening.
It would also make the answer to this question a LOT easier.
With the primary keys, this record would never enter the tables:
MMTW224
ASKER
I think it's good . . .
Results prior to latest change:
With the fix, it looks good . . .
sdstuber, with your latest update, it didn't run out of the box . . . it squawked on column "c". So I commented out "c" and it worked . . .
Results prior to latest change:
M1,WF1 M1 good
M1,WF1 M10 bad
M1,WF1 W1 good
M1,WF1 F10 bad
M10,WF10 M1 good <-- this should not be good !
M10,WF10 M10 good
M10,WF10 W1 good <-- this should not be good !
M10,WF10 W10 good
With the fix, it looks good . . .
M1,WF1 F10 bad
M1,WF1 M1 good
M1,WF1 M10 bad
M1,WF1 W1 good
M10,WF10 M1 bad <-- correct !
M10,WF10 M10 good
M10,WF10 W1 bad <-- correct !
M10,WF10 W10 good
sdstuber, with your latest update, it didn't run out of the box . . . it squawked on column "c". So I commented out "c" and it worked . . .
I included the manually derived "good" and"bad" into my local copy of your sample data.
That's how I checked for correctness.
I would look for anything where "c" != myresult
You real table, obviously wouldn't have that, but it's not really needed, it was only for debugging.
That's how I checked for correctness.
I would look for anything where "c" != myresult
You real table, obviously wouldn't have that, but it's not really needed, it was only for debugging.
ASKER
that's what I thought. It's working now, so I'm moving to get it to work against my real table.
Thx.
Thx.
ASKER
I am working to implement this, but it probably won't happen until tomorrow.
Looks like this hit the mark, though, so thanks again :-)
Looks like this hit the mark, though, so thanks again :-)
ASKER
I'm going to close this, as I tested it and it works great. I still have to implement everything, this might spill into next week.
But this was really fantastic, I haven't presented to the client yet, I'll let you know when I do :-)
But this was really fantastic, I haven't presented to the client yet, I'll let you know when I do :-)
ASKER
super-fantastic !
ASKER
slightwv, I'm sorry, I missed a couple of your posts:
Normalize the data, mask the normalization with a view.
- good link on INSTEAD OF triggers, thx.
- in your query, it's squawking, saying that "from keyword is not found where expected", and then the cursor rests on "group" (even in the inside queries).
>>With the primary keys, this record would never enter the tables: MMTW224
- yes, but in our system that COULD be recorded ! since user enters the data, they could fat-finger it . . . :-(
Normalize the data, mask the normalization with a view.
- good link on INSTEAD OF triggers, thx.
- in your query, it's squawking, saying that "from keyword is not found where expected", and then the cursor rests on "group" (even in the inside queries).
>>With the primary keys, this record would never enter the tables: MMTW224
- yes, but in our system that COULD be recorded ! since user enters the data, they could fat-finger it . . . :-(
Let me guess, database isn't 11g or above?
If 10g, I'll have to tweak it a little and am willing to do so if you think normalization is an option. If you don't think this is a possibility, I would rather not take the time.
I'm thinking not since you stated:
I wonder if the app is trapping errors in such a way that the database not accepting the data, would nicely tell the end user they are entering something 'bad'?
If 10g, I'll have to tweak it a little and am willing to do so if you think normalization is an option. If you don't think this is a possibility, I would rather not take the time.
I'm thinking not since you stated:
but in our system that COULD be recorded
I wonder if the app is trapping errors in such a way that the database not accepting the data, would nicely tell the end user they are entering something 'bad'?
ASKER
yes, d.b. is not 11g+, but we are moving there at the moment, but not next week . . .
Normalization is not an option, but I am interested in learning the tricks in your query . . . all of which I have never seen . . .
>>I wonder if the app is trapping errors in such a way that the database not accepting the data, would nicely tell the end user they are entering something 'bad'?
- definitely a good idea ! I have already commented on that in the course of my analysis and testing.
Normalization is not an option, but I am interested in learning the tricks in your query . . . all of which I have never seen . . .
>>I wonder if the app is trapping errors in such a way that the database not accepting the data, would nicely tell the end user they are entering something 'bad'?
- definitely a good idea ! I have already commented on that in the course of my analysis and testing.
>>but I am interested in learning the tricks in your query
Feel free to self-study and ask anything that you don't understand about what I posted.
LISTAGG is a string aggregator and is new to 11g.
It concatenates values together with whatever character you give it. In the example I posted above, I didn't provide a character so it defaulted to null.
It makes creating a CSV much easier than old methods.
>>Normalization is not an option,
Then some more experimenting for you to do.
Prior to 11g, there were several tricks:
http://www.oracle-base.com/articles/misc/string-aggregation-techniques.php
In that link above, forget WM_CONCAT is even mentioned... Seriously, purge it from your mind!!! Don't even think about EVER using it!!!
There is also an XML trick that the other Expert here, sdstuber, enlightened me to many years ago. I actually prefer it over the User Defined Function (UDF) although the UDF is supposed to be faster.
https://www.experts-exchange.com/questions/24914739/In-Oracle-how-can-I-concatenate-multiple-results-returned-from-select-statement-into-a-comma-separated-string.html?anchorAnswerId=25864822#a25864822
Feel free to self-study and ask anything that you don't understand about what I posted.
LISTAGG is a string aggregator and is new to 11g.
It concatenates values together with whatever character you give it. In the example I posted above, I didn't provide a character so it defaulted to null.
It makes creating a CSV much easier than old methods.
>>Normalization is not an option,
Then some more experimenting for you to do.
Prior to 11g, there were several tricks:
http://www.oracle-base.com/articles/misc/string-aggregation-techniques.php
In that link above, forget WM_CONCAT is even mentioned... Seriously, purge it from your mind!!! Don't even think about EVER using it!!!
There is also an XML trick that the other Expert here, sdstuber, enlightened me to many years ago. I actually prefer it over the User Defined Function (UDF) although the UDF is supposed to be faster.
https://www.experts-exchange.com/questions/24914739/In-Oracle-how-can-I-concatenate-multiple-results-returned-from-select-statement-into-a-comma-separated-string.html?anchorAnswerId=25864822#a25864822
if 10g then use the COLLECT builtin along with a function to write the collection into a string. Much simpler and much more efficient.
But, the XML method has the advantage that you don't need to create any other objects. Which, if you don't have permissions might make it the only option.
But, the XML method has the advantage that you don't need to create any other objects. Which, if you don't have permissions might make it the only option.
ASKER
excellent, thanks to both of you for the follow-up. I am fighting to get back to this but getting caught in other (related) priorities. But I will be back for sure.
ASKER
I did get it working with my own data and everything works ! I have to sort through some details, but I'm good to go with this post . . . thanks again :-)
ASKER
Gentlemen,
complete success :-) I implemented, ran my test, and no "bad" records. The logic I tested included the prev post where I got regexp_substr tips, plus this one.
I'm expecting some kudo's next week when I show the boss . . .
and a good end to the week !
complete success :-) I implemented, ran my test, and no "bad" records. The logic I tested included the prev post where I got regexp_substr tips, plus this one.
I'm expecting some kudo's next week when I show the boss . . .
and a good end to the week !
Open in new window
or, if the max number of csv values in A is 2 then try this...
Open in new window