Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

need help with REGEXP_SUBSTR or REGEXP_INSTR

Posted on 2014-10-27
15
270 Views
Last Modified: 2014-10-28
I need some help understanding REGEXP_INSTR / SUBSTR. I got excellent assistance here, and now I want to learn how to do it myself . . .

The pattern I'm looking for is this: M6T6W6R6, where the numbers are interspersed among the letters
- M = Mon, T = Tues, R = Thurs, etc. The numbers are class periods (1 - 10).

So I want to flag the above as "bad".

but these are good, because the numbers (periods) are not mixed in among the letters (days)
- MR567
- MTRF123456
- MF123

if this can be done, give me a tiny hint and then see if I can get it.
0
Comment
Question by:Gadsden Consulting
  • 8
  • 7
15 Comments
 
LVL 74

Accepted Solution

by:
sdstuber earned 500 total points
ID: 40406323
you need a pattern of letters '[MTWRF]+' followed by numbers '[0-9]+'

to ensure you don't have repeats, then use begin/end markers for the line.

That is, '^'  means the beginning of the line,  '$' means the end of the line.
0
 

Author Comment

by:Gadsden Consulting
ID: 40406345
ok, thanks, I'll give it a try . . .
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 40406364
note, my notes were for identification of "good" records via something like   regexp_like(.......)

to find bad records you would do NOT regexp_like(.......)
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 74

Expert Comment

by:sdstuber
ID: 40406371
if you want to find the specific pattern of repeated  letter/number pairs, you could do that too, but differently.

Either looking for letter,number,letter,number  or    letter,number 2 or more times
0
 

Author Comment

by:Gadsden Consulting
ID: 40406440
well, look at that . . .
SELECT DISTINCT a, Regexp_instr(a,'(^[MWTRFS]+[0-9]+$)') AS Valid_Mtg_Time
FROM tab1
WHERE INSTR(a,',') = 0
   AND a IN ('M5T34R56','WF567','MWF')  

results

WF567	1
MWF	0
M5T34R56	0

Open in new window


great !
0
 

Author Comment

by:Gadsden Consulting
ID: 40406446
sdstuber,

thanks for the other tips.

I need to study the book to see if I can see where they're telling me what you indicated. I just played around with previous pattern matches you you and slightwv posted here, until it worked. So I want to make sure I can turn to the reference and find the solution.
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 40406463
appendix D of the 11gR2  SQL reference covers the Oracle regexp syntax.

http://docs.oracle.com/cd/E11882_01/server.112/e41084/ap_posix.htm#SQLRF020


it's a shorter read than other books, and it's oracle specific.
0
 

Author Comment

by:Gadsden Consulting
ID: 40407059
sdstuber,

the Appendix D link I don't think has what I'm looking for . . . I didn't see anything actually describing how to use REGEXP.

I found this which is closer to what I was looking for, although I would never have been able to solve this problem by reading these pages.

In Jason Price's "Oracle Database 11g SQL", he says
" '^' matches the beginning of a string ... and $ matches the end position of a string" whereas you said
" '^'  means the beginning of the line,  '$' means the end of the line"

- is that the same ?

Either way, I would have never been able to figure this out on my own - it doesn't "click".
0
 
LVL 74

Assisted Solution

by:sdstuber
sdstuber earned 500 total points
ID: 40407132
>> - is that the same ?

unfortunately I have to say "it depends"

Oracle's regexp parser can support multi-line strings.

For example:

"This string
is on
three lines"

The parser can read that as one string of text, or as a string of three lines by recognizing the line breaks.

Thus  ^$ will either refer to the begin and end of the entire string in the former case
or  ^$ will refer to the begin and end of a single line within the string in the latter case.


>>> Either way, I would have never been able to figure this out on my own - it doesn't "click".

regexp's are tricky.  There often multiple ways to write the same thing.  I already gave one difference above with the regexp_like vs NOT regexp_like

Here's another way... rather than searching for a string of exactly one group of letters followed by numbers  you could have searched for the SECOND instance of a letter/number group.  For a legal string you wouldn't find one, for the illegal strings you were looking for you would find them.

The "trick" really, is in defining the pattern well.  Not the regexp, but the actual in-words description of what you are looking for.

Once you have that it's usually MUCH easier to write the regexp as a simple left-to-right search.

Your pattern is:

The string starts with (^) one or more  letters ( [MTRF] ) then those are followed by one or more numbers ( [0-9] ) and then the string ends ($).

Put that all together and you get...

^[MTWRFS]+[0-9]+$

If you defined your pattern differently, you'd get a different regexp.

The SQL appendix won't show you how to use them, but it includes the syntax pieces you need to know, and also, if you're reading other books, the absence of a feature (like forward references) in the appendix indicates Oracle does NOT support it (yet)
0
 

Author Comment

by:Gadsden Consulting
ID: 40407301
thanks a lot. I will review this in more detail tomorrow, and sleep on it tonight . . .
0
 

Author Comment

by:Gadsden Consulting
ID: 40408461
sdstuber,

thanks again, very helpful.

>>Here's another way... rather than searching for a string of exactly one group of letters followed by numbers  you could have searched for the SECOND instance of a letter/number group. For a legal string you wouldn't find one, for the illegal strings you were looking for you would find them.
-- good idea, but I couldn't get that to work, see below.

>>The "trick" really, is in defining the pattern well.  Not the regexp, but the actual in-words description of what you are looking for.
-- good point

-- so it works, but a couple of questions
    1. The pattern I'm looking for is "The string starts with (^) one or more  letters ( [MTRF] ) then those are followed by one or more numbers ( [0-9] ) and then the string ends ($).", so this is defined by ^[MWTRFS]+[0-9]+$
            - What if I wanted to make this legal: M5T34R56 (numbers follow letters), but MWF is illegal ?

   2. Why does this (Regexp_instr(a,'F5') return 2, where a = 'F5' ?

   3. Is a string denoted by a comma ? So to look for a comma, you backslash it ?

SELECT DISTINCT a, Regexp_instr(a,'(^[MWTRFS]+[0-9]+$)') AS "Pattern_Exists_Once",
            Regexp_instr(a,'F5') AS F5_found
FROM tab1
WHERE INSTR(a,',') = 0
   AND a IN ('M5T34R56','WF567','MWF','F123')

               Legal   F5_Found
WF567	         1	2
MWF	         0	0
F123	         1	0
M5T34R56	 0	0
 

Open in new window

 
And say I wanted to make MWF5T34R56 legal ? The pattern is "any letter (or multiples of) which is M,T,W,R,F,S followed by any number of 1,2,3,4,5,6,7,8,9,10" and this pattern can repeat ?

So only MWF above is illegal since it's not followed by numbers, but MWF1 is good.
0
 
LVL 74

Assisted Solution

by:sdstuber
sdstuber earned 500 total points
ID: 40408536
1 - see below

2 - It doesn't,  I get 1 when I run this...

SELECT REGEXP_INSTR(a, 'F5') FROM (SELECT 'F5' a FROM DUAL)

3 - no, commas aren't significant characters except in counters like {3,7} - meaning, the previous pattern will be found 3 to 7 times sequentially;  but commas in the middle of a string don't matter.

However, your sample data doesn't have any commas

 AND a IN ('M5T34R56','WF567','MWF','F123')  - that condition alone means your data won't have any commas, the INSTR isn't really needed since it's redundant with the IN clause.

4 - "make MWF5T34R56 legal"   - since you want a repeating pattern, wrap the entire repeatable part in parentheses and then use + to indicate it will be found 1 or more times.

'([MTWRFS]+[0-9]+)+'


for example...


WITH tab1
     AS (SELECT 'M5T34R56' a FROM DUAL
         UNION ALL
         SELECT 'WF567' FROM DUAL
         UNION ALL
         SELECT 'MWF' FROM DUAL
         UNION ALL
         SELECT 'F123' FROM DUAL)
SELECT a,
       REGEXP_INSTR(a, '(^[MWTRFS]+[0-9]+$)') AS "Pattern_Exists_Once",
       REGEXP_INSTR(a, 'F5') AS f5_found,
       REGEXP_INSTR(a, '([MTWRFS]+[0-9]+)+') AS is_legal
  FROM tab1
0
 

Author Comment

by:Gadsden Consulting
ID: 40408644
sdstuber,

thanks again.

>>AND a IN ('M5T34R56','WF567','MWF','F123')  - that condition alone means your data won't have any commas, the INSTR isn't really needed since it's redundant with the IN clause.
- right, I was first limiting myself to no commas with INSTR(seti.mtg_time,',') = 0 then just further limited to the IN clause

>>I get 1 when I run this... SELECT REGEXP_INSTR(a, 'F5') FROM (SELECT 'F5' a FROM DUAL)
-- but this returns 2: SELECT REGEXP_INSTR(a, 'F5') FROM (SELECT 'WF567' a FROM DUAL)

>>since you want a repeating pattern, wrap the entire repeatable part in parentheses and then use + to indicate it will be found 1 or more times.
-- great ! that's really helpful
-- but this also returned legal (1)
REGEXP_INSTR(a, '([MWTRFS]+[0-9]+)') AS "Pattern_Exists_Multiple"
whereas you have an extra "+"
REGEXP_INSTR(a, '([MTWRFS]+[0-9]+)+') AS is_legal
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 40408679
>>>  but this returns 2: SELECT REGEXP_INSTR(a, 'F5') FROM (SELECT 'WF567' a FROM DUAL)

of course,  F5 is found at the 2nd character of WF567

INSTR isn't a boolean function of 0/1 it returns the position of the string if it finds it, 0 if not

"Pattern_Exists_Multiple" - that's a bad name,  the INSTR looks for the first time that expression is found, which is at character 1, the fact that there are repeats is meaningless

REGEXP_INSTR(a, '([MTWRFS]+[0-9]+)+')  -- this checks explicitly for repeats.

to be even more explicit, add the ^$ to indicate the pattern must be the entire string.

if not

then  '############MTRWF123@@@@@@@@@@@@@@'  would be a legal string because a legal pattern can be found inside that string


your original question has been answered, please close this one and open new questions rather than adding on here.
0
 

Author Comment

by:Gadsden Consulting
ID: 40408763
>>of course,  F5 is found at the 2nd character of WF567
- oops, I was focusing on the 1 for the entire pattern and had a brain freeze on that . . .

>>your original question has been answered, please close this one and open new questions rather than adding on here.
-- sure
0

Featured Post

Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Creation date for a PDB 5 39
Filtering characters in an SQL field 2 16
subtr returning incorrect value 8 25
Trouble with <> 2 20
'Between' is such a common word we rarely think about it but in SQL it has a very specific definition we should be aware of. While most database vendors will have their own unique phrases to describe it (see references at end) the concept in common …
From implementing a password expiration date, to datatype conversions and file export options, these are some useful settings I've found in Jasper Server.
This video shows how to Export data from an Oracle database using the Datapump Export Utility.  The corresponding Datapump Import utility is also discussed and demonstrated.
This video shows how to configure and send email from and Oracle database using both UTL_SMTP and UTL_MAIL, as well as comparing UTL_SMTP to a manual SMTP conversation with a mail server.

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question