Check patter of string and return first part

I have a Docuemnt No in following sequence
     DocuemntType-AreaNo-DepartMentCode-SequenceNo

    Typically a  DocuemntNumber wil be    Type1-01-001-123
     But there is bad data also so some of them may be just 01-003-234  ( No type) or 01/004/123 ( with /)  ..e.t.c
     I need the oracle query for following
          Check if the pattern of docuemnt number is correct  ( That is 3 hiphens) .
             If the pattern is correct , return the Docuemnt type  (First part of the DocuemntNo)
             Else return ''
Sam OZAsked:
Who is Participating?
 
Alexander Eßer [Alex140181]Connect With a Mentor Software DeveloperCommented:
REGEXP_SUBSTR would be the weapon of choice ;-)
regexp_substr(doc_no, '^(.*)(-\d*-\d*-\d*)$', 1, 1, null, 1)

Open in new window


Sample:
select doc_no,
       regexp_count(doc_no, '^.*-\d*-\d*-\d*$') pattern_count,
       case
         when regexp_count(doc_no, '^.*-\d*-\d*-\d*$') > 0 then
          regexp_substr(doc_no, '^(.*)(-\d*-\d*-\d*)$', 1, 1, null, 1)
         else
          ''
       end doc_return
  from (select 'Type1-01-001-123' doc_no
          from dual
        union
        select '01-003-234' doc_no
          from dual
        union
        select '01/004/123' doc_no
          from dual);

Open in new window

0
 
sdstuberCommented:
What identifies a document type?  The assumption in the post above is it's anything before the first hyphen as long as there are 3 hyphens and no non-numerics after the first hyphen
So  'this---'  would return 'this' as the answer using the query above.  Is that correct?

If it is, then I suggest removing the extra parentheses in the return value because sub-expressions are expensive.
Also, I would eliminate the case statement entirely as it's simply extra processing for no functional value.

So, combining both of those suggestions, the doc_return function call would simply be:

REGEXP_SUBSTR(doc_no, '^(.*)-\d*-\d*-\d*$', 1,1,NULL,1)

Again though, this assumes the "document type" is as I've described it.  If that's not the case, please specify and provide sample data and expected results.

I will take a guess that maybe you want to ensure there are at least one digit in each of the AreaNo-DepartMentCode-SequenceNo portions and one character in the document type
If so then replace the * with +

REGEXP_SUBSTR(doc_no, '^(.+)-\d+-\d+-\d+$', 1,1,NULL,1)
0
 
Alexander Eßer [Alex140181]Software DeveloperCommented:
I agree with the "useless" REGEXP_COUNT ;-)
But you know, you're still using the expensive sub-expressions with "REGEXP_SUBSTR(doc_no, '^(.*)-\d*-\d*-\d*$', 1,1,NULL,1)" or "REGEXP_SUBSTR(doc_no, '^(.+)-\d+-\d+-\d+$', 1,1,NULL,1)".
As always: if this isn't huge big data whole lotta noise data processing and stuff, the expensive part is irrelevant. BUT, if you need to process huge amounts of data (maybe even in a time critical manner), then you should re-think the overall approach ;-)
0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

 
awking00Commented:
If the pattern is truly simply three hyphens -
select decode(regexp_count(doc_no,'-'),3,substr(doc_no,1,instr(doc_no,'-') - 1))
from yourtable;
0
 
Sam OZAuthor Commented:
Hi  Alex (Alexander Eßer)
    Your query works. But I can have  values like Type1-01-001-123/01   which are also good.  Can you please modify query ?

Thansk all for your reply
0
 
Alexander Eßer [Alex140181]Software DeveloperCommented:
regexp_substr(doc_no, '^(.*)(-\d*-\d*-\d*)', 1, 1, null, 1)

Open in new window

BUT it all depends on what you want to achieve. I strongly recommend that you take one step back from your current level/view of detailness and re-think (like stdstuber already mentioned in a similar manner).
Your first question stated a file pattern abc, now you say "no, it's also xyz..." That's NOT the way you develop!
0
 
awking00Commented:
>>But I can have  values like Type1-01-001-123/01   which are also good<<
My query works with that value so the question is, "Is the desired result truly, for any documentno with three hyphens, return the substring up to the first hyphen"? If not, we would need to know all of the patterns that qualify for returning the first part.
0
 
Sam OZAuthor Commented:
Hi Alex and all,
   Sorry for delay in getting back
       There is only one condition  . If  Document pattern is  DocuemntType-AreaNo-DepartMentCode-SequenceNo  return DocumentType
Else return ''

Only thing is Type1-01-001-123/01 is a good pattern . I am checking only the hiphens and not other characters
0
 
awking00Commented:
I'll reiterate, my query will treat Type1-01-001-123/01 as a good pattern and will retrieve the first part of every column that has 3 hyphens. What else do you need?
0
 
Alexander Eßer [Alex140181]Software DeveloperCommented:
I'm really looking forward to how this is going to end...
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.