Sam OZ
asked on
Check patter of string and return first part
I have a Docuemnt No in following sequence
DocuemntType-AreaNo-Depart MentCode-S equenceNo
Typically a DocuemntNumber wil be Type1-01-001-123
But there is bad data also so some of them may be just 01-003-234 ( No type) or 01/004/123 ( with /) ..e.t.c
I need the oracle query for following
Check if the pattern of docuemnt number is correct ( That is 3 hiphens) .
If the pattern is correct , return the Docuemnt type (First part of the DocuemntNo)
Else return ''
DocuemntType-AreaNo-Depart
Typically a DocuemntNumber wil be Type1-01-001-123
But there is bad data also so some of them may be just 01-003-234 ( No type) or 01/004/123 ( with /) ..e.t.c
I need the oracle query for following
Check if the pattern of docuemnt number is correct ( That is 3 hiphens) .
If the pattern is correct , return the Docuemnt type (First part of the DocuemntNo)
Else return ''
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I agree with the "useless" REGEXP_COUNT ;-)
But you know, you're still using the expensive sub-expressions with "REGEXP_SUBSTR(doc_no, '^(.*)-\d*-\d*-\d*$', 1,1,NULL,1)" or "REGEXP_SUBSTR(doc_no, '^(.+)-\d+-\d+-\d+$', 1,1,NULL,1)".
As always: if this isn't huge big data whole lotta noise data processing and stuff, the expensive part is irrelevant. BUT, if you need to process huge amounts of data (maybe even in a time critical manner), then you should re-think the overall approach ;-)
But you know, you're still using the expensive sub-expressions with "REGEXP_SUBSTR(doc_no, '^(.*)-\d*-\d*-\d*$', 1,1,NULL,1)" or "REGEXP_SUBSTR(doc_no, '^(.+)-\d+-\d+-\d+$', 1,1,NULL,1)".
As always: if this isn't huge big data whole lotta noise data processing and stuff, the expensive part is irrelevant. BUT, if you need to process huge amounts of data (maybe even in a time critical manner), then you should re-think the overall approach ;-)
If the pattern is truly simply three hyphens -
select decode(regexp_count(doc_no ,'-'),3,su bstr(doc_n o,1,instr( doc_no,'-' ) - 1))
from yourtable;
select decode(regexp_count(doc_no
from yourtable;
ASKER
Hi Alex (Alexander Eßer)
Your query works. But I can have values like Type1-01-001-123/01 which are also good. Can you please modify query ?
Thansk all for your reply
Your query works. But I can have values like Type1-01-001-123/01 which are also good. Can you please modify query ?
Thansk all for your reply
regexp_substr(doc_no, '^(.*)(-\d*-\d*-\d*)', 1, 1, null, 1)
BUT it all depends on what you want to achieve. I strongly recommend that you take one step back from your current level/view of detailness and re-think (like stdstuber already mentioned in a similar manner). Your first question stated a file pattern abc, now you say "no, it's also xyz..." That's NOT the way you develop!
>>But I can have values like Type1-01-001-123/01 which are also good<<
My query works with that value so the question is, "Is the desired result truly, for any documentno with three hyphens, return the substring up to the first hyphen"? If not, we would need to know all of the patterns that qualify for returning the first part.
My query works with that value so the question is, "Is the desired result truly, for any documentno with three hyphens, return the substring up to the first hyphen"? If not, we would need to know all of the patterns that qualify for returning the first part.
ASKER
Hi Alex and all,
Sorry for delay in getting back
There is only one condition . If Document pattern is DocuemntType-AreaNo-Depart MentCode-S equenceNo return DocumentType
Else return ''
Only thing is Type1-01-001-123/01 is a good pattern . I am checking only the hiphens and not other characters
Sorry for delay in getting back
There is only one condition . If Document pattern is DocuemntType-AreaNo-Depart
Else return ''
Only thing is Type1-01-001-123/01 is a good pattern . I am checking only the hiphens and not other characters
I'll reiterate, my query will treat Type1-01-001-123/01 as a good pattern and will retrieve the first part of every column that has 3 hyphens. What else do you need?
I'm really looking forward to how this is going to end...
So 'this---' would return 'this' as the answer using the query above. Is that correct?
If it is, then I suggest removing the extra parentheses in the return value because sub-expressions are expensive.
Also, I would eliminate the case statement entirely as it's simply extra processing for no functional value.
So, combining both of those suggestions, the doc_return function call would simply be:
REGEXP_SUBSTR(doc_no, '^(.*)-\d*-\d*-\d*$', 1,1,NULL,1)
Again though, this assumes the "document type" is as I've described it. If that's not the case, please specify and provide sample data and expected results.
I will take a guess that maybe you want to ensure there are at least one digit in each of the AreaNo-DepartMentCode-Sequ
If so then replace the * with +
REGEXP_SUBSTR(doc_no, '^(.+)-\d+-\d+-\d+$', 1,1,NULL,1)