Solved

Trapping unknown characters in a column

Posted on 2014-02-05
24
513 Views
Last Modified: 2014-03-18
I have a column which has characters which are unknown - may be control characters or some other junk. I would like to find the specific row with the column with this problem. When I do a function such as to_number or to_date, it gives an error but doesn't tell the exact row. I have a counter column which I can use which does take some time. The dump function does the same thing, that is, errors out. The error is ORA-01841 when inserting into a date column. Is there a way to run one SELECT and find the specific row? Something like:

select * from Tbl1 where instr(dump(Col1),'20')=0

This column is supposed to have an 8 digit number only which gets translated into date.
0
Comment
Question by:soccerplayer
  • 7
  • 5
  • 4
  • +1
24 Comments
 
LVL 13

Assisted Solution

by:Alexander Eßer [Alex140181]
Alexander Eßer [Alex140181] earned 250 total points
ID: 39837457
You could use regular expressions for this purpose, as shown here: http://psoug.org/reference/regexp.html

check for [:digit:] or count of [:cntrl:] = 0 (or something else that suites your needs)
0
 
LVL 76

Assisted Solution

by:slightwv (䄆 Netminder)
slightwv (䄆 Netminder) earned 250 total points
ID: 39837627
>> The error is ORA-01841 when inserting into a date column.

Inserting is different than locating columns with 'bad' characters.  Which are you wanting help with:  How to locate bad characters in existing data or how to fix the ORA-01841 when inserting?

If it is finding 'bad' data then Alex is correct.  I would use regexp_like.

You just need to define what is 'bad' for us.

If 'good' is alpha-numeric then try this:

drop table tab1 purge;
create table tab1(col1 char(1));

insert into tab1 values('a');
insert into tab1 values('1');
--Insert the 'bad' one
insert into tab1 values(chr(1));
commit;

select rowid from tab1 where regexp_like(col1,'[^[:alnum:]]');

Open in new window

0
 

Accepted Solution

by:
soccerplayer earned 0 total points
ID: 39839465
I would do a SQL such as:

select rowid from Tbl1 where regexp_like(Col1,'[^[:cntrl:]]');

Problem is, there might be characters which do not fall into any of these sets. In that case, the SQL would bomb without telling me the specific row.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39839502
>>select rowid from Tbl1 where regexp_like(Col1,'[^[:cntrl:]]');

That will return rowids for rows that do not contain a character in the cntrl class.

>>there might be characters which do not fall into any of these sets.

You need to decide what is considered 'good' and what is considered 'bad'.  You may not be able to use a pre-defined class or maybe add specific additional characters along with a class.

For example:  Say 'good' characters are numbers and a lower case 'x'.  Everything else is 'bad'.

This catches it:
drop table tab1 purge;
create table tab1(col1 varchar2(10));

insert into tab1 values('1');
insert into tab1 values('x');
insert into tab1 values('1x');
--Insert the 'bad' one
insert into tab1 values('a');
commit;

select rowid from tab1 where regexp_like(col1,'[^[:digit:]x]');

Open in new window


If you cannot figure out the regular expression, post your requirements on what is good and bad and we can help.
0
 
LVL 31

Expert Comment

by:awking00
ID: 39839530
>>This column is supposed to have an 8 digit number only which gets translated into date.<<
What is the format supposed to be that gets translated?
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39839558
I suppose I should follow up with this:
When deciding if you want to look for absence of good characters or look specifically for bad characters, choose the smaller of the two lists.

For example:  If 'x' is the only bad character it would be better to look for 'x' as opposed to look for everything except 'x'.
0
 

Author Comment

by:soccerplayer
ID: 39839631
Okay. I would do a SQL like this:

select rowid from Tbl1 where regexp_like(Col1,'[^[:digit:]]');

Does the ^ character mean "not"?

In this URL: http://psoug.org/reference/regexp.html, it says "Anchor the expression to the start of a line".
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39839660
>>I would do a SQL like this:

That will return rowid's of any row that does not contain a character in the 'digit' class.

>>Does the ^ character mean "not"?

When inside the square brackets it is 'not'.  When outside it is start of line.
0
 

Author Comment

by:soccerplayer
ID: 39839677
This information is not there in the URL above. Is there another place you could recommend that I could look this up?
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 31

Expert Comment

by:awking00
ID: 39839690
I think the problem may not necessarily be "unknown" characters, but characters that don't match the format mask that is being used to do the translation into date. Dates can take many formats but they don't follow a specific pattern. It would likely take a complex set of regular expressions to test the existence of non-matching patterns (e.g. assuming a yyyymmdd pattern, when two characters represent a month, if they are 02, then the characters representing the day can be 28 if the characters representing the year are not evenly divisible by 4, and can be 29 if the characters representing the year are divisible by 4 as long as they are not evenly divisible by 100, 200, or 300). I assume it could still be done, but you can see the complexity involved.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39839712
>>Is there another place you could recommend that I could look this up?

Here's the doc link:
Non-Matching Character List

http://docs.oracle.com/cd/B19306_01/appdev.102/b14251/adfns_regexp.htm#i1007670

Regular expressions are VERY powerful and VERY cryptic.  That makes them difficult to learn.

My favorite book on them is:
http://shop.oreilly.com/product/9780596528126.do

You won't be able to do everything in that book with Oracle regular expressions but you can do a lot of it.

The reason:  Everyone implements regular expressions in different ways.
0
 
LVL 31

Expert Comment

by:awking00
ID: 39840032
The following will produce every record and its rowid where the date field (col1) does not match the yyyymmdd format. It can be modified for other formats if needed.
With cte as
(select row_id from
 (select col1,
        case when   to_number(substr(col1,5,2)) = '02' and
                    to_number(substr(col1,7,2)) > '29'
             then 'Bad'
             when (mod(to_number(substr(col1,1,4)),4) != 0 or
                   mod(to_number(substr(col1,1,4)),100) = 0) and
                   (to_number(substr(col1,5,2)) = '02' and
                    to_number(substr(col1,7,2)) > '28')
             then 'Bad'
             when not regexp_like(col1,'(19|20)[0-9]{2}(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])')
             then 'Bad'
             else 'Okay'
        end valid
  ,rowid row_id from tbl1)
 where valid = 'Bad')
select cte.row_id, t1.*
from tbl1 t1, cte
where cte.row_id = t1.towid;
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39840595
Depending on just how bad the data is, even the above post can have problems. It assumes ONLY 8 numeric characters.

If the field is declared as varcahr2(10) and someone can fill up the entire field, it can be 'fooled' into thinking it is valid.

Granted, you could add a TON of other cases to the case statement but I feel the best is let Oracle tell you what is and isn't a valid date with a user defined function.

The example below shows here the above post will show bad data as valid and the function working.

Now, if you can have multiple strings that need to be considered 'valid',  there is a tweak to the function that loops through different formats but even this method can be fooled.

For example:  User enters data in yyyyddmm format, the database expects yyyymmdd format.  What does 20140102 become?  It is considered 'valid' given BOTH formats above but mean totally different dates.


drop table tab1 purge;
create table tab1 (col1 varchar2(10));

insert into tab1 values('9920140229');
insert into tab1 values('20140101');
commit;


create or replace function validdate(p_str in varchar2)
return number
is
	v_date date;
begin
	v_date := to_date(p_str,'yyyymmdd');
	return 1;
	exception when others then return 0;
end;
/

show errors


-- above example returns no rows, therefore nothing 'bad'.
With cte as
(select row_id from
 (select col1,
        case when   to_number(substr(col1,5,2)) = '02' and
                    to_number(substr(col1,7,2)) > '29'
             then 'Bad'
             when (mod(to_number(substr(col1,1,4)),4) != 0 or
                   mod(to_number(substr(col1,1,4)),100) = 0) and
                   (to_number(substr(col1,5,2)) = '02' and
                    to_number(substr(col1,7,2)) > '28')
             then 'Bad'
             when not regexp_like(col1,'(19|20)[0-9]{2}(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])')
             then 'Bad'
             else 'Okay'
        end valid
  ,rowid row_id from tab1)
 where valid = 'Bad')
select cte.row_id, t1.*
from tab1 t1, cte
where cte.row_id = t1.rowid; 



select row_id, col1 from (
	select rowid row_id, col1, validdate(col1) validdate from tab1
)
where validdate=0
/

Open in new window

0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39883800
soccerplayer,

Even though you seem to have gone with regular expressions and Alex was the first to suggest them, he really never posted anything as a follow up.

I really don't feel he should receive all the points.

Please explain why you closed this question the way you did.  If you don't, I'll ask a Moderator to take a look at the question.
0
 
LVL 13

Expert Comment

by:Alexander Eßer [Alex140181]
ID: 39885022
I agree with slightwv, so please re-open & split the points.
0
 
LVL 13

Expert Comment

by:Alexander Eßer [Alex140181]
ID: 39901367
nice SPLIT btw....
0
 

Author Comment

by:soccerplayer
ID: 39935705
I've requested that this question be closed as follows:

Accepted answer: 0 points for soccerplayer's comment #a39839465
Assisted answer: 250 points for Alex140181's comment #a39837457
Assisted answer: 250 points for slightwv's comment #a39837627

for the following reason:

Thanks to both of you.
0
 
LVL 13

Expert Comment

by:Alexander Eßer [Alex140181]
ID: 39936217
Yet no split... Try again ;-)
0
 
LVL 13

Expert Comment

by:Alexander Eßer [Alex140181]
ID: 39936228
eenookami,

I am really not after the points, but see ID 39935705:
Accepted answer: 0 points for soccerplayer's comment #a39839465
Assisted answer: 250 points for Alex140181's comment #a39837457
Assisted answer: 250 points for slightwv's comment #a39837627

But he only accepted ID 39883800?!?
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Need help with Oracle syntax 4 42
Oracle Database Upgrade 13 41
null value 15 65
Oracle Subquery bad Join 11 43
How to Create User-Defined Aggregates in Oracle Before we begin creating these things, what are user-defined aggregates?  They are a feature introduced in Oracle 9i that allows a developer to create his or her own functions like "SUM", "AVG", and…
This post first appeared at Oracleinaction  (http://oracleinaction.com/undo-and-redo-in-oracle/)by Anju Garg (Myself). I  will demonstrate that undo for DML’s is stored both in undo tablespace and online redo logs. Then, we will analyze the reaso…
This video shows how to Export data from an Oracle database using the Original Export Utility.  The corresponding Import utility, which works the same way is referenced, but not demonstrated.
This video explains what a user managed backup is and shows how to take one, providing a couple of simple example scripts.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now