diannagibbs
asked on
Oracle TO_NUMBER from varchar2 datatype
I'm trying to write a select for a report to clean up bad data. The source is varchar2 and the target is varchar2 but goal is to load into a number datatype. I have used CASE for the most part but when I add the ELSE to give me the good numbers, it fails. So, I have read that probably reading from bottom up so this will not work. Any ideas? I was also thinking spaces but I've tried TRIM, LTRIM,RTRIM, DECODE and TRANSLATE and nothing works. Any direction appreciated.
SQL> select rowid DI_ROW_ID, NAME_FIRST_MIDDLE_LAST, PROVIDERS_DATA_DZ_COL,
2 to_number(CASE
3 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%A%') then 9999999999
4 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%B%') then 9999999999
5 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%C%') then 9999999999
6 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%D%') then 9999999999
7 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%E%') then 9999999999
8 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%F%') then 9999999999
9 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%G%') then 9999999999
10 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%H%') then 9999999999
11 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%I%') then 9999999999
12 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%J%') then 9999999999
13 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%K%') then 9999999999
14 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%L%') then 9999999999
15 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%M%') then 9999999999
16 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%N%') then 9999999999
17 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%O%') then 9999999999
18 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%P%') then 9999999999
19 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%Q%') then 9999999999
20 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%R%') then 9999999999
21 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%S%') then 9999999999
22 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%T%') then 9999999999
23 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%U%') then 9999999999
24 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%V%') then 9999999999
25 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%W%') then 9999999999
26 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%X%') then 9999999999
27 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%Y%') then 9999999999
28 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%Z%') then 9999999999
29 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%.%') then 9999999999
30 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%?%') then 9999999999
31 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%_%') then 9999999999
32 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%/%') then 9999999999
33 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%*%') then 9999999999
34 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%@%') then 9999999999
35 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%-%') then 9999999999
36 WHEN PROVIDERS_DATA_DZ_COL like UPPER(' ') then 9999999999
37 WHEN PROVIDERS_DATA_DZ_COL IS null then 9999999999
38 ELSE PROVIDERS_DATA_DZ_COL
39 END) DI_PROVIDERS_FIX
40 from DI_PROVIDERS
41 where rownum < 2
42 order by 2
43 /
ELSE PROVIDERS_DATA_DZ_COL
*
ERROR at line 38:
ORA-00932: inconsistent datatypes: expected NUMBER got CHAR
SQL> select PROVIDERS_DATA_DZ_COL from DI_PROVIDERS where rownum < 2;
PROVIDERS_DATA_DZ_COL
-------------------------- ---------- ----
962399
SQL> select rowid DI_ROW_ID, NAME_FIRST_MIDDLE_LAST, PROVIDERS_DATA_DZ_COL,
2 to_number(CASE
3 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%A%') then 9999999999
4 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%B%') then 9999999999
5 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%C%') then 9999999999
6 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%D%') then 9999999999
7 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%E%') then 9999999999
8 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%F%') then 9999999999
9 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%G%') then 9999999999
10 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%H%') then 9999999999
11 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%I%') then 9999999999
12 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%J%') then 9999999999
13 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%K%') then 9999999999
14 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%L%') then 9999999999
15 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%M%') then 9999999999
16 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%N%') then 9999999999
17 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%O%') then 9999999999
18 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%P%') then 9999999999
19 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%Q%') then 9999999999
20 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%R%') then 9999999999
21 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%S%') then 9999999999
22 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%T%') then 9999999999
23 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%U%') then 9999999999
24 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%V%') then 9999999999
25 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%W%') then 9999999999
26 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%X%') then 9999999999
27 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%Y%') then 9999999999
28 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%Z%') then 9999999999
29 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%.%') then 9999999999
30 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%?%') then 9999999999
31 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%_%') then 9999999999
32 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%/%') then 9999999999
33 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%*%') then 9999999999
34 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%@%') then 9999999999
35 WHEN PROVIDERS_DATA_DZ_COL like UPPER('%-%') then 9999999999
36 WHEN PROVIDERS_DATA_DZ_COL like UPPER(' ') then 9999999999
37 WHEN PROVIDERS_DATA_DZ_COL IS null then 9999999999
38 ELSE PROVIDERS_DATA_DZ_COL
39 END) DI_PROVIDERS_FIX
40 from DI_PROVIDERS
41 where rownum < 2
42 order by 2
43 /
ELSE PROVIDERS_DATA_DZ_COL
*
ERROR at line 38:
ORA-00932: inconsistent datatypes: expected NUMBER got CHAR
SQL> select PROVIDERS_DATA_DZ_COL from DI_PROVIDERS where rownum < 2;
PROVIDERS_DATA_DZ_COL
--------------------------
962399
ASKER
That resolved the error but I would think I would see my PROVIDERS_DATA_DZ_COL value in the fix field, not 9999999999? Which is my goal. The FIX_DATA field with all the nine's would be the data to correct. I feel like I'm missing something simple.
DI_ROW_ID |NAME_FIRST_MIDDLE_LAST |PROVIDERS_DATA_DZ_COL | FIX_DATA
------------------|------- ---------- ---------- ---------- ---|------ ---------- ---------- ---------- ----|----- -----
AAZ9BIAL0AAAAMjAAA|Test Patient |962399 |9999999999
DI_ROW_ID |NAME_FIRST_MIDDLE_LAST |PROVIDERS_DATA_DZ_COL | FIX_DATA
------------------|-------
AAZ9BIAL0AAAAMjAAA|Test Patient |962399 |9999999999
select case when regexp_replace(providers_d ata_dz_col ,'[[:digit :]]','') is null
then 9999999999
else to_number(providers_data_d z_col)
end fix_data
then 9999999999
else to_number(providers_data_d
end fix_data
the regexp_replace above assumes your providers_data_dz_col is an integer if it's numeric.
rather than changing the value and looking at the result you could use regexp_like to simply evaluate it as-is
case when not regexp_like(providers_data _dz_col,'[ [:digit:]] ')
then 9999999999
else to_number(providers_data_d z_col)
end
if your value might not be an integer then try this...
case when not regexp_like(providers_data _dz_col,'^ (\+|\-)?[0 -9]*\.?[0- 9]*$')
then 9999999999
else to_number(providers_data_d z_col)
end
rather than changing the value and looking at the result you could use regexp_like to simply evaluate it as-is
case when not regexp_like(providers_data
then 9999999999
else to_number(providers_data_d
end
if your value might not be an integer then try this...
case when not regexp_like(providers_data
then 9999999999
else to_number(providers_data_d
end
In your original query the
WHEN PROVIDERS_DATA_DZ_COL like UPPER('%_%') then 9999999999
is the cause of the 9999999999 because _ means any single character in the like operation
you can change it to
WHEN PROVIDERS_DATA_DZ_COL like UPPER('%\_%') escape '\' then 9999999999
the regexp_like is of course a much nicer solution ,
though it lacks some natural readability.
WHEN PROVIDERS_DATA_DZ_COL like UPPER('%_%') then 9999999999
is the cause of the 9999999999 because _ means any single character in the like operation
you can change it to
WHEN PROVIDERS_DATA_DZ_COL like UPPER('%\_%') escape '\' then 9999999999
the regexp_like is of course a much nicer solution ,
though it lacks some natural readability.
ASKER
Thanks for all the great solutions, but I'm still not seeing what I would expect. I would expect my PROVIDERS_DATA_DZ_COL value to be in my FIX column. So I'm guessing that there are spaces in this value and that is why it is not working correctly? As I've mentioned, I've tried to use TRIM, LTRIM, RTIM, TRANSLAATE, SET TRIM ON, etc. None of these seems to working for me.
PROVIDERS_DATA_DZ_COL FIX
-------------------------- ---- --------------------
962399 9999999999
900324 9999999999
936633 9999999999
911941 9999999999
PROVIDERS_DATA_DZ_COL FIX
--------------------------
962399 9999999999
900324 9999999999
936633 9999999999
911941 9999999999
where/when are you applying trim
something like this?
case when not regexp_like(trim(providers _data_dz_c ol),'[[:di git:]]')
then 9999999999
else to_number(trim(providers_d ata_dz_col ))
end
something like this?
case when not regexp_like(trim(providers
then 9999999999
else to_number(trim(providers_d
end
if you want to see invisible characters in that column use DUMP
select PROVIDERS_DATA_DZ_COL,dump (PROVIDERS _DATA_DZ_C OL) from DI_PROVIDERS
where rownum < 10
select PROVIDERS_DATA_DZ_COL,dump
where rownum < 10
ASKER
LTRIM, RTIM example:
SQL> select
2 PROVIDERS_DATA_DZ_COL,
3 (case when not regexp_like(LTRIM(RTRIM(pr oviders_da ta_dz_col) ),'^(\+|\- )?[0-9]*\. ?[0-9]*$\' )
4 then 9999999999
5 else to_number(providers_data_d z_col)
6 end ) FIX
7 from DI_MIDAS_PROVIDERS where rownum < 5;
PROVIDERS_DATA_DZ_COL FIX
-------------------------- ---- --------------------
962399 9999999999
900324 9999999999
936633 9999999999
911941 9999999999
SQL> select
2 PROVIDERS_DATA_DZ_COL,
3 (case when not regexp_like(LTRIM(RTRIM(pr
4 then 9999999999
5 else to_number(providers_data_d
6 end ) FIX
7 from DI_MIDAS_PROVIDERS where rownum < 5;
PROVIDERS_DATA_DZ_COL FIX
--------------------------
962399 9999999999
900324 9999999999
936633 9999999999
911941 9999999999
what does DUMP show you?
ASKER
This DUMP is very, interesting and found my issue. What would ^M,^J be? Is that when you move data as binary instead of ascii?
PROVIDERS_DATA_DZ_COL DUMP(PROVIDERS_DATA_DZ_COL ,1017)
-------------------------- ---------- ---- -------------------------- ---------- ---------- ---------- -----
962399 Typ=1 Len=8 CharacterSet=US7ASCII: 9,6,2,3,9,9,^M,^J
900324 Typ=1 Len=8 CharacterSet=US7ASCII: 9,0,0,3,2,4,^M,^J
936633 Typ=1 Len=8 CharacterSet=US7ASCII: 9,3,6,6,3,3,^M,^J
911941 Typ=1 Len=8 CharacterSet=US7ASCII: 9,1,1,9,4,1,^M,^J
9119587 Typ=1 Len=9 CharacterSet=US7ASCII: 9,1,1,9,5,8,7,^M,^J
915623 Typ=1 Len=8 CharacterSet=US7ASCII: 9,1,5,6,2,3,^M,^J
014159 Typ=1 Len=8 CharacterSet=US7ASCII: 0,1,4,1,5,9,^M,^J
13755 Typ=1 Len=7 CharacterSet=US7ASCII: 1,3,7,5,5,^M,^J
936369 Typ=1 Len=8 CharacterSet=US7ASCII: 9,3,6,3,6,9,^M,^J
PROVIDERS_DATA_DZ_COL DUMP(PROVIDERS_DATA_DZ_COL
--------------------------
962399 Typ=1 Len=8 CharacterSet=US7ASCII: 9,6,2,3,9,9,^M,^J
900324 Typ=1 Len=8 CharacterSet=US7ASCII: 9,0,0,3,2,4,^M,^J
936633 Typ=1 Len=8 CharacterSet=US7ASCII: 9,3,6,6,3,3,^M,^J
911941 Typ=1 Len=8 CharacterSet=US7ASCII: 9,1,1,9,4,1,^M,^J
9119587 Typ=1 Len=9 CharacterSet=US7ASCII: 9,1,1,9,5,8,7,^M,^J
915623 Typ=1 Len=8 CharacterSet=US7ASCII: 9,1,5,6,2,3,^M,^J
014159 Typ=1 Len=8 CharacterSet=US7ASCII: 0,1,4,1,5,9,^M,^J
13755 Typ=1 Len=7 CharacterSet=US7ASCII: 1,3,7,5,5,^M,^J
936369 Typ=1 Len=8 CharacterSet=US7ASCII: 9,3,6,3,6,9,^M,^J
here's a small example showing the code does work note the "9" in the 1234 column
SQL> create table di_midas_providers (PROVIDERS_DATA_DZ_COL varchar2(50));
Table created.
SQL> insert into di_midas_providers (PROVIDERS_DATA_DZ_COL) values ('962399');
1 row created.
SQL> insert into di_midas_providers (PROVIDERS_DATA_DZ_COL) values ('900324');
1 row created.
SQL> insert into di_midas_providers (PROVIDERS_DATA_DZ_COL) values ('936633');
1 row created.
SQL> insert into di_midas_providers (PROVIDERS_DATA_DZ_COL) values ('911941');
1 row created.
SQL> insert into di_midas_providers (PROVIDERS_DATA_DZ_COL) values ('invalid');
1 row created.
SQL> insert into di_midas_providers (PROVIDERS_DATA_DZ_COL) values ('1234'||chr(9));
1 row created.
SQL> SELECT providers_data_dz_col,
2 DUMP(providers_data_dz_col) dump_output,
3 (CASE
4 WHEN NOT REGEXP_LIKE(LTRIM(RTRIM(providers_data_dz_col)), '^(\+|\-)?[0-9]*\.?[0-9]*$\')
5 THEN
6 9999999999
7 ELSE
8 TO_NUMBER(providers_data_dz_col)
9 END)
10 fix
11 FROM di_midas_providers;
PROVIDERS_DATA_DZ_COL DUMP_OUTPUT FIX
-------------------------------------------------- -------------------------------------------------- ----------
962399 Typ=1 Len=6: 57,54,50,51,57,57 962399
900324 Typ=1 Len=6: 57,48,48,51,50,52 900324
936633 Typ=1 Len=6: 57,51,54,54,51,51 936633
911941 Typ=1 Len=6: 57,49,49,57,52,49 911941
invalid Typ=1 Len=7: 105,110,118,97,108,105,100 9999999999
1234 Typ=1 Len=5: 49,50,51,52,9 9999999999
6 rows selected.
^M = chr(13) = carriage return
^J=chr(10) = new line
^J=chr(10) = new line
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Strangely enough, if there is a linefeed but no carriage return, the expression will match, but the to_number function will fail.
SQL> select '-123.456'||chr(10) as prov_data,
2 case when not regexp_like('-123.456'||ch r(10),'^(\ +|\-)?[0-9 ]*\.?[0-9] *$')
3 then 9999999999
4 else to_number('-123.456') ==> without the linefeed
5 end fix
6 from dual;
PROV_DATA FIX
--------- ----------
-123.456 -123.456
SQL> select '-123.456'||chr(10) as prov_data,
2 case when not regexp_like('-123.456'||ch r(10),'^(\ +|\-)?[0-9 ]*\.?[0-9] *$')
3 then 9999999999
4 else to_number('-123.456'||chr( 10))
5 end fix
6 from dual;
else to_number('-123.456'||chr( 10)) ==> with the linefeed
*
ERROR at line 4:
ORA-01722: invalid number
However, the replace function (could also use translate) will remedy that in any case. I just found that match somewhat surprising.
SQL> select '-123.456'||chr(10) as prov_data,
2 case when not regexp_like('-123.456'||ch
3 then 9999999999
4 else to_number('-123.456') ==> without the linefeed
5 end fix
6 from dual;
PROV_DATA FIX
--------- ----------
-123.456 -123.456
SQL> select '-123.456'||chr(10) as prov_data,
2 case when not regexp_like('-123.456'||ch
3 then 9999999999
4 else to_number('-123.456'||chr(
5 end fix
6 from dual;
else to_number('-123.456'||chr(
*
ERROR at line 4:
ORA-01722: invalid number
However, the replace function (could also use translate) will remedy that in any case. I just found that match somewhat surprising.
check the previous post where I explicitly check for any combination of space, carriage return or new line at the end of the string
ASKER
This is exactly what I was looking for - thanks so much!!!!!
I did check the previous post, which is why I said the replace function will remedy that in any case. I should have said the replace function(s) that sdstuber employed will remedy that in any case. I was just pointing out the fact that I thouhght it was unusual that the regexp_like(expression,'^(\+|\-)?[0-9]*\.?[0-9]* $') would match any expression
that contained a linefeed as long as the rest of the expression could be converted to a number.
that contained a linefeed as long as the rest of the expression could be converted to a number.
and then remove the to_number around the CASE