Link to home
Start Free TrialLog in
Avatar of diannagibbs
diannagibbsFlag for United States of America

asked on

Oracle TO_NUMBER from varchar2 datatype

I'm trying to write a select for a report to clean up bad data.  The source is varchar2 and the target is varchar2 but goal is to load into a number datatype.  I have used CASE for the most part but when I add the ELSE to give me the good numbers, it fails.  So, I have read that probably reading from bottom up so this will not work.  Any ideas?  I was also thinking spaces but I've tried TRIM, LTRIM,RTRIM, DECODE and TRANSLATE and nothing works.  Any direction appreciated.

  2  to_number(CASE
  3  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%A%') then 9999999999
  4  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%B%') then 9999999999
  5  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%C%') then 9999999999
  6  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%D%') then 9999999999
  7  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%E%') then 9999999999
  8  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%F%') then 9999999999
  9  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%G%') then 9999999999
 10  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%H%') then 9999999999
 11  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%I%') then 9999999999
 12  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%J%') then 9999999999
 13  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%K%') then 9999999999
 14  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%L%') then 9999999999
 15  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%M%') then 9999999999
 16  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%N%') then 9999999999
 17  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%O%') then 9999999999
 18  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%P%') then 9999999999
 19  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%Q%') then 9999999999
 20  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%R%') then 9999999999
 21  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%S%') then 9999999999
 22  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%T%') then 9999999999
 23  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%U%') then 9999999999
 24  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%V%') then 9999999999
 25  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%W%') then 9999999999
 26  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%X%') then 9999999999
 27  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%Y%') then 9999999999
 28  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%Z%') then 9999999999
 29  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%.%') then 9999999999
 30  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%?%') then 9999999999
 31  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%_%') then 9999999999
 32  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%/%') then 9999999999
 33  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%*%') then 9999999999
 34  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%@%') then 9999999999
 35  WHEN PROVIDERS_DATA_DZ_COL like UPPER('%-%') then 9999999999
 36  WHEN PROVIDERS_DATA_DZ_COL like UPPER(' ') then 9999999999
 37  WHEN PROVIDERS_DATA_DZ_COL IS null then 9999999999
 40  from DI_PROVIDERS
 41  where rownum < 2
 42  order by 2
 43  /
ERROR at line 38:
ORA-00932: inconsistent datatypes: expected NUMBER got CHAR

SQL> select PROVIDERS_DATA_DZ_COL from DI_PROVIDERS where rownum < 2;

Avatar of Sean Stuber
Sean Stuber


and then remove the to_number around the CASE
Avatar of diannagibbs


That resolved the error but I would think I would see my PROVIDERS_DATA_DZ_COL value in the fix field, not 9999999999?    Which is my goal.  The FIX_DATA field with all the nine's would be the data to correct.  I feel like I'm missing something simple.

DI_ROW_ID         |NAME_FIRST_MIDDLE_LAST                  |PROVIDERS_DATA_DZ_COL                   |  FIX_DATA
AAZ9BIAL0AAAAMjAAA|Test Patient                      |962399                                  |9999999999
select case when regexp_replace(providers_data_dz_col,'[[:digit:]]','') is null
                   then 9999999999
                   else to_number(providers_data_dz_col)
           end fix_data
the regexp_replace above assumes your providers_data_dz_col is an integer if it's numeric.

rather than changing the value and looking at the result you could use regexp_like to simply evaluate it as-is

 case when not regexp_like(providers_data_dz_col,'[[:digit:]]')
                   then 9999999999
                   else to_number(providers_data_dz_col)

if your value might not be an integer then try this...

 case when not regexp_like(providers_data_dz_col,'^(\+|\-)?[0-9]*\.?[0-9]*$')
                   then 9999999999
                   else to_number(providers_data_dz_col)
In your original query the
WHEN PROVIDERS_DATA_DZ_COL like UPPER('%_%') then 9999999999
is the cause of the  9999999999 because _ means any single character in the like operation
you can change it to
WHEN PROVIDERS_DATA_DZ_COL like UPPER('%\_%') escape '\' then 9999999999

the regexp_like is of course a much nicer solution ,
though it lacks some natural  readability.
Thanks for all the great solutions, but I'm still not seeing what I would expect.  I would expect my PROVIDERS_DATA_DZ_COL  value to be in my FIX column.  So I'm guessing that there are spaces in this value and that is why it is not working correctly?  As I've mentioned, I've tried to use TRIM, LTRIM, RTIM, TRANSLAATE, SET TRIM ON, etc.  None of these seems to working for me.

PROVIDERS_DATA_DZ_COL                           FIX
------------------------------ --------------------
962399                                   9999999999
900324                                   9999999999
936633                                   9999999999
911941                                   9999999999
where/when are you applying trim

something like this?

case when not regexp_like(trim(providers_data_dz_col),'[[:digit:]]')
                   then 9999999999
                   else to_number(trim(providers_data_dz_col))
if you want to see invisible characters in that column use DUMP

where rownum < 10
LTRIM, RTIM example:
SQL> select
  3  (case when not regexp_like(LTRIM(RTRIM(providers_data_dz_col)),'^(\+|\-)?[0-9]*\.?[0-9]*$\')
  4                     then 9999999999
  5                     else to_number(providers_data_dz_col)
  6  end ) FIX
  7  from DI_MIDAS_PROVIDERS where rownum < 5;

PROVIDERS_DATA_DZ_COL                           FIX
------------------------------ --------------------
962399                                   9999999999
900324                                   9999999999
936633                                   9999999999
911941                                   9999999999
what does DUMP show you?
This DUMP is very, interesting and found my issue.  What would ^M,^J be?  Is that when you move data as binary instead of ascii?

---------------------------------------- -------------------------------------------------------------
962399                                   Typ=1 Len=8 CharacterSet=US7ASCII: 9,6,2,3,9,9,^M,^J
900324                                   Typ=1 Len=8 CharacterSet=US7ASCII: 9,0,0,3,2,4,^M,^J
936633                                   Typ=1 Len=8 CharacterSet=US7ASCII: 9,3,6,6,3,3,^M,^J
911941                                   Typ=1 Len=8 CharacterSet=US7ASCII: 9,1,1,9,4,1,^M,^J
9119587                                  Typ=1 Len=9 CharacterSet=US7ASCII: 9,1,1,9,5,8,7,^M,^J
915623                                   Typ=1 Len=8 CharacterSet=US7ASCII: 9,1,5,6,2,3,^M,^J
014159                                   Typ=1 Len=8 CharacterSet=US7ASCII: 0,1,4,1,5,9,^M,^J
13755                                    Typ=1 Len=7 CharacterSet=US7ASCII: 1,3,7,5,5,^M,^J
936369                                   Typ=1 Len=8 CharacterSet=US7ASCII: 9,3,6,3,6,9,^M,^J
here's a small example showing the code does work  note the "9" in the 1234 column

SQL> create table di_midas_providers (PROVIDERS_DATA_DZ_COL varchar2(50));

Table created.

SQL> insert into di_midas_providers (PROVIDERS_DATA_DZ_COL) values ('962399');

1 row created.

SQL> insert into di_midas_providers (PROVIDERS_DATA_DZ_COL) values ('900324');

1 row created.

SQL> insert into di_midas_providers (PROVIDERS_DATA_DZ_COL) values ('936633');

1 row created.

SQL> insert into di_midas_providers (PROVIDERS_DATA_DZ_COL) values ('911941');

1 row created.

SQL> insert into di_midas_providers (PROVIDERS_DATA_DZ_COL) values ('invalid');

1 row created.

SQL> insert into di_midas_providers (PROVIDERS_DATA_DZ_COL) values ('1234'||chr(9));

1 row created.

SQL> SELECT providers_data_dz_col,
  2         DUMP(providers_data_dz_col) dump_output,
  3         (CASE
  4              WHEN NOT REGEXP_LIKE(LTRIM(RTRIM(providers_data_dz_col)), '^(\+|\-)?[0-9]*\.?[0-9]*$\')
  5              THEN
  6                  9999999999
  7              ELSE
  8                  TO_NUMBER(providers_data_dz_col)
  9          END)
 10             fix
 11    FROM di_midas_providers;

PROVIDERS_DATA_DZ_COL                              DUMP_OUTPUT                                            FIX
-------------------------------------------------- -------------------------------------------------- ----------
962399                                             Typ=1 Len=6: 57,54,50,51,57,57                      962399
900324                                             Typ=1 Len=6: 57,48,48,51,50,52                      900324
936633                                             Typ=1 Len=6: 57,51,54,54,51,51                      936633
911941                                             Typ=1 Len=6: 57,49,49,57,52,49                      911941
invalid                                            Typ=1 Len=7: 105,110,118,97,108,105,100            9999999999
1234                                                  Typ=1 Len=5: 49,50,51,52,9                      9999999999

6 rows selected.

Open in new window

^M = chr(13) =  carriage return

^J=chr(10) = new line
Avatar of Sean Stuber
Sean Stuber

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Strangely enough, if there is a linefeed but no carriage return, the expression will match, but the to_number function will fail.
SQL> select '-123.456'||chr(10) as prov_data,
  2  case when not regexp_like('-123.456'||chr(10),'^(\+|\-)?[0-9]*\.?[0-9]*$')
  3       then 9999999999
  4       else to_number('-123.456')  ==> without the linefeed
  5  end fix
  6  from dual;

--------- ----------
-123.456    -123.456

SQL> select '-123.456'||chr(10) as prov_data,
  2  case when not regexp_like('-123.456'||chr(10),'^(\+|\-)?[0-9]*\.?[0-9]*$')
  3       then 9999999999
  4       else to_number('-123.456'||chr(10))
  5  end fix
  6  from dual;
     else to_number('-123.456'||chr(10)) ==> with the linefeed
ERROR at line 4:
ORA-01722: invalid number

However, the replace function (could also use translate) will remedy that in any case. I just found that match somewhat surprising.
check  the previous post where I explicitly check for any combination of space, carriage return or new line at the end of the string
This is exactly what I was looking for - thanks so much!!!!!
I did check the previous post, which is why I said the replace function will remedy that in any case. I should have said the replace function(s) that sdstuber employed will remedy that in any case. I was just pointing out the fact that I thouhght it was unusual that the regexp_like(expression,'^(\+|\-)?[0-9]*\.?[0-9]*$') would match any expression
that contained a linefeed as long as the rest of the expression could be converted to a number.