asked on

return value not equal to another column

hi i have a table which got duplicate column

empno_db1

empno_db2

i what to write a query which will return value

when empno_db1 <> empno_db2

table script

  create table empt(empno_db1 number(6),  
                                empno_db2 number(6),  
                                deptno_db1 varchar(30),  
                                depatno_db2 varchar(30),  
                                transDate_db1 date,  
                                transDate_db2  date,  
                                receiptdate_db1 date,  
                                receiptdate_db2 date)

Open in new window

value in table

empno_db1 empno_db2 deptno_db1 depatno_db2 transDate_db1 transDate_db2 receiptdate_db1 receiptdate_db2

001 001 100 100 11/12/2015 11/12/2015 12/12/2015 12/12/2015

002 002 001 002 13/12/2015 09/12/2015 12/12/2015 12/12/2015

i only what to return value like this

empno_db1 empno_db2 deptno_db1 depatno_db2 transDate_db1 transDate_db2 receiptdate_db1 receiptdate_db2

002 002 001 002 13/12/2015 09/12/2015 12/12/2015 12/12/2015

Mlanda T

select empno_db1, empno_db2, deptno_db1, depatno_db2, transDate_db1, transDate_db2, receiptdate_db1, receiptdate_db2
from empt
where empno_db1 <> empno_db2

Open in new window

This query would do it. However, in your example of what you want to return... both empno_db1 and empno_db2 seem to have a value of 002???... which actually means the query should not return this row?

chalie001

ASKER

i have 50 columns with about 100 000 records

Mlanda T

Am not sure what the implication of your statement is.... the number of rows should not matter to the SELECT. You can list all your columns on the SELECT statement as shown.

chalie001

ASKER

wht if the other columns got null

must i do this in all 50 columns
where empno_db1 <> empno_db2

chalie001

ASKER

i also what to check all other columns not just empno_db1 empno_db2

Mlanda T

Ah! Ok. Had not realised you wanted to do the rest of the columns. You can use the NVL function to help deal with NULL values. The NVL function converts a NULL to a known value. NVL(x, value) returns value if x is null, otherwise, x is returned. Note that 'value' has to be consistent with the data type of that column. Add an OR condition for each pair of columns you want to compare.

select empno_db1, empno_db2, deptno_db1, depatno_db2, transDate_db1, transDate_db2, receiptdate_db1, receiptdate_db2
from empt
where NVL(empno_db1, 0) <> NVL(empno_db2, 0)
     or NVL(deptno_db1, '') <> NVL(deptno_db2, '')
     or NVL(transDate_db1, to_date('19000101','YYYYMMDD')) <> NVL(transDate_db2, to_date('19000101','YYYYMMDD'))

Open in new window

chalie001

ASKER

i have 50 columns with about 100 000 records, i can't hard code date values there is a lot

Gerwin Jansen

Your requirements are unclear, in the question you say:

1 i what to write a query which will return value when empno_db1 <> empno_db2

in a later comment you say:

2 i also what to check all other columns not just empno_db1 empno_db2

The solution to your first question is:

select * from empt where empno_db1 <> empno_db2;

What do you want with the other columns? Which other columns do you want to compare against which (other) columns?

chalie001

ASKER

something like this
where empno_db1 <> empno_db2
or (empno_db1 is null and empno_db2 is not null)
or (empno_db1 is not null and empno_db2 is null)

but must i do this to all 50 columns

chalie001

ASKER

this other columns
deptno_db1, depatno_db2, transDate_db1, transDate_db2, receiptdate_db1, receiptdate_db

Gerwin Jansen

Ok, so empno, deptno, transDate and receipt date - these are 8 columns, how to we get to 50?

Mlanda T

The version I gave you solves any issues related to having a null in either one or both columns for any of the comparisons you need to do. By using the NLV function, you simplify your query structure. There is a slight performance knock though. But it works well. You then don't need to do this

where empno_db1 <> empno_db2
or (empno_db1 is null and empno_db2 is not null)
or (empno_db1 is not null and empno_db2 is null)

Open in new window

slightwv (䄆 Netminder)

>> NVL(deptno_db1, '') <> NVL(deptno_db2, '')
>>By using the NLV function, you simplify your query structure.

This will NEVER work. Two empty single quotes in Oracle is the same as a null.

So NVL(deptno_db1, '') will return a null if deptno_db1 is null. Then a null cannot '=' or '<>' anything so the statement is invalid.

>>but must i do this to all 50 columns

Is there a reason you do not use the MINUS that was suggested in several of your previous columns?

Hopefully you now see how difficult it is to code for ALL the columns when they are side-by-side.

PortletPaul

a "report" of 100 columns (50 columns for each table side by side) is just not going to be worth the effort you will have to devote the query that produces it.

What will you do with this output?

I also recommend you don't try NVL(column,'') = ''
it will not work e.g.

**Oracle 11g R2 Schema Setup**:

    CREATE TABLE test
        (ID int, ASTRING VARCHAR(20))
        ;
    INSERT ALL 
        INTO test (ID, ASTRING) 
             VALUES (1,'')
        INTO test (ID, ASTRING) 
             VALUES (2,NULL)
    SELECT * FROM dual
    ;
    
**Query 1**:

    select *
    from test
    where NVL(astring,'') IS NULL
    

**[Results][2]**:
    | ID | ASTRING |
    |----|---------|
    |  1 |  (null) |
    |  2 |  (null) |
**Query 2**:

    
    
    select *
    from test
    where NVL(astring,'') = ''
    

**[Results][3]**:

  [1]: http://sqlfiddle.com/#!4/490bf/1
  [2]: http://sqlfiddle.com/#!4/490bf/1/0
  [3]: http://sqlfiddle.com/#!4/490bf/1/1

Open in new window

Mlanda T

This will NEVER work. Two empty single quotes in Oracle is the same as a null.

With a little ingenuity, one can make NLV work. If '' is null, then just any other string marker which you do not have in your data. Use NLV(field, 'GruHasABigNose') if it helps. The idea is to eliminate the nulls to make life easier. It is a model worth building on.

Mlanda T

Fiddle Playground: http://www.sqlfiddle.com/#!4/3e45c/11

Create table

CREATE TABLE test (
     ID int, 
     ValA1 VARCHAR(20), ValA2 VARCHAR(20),
     ValB1 NUMERIC(10), ValB2 NUMERIC(10),
     ValC1 DATE, ValC2 DATE
);

Open in new window

Create dummy data and test with queries.
Still using the NVL function. Am using a SYS_ID to generate a unique identifier and using that in the NLV, since the experts said that NVL(fld, '') will not work.

INSERT ALL 
INTO test (ID, ValA1, ValA2, ValB1, ValB2, ValC1, ValC2) VALUES( 1,'', 'Alpha', 12, 14, TO_DATE('2015-01-01', 'yyyy-mm-dd'), TO_DATE('2015-02-01', 'yyyy-mm-dd') )
INTO test (ID, ValA1, ValA2, ValB1, ValB2, ValC1, ValC2) VALUES( 2,'Alpha', 'Alpha', 12, 12, null, TO_DATE('2015-02-01', 'yyyy-mm-dd') )
INTO test (ID, ValA1, ValA2, ValB1, ValB2, ValC1, ValC2) VALUES( 3,'Omega', 'Alpha', 12, 14, TO_DATE('2015-02-01', 'yyyy-mm-dd'), TO_DATE('2015-02-01', 'yyyy-mm-dd') )
INTO test (ID, ValA1, ValA2, ValB1, ValB2, ValC1, ValC2) VALUES( 4, NULL, 'Alpha', null, null, null, null )
INTO test (ID, ValA1, ValA2, ValB1, ValB2, ValC1, ValC2) VALUES( 5,'Omega', 'Omega', 0, 14, TO_DATE('2015-01-01', 'yyyy-mm-dd'), null )
INTO test (ID, ValA1, ValA2, ValB1, ValB2, ValC1, ValC2) VALUES( 6,'Theta', 'Theta', null, 13, TO_DATE('2015-02-01', 'yyyy-mm-dd'), TO_DATE('2015-02-01', 'yyyy-mm-dd') )
INTO test (ID, ValA1, ValA2, ValB1, ValB2, ValC1, ValC2) VALUES( 7,'Beta', 'Alpha', 12, 14, null, TO_DATE('2015-02-01', 'yyyy-mm-dd') )
INTO test (ID, ValA1, ValA2, ValB1, ValB2, ValC1, ValC2) VALUES( 8,'Beta', 'Beta', 2, 14, TO_DATE('2015-01-01', 'yyyy-mm-dd'), TO_DATE('2015-02-01', 'yyyy-mm-dd') )
INTO test (ID, ValA1, ValA2, ValB1, ValB2, ValC1, ValC2) VALUES( 9,null,null, null, 14, TO_DATE('2015-02-01', 'yyyy-mm-dd'), TO_DATE('2015-02-01', 'yyyy-mm-dd') )
INTO test (ID, ValA1, ValA2, ValB1, ValB2, ValC1, ValC2) VALUES( 10,'Match', 'Match', 5, 5, TO_DATE('2015-02-01', 'yyyy-mm-dd'), TO_DATE('2015-02-01', 'yyyy-mm-dd') )
INTO test (ID, ValA1, ValA2, ValB1, ValB2, ValC1, ValC2) VALUES( 11,'Match', 'Match', 5, 5, TO_DATE('2015-02-01', 'yyyy-mm-dd'), TO_DATE('2015-02-01', 'yyyy-mm-dd') )
INTO test (ID, ValA1, ValA2, ValB1, ValB2, ValC1, ValC2) VALUES( 12,'Match', 'Match', 5, 5, TO_DATE('2015-02-01', 'yyyy-mm-dd'), TO_DATE('2015-02-01', 'yyyy-mm-dd') )
SELECT * FROM dual;

/*Creating a column to flag mismatches. Returns all data*/
SELECT 
    ID,
    ValA1, ValA2, CASE WHEN NVL(ValA1, '57831A90-6723-432B-82D3-3ED93C835FFA') != NVL(ValA2, '57831A90-6723-432B-82D3-3ED93C835FFA') THEN 1 ELSE 0 END as ValAResult,
    ValB1, ValB2, CASE WHEN NVL(ValB1, 0) != NVL(ValB2, 0) THEN 1 ELSE 0 END as ValBResult,
    ValC1, ValC2, CASE WHEN NVL(ValC1, TO_DATE('1900-01-01', 'yyyy-mm-dd')) != NVL(ValC2, TO_DATE('1900-01-01', 'yyyy-mm-dd')) THEN 1 ELSE 0 END as ValCResult
FROM test;

/*Filter data to return only where there is a mismatch*/
SELECT 
    ID,
    ValA1, ValA2,
    ValB1, ValB2,
    ValC1, ValC2
FROM test
WHERE
    CASE WHEN NVL(ValA1, '57831A90-6723-432B-82D3-3ED93C835FFA') != NVL(ValA2, '57831A90-6723-432B-82D3-3ED93C835FFA') THEN 1 ELSE 0 END = 1
    OR CASE WHEN NVL(ValB1, 0) != NVL(ValB2, 0) THEN 1 ELSE 0 END = 1
    OR CASE WHEN NVL(ValC1, TO_DATE('1900-01-01', 'yyyy-mm-dd')) != NVL(ValC2, TO_DATE('1900-01-01', 'yyyy-mm-dd')) THEN 1 ELSE 0 END = 1;

Open in new window

chalie001

ASKER

the reason am not using minus is i what to see both data from both diffirent database so i can make decision which data to update in production the minus just show the diffirent values but does not show what the other column data it is

johnsone

To me, the issue here is generating the where clause. It seems like a daunting task to do by hand. However, there are really 3 things you need to generate:

1.

column1 != column2

2.

column1 is null and column2 is not null

3.

column1 is not null and column2 is nullSo, why do it by hand? Have the database generate those statements for you. This code should generate those 3 clauses for you.

SELECT 'or ' 
       ||column_name 
       || '_db1 != ' 
       || column_name 
       || '_db2' 
FROM   (SELECT DISTINCT Substr(column_name, 1, Instr(column_name, '_', -1) - 1) 
                               column_name 
        FROM   user_tab_columns 
        WHERE  table_name = 'EMPT') c 
UNION ALL 
SELECT 'or ' 
       ||column_name 
       || '_db1 is null and ' 
       || column_name 
       || '_db2 is not null' 
FROM   (SELECT DISTINCT Substr(column_name, 1, Instr(column_name, '_', -1) - 1) 
                               column_name 
        FROM   user_tab_columns 
        WHERE  table_name = 'EMPT') c 
UNION ALL 
SELECT 'or ' 
       ||column_name 
       || '_db1 is not null and ' 
       || column_name 
       || '_db2 is null' 
FROM   (SELECT DISTINCT Substr(column_name, 1, Instr(column_name, '_', -1) - 1) 
                               column_name 
        FROM   user_tab_columns 
        WHERE  table_name = 'EMPT') c;

Open in new window

You hit USER_TAB_COLUMNS multiple times, but this is a one time thing and I don't think performance on generating the where clause is an issue. You are just going to have to change the word OR to WHERE on the first line and then add your SELECT and FROM clauses to the front.

From the sample table that is posted in the original question, that query generates these clauses:

'OR'||COLUMN_NAME||'_DB1!='||COLUMN_NAME||'_DB2'
--------------------------------------------------------------------------------
or TRANSDATE_db1 != TRANSDATE_db2
or RECEIPTDATE_db1 != RECEIPTDATE_db2
or DEPTNO_db1 != DEPTNO_db2
or DEPATNO_db1 != DEPATNO_db2
or EMPNO_db1 != EMPNO_db2
or TRANSDATE_db1 is null and TRANSDATE_db2 is not null
or RECEIPTDATE_db1 is null and RECEIPTDATE_db2 is not null
or DEPTNO_db1 is null and DEPTNO_db2 is not null
or DEPATNO_db1 is null and DEPATNO_db2 is not null
or EMPNO_db1 is null and EMPNO_db2 is not null
or TRANSDATE_db1 is not null and TRANSDATE_db2 is null
or RECEIPTDATE_db1 is not null and RECEIPTDATE_db2 is null
or DEPTNO_db1 is not null and DEPTNO_db2 is null
or DEPATNO_db1 is not null and DEPATNO_db2 is null
or EMPNO_db1 is not null and EMPNO_db2 is null

15 rows selected.

Open in new window

slightwv (䄆 Netminder)

>>With a little ingenuity, one can make NLV work

I do not disagree. I was just pointing out what was posted, as-is, would never work. If you use your 'trick' you just need to make sure the value you pick as filler will NEVER appear in the data itself. Not impossible but it takes some thought.

>>the reason am not using minus is i what to see both data from both diffirent database

What does your join look like?
What happens when db1 has a row db2 doesn't and db2 has a row db1 doesn't? Does what you have capture both ways?

Mlanda T

Not impossible but it takes some thought.

How well does a SYS_ID score here? Theoretically, they are globally unique. My Oracle skills are not 100%, but one could declare a variable, initialized to a SYS_ID and pass that around.

the reason am not using minus is i what to see both data from both diffirent database

This suggests a different table structure from the one posted in the original script (with the question) where the data columns are sitting in the same table. Either way, the only way to fulfill all the objectives laid out here is to use value comparisons as opposed to the MINUS approach.

slightwv (䄆 Netminder)

>>but one could declare a variable, initialized to a SYS_ID and pass that around.

I guess you mean SYS_GUID. That will work if you declare a variable. That requires code and not straight SQL.

Depending on how my question about missing rows is answered, everything so far can be done with straight SQL. Messy, yes, but it keeps everything in the SQL engine.

PortletPaul

>>"i what to see both data from both different database so i can make decision which data to update in production"

I would just like to suggest (again) that a "report" of 100 columns is quite unusable, you will spend a lot of time trying to figure out which of the columns is different. You are very likely to make mistakes using such a layout.

I would recommend breaking this into several reports that are easier to read and use (less columns). You might divide then by data types perhaps? (strings, dates, numbers and so on)

One other thing that concerns me, clearly the sample data and table structure (in the question) is way different to the actual situation. What I do not see in the sample is a unique row identifier. If you intend to update a production table, how will you ensure that you can update the correct row?

Please think through how you are going to perform the update. (e.g. will you create a table containing the wanted values?)

PortletPaul

on the mater of using NVL() instead of OR column IS NULL

I would NOT use NVL() here no matter how clever you get with the substitute value. As Johnsone has rightly pointed out you can generate the SQL anyway.

My viewpoint on this is drawn from a general dislike of using functions in the where clause as they are not "sargable" (they remove access to indexes/make the query slower) and most functions impose extra calculations for each row. All one needs to do is use OR column IS NULL to avoid those issues.

chalie001

ASKER

am returning 0 rows is this query

SELECT 'or ' 
       ||column_name 
       || '_S != ' 
       || column_name 
       || '_T' 
FROM   (SELECT DISTINCT Substr(column_name, 1, Instr(column_name, '_', -1) - 1) 
                               column_name 
        FROM   user_tab_columns 
        WHERE  table_name = 'mr_er_core_temp') c 
UNION ALL 
SELECT 'or ' 
       ||column_name 
       || '_S is null and ' 
       || column_name 
       || '_T is not null' 
FROM   (SELECT DISTINCT Substr(column_name, 1, Instr(column_name, '_', -1) - 1) 
                               column_name 
        FROM   user_tab_columns 
        WHERE  table_name = 'mr_er_core_temp') c 
UNION ALL 
SELECT 'or ' 
       ||column_name 
       || '_S is not null and ' 
       || column_name 
       || '_T is null' 
FROM   (SELECT DISTINCT Substr(column_name, 1, Instr(column_name, '_', -1) - 1) 
                               column_name 
        FROM   user_tab_columns 
        WHERE  table_name = 'mr_er_core_temp') c;

Open in new window

PortletPaul

The table name needs to be in UPPERCASE

SELECT DISTINCT Substr(column_name, 1, Instr(column_name, '_', -1) - 1)
column_name
FROM user_tab_columns
WHERE table_name = 'MR_ER_CORE_TEMP'

chalie001

ASKER

this query is returning everything if i select * from MR_ER_CORE_TEMP i get the ame data as am using OR from the above query

chalie001

ASKER

if value are the same i don't what to return tham only value which diffier must be return

ASKER CERTIFIED SOLUTION

johnsone

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

chalie001

ASKER

correct