huzefaq
asked on
Oracle SQL distinct query
Hello All,
I have two tables in oracle database.
1) MOTDC_STG_SHIP_CONF_HEADER _TMP
2) MOTDC_STG_SHIP_CONF_HEADER
Both tables are identical, only first table (MOTDC_STG_SHIP_CONF_HEADE R_TMP) does not have any unique constraint and has lots of duplicate data. And 2nd Main table has unique contraint on transaction_id, CONTAINER_ID, CONTAINER_TYPE, and does not have duplicate data.
I want to select all distinct records based on transaction_id, CONTAINER_ID, CONTAINER_TYPE from MOTDC_STG_SHIP_CONF_HEADER _TMP table and insert into the MOTDC_STG_SHIP_CONF_HEADER table.
Schema for MOTDC_STG_SHIP_CONTAINER
CREATE TABLE MOTDC_STG_SHIP_CONTAINER
(
TRANSACTION_ID NUMBER,
CONTAINER_TYPE VARCHAR2(10 BYTE),
CONTAINER_ID VARCHAR2(40 BYTE),
CONTAINER_LENGTH NUMBER(9,3),
CONTAINER_WIDTH NUMBER(9,3),
CONTAINER_HEIGHT NUMBER(9,3),
CONTAINER_WEIGHT NUMBER(9,3) NOT NULL,
CONTAINER_PACK_TYPE VARCHAR2(10 BYTE),
CONTAINER_TRK_NUM VARCHAR2(50 BYTE) NOT NULL,
CREATION_DATE DATE NOT NULL,
CREATED_BY NUMBER NOT NULL,
LAST_UPDATE_DATE DATE NOT NULL,
LAST_UPDATED_BY NUMBER NOT NULL
)
Schema for MOTDC_STG_SHIP_CONTAINER_T MP
CREATE TABLE MOTDC_STG_SHIP_CONTAINER_T MP
(
TRANSACTION_ID NUMBER,
CONTAINER_TYPE VARCHAR2(10 BYTE),
CONTAINER_ID VARCHAR2(40 BYTE),
CONTAINER_LENGTH NUMBER(9,3),
CONTAINER_WIDTH NUMBER(9,3),
CONTAINER_HEIGHT NUMBER(9,3),
CONTAINER_WEIGHT NUMBER(9,3) NOT NULL,
CONTAINER_PACK_TYPE VARCHAR2(10 BYTE),
CONTAINER_TRK_NUM VARCHAR2(50 BYTE) NOT NULL,
CREATION_DATE DATE NOT NULL,
CREATED_BY NUMBER NOT NULL,
LAST_UPDATE_DATE DATE NOT NULL,
LAST_UPDATED_BY NUMBER NOT NULL
)
I have two tables in oracle database.
1) MOTDC_STG_SHIP_CONF_HEADER
2) MOTDC_STG_SHIP_CONF_HEADER
Both tables are identical, only first table (MOTDC_STG_SHIP_CONF_HEADE
I want to select all distinct records based on transaction_id, CONTAINER_ID, CONTAINER_TYPE from MOTDC_STG_SHIP_CONF_HEADER
Schema for MOTDC_STG_SHIP_CONTAINER
CREATE TABLE MOTDC_STG_SHIP_CONTAINER
(
TRANSACTION_ID NUMBER,
CONTAINER_TYPE VARCHAR2(10 BYTE),
CONTAINER_ID VARCHAR2(40 BYTE),
CONTAINER_LENGTH NUMBER(9,3),
CONTAINER_WIDTH NUMBER(9,3),
CONTAINER_HEIGHT NUMBER(9,3),
CONTAINER_WEIGHT NUMBER(9,3) NOT NULL,
CONTAINER_PACK_TYPE VARCHAR2(10 BYTE),
CONTAINER_TRK_NUM VARCHAR2(50 BYTE) NOT NULL,
CREATION_DATE DATE NOT NULL,
CREATED_BY NUMBER NOT NULL,
LAST_UPDATE_DATE DATE NOT NULL,
LAST_UPDATED_BY NUMBER NOT NULL
)
Schema for MOTDC_STG_SHIP_CONTAINER_T
CREATE TABLE MOTDC_STG_SHIP_CONTAINER_T
(
TRANSACTION_ID NUMBER,
CONTAINER_TYPE VARCHAR2(10 BYTE),
CONTAINER_ID VARCHAR2(40 BYTE),
CONTAINER_LENGTH NUMBER(9,3),
CONTAINER_WIDTH NUMBER(9,3),
CONTAINER_HEIGHT NUMBER(9,3),
CONTAINER_WEIGHT NUMBER(9,3) NOT NULL,
CONTAINER_PACK_TYPE VARCHAR2(10 BYTE),
CONTAINER_TRK_NUM VARCHAR2(50 BYTE) NOT NULL,
CREATION_DATE DATE NOT NULL,
CREATED_BY NUMBER NOT NULL,
LAST_UPDATE_DATE DATE NOT NULL,
LAST_UPDATED_BY NUMBER NOT NULL
)
taking distinct data from _TMP table ( which has duplicates ) to _HEADER table is easy but you need to tell what values should
go into all other fields apart from transaction_id, CONTAINER_ID, CONTAINER_TYPE when insert into _HEADER table.
let us say if there are 2 records in _TMP with same transaction_id, CONTAINER_ID, CONTAINER_TYPE in _TMP table but other fields have
different values ( say container_weight has 100 for 1 record and 300 for 2nd record ). which value should we use for the record
which we are going to insert into _HEADER table.
similary, you need to tell based on what logic should we pick up the rest of the fields like as i said above.
Thanks
go into all other fields apart from transaction_id, CONTAINER_ID, CONTAINER_TYPE when insert into _HEADER table.
let us say if there are 2 records in _TMP with same transaction_id, CONTAINER_ID, CONTAINER_TYPE in _TMP table but other fields have
different values ( say container_weight has 100 for 1 record and 300 for 2nd record ). which value should we use for the record
which we are going to insert into _HEADER table.
similary, you need to tell based on what logic should we pick up the rest of the fields like as i said above.
Thanks
the below will work as long as we do not get more than 1 record for the same LAST_UPDATE_DATE and for the same TRANSACTION_ID, CONTAINER_TYPE, CONTAINER_ID.
If we have 2 records with same value to all the 4 fields then the insert will fail because of unique constraint error in the table which we are inserting.
INSERT INTO MOTDC_STG_SHIP_CONTAINER
SELECT a.* FROM
MOTDC_STG_SHIP_CONTAINER_T MP a
JOIN (SELECT MAX(LAST_UPDATE_DATE) as last, TRANSACTION_ID, CONTAINER_TYPE, CONTAINER_ID FROM MOTDC_STG_SHIP_CONTAINER_T MP GROUP BY TRANSACTION_ID, CONTAINER_TYPE, CONTAINER_ID) b on a.TRANSACTION_ID = b.TRANSACTION_ID AND a.CONTAINER_TYPE = b.CONTAINER_TYPE AND a.CONTAINER_ID = b.CONTAINER_ID and a.LAST_UPDATE_DATE = b.last
If we have 2 records with same value to all the 4 fields then the insert will fail because of unique constraint error in the table which we are inserting.
INSERT INTO MOTDC_STG_SHIP_CONTAINER
SELECT a.* FROM
MOTDC_STG_SHIP_CONTAINER_T
JOIN (SELECT MAX(LAST_UPDATE_DATE) as last, TRANSACTION_ID, CONTAINER_TYPE, CONTAINER_ID FROM MOTDC_STG_SHIP_CONTAINER_T
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I wouldn't do a double query on your temp table. Simple analytics will let you pick one row from each duplicate and copy it over.
You didn't specify which rows you wanted to keep out of your duplicates though
so for my example below, I simply pick the first one, based on last_update_date, for each
transaction_id,container_i d,containe r_type combination.
This should run more efficiently than a self join or query with requery
INSERT INTO motdc_stg_ship_container
(transaction_id, container_type, container_id, container_length,
container_width, container_height, container_weight,
container_pack_type, container_trk_num, creation_date,
created_by, last_update_date, last_updated_by)
SELECT transaction_id, container_type, container_id, container_length,
container_width, container_height, container_weight,
container_pack_type, container_trk_num, creation_date, created_by,
last_update_date, last_updated_by
FROM (SELECT transaction_id, container_type, container_id,
container_length, container_width, container_height,
container_weight, container_pack_type, container_trk_num,
creation_date, created_by, last_update_date,
last_updated_by,
ROW_NUMBER() OVER(PARTITION BY transaction_id, container_id, container_type ORDER BY last_update_date)
rn
FROM motdc_stg_ship_container_t mp)
WHERE rn = 1
You didn't specify which rows you wanted to keep out of your duplicates though
so for my example below, I simply pick the first one, based on last_update_date, for each
transaction_id,container_i
This should run more efficiently than a self join or query with requery
INSERT INTO motdc_stg_ship_container
(transaction_id, container_type, container_id, container_length,
container_width, container_height, container_weight,
container_pack_type, container_trk_num, creation_date,
created_by, last_update_date, last_updated_by)
SELECT transaction_id, container_type, container_id, container_length,
container_width, container_height, container_weight,
container_pack_type, container_trk_num, creation_date, created_by,
last_update_date, last_updated_by
FROM (SELECT transaction_id, container_type, container_id,
container_length, container_width, container_height,
container_weight, container_pack_type, container_trk_num,
creation_date, created_by, last_update_date,
last_updated_by,
ROW_NUMBER() OVER(PARTITION BY transaction_id, container_id, container_type ORDER BY last_update_date)
rn
FROM motdc_stg_ship_container_t
WHERE rn = 1
SELECT a.* FROM
MOTDC_STG_SHIP_CONTAINER_T
JOIN (SELECT MAX(LAST_UPDATE_DATE) as last, TRANSACTION_ID, CONTAINER_TYPE, CONTAINER_ID FROM MOTDC_STG_SHIP_CONTAINER_T