Link to home
Start Free TrialLog in
Avatar of huzefaq
huzefaq

asked on

Oracle SQL distinct query

Hello All,

I have two tables in oracle database.
1) MOTDC_STG_SHIP_CONF_HEADER_TMP
2) MOTDC_STG_SHIP_CONF_HEADER

Both tables are identical, only first table (MOTDC_STG_SHIP_CONF_HEADER_TMP)  does not have any unique constraint and has lots of duplicate data.  And 2nd Main table has unique contraint on transaction_id, CONTAINER_ID, CONTAINER_TYPE, and does not have duplicate data.  

I want to select all distinct records based on transaction_id, CONTAINER_ID, CONTAINER_TYPE from MOTDC_STG_SHIP_CONF_HEADER_TMP table and insert into the MOTDC_STG_SHIP_CONF_HEADER table.

Schema for MOTDC_STG_SHIP_CONTAINER

CREATE TABLE MOTDC_STG_SHIP_CONTAINER
(
  TRANSACTION_ID       NUMBER,
  CONTAINER_TYPE       VARCHAR2(10 BYTE),
  CONTAINER_ID         VARCHAR2(40 BYTE),
  CONTAINER_LENGTH     NUMBER(9,3),
  CONTAINER_WIDTH      NUMBER(9,3),
  CONTAINER_HEIGHT     NUMBER(9,3),
  CONTAINER_WEIGHT     NUMBER(9,3)              NOT NULL,
  CONTAINER_PACK_TYPE  VARCHAR2(10 BYTE),
  CONTAINER_TRK_NUM    VARCHAR2(50 BYTE)        NOT NULL,
  CREATION_DATE        DATE                     NOT NULL,
  CREATED_BY           NUMBER                   NOT NULL,
  LAST_UPDATE_DATE     DATE                     NOT NULL,
  LAST_UPDATED_BY      NUMBER                   NOT NULL
)

Schema for MOTDC_STG_SHIP_CONTAINER_TMP

CREATE TABLE MOTDC_STG_SHIP_CONTAINER_TMP
(
  TRANSACTION_ID       NUMBER,
  CONTAINER_TYPE       VARCHAR2(10 BYTE),
  CONTAINER_ID         VARCHAR2(40 BYTE),
  CONTAINER_LENGTH     NUMBER(9,3),
  CONTAINER_WIDTH      NUMBER(9,3),
  CONTAINER_HEIGHT     NUMBER(9,3),
  CONTAINER_WEIGHT     NUMBER(9,3)              NOT NULL,
  CONTAINER_PACK_TYPE  VARCHAR2(10 BYTE),
  CONTAINER_TRK_NUM    VARCHAR2(50 BYTE)        NOT NULL,
  CREATION_DATE        DATE                     NOT NULL,
  CREATED_BY           NUMBER                   NOT NULL,
  LAST_UPDATE_DATE     DATE                     NOT NULL,
  LAST_UPDATED_BY      NUMBER                   NOT NULL
)

Avatar of mherchl
mherchl
Flag of Slovakia image

INSERT INTO MOTDC_STG_SHIP_CONTAINER
SELECT a.* FROM
MOTDC_STG_SHIP_CONTAINER_TMP a
JOIN (SELECT MAX(LAST_UPDATE_DATE) as last, TRANSACTION_ID, CONTAINER_TYPE,  CONTAINER_ID FROM MOTDC_STG_SHIP_CONTAINER_TMP GROUP BY TRANSACTION_ID, CONTAINER_TYPE,  CONTAINER_ID) b on a.TRANSACTION_ID = b.TRANSACTION_ID AND a.CONTAINER_TYPE = b.CONTAINER_TYPE AND a.CONTAINER_ID = b.CONTAINER_ID and a.LAST_UPDATE_DATE = b.last


Avatar of Naveen Kumar
taking distinct data from _TMP table ( which has duplicates ) to _HEADER table is easy but you need to tell what values should
go into all other fields apart from  transaction_id, CONTAINER_ID, CONTAINER_TYPE when insert into _HEADER table.

let us say if there are 2 records in _TMP with same transaction_id, CONTAINER_ID, CONTAINER_TYPE in _TMP table but other fields have
different values ( say container_weight has 100 for 1 record and 300 for 2nd record ). which value should we use for the record
which we are going to insert into _HEADER table.

similary, you need to tell based on what logic should we pick up the rest of the fields like as i said above.

Thanks
the below will work as long as we do not get more than 1 record for the same LAST_UPDATE_DATE and for the same TRANSACTION_ID, CONTAINER_TYPE,  CONTAINER_ID.

If we have 2 records with same value to all the 4 fields then the insert will fail because of unique constraint error in the table which we are inserting.

INSERT INTO MOTDC_STG_SHIP_CONTAINER
SELECT a.* FROM
MOTDC_STG_SHIP_CONTAINER_TMP a
JOIN (SELECT MAX(LAST_UPDATE_DATE) as last, TRANSACTION_ID, CONTAINER_TYPE,  CONTAINER_ID FROM MOTDC_STG_SHIP_CONTAINER_TMP GROUP BY TRANSACTION_ID, CONTAINER_TYPE,  CONTAINER_ID) b on a.TRANSACTION_ID = b.TRANSACTION_ID AND a.CONTAINER_TYPE = b.CONTAINER_TYPE AND a.CONTAINER_ID = b.CONTAINER_ID and a.LAST_UPDATE_DATE = b.last
ASKER CERTIFIED SOLUTION
Avatar of konektor
konektor
Flag of Czechia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Sean Stuber
Sean Stuber

I wouldn't do a double query on your temp table.  Simple analytics will let you pick one row from each duplicate and copy it over.

You didn't specify which rows you wanted to keep out of your duplicates though
so for my example below, I simply pick the first one, based on last_update_date, for each
transaction_id,container_id,container_type combination.

This should run more efficiently than a self join or query with requery


INSERT INTO motdc_stg_ship_container
            (transaction_id, container_type, container_id, container_length,
             container_width, container_height, container_weight,
             container_pack_type, container_trk_num, creation_date,
             created_by, last_update_date, last_updated_by)
    SELECT transaction_id, container_type, container_id, container_length,
           container_width, container_height, container_weight,
           container_pack_type, container_trk_num, creation_date, created_by,
           last_update_date, last_updated_by
      FROM (SELECT transaction_id, container_type, container_id,
                   container_length, container_width, container_height,
                   container_weight, container_pack_type, container_trk_num,
                   creation_date, created_by, last_update_date,
                   last_updated_by,
                   ROW_NUMBER() OVER(PARTITION BY transaction_id, container_id, container_type ORDER BY last_update_date)
                                                                           rn
              FROM motdc_stg_ship_container_tmp)
     WHERE rn = 1