Avatar of Culwatrnca11
Culwatrnca11
Flag for United States of America asked on

How do you remove duplicate lines in a Teradata query?

I have a report I would like to remove duplicates "Claim_Id".   I end up having to export the report and remove duplicates in excel. Problem is I am generating way to much data because of the duplicates, how do I add code to my report to remove these when the report is executed?

clm.internal_id AS "Claim ID"      

Thank you
* TeradataSQLMySQL Server

Avatar of undefined
Last Comment
PortletPaul

8/22/2022 - Mon
Robert

Don't know if this is helpful as I'm not familiar with Teradata  but to do this in SQL query you would just do
Select distinct clm.internal_id AS "Claim ID" FROM "Table Name" Where "Criteria" 

Open in new window

Culwatrnca11

ASKER
Thanks for the input Robert, I am not sure how to work that into my query. this is my first time trying to remove duplicates.
PortletPaul

"duplicate" (very unfortunately) can mean different things to different people. SQL treats duplicates BY ROW so in the following table there are no duplicates:
+---------+---------+-----------------+--------------+
| ClaimID |  Col2   |      Col3       | NumberColumn |
+---------+---------+-----------------+--------------+
| abcd    | Value 2 | 123A            |         10.0 |
| abcd    | cols    | maybe some text |     -2,027.1 |
| abcd    | another | more here       |              |
+---------+---------+-----------------+--------------+

Open in new window

What I sense you are looking for is to suppress the repetition of values in a column so some value is visible only once (per grouping of data), something like the following:
+---------+---------+-----------------+--------------+
| ClaimID |  Col2   |      Col3       | NumberColumn |
+---------+---------+-----------------+--------------+
| abcd    | Value 2 | 123A            |         10.0 |
|         | cols    | maybe some text |     -2,027.1 |
|         | another | more here       |              |
+---------+---------+-----------------+--------------+

Open in new window

Is the second table closer to you need?
What product are you using to produce the report?


Please provide you current query.
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
Culwatrnca11

ASKER
Paul,

Thanks for the feedback, I have attached my code.
TapUZData-Claims-Report-custom---exp.sql
PortletPaul

Can you answer the 2 questions please.

Is table2 something like your objective? (i.e. you want to suppress value repetition in a column)
What product(s) are you using to produce the report?
Culwatrnca11

ASKER
Paul,

The your first query is what is happening, my objective is to have just one column to determine a volume. Does that help?
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Culwatrnca11

ASKER
this table (last select column)     cml.adj_table_name       if left in will generate duplicate claim_IDs but I need the table left in but only want to see one line per claim_id
PortletPaul

Please note that queries involving multiple joins and many columns in the select list are NOT good candidates for "select distinct", in fact the reverse is true.  That single extra word "distinct" can be the cause of extreme slowness in such a query.

If the column cml.adj_table_name is causing unwanted repetition in the query result, then find some way to reduce the rows coming from that table e.g.

LEFT OUTER JOIN (
            select distinct 
                  adj_table_cml_id, adj_table_name
            from Clarity_CML
            ) cml ON enc.ben_adj_table_id = cml.adj_table_cml_id     --pull in adjudication table

Open in new window

SELECT
      clm.internal_id                                                          AS "claim id"
    , cov.group_num
    , PATIENT.PAT_MRN_ID                                                       AS pat_mrn
    ,EXTRACT(YEAR FROM CURRENT_DATE) - EXTRACT(YEAR FROM PATIENT.BIRTH_DATE) -
             CASE WHEN CURRENT_DATE (FORMAT 'MMDD') (CHAR(4)) <
                     PATIENT.BIRTH_DATE (FORMAT 'MMDD')  (CHAR(4)) THEN 1
              ELSE 0 END                                                       AS AGE
    , CASE ap.status_c WHEN '1' THEN 'New' WHEN '2' THEN 'Pending' WHEN '3' THEN 'Denied'   		-- remove for alex group
      WHEN '4' THEN 'Clean' WHEN '5' THEN 'Void'   		    --remove for alex group
      ELSE ' ' END                                                             AS "claim status"
    , ven.vendor_name   --When pulling for all Vendors
    , CASE      -- NCAL Only   Internal External
      WHEN (ven.rpt_grpr_03 = 'NC8'
                        OR ven.rpt_grpr_04 = 'RESOLUTE' 
                        OR ven.vendor_name LIKE '%AMERICAN%SPECIALTY%HEALTH%PLAN%OF%CA%') 
                        AND COALESCE(ap.tot_net_payable, 0) <= 0 
            THEN 'Encounters' ELSE 'Claims'
      END                                                                      AS intext_nc
    , UPPER(spec.name)                                                         AS "specialty"
    , epp.benefit_plan_name                                                    AS "benefit plan"
    , epm.payor_name                                                           AS "payor"
    , pln.plan_grp_name                                                        AS "plan group"
    , ven.vendor_name                                                          AS "vendor"
    , lob.lob_name                                                             AS "lob name"
    , pos.pos_name                                                             AS "place of service"
    , ser.prov_type                                                            AS "provider type"
    , ap.service_start_date                                                    AS "svc date"
    , ap.service_end_date                                                      AS "svc end date"
    , ap.date_received                                                         AS "date received"
    , (CURRENT_DATE - ap.date_received - 1)                                    AS "claim age"
    , ap.admission_type_c                                                      AS "admission type"
    , COALESCE(zc_cf.NAME, 'UNKNOWN')                                          AS claim_type
    , eap1.eap_type_of_ser_c
    , cml.adj_table_name   -- ADJUDICATION TABLE INFO
FROM ap_claim ap
INNER JOIN clm_map clm ON ap.claim_id = clm.cid
LEFT OUTER JOIN ap_claim_dofr_chks dofr ON ap.claim_id = dofr.claim_id AND dofr.line = 1
LEFT OUTER JOIN zc_check_sts chk ON dofr.pmt_info_chk_sts_c = chk.check_status_c
LEFT OUTER JOIN ap_check ckr ON dofr.pmt_info_chk_id = ckr.check_id
LEFT OUTER JOIN clarity_vendor ven ON ap.vendor_id = ven.vendor_id
LEFT OUTER JOIN zc_specialty spec ON ap.assoc_spec_c = spec.specialty_c
/*LEFT OUTER JOIN ib_receiver ib ON ap.inbasket_message_id = ib.msg_id		
LEFT OUTER JOIN clarity_hip hip ON ib.registry_id = hip.registry_id	*/
LEFT OUTER JOIN ZC_CLAIM_FORMAT zc_cf ON ap.CLAIM_FORMAT_C = zc_cf.CLAIM_FORMAT_C
LEFT OUTER JOIN coverage cov ON ap.coverage_id = cov.coverage_id
LEFT OUTER JOIN clarity_epp epp ON ap.plan_id = epp.benefit_plan_id
LEFT OUTER JOIN clarity_epm epm ON ap.payor_id = epm.payor_id
LEFT OUTER JOIN plan_grp pln ON ap.plan_group_id = pln.plan_grp_id
LEFT OUTER JOIN clarity_lob lob ON ap.clm_lob_id = lob.lob_id
LEFT OUTER JOIN clarity_pos pos ON ap.loc_id = pos.pos_id
LEFT OUTER JOIN clarity_ser ser ON ap.prov_id = ser.prov_id
LEFT OUTER JOIN clarity_emp emp ON ap.entry_user_id = emp.user_id
LEFT OUTER JOIN ap_claim_2 ap2 ON ap.claim_id = ap2.claim_id
LEFT OUTER JOIN zc_meth_to_pay_clm zc_mtpc ON ap.meth_to_pay_clm_c = zc_mtpc.meth_to_pay_clm_c
LEFT OUTER JOIN patient ON ap.pat_id = patient.pat_id
LEFT OUTER JOIN ap_claim_px px1 ON ap.claim_id = px1.claim_id
LEFT OUTER JOIN ap_claim_dx dx1 ON ap.claim_id = dx1.claim_id
LEFT OUTER JOIN clarity_edg edg1 ON dx1.dx_id = edg1.dx_id
LEFT OUTER JOIN ap_claim_px_2 px2 ON ap.claim_id = px2.claim_id
LEFT OUTER JOIN clarity_eap eap1 ON px1.proc_id = eap1.proc_id
LEFT OUTER JOIN PAT_ENC enc ON patient.PAT_ID = enc.pat_id
LEFT OUTER JOIN clarity_prc prc ON enc.appt_prc_id = prc.prc_id      --pull in Benefit Group
LEFT OUTER JOIN (
            select distinct 
                  adj_table_cml_id, adj_table_name
            from Clarity_CML
            ) cml ON enc.ben_adj_table_id = cml.adj_table_cml_id     --pull in adjudication table
LEFT OUTER JOIN (
      SELECT
        pxx.POS_TYPE_C
      , pxx.CLAIM_ID
      , MAX (CASE WHEN pxx.LINE = 1 THEN c.PROC_CODE | | '-' | | c.PROC_NAME ELSE NULL END) AS CPT_CODE_1
      , MAX (CASE WHEN pxx.LINE = 2 THEN c.PROC_CODE | | '-' | | c.PROC_NAME ELSE NULL END) AS CPT_CODE_2
      , MAX (CASE WHEN pxx.LINE = 3 THEN c.PROC_CODE | | '-' | | c.PROC_NAME ELSE NULL END) AS CPT_CODE_3
      , MAX (CASE WHEN pxx.LINE = 4 THEN c.PROC_CODE | | '-' | | c.PROC_NAME ELSE NULL END) AS CPT_CODE_4
      , MAX (CASE WHEN pxx.LINE = 5 THEN c.PROC_CODE | | '-' | | c.PROC_NAME ELSE NULL END) AS CPT_CODE_5
      , MAX (CASE WHEN pxx.LINE = 6 THEN c.PROC_CODE | | '-' | | c.PROC_NAME ELSE NULL END) AS CPT_CODE_6
      , MAX (CASE WHEN pxx.LINE = 7 THEN c.PROC_CODE | | '-' | | c.PROC_NAME ELSE NULL END) AS CPT_CODE_7
      , MAX (CASE WHEN pxx.LINE = 8 THEN c.PROC_CODE | | '-' | | c.PROC_NAME ELSE NULL END) AS CPT_CODE_8
      , MAX (CASE WHEN pxx.LINE = 9 THEN c.PROC_CODE | | '-' | | c.PROC_NAME ELSE NULL END) AS CPT_CODE_9
      , MAX (CASE WHEN pxx.LINE = 10 THEN c.PROC_CODE | | '-' | | c.PROC_NAME ELSE NULL END) AS CPT_CODE_10
      FROM AP_CLAIM_PX pxx
      INNER JOIN AP_CLAIM b ON pxx.CLAIM_ID = b.CLAIM_ID --AND B.STATUS_C = 2
      INNER JOIN CLARITY_EAP c ON pxx.PROC_ID = c.PROC_ID
      GROUP BY
            pxx.CLAIM_ID
          , pxx.POS_TYPE_C
      ) eap ON ap.CLAIM_ID = eap.CLAIM_ID
LEFT JOIN (
      SELECT
            pxx.CLAIM_ID
          , MAX (CASE WHEN pxx.LINE = 1 THEN f.REF_BILL_CODE | | ' - ' | | f.DX_NAME ELSE NULL END) AS CLM_DX_1
      , MAX (CASE WHEN pxx.LINE = 2 THEN f.REF_BILL_CODE | | ' - ' | | f.DX_NAME ELSE NULL END) AS CLM_DX_2
      , MAX (CASE WHEN pxx.LINE = 3 THEN f.REF_BILL_CODE | | ' - ' | | f.DX_NAME ELSE NULL END) AS CLM_DX_3
      , MAX (CASE WHEN pxx.LINE = 4 THEN f.REF_BILL_CODE | | ' - ' | | f.DX_NAME ELSE NULL END) AS CLM_DX_4
      , MAX (CASE WHEN pxx.LINE = 5 THEN f.REF_BILL_CODE | | ' - ' | | f.DX_NAME ELSE NULL END) AS CLM_DX_5
      , MAX (CASE WHEN pxx.LINE = 6 THEN f.REF_BILL_CODE | | ' - ' | | f.DX_NAME ELSE NULL END) AS CLM_DX_6
      , MAX (CASE WHEN pxx.LINE = 7 THEN f.REF_BILL_CODE | | ' - ' | | f.DX_NAME ELSE NULL END) AS CLM_DX_7
      , MAX (CASE WHEN pxx.LINE = 8 THEN f.REF_BILL_CODE | | ' - ' | | f.DX_NAME ELSE NULL END) AS CLM_DX_8
      , MAX (CASE WHEN pxx.LINE = 9 THEN f.REF_BILL_CODE | | ' - ' | | f.DX_NAME ELSE NULL END) AS CLM_DX_9
      , MAX (CASE WHEN pxx.LINE = 10 THEN f.REF_BILL_CODE | | ' - ' | | f.DX_NAME ELSE NULL END) AS CLM_DX_10
      FROM AP_CLAIM_DX pxx
      INNER JOIN AP_CLAIM b ON pxx.CLAIM_ID = b.CLAIM_ID        --AND B.STATUS_C = 2
      INNER JOIN CLARITY_EDG f ON pxx.DX_ID = f.DX_ID
      GROUP BY
            pxx.CLAIM_ID
      ) dx ON ap.CLAIM_ID = dx.CLAIM_ID
LEFT OUTER JOIN ZC_POS_TYPE pt ON eap.POS_TYPE_C = pt.POS_TYPE_C  ---WW
LEFT OUTER JOIN CL_UB_REV_CODE rev ON px1.UBC_REVENUE_CODE_ID = rev.UB_REV_CODE_ID
WHERE ap.SERV_DATE BETWEEN '10/01/2017' AND CURRENT_DATE    --'01/01/2017'  
AND ap.status_c IN ('1', '4', '5')
AND cml.adj_table_name LIKE ANY ('PROVAHUC%' , 'PROVPUC%')
AND INTEXT_NC = 'Encounters'
ORDER BY
      clm.internal_id ASC;

Open in new window

Personally I would never use "select distinct" on the top of a query using >30 joins. I would suggest you remove unnecessary "noise" from your query, and then "start small"  to work out what joins I really do need and which ones of those need to be converted into a subquery that produces less rows.

Regarding this: "I have a report I would like to remove duplicates "Claim_Id""

Your query starts and ends with:
SELECT clm.internal_id  AS "claim id"
ORDER BY clm.internal_id ASC;

The repetition of clm.internal_id on every row is NECESSARY to produce the wanted order
Culwatrnca11

ASKER
Made your suggested changes. still seeing duplicate claim_IDs
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
ASKER CERTIFIED SOLUTION
PortletPaul

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.