SQL Server

SELECT INST_KEY, ANTAL_KONTI, ANTAL_BTO_NYUDLAAN,
ANTAL_FORTIDINDFRI, BTO_NYUDLAAN, FORTIDINDFRI,
A.INDEKS_FAKTOR_H,
A.INDEKS_FAKTOR_K,
ORDINAERE_AFDRAG, UDBYTTE,
SIKKERHED_BELOEB, RENTER, BIDRAG_ULTIMO, EFF_RENTE_FORR_OMF, EFF_RENTE_ULTIMO,
AAO_belob AS AAO_BELOEB,
FORR_OMFANG, GNS_VAL, PAA_LOEBNE_RENTER, A.NOM_RENTE_SATS,
PRIMO, BOERSKURS_REG, NETTO_TRANS, TAB, VALUTA_REG, REKLASS,
VAERDI_REG, ULTIMO, PRIMO_MV, REKLASS_MV, TAB_MV,
VALUTA_REG_MV, NETTO_TRANS_MV, ULTIMO_MV, BOERSKURS_REG_MV,
PRIMO_NOM, NETTO_TRANS_NOM, REKLASS_NOM, TAB_NOM, VALUTA_REG_NOM,
ULTIMO_NOM, PRIMO_ANTAL, NETTO_TRANS_ANTAL, ULTIMO_ANTAL,
A.EFF_DT, KTTP,
LOAN_PURPOSE, IDKT_START_DT, REDEEMABLE_CODE,
NOM_INDEKS_MK, RENTE_SKIFT_DT, MDR_TIL_RENTESKIFT, STEPUP_DT, UDLOEB_DT,
A.INTERNET_ACCESS_MK,
ULTIMO_OPRVAL_MV, ULTIMO_OPRVAL_NOM, PRIMO_OPRVAL_MV,
PRIMO_OPRVAL_NOM, AFDRAG_DT, AFDRAG_DT_MK, NET_OPRVAL_NOM,
NET_OPRVAL_NOM_MK, NET_OPRVAL_MV, NET_OPRVAL_MV_MK, TAB_OPRVAL_NOM,
TAB_OPRVAL_MV, B_REG_OPRVAL, B_REG_OPRVAL_MK, VAL_REG_OPR_MK, NET_TRANS_JUST_MV, NET_TRANS_JUST_NOM, NET_TRANS_JUST,
ULTIMO_KORR_MV, ULTIMO_KORR_NOM, VAL_KURS_PRIMO,
VAL_KURS_PRIMO_MK, VAL_KURS_ULTIMO, VAL_KURS_ULTIMO_MK,
SEKTOR_INFO, CORR_MK, CORR_UPDATED_MK, VAL_KURS_GNS, REPORTING_DT, ONE_DAY_MAT_MK, ULTIMO_OPRVAL,
LTrim(RTrim(FULLPDST)) AS FULLPDST,
afdrag_opr, b_reg_oprval_mv, b_reg_oprval_mv_mk,
deriv_type, indeks_mk, koeb_salg_mk, new_business_cause,
payment_mk, position, ppstm_papir_type, uafviklet_mk, ultimo_korr,
used_in_repo_mk, val_reg_opr, val_reg_opr_mv_mk, val_reg_opr_mv,
LTrim(RTrim(A.IDKT)) AS IDKT,
LTrim(RTrim(VALUTA)) AS VALUTA,
LTrim(RTrim(AMOUNT_TYPE)) AS AMOUNT_TYPE,
LTrim(RTrim(EXT_CONV_MK)) AS EXT_CONV_MK,
LTrim(RTrim(NEW_BUS_MK)) AS NEW_BUS_MK,
LTrim(RTrim(KNID)) AS KNID,
LTrim(RTrim(REG_NO)) AS REG_NO,
opts_acc,
opts_cust ,
SURR_ID,
LTrim(RTrim(BRANCHE)) AS BRANCHE ,
LTrim(RTrim(CVR)) AS CVR,
LTrim(RTrim(KOMMUNE)) AS KOMMUNE,
LTrim(RTrim(LAND)) AS LAND,
SubString(LTrim(RTrim(LAND_M)), 1,2) AS LAND_M,
LTrim(RTrim(SEKTOR)) AS SEKTOR ,
LTrim(RTrim(SEKTOR_2)) AS SEKTOR_2 ,
LTrim(RTrim(SEKTOR_M1)) AS  SEKTOR_M1,
LTrim(RTrim(KNGR)) AS KNGR,
LTrim(RTrim(A.FAST_RENTE_MK)) AS FAST_RENTE_MK,
LTrim(RTrim(A.SIKKERHED_KD)) AS SIKKERHED_KD ,
EXCHANGE_M_FACTOR,
B.SURR_ID_ACC  AS  SURR_ID_ACC  
FROM [RHRH_MFI3].[dim].BOD_RHB_GFS_REPORTING A
INNER JOIN
[RHRH_MFI3].[dim].BOD_ACC_DIM  B
ON
A.IDKT = B.IDKT and A.opts_acc = B.EFF_DT
WHERE
A.LoadStatusId = 415763642
and SURR_ID in (select SURR_ID from [RHRH_MFI3].[dim].BOD_RHB_GFS_REPORTING
                              where LoadStatusId = 415763642
                              group by surr_id
                              having count(*) = 1)
Hi Experts,

This table BOD_RHB_GFS_REPORTING has these many rows.220298911.
Iam trying to find the duplicates as well is there any other way to write this query and find the duplicates as well.
Thanks in advance.
SandeepiiiAsked:
Who is Participating?
 
PortletPaulConnect With a Mentor Commented:
From a reporting table of 220m rows and 193 columns, a query of 126 columns, plus an inner join to some other table, then some where conditions involving on the reporting table.

Can all of that, plus detection of 'duplicates' (in the reporting table) be included in the one query?

I wouldn't recommend it.
If you believe there are duplicates in the reporting table I'd be wanting to figure that out independently. As others have indicated above you haven't really defined what you mean by duplicate. Is it any 2+ rows that share exactly the same information across all 193 columns? if it isn't that, well, what set of conditions do you consider to be a duplicate?

{+edit - sorry}
WHERE A.LoadStatusId = 415763642
      AND SURR_ID IN (
            SELECT SURR_ID
            FROM [RHRH_MFI3].[dim].BOD_RHB_GFS_REPORTING
            WHERE LoadStatusId = 415763642
            GROUP BY surr_id
            HAVING count(*) > 1 /*any results from this? */
            )

Do you get any results from your existing query if the HAVING condition is changed?
Is this what you consider to be the duplication?
0
 
didnthaveanameCommented:
Can you give us the definition of the BOD_RHB_GFS_REPORTING table as well as an example of 2 rows that are duplicates? (I am thinking a self join would work well for this, but it's tough to say without knowing what constitutes a duplicate and what the structure of the table is).  Thanks!
0
 
SandeepiiiAuthor Commented:
well generally there won't be any duplicates ,but we just try to find them.
here is the table definition,i think putting index on SURR_ID,LoadStatusId colu,mn can help.
but is it okay if i ,make non clustered index or non clustered on these column.
Thanks for your reply and answer.
Its joining with table BOD_ACC_DIM which is having these many rows 44633965
definition.xlsx
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
didnthaveanameCommented:
Is there/what is the primary key on this table?  And just for clarification: you're looking for instances where the exact same data, aside from the primary key, exists, correct?
0
 
SandeepiiiAuthor Commented:
Is it any 2+ rows that share exactly the same information across all 193 columns--Yes this the meaning of duplicate.
Thanks for your comments.
0
 
PortletPaulCommented:
220m rows and 193 columns
compare all fields of all rows for full-row duplication

Will you start this query on Friday and hope it's finished by Monday morning?
(i.e. this is not going to be quick or easy)
I'm not sure I want to prepare this query - it's going to be very slow.


is SURR_ID a unique primary key?  (it is non-null)
what Indexes exist on the table? [BOD_RHB_GFS_REPORTING]

what makes you believe there are duplicates? (is there any proof they exist?)
0
 
SandeepiiiAuthor Commented:
thanks,
right now we are creating
CREATE CLUSTERED INDEX [ixcBOD_LoadStatusIdsurr_id] ON [DIM].[BOD_RHB_GFS_REPORTING]
(
      [LoadStatusId] ASC,
      [surr_id]ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO
and its taking hell lots of time.We are hoping after index the query might speed up atleast by 50% of the time.
0
 
PortletPaulCommented:
For a reporting table, which contains a non-null date (eff_dt) - it would seem to me you are likely to need an index on at least that  as well (for date range filtering as an example)

There are several other date fields (e.g. reporting_dt) which might be even more important for filtering - but it is nullable. I don't recognize many of the field names so I really can't comment in any depth.

to be honest, it surprises me you are only now adding indexes
and, you still haven't indicated if surr_id is unique or not (I get the feeling it isn't) {+edit}
0
 
SandeepiiiAuthor Commented:
Hi thanks for your reply ,i was able to create indexes as i told in above query .Took almost 3hrs.Now query  running lot better(30-35min) and its one of the project which have run into trouble and they want instant solution.As you know we dba are always in support when ever there is a trouble and have to give some instant solution.!!!
Thanks for the inputs experts.
0
 
Anthony PerkinsCommented:
As you know we dba are always in support when ever there is a trouble and have to give some instant solution.!!!
That is not entirely true.  A true DBA should only support good solutions and encourage users away from bad approaches, such as the one you have here.
0
 
SandeepiiiAuthor Commented:
I agree with you entirely,thats why iam writing here and taking all your experts advise.
Thanks for your time and advise.
0
 
SandeepiiiAuthor Commented:
Solution was nice.
0
All Courses

From novice to tech pro — start learning today.