• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 297
  • Last Modified:

SQL Server

SELECT INST_KEY, ANTAL_KONTI, ANTAL_BTO_NYUDLAAN,
ANTAL_FORTIDINDFRI, BTO_NYUDLAAN, FORTIDINDFRI,
A.INDEKS_FAKTOR_H,
A.INDEKS_FAKTOR_K,
ORDINAERE_AFDRAG, UDBYTTE,
SIKKERHED_BELOEB, RENTER, BIDRAG_ULTIMO, EFF_RENTE_FORR_OMF, EFF_RENTE_ULTIMO,
AAO_belob AS AAO_BELOEB,
FORR_OMFANG, GNS_VAL, PAA_LOEBNE_RENTER, A.NOM_RENTE_SATS,
PRIMO, BOERSKURS_REG, NETTO_TRANS, TAB, VALUTA_REG, REKLASS,
VAERDI_REG, ULTIMO, PRIMO_MV, REKLASS_MV, TAB_MV,
VALUTA_REG_MV, NETTO_TRANS_MV, ULTIMO_MV, BOERSKURS_REG_MV,
PRIMO_NOM, NETTO_TRANS_NOM, REKLASS_NOM, TAB_NOM, VALUTA_REG_NOM,
ULTIMO_NOM, PRIMO_ANTAL, NETTO_TRANS_ANTAL, ULTIMO_ANTAL,
A.EFF_DT, KTTP,
LOAN_PURPOSE, IDKT_START_DT, REDEEMABLE_CODE,
NOM_INDEKS_MK, RENTE_SKIFT_DT, MDR_TIL_RENTESKIFT, STEPUP_DT, UDLOEB_DT,
A.INTERNET_ACCESS_MK,
ULTIMO_OPRVAL_MV, ULTIMO_OPRVAL_NOM, PRIMO_OPRVAL_MV,
PRIMO_OPRVAL_NOM, AFDRAG_DT, AFDRAG_DT_MK, NET_OPRVAL_NOM,
NET_OPRVAL_NOM_MK, NET_OPRVAL_MV, NET_OPRVAL_MV_MK, TAB_OPRVAL_NOM,
TAB_OPRVAL_MV, B_REG_OPRVAL, B_REG_OPRVAL_MK, VAL_REG_OPR_MK, NET_TRANS_JUST_MV, NET_TRANS_JUST_NOM, NET_TRANS_JUST,
ULTIMO_KORR_MV, ULTIMO_KORR_NOM, VAL_KURS_PRIMO,
VAL_KURS_PRIMO_MK, VAL_KURS_ULTIMO, VAL_KURS_ULTIMO_MK,
SEKTOR_INFO, CORR_MK, CORR_UPDATED_MK, VAL_KURS_GNS, REPORTING_DT, ONE_DAY_MAT_MK, ULTIMO_OPRVAL,
LTrim(RTrim(FULLPDST)) AS FULLPDST,
afdrag_opr, b_reg_oprval_mv, b_reg_oprval_mv_mk,
deriv_type, indeks_mk, koeb_salg_mk, new_business_cause,
payment_mk, position, ppstm_papir_type, uafviklet_mk, ultimo_korr,
used_in_repo_mk, val_reg_opr, val_reg_opr_mv_mk, val_reg_opr_mv,
LTrim(RTrim(A.IDKT)) AS IDKT,
LTrim(RTrim(VALUTA)) AS VALUTA,
LTrim(RTrim(AMOUNT_TYPE)) AS AMOUNT_TYPE,
LTrim(RTrim(EXT_CONV_MK)) AS EXT_CONV_MK,
LTrim(RTrim(NEW_BUS_MK)) AS NEW_BUS_MK,
LTrim(RTrim(KNID)) AS KNID,
LTrim(RTrim(REG_NO)) AS REG_NO,
opts_acc,
opts_cust ,
SURR_ID,
LTrim(RTrim(BRANCHE)) AS BRANCHE ,
LTrim(RTrim(CVR)) AS CVR,
LTrim(RTrim(KOMMUNE)) AS KOMMUNE,
LTrim(RTrim(LAND)) AS LAND,
SubString(LTrim(RTrim(LAND_M)), 1,2) AS LAND_M,
LTrim(RTrim(SEKTOR)) AS SEKTOR ,
LTrim(RTrim(SEKTOR_2)) AS SEKTOR_2 ,
LTrim(RTrim(SEKTOR_M1)) AS  SEKTOR_M1,
LTrim(RTrim(KNGR)) AS KNGR,
LTrim(RTrim(A.FAST_RENTE_MK)) AS FAST_RENTE_MK,
LTrim(RTrim(A.SIKKERHED_KD)) AS SIKKERHED_KD ,
EXCHANGE_M_FACTOR,
B.SURR_ID_ACC  AS  SURR_ID_ACC  
FROM [RHRH_MFI3].[dim].BOD_RHB_GFS_REPORTING A
INNER JOIN
[RHRH_MFI3].[dim].BOD_ACC_DIM  B
ON
A.IDKT = B.IDKT and A.opts_acc = B.EFF_DT
WHERE
A.LoadStatusId = 415763642
and SURR_ID in (select SURR_ID from [RHRH_MFI3].[dim].BOD_RHB_GFS_REPORTING
                              where LoadStatusId = 415763642
                              group by surr_id
                              having count(*) = 1)
Hi Experts,

This table BOD_RHB_GFS_REPORTING has these many rows.220298911.
Iam trying to find the duplicates as well is there any other way to write this query and find the duplicates as well.
Thanks in advance.
0
Sandeepiii
Asked:
Sandeepiii
  • 6
  • 3
  • 2
  • +1
1 Solution
 
didnthaveanameCommented:
Can you give us the definition of the BOD_RHB_GFS_REPORTING table as well as an example of 2 rows that are duplicates? (I am thinking a self join would work well for this, but it's tough to say without knowing what constitutes a duplicate and what the structure of the table is).  Thanks!
0
 
SandeepiiiAuthor Commented:
well generally there won't be any duplicates ,but we just try to find them.
here is the table definition,i think putting index on SURR_ID,LoadStatusId colu,mn can help.
but is it okay if i ,make non clustered index or non clustered on these column.
Thanks for your reply and answer.
Its joining with table BOD_ACC_DIM which is having these many rows 44633965
definition.xlsx
0
 
didnthaveanameCommented:
Is there/what is the primary key on this table?  And just for clarification: you're looking for instances where the exact same data, aside from the primary key, exists, correct?
0
Cloud Class® Course: CompTIA Cloud+

The CompTIA Cloud+ Basic training course will teach you about cloud concepts and models, data storage, networking, and network infrastructure.

 
PortletPaulfreelancerCommented:
From a reporting table of 220m rows and 193 columns, a query of 126 columns, plus an inner join to some other table, then some where conditions involving on the reporting table.

Can all of that, plus detection of 'duplicates' (in the reporting table) be included in the one query?

I wouldn't recommend it.
If you believe there are duplicates in the reporting table I'd be wanting to figure that out independently. As others have indicated above you haven't really defined what you mean by duplicate. Is it any 2+ rows that share exactly the same information across all 193 columns? if it isn't that, well, what set of conditions do you consider to be a duplicate?

{+edit - sorry}
WHERE A.LoadStatusId = 415763642
      AND SURR_ID IN (
            SELECT SURR_ID
            FROM [RHRH_MFI3].[dim].BOD_RHB_GFS_REPORTING
            WHERE LoadStatusId = 415763642
            GROUP BY surr_id
            HAVING count(*) > 1 /*any results from this? */
            )

Do you get any results from your existing query if the HAVING condition is changed?
Is this what you consider to be the duplication?
0
 
SandeepiiiAuthor Commented:
Is it any 2+ rows that share exactly the same information across all 193 columns--Yes this the meaning of duplicate.
Thanks for your comments.
0
 
PortletPaulfreelancerCommented:
220m rows and 193 columns
compare all fields of all rows for full-row duplication

Will you start this query on Friday and hope it's finished by Monday morning?
(i.e. this is not going to be quick or easy)
I'm not sure I want to prepare this query - it's going to be very slow.


is SURR_ID a unique primary key?  (it is non-null)
what Indexes exist on the table? [BOD_RHB_GFS_REPORTING]

what makes you believe there are duplicates? (is there any proof they exist?)
0
 
SandeepiiiAuthor Commented:
thanks,
right now we are creating
CREATE CLUSTERED INDEX [ixcBOD_LoadStatusIdsurr_id] ON [DIM].[BOD_RHB_GFS_REPORTING]
(
      [LoadStatusId] ASC,
      [surr_id]ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO
and its taking hell lots of time.We are hoping after index the query might speed up atleast by 50% of the time.
0
 
PortletPaulfreelancerCommented:
For a reporting table, which contains a non-null date (eff_dt) - it would seem to me you are likely to need an index on at least that  as well (for date range filtering as an example)

There are several other date fields (e.g. reporting_dt) which might be even more important for filtering - but it is nullable. I don't recognize many of the field names so I really can't comment in any depth.

to be honest, it surprises me you are only now adding indexes
and, you still haven't indicated if surr_id is unique or not (I get the feeling it isn't) {+edit}
0
 
SandeepiiiAuthor Commented:
Hi thanks for your reply ,i was able to create indexes as i told in above query .Took almost 3hrs.Now query  running lot better(30-35min) and its one of the project which have run into trouble and they want instant solution.As you know we dba are always in support when ever there is a trouble and have to give some instant solution.!!!
Thanks for the inputs experts.
0
 
Anthony PerkinsCommented:
As you know we dba are always in support when ever there is a trouble and have to give some instant solution.!!!
That is not entirely true.  A true DBA should only support good solutions and encourage users away from bad approaches, such as the one you have here.
0
 
SandeepiiiAuthor Commented:
I agree with you entirely,thats why iam writing here and taking all your experts advise.
Thanks for your time and advise.
0
 
SandeepiiiAuthor Commented:
Solution was nice.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 6
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now