Solved

SQL Server

Posted on 2013-06-19
12
254 Views
Last Modified: 2013-06-24
SELECT INST_KEY, ANTAL_KONTI, ANTAL_BTO_NYUDLAAN,
ANTAL_FORTIDINDFRI, BTO_NYUDLAAN, FORTIDINDFRI,
A.INDEKS_FAKTOR_H,
A.INDEKS_FAKTOR_K,
ORDINAERE_AFDRAG, UDBYTTE,
SIKKERHED_BELOEB, RENTER, BIDRAG_ULTIMO, EFF_RENTE_FORR_OMF, EFF_RENTE_ULTIMO,
AAO_belob AS AAO_BELOEB,
FORR_OMFANG, GNS_VAL, PAA_LOEBNE_RENTER, A.NOM_RENTE_SATS,
PRIMO, BOERSKURS_REG, NETTO_TRANS, TAB, VALUTA_REG, REKLASS,
VAERDI_REG, ULTIMO, PRIMO_MV, REKLASS_MV, TAB_MV,
VALUTA_REG_MV, NETTO_TRANS_MV, ULTIMO_MV, BOERSKURS_REG_MV,
PRIMO_NOM, NETTO_TRANS_NOM, REKLASS_NOM, TAB_NOM, VALUTA_REG_NOM,
ULTIMO_NOM, PRIMO_ANTAL, NETTO_TRANS_ANTAL, ULTIMO_ANTAL,
A.EFF_DT, KTTP,
LOAN_PURPOSE, IDKT_START_DT, REDEEMABLE_CODE,
NOM_INDEKS_MK, RENTE_SKIFT_DT, MDR_TIL_RENTESKIFT, STEPUP_DT, UDLOEB_DT,
A.INTERNET_ACCESS_MK,
ULTIMO_OPRVAL_MV, ULTIMO_OPRVAL_NOM, PRIMO_OPRVAL_MV,
PRIMO_OPRVAL_NOM, AFDRAG_DT, AFDRAG_DT_MK, NET_OPRVAL_NOM,
NET_OPRVAL_NOM_MK, NET_OPRVAL_MV, NET_OPRVAL_MV_MK, TAB_OPRVAL_NOM,
TAB_OPRVAL_MV, B_REG_OPRVAL, B_REG_OPRVAL_MK, VAL_REG_OPR_MK, NET_TRANS_JUST_MV, NET_TRANS_JUST_NOM, NET_TRANS_JUST,
ULTIMO_KORR_MV, ULTIMO_KORR_NOM, VAL_KURS_PRIMO,
VAL_KURS_PRIMO_MK, VAL_KURS_ULTIMO, VAL_KURS_ULTIMO_MK,
SEKTOR_INFO, CORR_MK, CORR_UPDATED_MK, VAL_KURS_GNS, REPORTING_DT, ONE_DAY_MAT_MK, ULTIMO_OPRVAL,
LTrim(RTrim(FULLPDST)) AS FULLPDST,
afdrag_opr, b_reg_oprval_mv, b_reg_oprval_mv_mk,
deriv_type, indeks_mk, koeb_salg_mk, new_business_cause,
payment_mk, position, ppstm_papir_type, uafviklet_mk, ultimo_korr,
used_in_repo_mk, val_reg_opr, val_reg_opr_mv_mk, val_reg_opr_mv,
LTrim(RTrim(A.IDKT)) AS IDKT,
LTrim(RTrim(VALUTA)) AS VALUTA,
LTrim(RTrim(AMOUNT_TYPE)) AS AMOUNT_TYPE,
LTrim(RTrim(EXT_CONV_MK)) AS EXT_CONV_MK,
LTrim(RTrim(NEW_BUS_MK)) AS NEW_BUS_MK,
LTrim(RTrim(KNID)) AS KNID,
LTrim(RTrim(REG_NO)) AS REG_NO,
opts_acc,
opts_cust ,
SURR_ID,
LTrim(RTrim(BRANCHE)) AS BRANCHE ,
LTrim(RTrim(CVR)) AS CVR,
LTrim(RTrim(KOMMUNE)) AS KOMMUNE,
LTrim(RTrim(LAND)) AS LAND,
SubString(LTrim(RTrim(LAND_M)), 1,2) AS LAND_M,
LTrim(RTrim(SEKTOR)) AS SEKTOR ,
LTrim(RTrim(SEKTOR_2)) AS SEKTOR_2 ,
LTrim(RTrim(SEKTOR_M1)) AS  SEKTOR_M1,
LTrim(RTrim(KNGR)) AS KNGR,
LTrim(RTrim(A.FAST_RENTE_MK)) AS FAST_RENTE_MK,
LTrim(RTrim(A.SIKKERHED_KD)) AS SIKKERHED_KD ,
EXCHANGE_M_FACTOR,
B.SURR_ID_ACC  AS  SURR_ID_ACC  
FROM [RHRH_MFI3].[dim].BOD_RHB_GFS_REPORTING A
INNER JOIN
[RHRH_MFI3].[dim].BOD_ACC_DIM  B
ON
A.IDKT = B.IDKT and A.opts_acc = B.EFF_DT
WHERE
A.LoadStatusId = 415763642
and SURR_ID in (select SURR_ID from [RHRH_MFI3].[dim].BOD_RHB_GFS_REPORTING
                              where LoadStatusId = 415763642
                              group by surr_id
                              having count(*) = 1)
Hi Experts,

This table BOD_RHB_GFS_REPORTING has these many rows.220298911.
Iam trying to find the duplicates as well is there any other way to write this query and find the duplicates as well.
Thanks in advance.
0
Comment
Question by:Sandeepiii
  • 6
  • 3
  • 2
  • +1
12 Comments
 
LVL 8

Expert Comment

by:didnthaveaname
ID: 39259683
Can you give us the definition of the BOD_RHB_GFS_REPORTING table as well as an example of 2 rows that are duplicates? (I am thinking a self join would work well for this, but it's tough to say without knowing what constitutes a duplicate and what the structure of the table is).  Thanks!
0
 

Author Comment

by:Sandeepiii
ID: 39259818
well generally there won't be any duplicates ,but we just try to find them.
here is the table definition,i think putting index on SURR_ID,LoadStatusId colu,mn can help.
but is it okay if i ,make non clustered index or non clustered on these column.
Thanks for your reply and answer.
Its joining with table BOD_ACC_DIM which is having these many rows 44633965
definition.xlsx
0
 
LVL 8

Expert Comment

by:didnthaveaname
ID: 39259876
Is there/what is the primary key on this table?  And just for clarification: you're looking for instances where the exact same data, aside from the primary key, exists, correct?
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 48

Accepted Solution

by:
PortletPaul earned 500 total points
ID: 39261187
From a reporting table of 220m rows and 193 columns, a query of 126 columns, plus an inner join to some other table, then some where conditions involving on the reporting table.

Can all of that, plus detection of 'duplicates' (in the reporting table) be included in the one query?

I wouldn't recommend it.
If you believe there are duplicates in the reporting table I'd be wanting to figure that out independently. As others have indicated above you haven't really defined what you mean by duplicate. Is it any 2+ rows that share exactly the same information across all 193 columns? if it isn't that, well, what set of conditions do you consider to be a duplicate?

{+edit - sorry}
WHERE A.LoadStatusId = 415763642
      AND SURR_ID IN (
            SELECT SURR_ID
            FROM [RHRH_MFI3].[dim].BOD_RHB_GFS_REPORTING
            WHERE LoadStatusId = 415763642
            GROUP BY surr_id
            HAVING count(*) > 1 /*any results from this? */
            )

Do you get any results from your existing query if the HAVING condition is changed?
Is this what you consider to be the duplication?
0
 

Author Comment

by:Sandeepiii
ID: 39261579
Is it any 2+ rows that share exactly the same information across all 193 columns--Yes this the meaning of duplicate.
Thanks for your comments.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39261695
220m rows and 193 columns
compare all fields of all rows for full-row duplication

Will you start this query on Friday and hope it's finished by Monday morning?
(i.e. this is not going to be quick or easy)
I'm not sure I want to prepare this query - it's going to be very slow.


is SURR_ID a unique primary key?  (it is non-null)
what Indexes exist on the table? [BOD_RHB_GFS_REPORTING]

what makes you believe there are duplicates? (is there any proof they exist?)
0
 

Author Comment

by:Sandeepiii
ID: 39261783
thanks,
right now we are creating
CREATE CLUSTERED INDEX [ixcBOD_LoadStatusIdsurr_id] ON [DIM].[BOD_RHB_GFS_REPORTING]
(
      [LoadStatusId] ASC,
      [surr_id]ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO
and its taking hell lots of time.We are hoping after index the query might speed up atleast by 50% of the time.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 39261970
For a reporting table, which contains a non-null date (eff_dt) - it would seem to me you are likely to need an index on at least that  as well (for date range filtering as an example)

There are several other date fields (e.g. reporting_dt) which might be even more important for filtering - but it is nullable. I don't recognize many of the field names so I really can't comment in any depth.

to be honest, it surprises me you are only now adding indexes
and, you still haven't indicated if surr_id is unique or not (I get the feeling it isn't) {+edit}
0
 

Author Comment

by:Sandeepiii
ID: 39262227
Hi thanks for your reply ,i was able to create indexes as i told in above query .Took almost 3hrs.Now query  running lot better(30-35min) and its one of the project which have run into trouble and they want instant solution.As you know we dba are always in support when ever there is a trouble and have to give some instant solution.!!!
Thanks for the inputs experts.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 39264812
As you know we dba are always in support when ever there is a trouble and have to give some instant solution.!!!
That is not entirely true.  A true DBA should only support good solutions and encourage users away from bad approaches, such as the one you have here.
0
 

Author Comment

by:Sandeepiii
ID: 39264987
I agree with you entirely,thats why iam writing here and taking all your experts advise.
Thanks for your time and advise.
0
 

Author Closing Comment

by:Sandeepiii
ID: 39271035
Solution was nice.
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Why is this different from all of the other step by step guides?  Because I make a living as a DBA and not as a writer and I lived through this experience. Defining the name: When I talk to people they say different names on this subject stuff l…
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
Viewers will learn how the fundamental information of how to create a table.

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question