Solved

SQL Server

Posted on 2013-06-19
12
244 Views
Last Modified: 2013-06-24
SELECT INST_KEY, ANTAL_KONTI, ANTAL_BTO_NYUDLAAN,
ANTAL_FORTIDINDFRI, BTO_NYUDLAAN, FORTIDINDFRI,
A.INDEKS_FAKTOR_H,
A.INDEKS_FAKTOR_K,
ORDINAERE_AFDRAG, UDBYTTE,
SIKKERHED_BELOEB, RENTER, BIDRAG_ULTIMO, EFF_RENTE_FORR_OMF, EFF_RENTE_ULTIMO,
AAO_belob AS AAO_BELOEB,
FORR_OMFANG, GNS_VAL, PAA_LOEBNE_RENTER, A.NOM_RENTE_SATS,
PRIMO, BOERSKURS_REG, NETTO_TRANS, TAB, VALUTA_REG, REKLASS,
VAERDI_REG, ULTIMO, PRIMO_MV, REKLASS_MV, TAB_MV,
VALUTA_REG_MV, NETTO_TRANS_MV, ULTIMO_MV, BOERSKURS_REG_MV,
PRIMO_NOM, NETTO_TRANS_NOM, REKLASS_NOM, TAB_NOM, VALUTA_REG_NOM,
ULTIMO_NOM, PRIMO_ANTAL, NETTO_TRANS_ANTAL, ULTIMO_ANTAL,
A.EFF_DT, KTTP,
LOAN_PURPOSE, IDKT_START_DT, REDEEMABLE_CODE,
NOM_INDEKS_MK, RENTE_SKIFT_DT, MDR_TIL_RENTESKIFT, STEPUP_DT, UDLOEB_DT,
A.INTERNET_ACCESS_MK,
ULTIMO_OPRVAL_MV, ULTIMO_OPRVAL_NOM, PRIMO_OPRVAL_MV,
PRIMO_OPRVAL_NOM, AFDRAG_DT, AFDRAG_DT_MK, NET_OPRVAL_NOM,
NET_OPRVAL_NOM_MK, NET_OPRVAL_MV, NET_OPRVAL_MV_MK, TAB_OPRVAL_NOM,
TAB_OPRVAL_MV, B_REG_OPRVAL, B_REG_OPRVAL_MK, VAL_REG_OPR_MK, NET_TRANS_JUST_MV, NET_TRANS_JUST_NOM, NET_TRANS_JUST,
ULTIMO_KORR_MV, ULTIMO_KORR_NOM, VAL_KURS_PRIMO,
VAL_KURS_PRIMO_MK, VAL_KURS_ULTIMO, VAL_KURS_ULTIMO_MK,
SEKTOR_INFO, CORR_MK, CORR_UPDATED_MK, VAL_KURS_GNS, REPORTING_DT, ONE_DAY_MAT_MK, ULTIMO_OPRVAL,
LTrim(RTrim(FULLPDST)) AS FULLPDST,
afdrag_opr, b_reg_oprval_mv, b_reg_oprval_mv_mk,
deriv_type, indeks_mk, koeb_salg_mk, new_business_cause,
payment_mk, position, ppstm_papir_type, uafviklet_mk, ultimo_korr,
used_in_repo_mk, val_reg_opr, val_reg_opr_mv_mk, val_reg_opr_mv,
LTrim(RTrim(A.IDKT)) AS IDKT,
LTrim(RTrim(VALUTA)) AS VALUTA,
LTrim(RTrim(AMOUNT_TYPE)) AS AMOUNT_TYPE,
LTrim(RTrim(EXT_CONV_MK)) AS EXT_CONV_MK,
LTrim(RTrim(NEW_BUS_MK)) AS NEW_BUS_MK,
LTrim(RTrim(KNID)) AS KNID,
LTrim(RTrim(REG_NO)) AS REG_NO,
opts_acc,
opts_cust ,
SURR_ID,
LTrim(RTrim(BRANCHE)) AS BRANCHE ,
LTrim(RTrim(CVR)) AS CVR,
LTrim(RTrim(KOMMUNE)) AS KOMMUNE,
LTrim(RTrim(LAND)) AS LAND,
SubString(LTrim(RTrim(LAND_M)), 1,2) AS LAND_M,
LTrim(RTrim(SEKTOR)) AS SEKTOR ,
LTrim(RTrim(SEKTOR_2)) AS SEKTOR_2 ,
LTrim(RTrim(SEKTOR_M1)) AS  SEKTOR_M1,
LTrim(RTrim(KNGR)) AS KNGR,
LTrim(RTrim(A.FAST_RENTE_MK)) AS FAST_RENTE_MK,
LTrim(RTrim(A.SIKKERHED_KD)) AS SIKKERHED_KD ,
EXCHANGE_M_FACTOR,
B.SURR_ID_ACC  AS  SURR_ID_ACC  
FROM [RHRH_MFI3].[dim].BOD_RHB_GFS_REPORTING A
INNER JOIN
[RHRH_MFI3].[dim].BOD_ACC_DIM  B
ON
A.IDKT = B.IDKT and A.opts_acc = B.EFF_DT
WHERE
A.LoadStatusId = 415763642
and SURR_ID in (select SURR_ID from [RHRH_MFI3].[dim].BOD_RHB_GFS_REPORTING
                              where LoadStatusId = 415763642
                              group by surr_id
                              having count(*) = 1)
Hi Experts,

This table BOD_RHB_GFS_REPORTING has these many rows.220298911.
Iam trying to find the duplicates as well is there any other way to write this query and find the duplicates as well.
Thanks in advance.
0
Comment
Question by:Sandeepiii
  • 6
  • 3
  • 2
  • +1
12 Comments
 
LVL 8

Expert Comment

by:didnthaveaname
Comment Utility
Can you give us the definition of the BOD_RHB_GFS_REPORTING table as well as an example of 2 rows that are duplicates? (I am thinking a self join would work well for this, but it's tough to say without knowing what constitutes a duplicate and what the structure of the table is).  Thanks!
0
 

Author Comment

by:Sandeepiii
Comment Utility
well generally there won't be any duplicates ,but we just try to find them.
here is the table definition,i think putting index on SURR_ID,LoadStatusId colu,mn can help.
but is it okay if i ,make non clustered index or non clustered on these column.
Thanks for your reply and answer.
Its joining with table BOD_ACC_DIM which is having these many rows 44633965
definition.xlsx
0
 
LVL 8

Expert Comment

by:didnthaveaname
Comment Utility
Is there/what is the primary key on this table?  And just for clarification: you're looking for instances where the exact same data, aside from the primary key, exists, correct?
0
 
LVL 48

Accepted Solution

by:
PortletPaul earned 500 total points
Comment Utility
From a reporting table of 220m rows and 193 columns, a query of 126 columns, plus an inner join to some other table, then some where conditions involving on the reporting table.

Can all of that, plus detection of 'duplicates' (in the reporting table) be included in the one query?

I wouldn't recommend it.
If you believe there are duplicates in the reporting table I'd be wanting to figure that out independently. As others have indicated above you haven't really defined what you mean by duplicate. Is it any 2+ rows that share exactly the same information across all 193 columns? if it isn't that, well, what set of conditions do you consider to be a duplicate?

{+edit - sorry}
WHERE A.LoadStatusId = 415763642
      AND SURR_ID IN (
            SELECT SURR_ID
            FROM [RHRH_MFI3].[dim].BOD_RHB_GFS_REPORTING
            WHERE LoadStatusId = 415763642
            GROUP BY surr_id
            HAVING count(*) > 1 /*any results from this? */
            )

Do you get any results from your existing query if the HAVING condition is changed?
Is this what you consider to be the duplication?
0
 

Author Comment

by:Sandeepiii
Comment Utility
Is it any 2+ rows that share exactly the same information across all 193 columns--Yes this the meaning of duplicate.
Thanks for your comments.
0
 
LVL 48

Expert Comment

by:PortletPaul
Comment Utility
220m rows and 193 columns
compare all fields of all rows for full-row duplication

Will you start this query on Friday and hope it's finished by Monday morning?
(i.e. this is not going to be quick or easy)
I'm not sure I want to prepare this query - it's going to be very slow.


is SURR_ID a unique primary key?  (it is non-null)
what Indexes exist on the table? [BOD_RHB_GFS_REPORTING]

what makes you believe there are duplicates? (is there any proof they exist?)
0
Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

 

Author Comment

by:Sandeepiii
Comment Utility
thanks,
right now we are creating
CREATE CLUSTERED INDEX [ixcBOD_LoadStatusIdsurr_id] ON [DIM].[BOD_RHB_GFS_REPORTING]
(
      [LoadStatusId] ASC,
      [surr_id]ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO
and its taking hell lots of time.We are hoping after index the query might speed up atleast by 50% of the time.
0
 
LVL 48

Expert Comment

by:PortletPaul
Comment Utility
For a reporting table, which contains a non-null date (eff_dt) - it would seem to me you are likely to need an index on at least that  as well (for date range filtering as an example)

There are several other date fields (e.g. reporting_dt) which might be even more important for filtering - but it is nullable. I don't recognize many of the field names so I really can't comment in any depth.

to be honest, it surprises me you are only now adding indexes
and, you still haven't indicated if surr_id is unique or not (I get the feeling it isn't) {+edit}
0
 

Author Comment

by:Sandeepiii
Comment Utility
Hi thanks for your reply ,i was able to create indexes as i told in above query .Took almost 3hrs.Now query  running lot better(30-35min) and its one of the project which have run into trouble and they want instant solution.As you know we dba are always in support when ever there is a trouble and have to give some instant solution.!!!
Thanks for the inputs experts.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
Comment Utility
As you know we dba are always in support when ever there is a trouble and have to give some instant solution.!!!
That is not entirely true.  A true DBA should only support good solutions and encourage users away from bad approaches, such as the one you have here.
0
 

Author Comment

by:Sandeepiii
Comment Utility
I agree with you entirely,thats why iam writing here and taking all your experts advise.
Thanks for your time and advise.
0
 

Author Closing Comment

by:Sandeepiii
Comment Utility
Solution was nice.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.
Via a live example, show how to setup several different housekeeping processes for a SQL Server.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now