Avatar of n_srikanth4
n_srikanth4
Flag for India asked on

Having duplicates with NOLOCK

Hi Team,

     I am getting duplicate records because of NOLOCk , I understand that this is doing a uncommitted (or) dirty reads and ending up with duplicates.
Could you please suggest the Isloation level to be used as I am querying the OLTP Source  table below (BASE_ASSIGNMENTS) which is continuously being updated all the time. I am reading this table and perform the ETL operations and store the aggregated data in the warehouse . Before I store in the warehouse , I do the ETL staging operation where I encounter the duplicate issue .

Please correct this query (used to load Staging  from source table  ) and help me with the solution.

SELECT DISTINCT 21 DATASOURCE_ID, B.KVISUMMARYINTID KVISUMMARYINTID, CAST(B.WAREHOUSE_ID AS NVARCHAR(32)) WH_ID,
FROM BASE_ASSIGNMENTS B WITH (NOLOCK)
LEFT OUTER JOIN
(SELECT ASSIGN_NUM, MIN(TRNDTE) TRNDTE FROM KVI_SUMMARY_HST WITH (NOLOCK) GROUP BY ASSIGN_NUM
) H
 ON B.ASSIGN_NUM = H.ASSIGN_NUM WHERE B.REPORT_DATE BETWEEN '2016-01-24T00:00:00' AND '2016-01-26T23:59:59'

Thanks,
Sreekanth.
DatabasesMicrosoft SQL ServerMicrosoft SQL Server 2005Microsoft SQL Server 2008SQL

Avatar of undefined
Last Comment
n_srikanth4

8/22/2022 - Mon
ste5an

So you're getting those dups while reading the staging table?

In the ETL process there should be no need for NOLOCK at all, cause the E-T-L is an ordered process with discrete steps.
n_srikanth4

ASKER
Hi Ste5an ,

   I am getting the duplicate records while reading the data from the source and loading to the stage table using the query below :

Query :

SELECT DISTINCT 21 DATASOURCE_ID, B.KVISUMMARYINTID KVISUMMARYINTID, CAST(B.WAREHOUSE_ID AS NVARCHAR(32)) WH_ID,
FROM BASE_ASSIGNMENTS B WITH (NOLOCK)
LEFT OUTER JOIN
(SELECT ASSIGN_NUM, MIN(TRNDTE) TRNDTE FROM KVI_SUMMARY_HST WITH (NOLOCK) GROUP BY ASSIGN_NUM
) H
 ON B.ASSIGN_NUM = H.ASSIGN_NUM WHERE B.REPORT_DATE BETWEEN '2016-01-24T00:00:00' AND '2016-01-26T23:59:59'


Thanks,
Sreekanth.
ASKER CERTIFIED SOLUTION
PortletPaul

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
ste5an

So the source is a different system. In this case don't use NOLOCK or READ UNCOMMITED.

What's wrong with READ COMMITED?

Alternatives are SNAPSHOT and RCSI.
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
n_srikanth4

ASKER
Hi Ste5an ,

 Could you please provide me an example queries for READ COMMITED including the alternatives SNAPSHOT and RCSI . What is the best one to use in this scenario where the source tables are always updating and I am reading the data from the source at the same time  and I want to ensure that I want to avoid reading the same data twice  to avoid duplicates while reading the data from the source .

Thanks,
Sreekanth.
ste5an

What does "source tables are always updating" mean?
n_srikanth4

ASKER
Ste5an , source tables are always updating means they are transactional OLTP systems which are updated all the time and I am the consumer of the data and I read the transactional tables while they are being updated .
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ste5an

For a normal OLTP using READ COMMITED is imho sufficient.

The problem is your definition of "always" and "all the time".

Without a concrete analysis of your system it's hard to tell. There is for sure a tipping point, where using log shipping to a second instance or stream processing must be considered.
n_srikanth4

ASKER
ste5an, Could you please give a simple example(query) with regards to my scenario how the READ COMMITED should work (or) considering any other aletrnatives like using sharedlocks\log shipping\stream processing , if so how to use them ?
ste5an

Don't use NOLOCK and check your used isolation level. When you don't have specified one, then it's READ COMMITED. You can run DBCC USEROPTIONS; to check it.

For the usage of different isolation levels or strategies an concrete analysis is required, but this is out-of-scope for at least of me.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
Vitor Montalvão

n_srikanth4, did you try to run the process without the NOLOCK hint as others Experts already suggested?
n_srikanth4

ASKER
good