Link to home
Start Free TrialLog in
Avatar of n_srikanth4
n_srikanth4Flag for India

asked on

Having duplicates with NOLOCK

Hi Team,

     I am getting duplicate records because of NOLOCk , I understand that this is doing a uncommitted (or) dirty reads and ending up with duplicates.
Could you please suggest the Isloation level to be used as I am querying the OLTP Source  table below (BASE_ASSIGNMENTS) which is continuously being updated all the time. I am reading this table and perform the ETL operations and store the aggregated data in the warehouse . Before I store in the warehouse , I do the ETL staging operation where I encounter the duplicate issue .

Please correct this query (used to load Staging  from source table  ) and help me with the solution.

SELECT DISTINCT 21 DATASOURCE_ID, B.KVISUMMARYINTID KVISUMMARYINTID, CAST(B.WAREHOUSE_ID AS NVARCHAR(32)) WH_ID,
FROM BASE_ASSIGNMENTS B WITH (NOLOCK)
LEFT OUTER JOIN
(SELECT ASSIGN_NUM, MIN(TRNDTE) TRNDTE FROM KVI_SUMMARY_HST WITH (NOLOCK) GROUP BY ASSIGN_NUM
) H
 ON B.ASSIGN_NUM = H.ASSIGN_NUM WHERE B.REPORT_DATE BETWEEN '2016-01-24T00:00:00' AND '2016-01-26T23:59:59'

Thanks,
Sreekanth.
Avatar of ste5an
ste5an
Flag of Germany image

So you're getting those dups while reading the staging table?

In the ETL process there should be no need for NOLOCK at all, cause the E-T-L is an ordered process with discrete steps.
Avatar of n_srikanth4

ASKER

Hi Ste5an ,

   I am getting the duplicate records while reading the data from the source and loading to the stage table using the query below :

Query :

SELECT DISTINCT 21 DATASOURCE_ID, B.KVISUMMARYINTID KVISUMMARYINTID, CAST(B.WAREHOUSE_ID AS NVARCHAR(32)) WH_ID,
FROM BASE_ASSIGNMENTS B WITH (NOLOCK)
LEFT OUTER JOIN
(SELECT ASSIGN_NUM, MIN(TRNDTE) TRNDTE FROM KVI_SUMMARY_HST WITH (NOLOCK) GROUP BY ASSIGN_NUM
) H
 ON B.ASSIGN_NUM = H.ASSIGN_NUM WHERE B.REPORT_DATE BETWEEN '2016-01-24T00:00:00' AND '2016-01-26T23:59:59'


Thanks,
Sreekanth.
ASKER CERTIFIED SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
So the source is a different system. In this case don't use NOLOCK or READ UNCOMMITED.

What's wrong with READ COMMITED?

Alternatives are SNAPSHOT and RCSI.
Hi Ste5an ,

 Could you please provide me an example queries for READ COMMITED including the alternatives SNAPSHOT and RCSI . What is the best one to use in this scenario where the source tables are always updating and I am reading the data from the source at the same time  and I want to ensure that I want to avoid reading the same data twice  to avoid duplicates while reading the data from the source .

Thanks,
Sreekanth.
What does "source tables are always updating" mean?
Ste5an , source tables are always updating means they are transactional OLTP systems which are updated all the time and I am the consumer of the data and I read the transactional tables while they are being updated .
For a normal OLTP using READ COMMITED is imho sufficient.

The problem is your definition of "always" and "all the time".

Without a concrete analysis of your system it's hard to tell. There is for sure a tipping point, where using log shipping to a second instance or stream processing must be considered.
ste5an, Could you please give a simple example(query) with regards to my scenario how the READ COMMITED should work (or) considering any other aletrnatives like using sharedlocks\log shipping\stream processing , if so how to use them ?
Don't use NOLOCK and check your used isolation level. When you don't have specified one, then it's READ COMMITED. You can run DBCC USEROPTIONS; to check it.

For the usage of different isolation levels or strategies an concrete analysis is required, but this is out-of-scope for at least of me.
n_srikanth4, did you try to run the process without the NOLOCK hint as others Experts already suggested?
good