n_srikanth4
asked on
Having duplicates with NOLOCK
Hi Team,
I am getting duplicate records because of NOLOCk , I understand that this is doing a uncommitted (or) dirty reads and ending up with duplicates.
Could you please suggest the Isloation level to be used as I am querying the OLTP Source table below (BASE_ASSIGNMENTS) which is continuously being updated all the time. I am reading this table and perform the ETL operations and store the aggregated data in the warehouse . Before I store in the warehouse , I do the ETL staging operation where I encounter the duplicate issue .
Please correct this query (used to load Staging from source table ) and help me with the solution.
SELECT DISTINCT 21 DATASOURCE_ID, B.KVISUMMARYINTID KVISUMMARYINTID, CAST(B.WAREHOUSE_ID AS NVARCHAR(32)) WH_ID,
FROM BASE_ASSIGNMENTS B WITH (NOLOCK)
LEFT OUTER JOIN
(SELECT ASSIGN_NUM, MIN(TRNDTE) TRNDTE FROM KVI_SUMMARY_HST WITH (NOLOCK) GROUP BY ASSIGN_NUM
) H
ON B.ASSIGN_NUM = H.ASSIGN_NUM WHERE B.REPORT_DATE BETWEEN '2016-01-24T00:00:00' AND '2016-01-26T23:59:59'
Thanks,
Sreekanth.
I am getting duplicate records because of NOLOCk , I understand that this is doing a uncommitted (or) dirty reads and ending up with duplicates.
Could you please suggest the Isloation level to be used as I am querying the OLTP Source table below (BASE_ASSIGNMENTS) which is continuously being updated all the time. I am reading this table and perform the ETL operations and store the aggregated data in the warehouse . Before I store in the warehouse , I do the ETL staging operation where I encounter the duplicate issue .
Please correct this query (used to load Staging from source table ) and help me with the solution.
SELECT DISTINCT 21 DATASOURCE_ID, B.KVISUMMARYINTID KVISUMMARYINTID, CAST(B.WAREHOUSE_ID AS NVARCHAR(32)) WH_ID,
FROM BASE_ASSIGNMENTS B WITH (NOLOCK)
LEFT OUTER JOIN
(SELECT ASSIGN_NUM, MIN(TRNDTE) TRNDTE FROM KVI_SUMMARY_HST WITH (NOLOCK) GROUP BY ASSIGN_NUM
) H
ON B.ASSIGN_NUM = H.ASSIGN_NUM WHERE B.REPORT_DATE BETWEEN '2016-01-24T00:00:00' AND '2016-01-26T23:59:59'
Thanks,
Sreekanth.
ASKER
Hi Ste5an ,
I am getting the duplicate records while reading the data from the source and loading to the stage table using the query below :
Query :
SELECT DISTINCT 21 DATASOURCE_ID, B.KVISUMMARYINTID KVISUMMARYINTID, CAST(B.WAREHOUSE_ID AS NVARCHAR(32)) WH_ID,
FROM BASE_ASSIGNMENTS B WITH (NOLOCK)
LEFT OUTER JOIN
(SELECT ASSIGN_NUM, MIN(TRNDTE) TRNDTE FROM KVI_SUMMARY_HST WITH (NOLOCK) GROUP BY ASSIGN_NUM
) H
ON B.ASSIGN_NUM = H.ASSIGN_NUM WHERE B.REPORT_DATE BETWEEN '2016-01-24T00:00:00' AND '2016-01-26T23:59:59'
Thanks,
Sreekanth.
I am getting the duplicate records while reading the data from the source and loading to the stage table using the query below :
Query :
SELECT DISTINCT 21 DATASOURCE_ID, B.KVISUMMARYINTID KVISUMMARYINTID, CAST(B.WAREHOUSE_ID AS NVARCHAR(32)) WH_ID,
FROM BASE_ASSIGNMENTS B WITH (NOLOCK)
LEFT OUTER JOIN
(SELECT ASSIGN_NUM, MIN(TRNDTE) TRNDTE FROM KVI_SUMMARY_HST WITH (NOLOCK) GROUP BY ASSIGN_NUM
) H
ON B.ASSIGN_NUM = H.ASSIGN_NUM WHERE B.REPORT_DATE BETWEEN '2016-01-24T00:00:00' AND '2016-01-26T23:59:59'
Thanks,
Sreekanth.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
So the source is a different system. In this case don't use NOLOCK or READ UNCOMMITED.
What's wrong with READ COMMITED?
Alternatives are SNAPSHOT and RCSI.
What's wrong with READ COMMITED?
Alternatives are SNAPSHOT and RCSI.
ASKER
Hi Ste5an ,
Could you please provide me an example queries for READ COMMITED including the alternatives SNAPSHOT and RCSI . What is the best one to use in this scenario where the source tables are always updating and I am reading the data from the source at the same time and I want to ensure that I want to avoid reading the same data twice to avoid duplicates while reading the data from the source .
Thanks,
Sreekanth.
Could you please provide me an example queries for READ COMMITED including the alternatives SNAPSHOT and RCSI . What is the best one to use in this scenario where the source tables are always updating and I am reading the data from the source at the same time and I want to ensure that I want to avoid reading the same data twice to avoid duplicates while reading the data from the source .
Thanks,
Sreekanth.
What does "source tables are always updating" mean?
ASKER
Ste5an , source tables are always updating means they are transactional OLTP systems which are updated all the time and I am the consumer of the data and I read the transactional tables while they are being updated .
For a normal OLTP using READ COMMITED is imho sufficient.
The problem is your definition of "always" and "all the time".
Without a concrete analysis of your system it's hard to tell. There is for sure a tipping point, where using log shipping to a second instance or stream processing must be considered.
The problem is your definition of "always" and "all the time".
Without a concrete analysis of your system it's hard to tell. There is for sure a tipping point, where using log shipping to a second instance or stream processing must be considered.
ASKER
ste5an, Could you please give a simple example(query) with regards to my scenario how the READ COMMITED should work (or) considering any other aletrnatives like using sharedlocks\log shipping\stream processing , if so how to use them ?
Don't use NOLOCK and check your used isolation level. When you don't have specified one, then it's READ COMMITED. You can run DBCC USEROPTIONS; to check it.
For the usage of different isolation levels or strategies an concrete analysis is required, but this is out-of-scope for at least of me.
For the usage of different isolation levels or strategies an concrete analysis is required, but this is out-of-scope for at least of me.
n_srikanth4, did you try to run the process without the NOLOCK hint as others Experts already suggested?
ASKER
good
In the ETL process there should be no need for NOLOCK at all, cause the E-T-L is an ordered process with discrete steps.