Having duplicates with NOLOCK

Hi Team,

     I am getting duplicate records because of NOLOCk , I understand that this is doing a uncommitted (or) dirty reads and ending up with duplicates.
Could you please suggest the Isloation level to be used as I am querying the OLTP Source  table below (BASE_ASSIGNMENTS) which is continuously being updated all the time. I am reading this table and perform the ETL operations and store the aggregated data in the warehouse . Before I store in the warehouse , I do the ETL staging operation where I encounter the duplicate issue .

Please correct this query (used to load Staging  from source table  ) and help me with the solution.

SELECT DISTINCT 21 DATASOURCE_ID, B.KVISUMMARYINTID KVISUMMARYINTID, CAST(B.WAREHOUSE_ID AS NVARCHAR(32)) WH_ID,
FROM BASE_ASSIGNMENTS B WITH (NOLOCK)
LEFT OUTER JOIN
(SELECT ASSIGN_NUM, MIN(TRNDTE) TRNDTE FROM KVI_SUMMARY_HST WITH (NOLOCK) GROUP BY ASSIGN_NUM
) H
 ON B.ASSIGN_NUM = H.ASSIGN_NUM WHERE B.REPORT_DATE BETWEEN '2016-01-24T00:00:00' AND '2016-01-26T23:59:59'

Thanks,
Sreekanth.
n_srikanth4Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ste5anSenior DeveloperCommented:
So you're getting those dups while reading the staging table?

In the ETL process there should be no need for NOLOCK at all, cause the E-T-L is an ordered process with discrete steps.
n_srikanth4Author Commented:
Hi Ste5an ,

   I am getting the duplicate records while reading the data from the source and loading to the stage table using the query below :

Query :

SELECT DISTINCT 21 DATASOURCE_ID, B.KVISUMMARYINTID KVISUMMARYINTID, CAST(B.WAREHOUSE_ID AS NVARCHAR(32)) WH_ID,
FROM BASE_ASSIGNMENTS B WITH (NOLOCK)
LEFT OUTER JOIN
(SELECT ASSIGN_NUM, MIN(TRNDTE) TRNDTE FROM KVI_SUMMARY_HST WITH (NOLOCK) GROUP BY ASSIGN_NUM
) H
 ON B.ASSIGN_NUM = H.ASSIGN_NUM WHERE B.REPORT_DATE BETWEEN '2016-01-24T00:00:00' AND '2016-01-26T23:59:59'


Thanks,
Sreekanth.
PortletPaulEE Topic AdvisorCommented:
Do NOT use "with (nolock)", just remove it from the query.

Also, you should NOT use "between" for your date range. The way you are currently using it creates a one second gap. Instead of
BETWEEN '2016-01-24T00:00:00' AND '2016-01-26T23:59:59'
use
column >= '20160124'  AND column < '20160127' --YYYYMMDD is the safest date format in SQL Server
SELECT DISTINCT
      21 DATASOURCE_ID
    , B.KVISUMMARYINTID KVISUMMARYINTID
    , CAST(B.WAREHOUSE_ID AS nvarchar(32)) WH_ID
   ....
   , H.TRNDTE
FROM BASE_ASSIGNMENTS B
      LEFT OUTER JOIN (
            SELECT
                  ASSIGN_NUM
                , MIN(TRNDTE) TRNDTE
            FROM KVI_SUMMARY_HST WITH (NOLOCK)
            GROUP BY
                  ASSIGN_NUM
      ) H ON B.ASSIGN_NUM = H.ASSIGN_NUM
WHERE B.REPORT_DATE >= '20160124'
      AND B.REPORT_DATE < '20160127'

Open in new window

for more on the between topic please see: "Beware of Between"
I assume H.TRNDTE is actually used in the query otherwise you should just stop using the derived table

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Your Guide to Achieving IT Business Success

The IT Service Excellence Tool Kit has best practices to keep your clients happy and business booming. Inside, you’ll find everything you need to increase client satisfaction and retention, become more competitive, and increase your overall success.

ste5anSenior DeveloperCommented:
So the source is a different system. In this case don't use NOLOCK or READ UNCOMMITED.

What's wrong with READ COMMITED?

Alternatives are SNAPSHOT and RCSI.
n_srikanth4Author Commented:
Hi Ste5an ,

 Could you please provide me an example queries for READ COMMITED including the alternatives SNAPSHOT and RCSI . What is the best one to use in this scenario where the source tables are always updating and I am reading the data from the source at the same time  and I want to ensure that I want to avoid reading the same data twice  to avoid duplicates while reading the data from the source .

Thanks,
Sreekanth.
ste5anSenior DeveloperCommented:
What does "source tables are always updating" mean?
n_srikanth4Author Commented:
Ste5an , source tables are always updating means they are transactional OLTP systems which are updated all the time and I am the consumer of the data and I read the transactional tables while they are being updated .
ste5anSenior DeveloperCommented:
For a normal OLTP using READ COMMITED is imho sufficient.

The problem is your definition of "always" and "all the time".

Without a concrete analysis of your system it's hard to tell. There is for sure a tipping point, where using log shipping to a second instance or stream processing must be considered.
n_srikanth4Author Commented:
ste5an, Could you please give a simple example(query) with regards to my scenario how the READ COMMITED should work (or) considering any other aletrnatives like using sharedlocks\log shipping\stream processing , if so how to use them ?
ste5anSenior DeveloperCommented:
Don't use NOLOCK and check your used isolation level. When you don't have specified one, then it's READ COMMITED. You can run DBCC USEROPTIONS; to check it.

For the usage of different isolation levels or strategies an concrete analysis is required, but this is out-of-scope for at least of me.
Vitor MontalvãoMSSQL Senior EngineerCommented:
n_srikanth4, did you try to run the process without the NOLOCK hint as others Experts already suggested?
n_srikanth4Author Commented:
good
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server 2008

From novice to tech pro — start learning today.