Solved

working table violations

Posted on 2008-06-11
29
409 Views
Last Modified: 2008-06-16
i have a proc which bulk loads a ton of data into two tables daily (see below).  Unfortunately, i have to change it to write to a working table FIRST, and then to the perm table.   This is because the datasource is having issues compiling the records, and continually violates the PK.  So...

1. I need to bulk insert into working first.
2. I will then copy all distinct values from working into perm table.
3. I will then purge all records from working EXCEPT for the violations.

This last step is simply due to the volume of data (3 to 8M records nightly), and I will retain those 'violations' for 24 hrs such that whomever can review them, and hopefully resolve the reason for their existence.

Then, tomorrow night when job runs again (invokes the procedure), I will purge out that working table with a new first step in the proc (truncate table working).

My PK is non-clustered based on OrderNo, Mindex.  

Example failure - joeApp wrote two records to the perm table back on 5/18 with this orderno, mindex:
2016, 0
2016, 1

They attempted to write two new records today with the same Orderno, Mindex.   I know I could alter the PK to include date or something, and that would prevent this type of failure.  But I'd really rather avoid this, if I can.  They say this problem is 'likely every now and then', and I need to workaround it.

All of this is done in one procedure, which is cool, I just need to change the proc to account for steps 2 and 3.  I am concerned about the batched insert from working into perm, to avoid blowing the tran log, and I am concerned about the best way to purge everything BUT the violations.

I was wondering if anybody has any advice or suggestions on either.

CREATE PROC dbo.procname
AS
SET NOCOUNT ON

DECLARE @maxid1  int
DECLARE @maxid2  int
SELECT @maxid1 = IsNull (max(eventID), 0) from database.dbo.table1
SELECT @maxid2 = IsNull (max(eventID), 0) from database.dbo.table2

BEGIN
   BEGIN
   DECLARE @fileExists1 INT
   EXEC master.dbo.xp_fileexist '\\ServerName\ShareName\file1.csv', @fileExists1 OUTPUT
   IF (@fileExists1 = 1)
     BEGIN
       BULK INSERT database.dbo.table1
       FROM '\\ServerName\ShareName\file1.csv'
       WITH(FORMATFILE = 'E:\MSSQL\Tools\file1.fmt',FIELDTERMINATOR = '\t',ROWTERMINATOR = '\r',FIRSTROW=1,TABLOCK,ROWS_PER_BATCH = 100000)
     END
     ELSE
      PRINT 'file1.csv does not exist.'
     END

BEGIN
  DECLARE @fileExists32 INT
  EXEC master.dbo.xp_fileExist '\\ServerName\ShareName\file2.csv',@fileExists32 OUTPUT
  IF @fileExists32 = 1
     BEGIN
       BULK INSERT database.dbo.table2
       FROM '\\ServerName\ShareName\file2.csv'
       WITH(FORMATFILE = 'E:\MSSQL\Tools\file2.fmt',FIELDTERMINATOR = ',',ROWTERMINATOR = '\r',FIRSTROW=1,TABLOCK,ROWS_PER_BATCH = 100000)
    END
    ELSE
        PRINT 'file2.csv does not exist.'
    END

SET NOCOUNT ON
INSERT database.dbo.ProcessLog (EventIDStart,EventIdEnd,InsertTime)
SELECT @maxid1 + 1, (select IsNull (max(EventID), 0)  from database.dbo.table1),  getdate()
SET NOCOUNT OFF

SET NOCOUNT ON
INSERT database.dbo.ProcessLog (EventIDStart,EventIdEnd,InsertTime)
SELECT @maxid2 + 1, (select IsNull (max(EventID), 0)  from database.dbo.table2),  getdate()
SET NOCOUNT OFF

EXEC master..xp_cmdshell '\\ServerName\E$\MSSQL\Tools\databaseName.bat', no_output
END

SET NOCOUNT OFF
GO
0
Comment
Question by:dbaSQL
  • 16
  • 13
29 Comments
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
1, easy, changed bulk insert into working table
2., copy all distinct values from working table into perm  ---  I am certain this could be smarter, I am not even sure if my left join is correct.  basically, there may be existing values, so i'm trying to insert table1 from table1working where working.orderno isn't in table1.  (you know... new stuff)
i would love to hear suggestions, this is what i've got,  I have changed the proc i posted up there - the bulk insert is going into the working tables, then each is followed with one of these:

set rowcount 100000
declare @rc1 int
set @rc1 = 100000
while @rc1 =100000
  begin
      begin transaction
          
     INSERT INTO database.dbo.table1
(EventTime,MsgType,MIndex,OrderNo,ClorID,ExchangeID,GWID,EndPoint,Trader,Acct,Client,Symbol,Price,Side,Quantity,TimeInForce,ExecInstruction)
 SELECT DISTINCT w.EventTime,w.MsgType,w.MIndex,w.OrderNo,w.ClorID,w.ExchangeID,w.GWID,w.EndPoint,w.Trader,w.Acct,w.Client,w.Symbol,w.Price,w.Side,w.Quantity,w.TimeInForce,w.ExecInstruction
 FROM database.dbo.table1working w WITH (TABLOCKX)
 LEFT JOIN database.dbo.table1 o
       ON w.OrderNo = o.OrderNo
       WHERE o.OrderNo IS NULL

      select @rc1=@@rowcount
      commit
   end
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
That is actually pretty good. Wouldn't do too many things differently. Left join is correct - so long as there aren't line numbers within the Orderno, otherwise you might need to extend it a bit more (often there is multiple lines per order - might not be that way for you).

You should really capture the exceptions, and possibly come up with a "rule" as to what to do with dupes - such as "last one wins", and at tleast that way is a bit more predictable. In that regard, you may want an order by or equivelent criteria (such as max (id) or max(date) from the working table).

As a rule of thumb, always best to use a "staging" table, do the validation on it, flag / mask exceptions somehow, then update "clean" data, or fail the batch.

What is it that is concerning you, or, why do you have doubts ?

0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
>>As a rule of thumb, always best to use a "staging" table, do the validation on it, flag / mask exceptions somehow, then update "clean" data, or fail the batch.
Agreed.  That's why I gotta do this.

>>(often there is multiple lines per order - might not be that way for you).
There is, Mark.  For example, the two that prompts me to make this change --

orderno    mindex
2016,          0
2016,          1

one orderno can be present up to five times (five different possible mindexes)

This may be problematic, yes?  I mean, the way I have it now, I'm only looking for orderno, not mindex.  Shouldn't I change this?  

Also, given the volumes of data, this thing is waaaaaaaaaaaaaaaaaaaaaaaay too slow.  Last night I figured I'd modify a tiny bit more by first dropping the indices, running the bulk insert into working, the copy into perm, then rebuilding the indices.  I reaaaallly would like to do this all within one procedure.  this is because it's not me that is firing this -- it is via the front end, and I want them to only invoke one proc to get the job done.  I am having massive syntax problems getting the indexes dropped, doing the work, and recreating them, within one procedure.

Are you able to advise?
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Yes it is problematic, because as soon as line 1 (ie mindex = 0) is inserted, then the orderno will exist and will stop populating...

Yep, can advise... Will have a little think first...

How much data is there ? Shouldn't be too bad... Do you want to post some data structures and maybe the raw file (and the format) / Might make it easier...
0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
um... how should I change it to account for mindex, too?
data -- two tables -- one averages about 2 to 4M daily, the other is upwards of 4 to 6M daily.
i will genericize the table def and attach shortly.  the raw files are toooooo huge.  what do you think about that?
0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
these are the two tables.  but more specifically, what of the drop/recreation of the indices, before and after the data load?  i'm trying it, but the syntax isn't accurate.

--this is the bigger one (4 to 6M daily)
CREATE TABLE [dbo].[table1] (
      [InsertTime] [datetime] NOT NULL ,
      [EventID] [bigint] IDENTITY (1, 1) NOT NULL ,
      [EventTime] [datetime] NOT NULL ,
      [MsgType] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [MIndex] [int] NOT NULL ,
      [OrderNo] [char] (16) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [ClorID] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [ExchangeID] [varchar] (55) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [GWID] [int] NOT NULL ,
      [EndPoint] [varchar] (8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [Trader] [varchar] (16) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [Acct] [varchar] (16) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [Client] [varchar] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [Symbol] [varchar] (16) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [Price] [decimal](18, 8) NOT NULL ,
      [Side] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [Quantity] [bigint] NOT NULL ,
      [TimeInForce] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [ExecInstruction] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [mydb_Data]
GO

--this is the smaller one (2 to 4M daily)
CREATE TABLE [dbo].[table2] (
      [InsertTime] [datetime] NOT NULL ,
      [EventID] [bigint] IDENTITY (1, 1) NOT NULL ,
      [EventTime] [datetime] NOT NULL ,
      [MsgType] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [MIndex] [int] NOT NULL ,
      [OrderNo] [char] (16) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [ClorID] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
      [GWID] [int] NOT NULL
) ON [mydb_Data]
GO
0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
i've even tried to create two smaller procs -- usp_dropIndexes, usp_createIndexes -- that i would just fire off in the proc i originally posted.  (the drop up top, the create down below)

it won't work
0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
this is my original effort, w/out the two smaller procs -- can you help me clean this up, mark?

CREATE PROC [dbo].[usp_loaddatabase2]
AS
SET NOCOUNT ON
-- Drop the indexes
  DROP INDEX table1.idx_table1_EventTime
  DROP INDEX table1.idx_table1_InsertTime
  DROP INDEX table1.idx_table1_OrderNo
  ALTER TABLE table1 DROP CONSTRAINT PK_table1_OrderNo_MIndex

BEGIN
  BEGIN
  DECLARE @fileExists1 INT
  EXEC master.dbo.xp_fileexist '\\servername\database\file1.csv', @fileExists1 OUTPUT
  IF (@fileExists1 = 1)
  BEGIN
    BULK INSERT database.dbo.table1Working
    FROM '\\servername\database\file1.csv'
WITH(FORMATFILE = 'E:\MSSQL\Tools\orders.fmt',FIELDTERMINATOR = '\t',ROWTERMINATOR = '\r',FIRSTROW=1,TABLOCK,ROWS_PER_BATCH = 100000)
  END
  ELSE
     PRINT 'file1.csv does not exist.'
END

   --Insert rows into table1 from table1Working where OrderNo, MIndex doesnt already exist
   --Set rowcount to 100000 to limit number of inserts per batch
    set rowcount 100000
      declare @rc1 int
      set @rc1 = 100000
      while @rc1 =100000
      begin
       begin transaction
          
     INSERT INTO database.dbo.table1 (EventTime,MsgType,MIndex,OrderNo,ClorID,ExchangeID,GWID,EndPoint,trader,acct,
Client,Symbol,Price,Side,Quantity,TimeInForce,ExecInstruction)
 SELECT DISTINCT w.EventTime,w.MsgType,w.MIndex,w.OrderNo,w.ClorID,w.ExchangeID,w.GWID,w.EndPoint,w.trader,w.acct,w.Client,w.Symbol,w.Price,w.Side,w.Quantity,w.TimeInForce,w.ExecInstruction
 FROM database.dbo.table1Working w WITH (TABLOCKX)
   LEFT JOIN database.dbo.table1 o
       ON w.OrderNo = o.OrderNo
     WHERE o.OrderNo IS NULL
   
  --Get number of rows updated, batch process will continue until less than 100000
     select @rc1=@@rowcount
     --Commit the transaction
     commit
   end

-- Drop the indexes  exec sp_helpindex table2
  DROP INDEX table2.idx_table2_EventTime
  DROP INDEX table2.idx_table2_InsertTime
  DROP INDEX table2.idx_table2_OrderNo
  ALTER TABLE table2 DROP CONSTRAINT PK_table2_OrderNo_MIndex

 BEGIN
 DECLARE @fileExists32 INT
 EXEC master.dbo.xp_fileExist '\\servername\database\file2.csv',@fileExists32 OUTPUT
  IF @fileExists32 = 1
 BEGIN
  BULK INSERT database.dbo.table2Working
  FROM '\\servername\database\file2.csv'
WITH(FORMATFILE = 'E:\MSSQL\Tools\cancels.fmt',FIELDTERMINATOR = ',',ROWTERMINATOR = '\r',FIRSTROW=1,TABLOCK,ROWS_PER_BATCH = 100000)
 END
 ELSE
    PRINT 'file2.csv does not exist.'
 END

   --Insert rows into table2 from table2Working where OrderNo, MIndex doesnt already exist
   --Set rowcount to 100000 to limit number of inserts per batch
    set rowcount 100000
      declare @rc2 int
      set @rc2 = 100000
      while @rc2 =100000
      begin
       begin transaction
      INSERT INTO database.dbo.table2 (EventTime,MsgType,MIndex,OrderNo,ClorID,GWID)
  SELECT DISTINCT w.EventTime,w.MsgType,w.MIndex,w.OrderNo,w.ClorID,w.GWID
  FROM database.dbo.table2Working w WITH (TABLOCKX)
   LEFT JOIN database.dbo.table2 o
       ON w.OrderNo = o.OrderNo
  WHERE o.OrderNo IS NULL

      --Get number of rows updated
      --Process will continue until less than 50000
   
     select @rc2=@@rowcount
     --Commit the transaction
     commit

   end
END
--recreate the indices
 CREATE CLUSTERED INDEX [idx_table1_EventTime] ON [dbo].[table1]([EventTime]) WITH  FILLFACTOR = 80 ON [database_Data]
GO
 CREATE INDEX [idx_table1_InsertTime] ON [dbo].[table1]([InsertTime]) WITH  FILLFACTOR = 80 ON [database_Data]
GO
 CREATE INDEX [idx_table1_OrderNo] ON [dbo].[table1]([OrderNo]) WITH  FILLFACTOR = 80 ON [database_Data]
GO
 CREATE CLUSTERED INDEX [idx_table2_EventTime] ON [dbo].[table2]([EventTime]) WITH  FILLFACTOR = 80 ON [database_Data]
GO
 CREATE INDEX [idx_table2_InsertTime] ON [dbo].[table2]([InsertTime]) WITH  FILLFACTOR = 80 ON [database_Data]
GO
 CREATE INDEX [idx_table2_OrderNo] ON [dbo].[table2]([OrderNo]) WITH  FILLFACTOR = 80 ON [database_Data]
GO
ALTER TABLE [dbo].[table1] ADD
      CONSTRAINT [PK_table1_OrderNo_MIndex] PRIMARY KEY  NONCLUSTERED
      (
            [OrderNo],
            [MIndex]
      ) WITH  FILLFACTOR = 80  ON [database_Data]
GO
ALTER TABLE [dbo].[table2] ADD
      CONSTRAINT [PK_table2_OrderNo_MIndex] PRIMARY KEY  NONCLUSTERED
      (
            [OrderNo],
            [MIndex]
      ) WITH  FILLFACTOR = 80  ON [database_Data]
GO


    DECLARE @maxid1  int
   DECLARE @maxid2  int
   SELECT @maxid1 = IsNull (max(eventID), 0) from database.dbo.table1
   SELECT @maxid2 = IsNull (max(eventID), 0) from database.dbo.table2

SET NOCOUNT ON
 INSERT database.dbo.ProcessLog (EventIDStart,EventIdEnd,InsertTime)
 SELECT @maxid1 + 1, (select IsNull (max(EventID), 0)  from database.dbo.table1),  getdate()
SET NOCOUNT OFF

SET NOCOUNT ON
 INSERT database.dbo.ProcessLog (EventIDStart,EventIdEnd,InsertTime)
 SELECT @maxid2 + 1, (select IsNull (max(EventID), 0)  from database.dbo.table2),  getdate()
SET NOCOUNT OFF

EXEC master..xp_cmdshell '\\servername\E$\MSSQL\Tools\database.bat', no_output

SET NOCOUNT OFF
GO
0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
ok, last big post like this, i promise.  here is the 'new' version of the hopefully-new-proc.  i added existence checks on both the drop and recreation of the indices.  i am getting ready to attempt to run it in the dev bed, but i still haven't accounted for the mindex in my left join.  
what do you think?  (very concerned)

CREATE PROC [dbo].[usp_loadDatabase2]
AS
SET NOCOUNT ON

-- Drop the indexes
IF EXISTS (SELECT name from sysindexes WHERE name = 'idx_table1_EventTime')
  DROP INDEX table1.idx_table1_EventTime
IF EXISTS (SELECT name from sysindexes WHERE name = 'idx_table1_InsertTime')
  DROP INDEX table1.idx_table1_InsertTime
IF EXISTS (SELECT name from sysindexes WHERE name = 'idx_table1_OrderNo')
  DROP INDEX table1.idx_table1_OrderNo
IF EXISTS (SELECT name from sysindexes WHERE name = 'PK_table1_OrderNo_MIndex')
  ALTER TABLE table1 DROP CONSTRAINT PK_table1_OrderNo_MIndex
IF EXISTS (SELECT name from sysindexes WHERE name = 'idx_table2_EventTime')
  DROP INDEX table2.idx_table2_EventTime
IF EXISTS (SELECT name from sysindexes WHERE name = 'idx_table2_InsertTime')
  DROP INDEX table2.idx_table2_InsertTime
IF EXISTS (SELECT name from sysindexes WHERE name = 'idx_table2_OrderNo')
  DROP INDEX table2.idx_table2_OrderNo
IF EXISTS (SELECT name from sysindexes WHERE name = 'PK_table2_OrderNo_MIndex')
  ALTER TABLE table2 DROP CONSTRAINT PK_table2_OrderNo_MIndex

BEGIN
      BEGIN
      --load into table1Working
      DECLARE @fileExists1 INT
      EXEC master.dbo.xp_fileexist '\\SERVERNAME\Database\file1.csv', @fileExists1 OUTPUT
      IF (@fileExists1 = 1)
      BEGIN
            BULK INSERT Database.dbo.table1Working
             FROM '\\SERVERNAME\Database\file1.csv'
            WITH(FORMATFILE = 'D:\MSSQL\Tools\orders.fmt',FIELDTERMINATOR = '\t',ROWTERMINATOR = '\r',FIRSTROW=1,TABLOCK,ROWS_PER_BATCH = 100000)
      END
      ELSE
      PRINT 'file1.csv does not exist.'
END

   --Insert rows into table1 from table1Working where OrderNo, MIndex doesnt already exist
   --Set rowcount to 100000 to limit number of inserts per batch
    set rowcount 100000
      declare @rc1 int
      set @rc1 = 100000
      while @rc1 =100000
      begin
       begin transaction
          
     INSERT INTO Database.dbo.table1 (EventTime,MsgType,MIndex,OrderNo,ClorID,ExchangeID,GWID,EndPoint,trader,acct,
   Client,Symbol,Price,Side,Quantity,TimeInForce,ExecInstruction)
 SELECT DISTINCT w.EventTime,w.MsgType,w.MIndex,w.OrderNo,w.ClorID,w.ExchangeID,w.GWID,w.EndPoint,w.trader,w.acct,w.Client, w.Symbol,w.Price,w.Side,w.Quantity,w.TimeInForce,w.ExecInstruction
FROM Database.dbo.table1Working w WITH (TABLOCKX)
   LEFT JOIN Database.dbo.table1 o
       ON w.OrderNo = o.OrderNo
      WHERE o.OrderNo IS NULL

     --Get number of rows updated, batch process will continue until less than 100000
     select @rc1=@@rowcount
     --Commit the transaction
     commit
   end


      BEGIN
    --load into table2working
      DECLARE @fileExists32 INT
      EXEC master.dbo.xp_fileExist '\\SERVERNAME\Database\file2.csv',@fileExists32 OUTPUT
      IF @fileExists32 = 1
      BEGIN
            BULK INSERT Database.dbo.table2Working
             FROM '\\SERVERNAME\Database\file2.csv'
            WITH(FORMATFILE = 'D:\MSSQL\Tools\cancels.fmt',FIELDTERMINATOR = ',',ROWTERMINATOR = '\r',FIRSTROW=1,TABLOCK,ROWS_PER_BATCH = 100000)
      END
      ELSE
      PRINT 'file2.csv does not exist.'
      END

   --Insert rows into table2 from table2Working where OrderNo, MIndex doesnt already exist
   --Set rowcount to 100000 to limit number of inserts per batch
    set rowcount 100000
      declare @rc2 int
      set @rc2 = 100000
      while @rc2 =100000
      begin
       begin transaction
      INSERT INTO Database.dbo.table2 (EventTime,MsgType,MIndex,OrderNo,ClorID,GWID)
        SELECT DISTINCT w.EventTime,w.MsgType,w.MIndex,w.OrderNo,w.ClorID,w.GWID
        FROM Database.dbo.table2Working w WITH (TABLOCKX)
         LEFT JOIN Database.dbo.table2 o
       ON w.OrderNo = o.OrderNo
        WHERE o.OrderNo IS NULL
      --Get number of rows updated
      --Process will continue until less than 50000
   
     select @rc2=@@rowcount
     --Commit the transaction
     commit

   end
END

--recreate the indices
 IF NOT EXISTS (SELECT name from sysindexes WHERE name = 'idx_table1_EventTime')
CREATE CLUSTERED INDEX [idx_table1_EventTime] ON [dbo].[table1]([EventTime]) WITH  FILLFACTOR = 80 ON [Database_Data]
 IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'idx_table1_InsertTime')
CREATE INDEX [idx_table1_InsertTime] ON [dbo].[table1]([InsertTime]) WITH  FILLFACTOR = 80 ON [Database_Data]
 IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'idx_table1_OrderNo')
CREATE INDEX [idx_table1_OrderNo] ON [dbo].[table1]([OrderNo]) WITH  FILLFACTOR = 80 ON [Database_Data]
 IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'idx_table2_EventTime')
CREATE CLUSTERED INDEX [idx_table2_EventTime] ON [dbo].[table2]([EventTime]) WITH  FILLFACTOR = 80 ON [Database_Data]
 IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'idx_table2_InsertTime')
CREATE INDEX [idx_table2_InsertTime] ON [dbo].[table2]([InsertTime]) WITH  FILLFACTOR = 80 ON [Database_Data]
 IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'idx_table2_OrderNo')
CREATE INDEX [idx_table2_OrderNo] ON [dbo].[table2]([OrderNo]) WITH  FILLFACTOR = 80 ON [Database_Data]
 IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'PK_table1_OrderNo_MIndex')
ALTER TABLE [dbo].[table1] ADD
      CONSTRAINT [PK_table1_OrderNo_MIndex] PRIMARY KEY  NONCLUSTERED
      (
            [OrderNo],
            [MIndex]
      ) WITH  FILLFACTOR = 80  ON [Database_Data]
 IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'PK_table2_OrderNo_MIndex')
ALTER TABLE [dbo].[table2] ADD
      CONSTRAINT [PK_table2_OrderNo_MIndex] PRIMARY KEY  NONCLUSTERED
      (
            [OrderNo],
            [MIndex]
      ) WITH  FILLFACTOR = 80  ON [Database_Data]

      --load auditing data into processlog
    DECLARE @maxid1  int
      DECLARE @maxid2  int
      SELECT @maxid1 = IsNull (max(eventID), 0) from Database.dbo.table1
      SELECT @maxid2 = IsNull (max(eventID), 0) from Database.dbo.table2

      SET NOCOUNT ON
       INSERT Database.dbo.ProcessLog (EventIDStart,EventIdEnd,InsertTime)
SELECT @maxid1 + 1, (select IsNull (max(EventID), 0)  from Database.dbo.table1),  getdate()
      SET NOCOUNT OFF

      SET NOCOUNT ON
       INSERT Database.dbo.ProcessLog (EventIDStart,EventIdEnd,InsertTime)
 SELECT @maxid2 + 1, (select IsNull (max(EventID), 0)  from Database.dbo.table2),  getdate()
      SET NOCOUNT OFF

      --exec the bat file to rename/relocate the current .csv's.
      EXEC master..xp_cmdshell '\\SERVERNAME\D$\MSSQL\Tools\Database.bat', no_output

SET NOCOUNT OFF
GO
0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
simply like this?

 LEFT JOIN database.dbo.table1 o
       ON w.OrderNo = o.OrderNo
       AND w.MIndex = o.MIndex
       WHERE o.OrderNo IS NULL
       AND o.MIndex IS NULL
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
OK,I am back, stop changing / posting for a minute, I am looking...

4 to 6 M - is that millions records or megabytes ? I regularly load 10meg datafiles and should be minutes...

the new left join is fine, might try something a bit different though...
0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
million, sorry...
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
No problems, how about copying maybe a page full ? Just want to be able to run and debug, so doesn't have to be millions, just a couple of handfulls...

6 Million is a fair few, but nothing that couldn't be handled, admittedly my 12 million is weekly, not nightly....


Cannot see any truncate / create table - what happens to "yesterday's" records ?

0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
OK, can you tell me about the indexes you have on table 1 and table 2 - particularly : eventtime, inserttime and ordno

ordno is probably a waste because of the PK, eventtime as clustered ?, inserttime - maybe because of "other" requirements ?
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
You will need to keep the PK - it is the basis of the join.

assume that table1working is a replicant of table1 - is that correct ?
0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
>>Cannot see any truncate / create table - what happens to "yesterday's" records ?

oh yeah.... i wish.   data is retained in table1 and table2 'indefinitely'.  they bought me 10tb of disk to sit behind it and they do not want to hear me whine.... :-)

meaning, i cannot archive, purge, anything....

the initial bulk into working is purely to handle possible pk violations.  
- tonight, i bulk insert into working
- copy only legitimately new records to perm table
- purge all data but violations from working table

tomorrow, i start by truncating working table, then i do it all over again


do you have an email i could use to send a txt file?
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
yep, on my bio, or, run this...

select replace(replace('mark[dotwillsatbigpond[dotcom','[dot','.'),'at','@')


at least they bought you some disk !!

Definitely need to check those indexes and definitely do not drop the PK...
0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
oh mark, that is great.  i'm looking at that thing trying to figure out how it relates to my data... and it's your email.  totally cool

the other indexes, i can drop them, yes?
and why do you say not the pk?  oh, wait, nevermind.. i know.

how much data?
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
It is a bit nerdy - but you do not get e-mails from scanners :)

as much as is easy for you - zip it up and keep it under a couple of meg will be fine... Will want a few different orders at least, I can then copy and past a few times to get some dupes.

How are you determining that a particular order was a "violation" after it has been posted ?





0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
>>OK, can you tell me about the indexes you have on table 1 and table 2 - particularly : eventtime, inserttime and ordno
ordno is probably a waste because of the PK, eventtime as clustered ?, inserttime - maybe because of "other" requirements ?

clustered on eventtime because it is paramount, first and foremost in all retrievals
inserttime, simply because a lot/all of my maint tasks go in only by inserttime
orderno, i thought it would be helpful, the two tables are joined in a view to present all current day data by orderno

please recommend anything you like on the indices

i'm just going to pump a bit of data in a txt file, one day only, but not the whole day (way tooo big)
0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
violaters are based purely on the pk -- if orderno/mindex pre-exists, it's a violater  (see from above)

Example failure - joeApp wrote two records to the perm table back on 5/18 with this orderno, mindex:
2016, 0
2016, 1

They attempted to write two new records today with the same Orderno, Mindex.   I know I could alter the PK to include date or something, and that would prevent this type of failure.  But I'd really rather avoid this, if I can.  They say this problem is 'likely every now and then', and I need to workaround it.



zipping/emailing now
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Hi,

Could you please try the following (small sample first, then a big one - need timing, need PK indexes in place) :

SELECT a.* FROM OPENROWSET( BULK '\\SERVERNAME\Database\file2.csv', FORMATFILE = 'D:\MSSQL\Tools\orders.fmt', firstrow = 1) AS a
where not exists (select NULL from table1 where orderno = a.orderno and mindex = a.msgindex)

if the ONLY error is dupe PK, then we might be able to go straight into the table1...

e.g.

SELECT a.* into todays_dupes FROM OPENROWSET( BULK '\\SERVERNAME\Database\file2.csv', FORMATFILE = 'D:\MSSQL\Tools\orders.fmt', firstrow = 1) AS a
where exists (select NULL from table1 where orderno = a.orderno and mindex = a.msgindex)

--then do the insert

insert table1 (...........columns............)
SELECT a.* into todays_dupes FROM OPENROWSET( BULK '\\SERVERNAME\Database\file2.csv', FORMATFILE = 'D:\MSSQL\Tools\orders.fmt', firstrow = 1) AS a
where exists (select NULL from table1 where orderno = a.orderno and mindex = a.msgindex)
0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'BULK'.
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Ummmm.... What version of SQL are you using ?
0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
v2000  sorry, i should have mentioned that.  v2005 is coming, but not any time super soon
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Need to check fairly carefully, had to change a few things to match local names, some nulls, sample data (delimited by comma) etc, put in a few messages as well. think I put it all back correctly ;) Especially check FIELDTERMINATOR and FIRSTROW (sample data delimited by comma, and had header rows)

So, added a new column to end of working1table and working2table being:     status_flag int default 0
it is used to flag dupes (ie status = 1)

The only indexes (which need to be retained) for Table1 and Table2 are :

IF NOT EXISTS (SELECT name from sysindexes WHERE name = 'idx_table1_EventTime')
   CREATE INDEX [idx_table1_EventTime] ON [dbo].[table1]([EventTime]) WITH  FILLFACTOR = 90 ON [Database_Data]
IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'idx_table1_InsertTime')
   CREATE INDEX [idx_table1_InsertTime] ON [dbo].[table1]([InsertTime]) WITH  FILLFACTOR = 90 ON [Database_Data]
IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'PK_table1_OrderNo_MIndex')
   ALTER TABLE [dbo].[table1] ADD
      CONSTRAINT [PK_table1_OrderNo_MIndex] PRIMARY KEY  NONCLUSTERED
      (
            [OrderNo],
            [MIndex]
      ) WITH  FILLFACTOR = 90  ON [Database_Data]

IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'idx_table2_EventTime')
   CREATE INDEX [idx_table2_EventTime] ON [dbo].[table2]([EventTime]) WITH  FILLFACTOR = 90 ON [Database_Data]
IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'idx_table2_InsertTime')
   CREATE INDEX [idx_table2_InsertTime] ON [dbo].[table2]([InsertTime]) WITH  FILLFACTOR = 90 ON [Database_Data]
IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'PK_table2_OrderNo_MIndex')
   ALTER TABLE [dbo].[table2] ADD
      CONSTRAINT [PK_table2_OrderNo_MIndex] PRIMARY KEY  NONCLUSTERED
      (
            [OrderNo],
            [MIndex]
      ) WITH  FILLFACTOR = 90  ON [Database_Data]


CREATE PROC [dbo].[usp_loadDatabase2]

AS

BEGIN

    SET NOCOUNT OFF
 

print 'Load File 1 ' + convert(varchar(30),getdate(),121)

    --load into Database.dbo.table1Working

    DECLARE @fileExists1 INT

    EXEC master.dbo.xp_fileexist '\\SERVERNAME\Database\file1.csv', @fileExists1 OUTPUT

    IF (@fileExists1 = 1)

    BEGIN

        BULK INSERT Database.dbo.table1Working 

        FROM '\\SERVERNAME\Database\file1.csv'

        WITH(FORMATFILE = 'D:\MSSQL\Tools\orders.fmt',FIELDTERMINATOR = ',',ROWTERMINATOR = '\r',FIRSTROW=2,TABLOCK,ROWS_PER_BATCH = 100000) 

    END

    ELSE PRINT 'file1.csv does not exist.'
 

print 'Check Internal Dupes Table1Working ' + convert(varchar(30),getdate(),121)

    --Check and flag any dupes... Internally first (keeping the most recent)

    update table1working set status_flag = 1

    where table1working.eventid <> (select max(d.eventid) from table1working d where d.orderno = table1working.orderno and d.mindex = table1working.mindex)
 

print 'Check Prior Loaded Dupes Table1Working ' + convert(varchar(30),getdate(),121)

    --Check and flag any dupes... Then Compare with previously loaded

    update table1working set status_flag = 1 

    from table1working 

    INNER JOIN table1 D ON D.OrderNo = table1working.OrderNo and D.MIndex = table1working.MIndex
 

print 'Insert From Table1Working ' + convert(varchar(30),getdate(),121)

    --Insert rows into Database.dbo.table1 from Database.dbo.table1Working where OrderNo,mindex doesn't already exist...

    --Set rowcount to 100000 to limit number of inserts per batch

    set rowcount 100000

    declare @rc1 int

    set @rc1 = 100000

    while @rc1 = 100000

    begin

        begin transaction

     

        INSERT INTO Database.dbo.table1 (EventTime,MsgType,MIndex,OrderNo,ClorID,ExchangeID,GWID,EndPoint,trader,acct,Client,Symbol,Price,Side,Quantity,TimeInForce,ExecInstruction)

        SELECT w.EventTime,w.MsgType,w.MIndex,w.OrderNo,w.ClorID,w.ExchangeID,w.GWID,w.EndPoint,w.trader,w.acct,w.Client, w.Symbol,w.Price,w.Side,w.Quantity,w.TimeInForce,w.ExecInstruction

        FROM Database.dbo.table1Working w WITH (NOLOCK) 

        LEFT JOIN Database.dbo.table1 o ON w.OrderNo = o.OrderNo and w.MIndex = o.MIndex

        WHERE o.OrderNo IS NULL and w.status_flag = 0
 

     --Get number of rows updated, batch process will continue until less than 100000

        select @rc1=@@rowcount

     --Commit the transaction

        commit

    end
 

print 'Load File 2 ' + convert(varchar(30),getdate(),121)

    --load into table2working

    DECLARE @fileExists32 INT

    EXEC master.dbo.xp_fileExist '\\SERVERNAME\Database\file2.csv',@fileExists32 OUTPUT

    IF (@fileExists32 = 1)

    BEGIN

        BULK INSERT Database.dbo.table2Working

        FROM '\\SERVERNAME\Database\file2.csv'

        WITH(FORMATFILE = 'D:\MSSQL\Tools\cancels.fmt',FIELDTERMINATOR = ',',ROWTERMINATOR = '\r',FIRSTROW=2,TABLOCK,ROWS_PER_BATCH = 100000)

    END

    ELSE PRINT 'file2.csv does not exist.'

    
 

print 'Check Internal Dupes Table2Working ' + convert(varchar(30),getdate(),121)

    --Check and flag any dupes... Internally first (keeping the most recent)

    update table2working set status_flag = 1

    where table2working.eventid <> (select max(d.eventid) from table2working d where d.orderno = table2working.orderno and d.mindex = table2working.mindex)
 

print 'Check Prior Loaded Dupes Table2Working ' + convert(varchar(30),getdate(),121)

    --Check and flag any dupes... Then Compare with previously loaded

    update table2working set status_flag = 1 

    from table2working 

    INNER JOIN table2 D ON D.OrderNo = table2working.OrderNo and D.MIndex = table2working.MIndex
 

print 'Insert From Table2Working ' + convert(varchar(30),getdate(),121)

    --Insert rows into table2 from table2Working where OrderNo, mindex doesn't already exist

    --Set rowcount to 100000 to limit number of inserts per batch

    set rowcount 100000

    declare @rc2 int

    set @rc2 = 100000

    while @rc2 = 100000

    begin

        begin transaction

        INSERT INTO Database.dbo.table2 (EventTime,MsgType,MIndex,OrderNo,ClorID,GWID)

        SELECT w.EventTime,w.MsgType,w.MIndex,w.OrderNo,w.ClorID,w.GWID

        FROM Database.dbo.table2Working w WITH (NOLOCK)

        LEFT JOIN Database.dbo.table2 o ON w.OrderNo = o.OrderNo and w.MIndex = o.MIndex

        WHERE o.OrderNo IS NULL and status_flag = 0

   

      --Get number of rows updated, batch process will continue until less than 100000

        select @rc2=@@rowcount

      --Commit the transaction

        commit transaction

    end
 

 --load auditing data into processlog
 

print 'load auditing data into processing ' + convert(varchar(30),getdate(),121)

    DECLARE @maxid1  int

    DECLARE @maxid2  int

    SELECT @maxid1 = IsNull (max(eventID), 0) from Database.dbo.table1

    SELECT @maxid2 = IsNull (max(eventID), 0) from Database.dbo.table2
 

    SET NOCOUNT ON
 

    INSERT Database.dbo.ProcessLog (EventIDStart,EventIdEnd,InsertTime)

    SELECT @maxid1 + 1, (select IsNull (max(EventID), 0)  from Database.dbo.table1),  getdate() 
 

    INSERT Database.dbo.ProcessLog (EventIDStart,EventIdEnd,InsertTime)

    SELECT @maxid2 + 1, (select IsNull (max(EventID), 0)  from Database.dbo.table2),  getdate() 
 

    SET NOCOUNT OFF
 

print 'Final Archive of raw data ' + convert(varchar(30),getdate(),121)

 --exec the bat file to rename/relocate the current .csv's.

    EXEC master..xp_cmdshell '\\SERVERNAME\D$\MSSQL\Tools\Database.bat', no_output
 
 

END

Open in new window

0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 500 total points
Comment Utility
OK, a couple of refinements...

Basically if we check just the several million rows in the new workingNtables for dupes, then we can avoid trying to resolve at time of insert where we  are joining to a dynamically growing table of considerably more rows... So, we can update the new Status_Flag to show dupes, and then ignore them for the insert. That update happens on two static tables so the join syntax will work a lot more efficiently...

The only other thing to consider is dropping the indexes, and there is a comment where that should / could happen either side of the insert. We want the PK index at time of updating the dupes for the join to work as best as possible.


CREATE PROC [dbo].[usp_loadDatabase2]

AS

BEGIN

    SET NOCOUNT OFF
 

print 'Load File 1 ' + convert(varchar(30),getdate(),121)

    --load into Database.dbo.table1Working

    DECLARE @fileExists1 INT

    EXEC master.dbo.xp_fileexist '\\SERVERNAME\Database\file1.csv', @fileExists1 OUTPUT

    IF (@fileExists1 = 1)

    BEGIN

        BULK INSERT Database.dbo.table1Working 

        FROM '\\SERVERNAME\Database\file1.csv'

        WITH(FORMATFILE = 'D:\MSSQL\Tools\orders.fmt',FIELDTERMINATOR = ',',ROWTERMINATOR = '\r',FIRSTROW=2,TABLOCK,ROWS_PER_BATCH = 100000) 

    END

    ELSE PRINT 'file1.csv does not exist.'
 

print 'Check Internal Dupes Table1Working ' + convert(varchar(30),getdate(),121)

    --Check and flag any dupes... Internally first (keeping the most recent)

    update table1working set status_flag = 1

    where table1working.eventid <> (select max(d.eventid) from table1working d where d.orderno = table1working.orderno and d.mindex = table1working.mindex)
 

print 'Check Prior Loaded Dupes Table1Working ' + convert(varchar(30),getdate(),121)

    --Check and flag any dupes... Then Compare with previously loaded

    update table1working set status_flag = 1 

    from table1working 

    INNER JOIN table1 D ON D.OrderNo = table1working.OrderNo and D.MIndex = table1working.MIndex
 

-------------------------------------------

/* drop indexes on table1 here if needed */

-------------------------------------------
 

print 'Insert From Table1Working ' + convert(varchar(30),getdate(),121)

    --Insert rows into Database.dbo.table1 from Database.dbo.table1Working where OrderNo,mindex doesn't already exist...

    --Set rowcount to 100000 to limit number of inserts per batch

    set rowcount 100000

    declare @rc1 int

    set @rc1 = 100000

    while @rc1 = 100000

    begin

        begin transaction

     

        INSERT INTO Database.dbo.table1 (EventTime,MsgType,MIndex,OrderNo,ClorID,ExchangeID,GWID,EndPoint,trader,acct,Client,Symbol,Price,Side,Quantity,TimeInForce,ExecInstruction)

        SELECT w.EventTime,w.MsgType,w.MIndex,w.OrderNo,w.ClorID,w.ExchangeID,w.GWID,w.EndPoint,w.trader,w.acct,w.Client, w.Symbol,w.Price,w.Side,w.Quantity,w.TimeInForce,w.ExecInstruction

        FROM Database.dbo.table1Working w WITH (NOLOCK) 

        WHERE w.status_flag = 0
 

     --Get number of rows updated, batch process will continue until less than 100000

        select @rc1=@@rowcount

     --Commit the transaction

        commit

    end
 

----------------------------------------------

/* Rebuild indexes on table1 here if needed */

----------------------------------------------

print 'Rebuild Table1 Indexes if needed ' + convert(varchar(30),getdate(),121)
 

    IF NOT EXISTS (SELECT name from sysindexes WHERE name = 'idx_table1_EventTime') 

       CREATE INDEX [idx_table1_EventTime] ON [dbo].[table1]([EventTime]) WITH  FILLFACTOR = 90 ON [Database_Data]

    IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'idx_table1_InsertTime')

       CREATE INDEX [idx_table1_InsertTime] ON [dbo].[table1]([InsertTime]) WITH  FILLFACTOR = 90 ON [Database_Data]

    IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'PK_table1_OrderNo_MIndex')

       ALTER TABLE [dbo].[table1] ADD CONSTRAINT [PK_table1_OrderNo_MIndex] PRIMARY KEY  NONCLUSTERED 

       (

            [OrderNo],

            [MIndex]

       ) WITH  FILLFACTOR = 90  ON [Database_Data] 
 
 

print 'Load File 2 ' + convert(varchar(30),getdate(),121)

    --load into table2working

    DECLARE @fileExists32 INT

    EXEC master.dbo.xp_fileExist '\\SERVERNAME\Database\file2.csv',@fileExists32 OUTPUT

    IF (@fileExists32 = 1)

    BEGIN

        BULK INSERT Database.dbo.table2Working

        FROM '\\SERVERNAME\Database\file2.csv'

        WITH(FORMATFILE = 'D:\MSSQL\Tools\cancels.fmt',FIELDTERMINATOR = ',',ROWTERMINATOR = '\r',FIRSTROW=2,TABLOCK,ROWS_PER_BATCH = 100000)

    END

    ELSE PRINT 'file2.csv does not exist.'

    
 

print 'Check Internal Dupes Table2Working ' + convert(varchar(30),getdate(),121)

    --Check and flag any dupes... Internally first (keeping the most recent)

    update table2working set status_flag = 1

    where table2working.eventid <> (select max(d.eventid) from table2working d where d.orderno = table2working.orderno and d.mindex = table2working.mindex)
 

print 'Check Prior Loaded Dupes Table2Working ' + convert(varchar(30),getdate(),121)

    --Check and flag any dupes... Then Compare with previously loaded

    update table2working set status_flag = 1 

    from table2working 

    INNER JOIN table2 D ON D.OrderNo = table2working.OrderNo and D.MIndex = table2working.MIndex
 

-------------------------------------------

/* drop indexes on table2 here if needed */

-------------------------------------------
 

print 'Insert From Table2Working ' + convert(varchar(30),getdate(),121)

    --Insert rows into table2 from table2Working where OrderNo, mindex doesn't already exist

    --Set rowcount to 100000 to limit number of inserts per batch

    set rowcount 100000

    declare @rc2 int

    set @rc2 = 100000

    while @rc2 = 100000

    begin

        begin transaction

        INSERT INTO Database.dbo.table2 (EventTime,MsgType,MIndex,OrderNo,ClorID,GWID)

        SELECT w.EventTime,w.MsgType,w.MIndex,w.OrderNo,w.ClorID,w.GWID

        FROM Database.dbo.table2Working w WITH (NOLOCK)

        WHERE status_flag = 0

   

      --Get number of rows updated, batch process will continue until less than 100000

        select @rc2=@@rowcount

      --Commit the transaction

        commit transaction

    end
 
 

----------------------------------------------

/* rebuild indexes on table2 here if needed */

----------------------------------------------

print 'Rebuild Table2 Indexes if needed ' + convert(varchar(30),getdate(),121)

    IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'idx_table2_EventTime')

       CREATE INDEX [idx_table2_EventTime] ON [dbo].[table2]([EventTime]) WITH  FILLFACTOR = 90 ON [Database_Data]

    IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'idx_table2_InsertTime')

       CREATE INDEX [idx_table2_InsertTime] ON [dbo].[table2]([InsertTime]) WITH  FILLFACTOR = 90 ON [Database_Data]

    IF NOT EXISTS (SELECT name FROM sysindexes WHERE name = 'PK_table2_OrderNo_MIndex')

       ALTER TABLE [dbo].[table2] ADD CONSTRAINT [PK_table2_OrderNo_MIndex] PRIMARY KEY  NONCLUSTERED 

       ( 

            [OrderNo],

            [MIndex]

       ) WITH  FILLFACTOR = 90  ON [Database_Data] 
 
 

print 'load auditing data into processing ' + convert(varchar(30),getdate(),121)

    --load auditing data into processlog

    DECLARE @maxid1  int

    DECLARE @maxid2  int

    SELECT @maxid1 = IsNull (max(eventID), 0) from Database.dbo.table1

    SELECT @maxid2 = IsNull (max(eventID), 0) from Database.dbo.table2
 

    SET NOCOUNT ON
 

    INSERT Database.dbo.ProcessLog (EventIDStart,EventIdEnd,InsertTime)

    SELECT @maxid1 + 1, (select IsNull (max(EventID), 0)  from Database.dbo.table1),  getdate() 
 

    INSERT Database.dbo.ProcessLog (EventIDStart,EventIdEnd,InsertTime)

    SELECT @maxid2 + 1, (select IsNull (max(EventID), 0)  from Database.dbo.table2),  getdate() 
 

    SET NOCOUNT OFF
 

print 'Final Archive of raw data ' + convert(varchar(30),getdate(),121)

 --exec the bat file to rename/relocate the current .csv's.

    EXEC master..xp_cmdshell '\\SERVERNAME\D$\MSSQL\Tools\Database.bat', no_output
 
 

END

Open in new window

0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Oh, and then the violations are easily found using:

 select * from working1table where status_flag = 1

or

 select * from working2table where status_flag = 1
0
 
LVL 17

Author Comment

by:dbaSQL
Comment Utility
Fabulous!  The refined version is the one I've used.  Very accurate results, very acceptable runtim.

I have ran several times with small files and very large files (6.3M+ records) ---
None of my tests have run greater than 18 minutes (way down from the hour plus I was doing before).  None of the pk violations cause failure -- they are simply flagged and worked around with ease -- and, they are retained for review in the working tables, such that the parties that be are (hopefully) able to  correct, or at least prevent them from recurring.

Possibly I was overindexed before -- maybe that (among other things) was upping my runtime, hard to say.  What I've got now is beautiful.  I have run it on three different servers w/out error.

excellent, mark wills.  I am very grateful for your time and your input, and your expertly-advice.



0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
This article explains all about SQL Server Piecemeal Restore with examples in step by step manner.
Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now