Link to home
Start Free TrialLog in
Avatar of rrollinsny
rrollinsnyFlag for United States of America

asked on

Very Fast Sproc that will updated recrods

I have a tmptable that I import a txt file into. This table may be from 20,000 to 2 million records, this varies by day.  What I need to do is flag a flag field on the records in this order :  1.) Check for Duplicate records that came in, 2.) Check a field for a certain value.  This is actually just the first 3 digits of the field.  I am testing to make sure the phone number is a valid US areacode.  The field is simply the 10 digit phone number., 3.) Check to make sure the records that made it this far without being flagged are not in another table. 4.)  Serialize the records. (I have an SP to do this currently).

The tmptable has about 173 fields in it, but the two important ones are serialid(this is the key field for the master table I append this data into later) and ophone(this is the phone number field which I need to pass the 4 steps above.

tmptable
serialid number (15)
ophone nvarchar(10)

tbl2 (Contains about 1 million distinct phone numbers)
phonenumber nvarchar(10)

The serialize SP is called Seializ.  It simply gets the last serial number in the master table and starts with the next number updating each serialid with a number until it hits the eof.

I would like to be able to call this from a VB app that I run.  Currently I am using ADO to do all this and it is SLOOOOW!!  Any help will be Great!!!  Thanks in advance!!

ASKER CERTIFIED SOLUTION
Avatar of rafrancisco
rafrancisco

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of rrollinsny

ASKER

rafrancisco, thanks for the quick response.  I am looking more for a one shot deal on this.  I like your option, but does moving or copying this data around so much cause a time delay?  I am just wanting to flag the records as I run through the records as I thought this would be faster than moving the data around so much.
Avatar of rafrancisco
rafrancisco

Based on experience, creating new tables is faster that updating the same table multiple times.  As records are being updated, the SQL table becomes fragmented and updates become slower.

Steps 2 and 3 can actually be combined together so that you will only have 2 extra tables created:

SELECT IDENTITY(1,1,INT) AS SerialNumber, *
INTO YourNewTable2
FROM YourNewTable A
WHERE EXISTS (SELECT 'X' FROM AreaCodeTable B WHERE A.AreaCode = LEFT(A.[OPhone], 3)) AND
           NOT EXISTS (SELECT 'X' FROM AllPhoneNumbers C WHERE A.[OPhone] = C.[PhoneNumber])
When I tested this (just the first part) I had the following results:
1.)  I tested the import and distinct on 600,000 records in the table.  These 600,000 were simply 20 duplicates of 30,000 unique records.  It ran in about 28 seconds.  All I did was run your statement above and then drop the orginal table, and move the distinct data back using the same statement but this time into the orginal statement.  Here is the code for that:

SELECT DISTINCT * INTO newtmp FROM newimport
go
exec ('drop table newimport')
SELECT DISTINCT * INTO newimport FROM newtmp
exec ('drop table newtmp')

When I ran this exact same code on teh table that had about 374,000 unique records all duplicated once, it took nearly 5 minutes.

Anything I am doing wrong here?  In a production enviroment I would not be moving the data back to the first table as it is really not needed.  I am simply testing the speed of procedure right now.  
Sorry for the delay.  I don't see anything wrong.  The second query that you ran took longer because it processed more records, which is about 187,000 records.  The processing time for the first table you processed will not be in proportion with the second table, meaning just because it took 28 seconds to process 30,000 unique records, it will take roughly 3 minutes to process 187,000 unique records.

Also the time difference in your testing is not dependent on the total number of records being processed but by the total number of UNIQUE records.

I hope you are making progress in this. and I hope this helps.
Thanks for all your Help.  I have everything working great now.  It is much faster than ADO!!!  I may post another question soon, as I now need to track counts for all the records i am "removing" by getting the unique records.