Fastest way to copy rows from one table to another

I have a database with a table that has 8 million rows in it, table was badly designed with no primary key not an identity column, I have some dat values that are entered into a varchar column, I need to query this table for records between certain years. The entries in the db at this point for the datatime are entered as a varchar with the following format 'yyyyMMddhhmm: no seconds and no seperators '200909301725' instead of '2009-09-30 17:25",
My question is how can I insert a primart key, and an indentity column and convert the values of what is supposed to be datetime ti an actual datetime and what do I need to index inorder to have the query exccecute fast enough when having a condition on a year where date>'2009-01-01T00:00:00' , pleasde note that each year has about 500,000 records that will be retrieved.

Thank you
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

i would use DTS to import to a new table that has an identity column, then use a function to convert the string to a date
xav056Author Commented:
should I not convert the strring to date before importing, I need to change the column type from varchar(20) to datetime, I do not thing a simple alter function would do the trick as the data in the column is not properly formatted,
First order of business:  backup your original table.

Second order of business: determine the best primary key and add it

   Alter table Yourtable add constraint YourConstraintName Primary Key (col1, col2,...)

Third order of business: add the new date column

   Alter table Yourtable add YourDateColumnName datetime

Update the date column:
   Update YourTable
       set YourDateColumnName = convert(DATETIME,expression,120)
   For expression, substitute an expression that converts your varchar date to this format "yyyy-mm-dd hh:mi:ss".  something like this:

   substring(yourdate,1,4) + '-' + substring(yourdate,5,2) + '-' + substring(yourdate,7,2) + ' ' + substring(yourdate,9,2) + ':' + substring(yourdate(11,2) + ':00'

Fourth order of business
    delete original date column
          alter table yourtable delete column yourdate

Fifth order of business
     add index on date
         create index YourIndexName on Yourtablename (YourDateColumnName)

Sixth order of business:
     add identity column: similar to step 3, except with identity specification.  But then again, do you really need an identity column?  If so, why?  And after determining the reason, add indexes and/or unique key to support it.



Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

you could create a second table with the same columns, but with an identity column, and the column as a date, rather than a varchar; then select into it while casting/converting the string to a date
>should I not convert the strring to date before importing, I need to change the column type from varchar(20) to datetime, I do not thing a simple alter function would do the trick as the data in the column is not properly formatted

You can convert from string to date WHILE importing or updating the new datetime column.  Alter will not work.  

The conversion from varchar to date requires the varchar to first be reformatted to one of the acceptable date-string formats.  See BOL for the CONVERT function. But given the format you are starting with, that is not too difficult.  (Now is time for prayers that your date formatting is consistent across all rows.  In my experience, there's a snowball's chance in hell that all your dates are expressed properly).  
xav056Author Commented:
I need to add the identity colimn to use it as a primary key, Ii do not really have a godd primary key to use in my table(not even a combination of columns)
the qusetion here how can I add the identity column the fastest way I have 8 million records and it seems that it is taking for ever to do,
same thing for populatinf the date, whats the fastest way to do it

Thank you
xav056Author Commented:
that was what I initially though off, but again its taking too much time to insert 8 million records using insert into table 1 (col1,col2)
select (col1,col2) from table 2

thank you
> I have 8 million records and it seems that it is taking for ever to do

   Is there some reason you need to do this all in one pass? It's not going to be instantaneous no matter how you  
   slice it.  When I work with extremely large tables, I break things up into smaller chunks.  So the logs, etc..
   don't get bogged down.  I set up a loop and inside it process chunks of data for some data determined range
    (1mo, 6mo, ... whatever makes the most sense).  Each iteration processes the current chunk, then repeats until
   everything is processed.  Not very elegant, but it's worked well for me in the past with this type of one-time

xav056Author Commented:
K I managed to transfer the data it actually took 110 minutes to copy the 8 million records from one table without an indentity column to one with an identity column, I also created new datecolumn as suggested by dqmq and did the conversion from varchar to date, that took 12 minutes to complete for the table,
Now I need help with setting up the indexes for querying the data fast,
I would like to select
SElect col1,col2,col3,col4,col5,col6,col7
from table
where date is between 'date1' and 'date2'

I added non clustered index on date assuming that date is col7
If I do select col7
from table
where date netween 'date1' and date2
I get result for 500,000 in 5 seconds which is not too bad
but If i include in my select statement more than the date col, 7 columns as in the sample above the quesry takes about 3 minutes, which is too much
How can I set up my indexes to get the query to run fast

Adding more columns in the select clause does NOT ordinarily impact performance (unless those columns are also used in the where clause).  I'd start by adding one column at a time until you find the culprit.  Then explore the nature of that column.

Also, you can show the execution plan to see what indexes are used by your query.
One more thing (a rant, I suppose).

A relation (table is a close approximation) is supposed to represent business assertions. In fact, every row should represent exactly one business assertion. That business assertion must be uniquely identifiable by a primary key.  If not, you are in BIG doo.  

Now, I think you know that, because of your initiatlve to add the identity column.  However, the identity column is totally meaningless in business terms and therefore really can't improve the business assertion represented by the row and really can't serve the intended purpose of the primary key. There are some good reasons for identity columns, but needing a primary key is usually not one of them.  After more than 15 years in the business, I cannot remember a situation where a unique key other than an identity column wasn't important.

In other words, if the rows in your table are not unique, then the table does not represent a business relation.  Arbtrarily tossing in a meaningless column to make the rows unqiue does not address that problem.

Now, I'll get off the soap box.
xav056Author Commented:
Not quite sure why the identity column is a bad idea,
Lets say I want to have foreign constraints
One table has 8 columns where 4 columsn combined together could be used as a primary key, When I want to make sure that another table would lookup against a certain primary key, my intuition would tell me to add a foreighn key constraint but if i have a coposite primary key IO would have to repeat the 4 columns from table 1 and insert them in table 2, would it not be better in such a scenario to insert an identity column in table 1 and reference that as foreign key in table 2?
Thank you
Let's be clear: the debate is not about identity columns, it's about surrogate keys.  The pros/cons of assigning the identity property to a surrogate key is another discussion entirely.

I didn't mean to imply that surrogate keys are bad; for sure they have their place.  One good reason for a surrogate key is if the natural key is to wide.  By that I mean, one column with too many characters or a composite key with too many columns.  One the one hand, joins are simpler and more efficient when they involve a single column.  On the other, joins are often unnecessary when columns of the parent table propogate to the child.  

But more importantly, natural keys convey a business rule.  If you replace a natural key with a surrogate key, then you lose that business rule.  I use surrogate keys when they make sense, but I NEVER drop the natural key when I do so.  Why? Because the natural key is important to the integrity of the data and helps the database faithfully represent the business.  In your case, absence of a natural key suggests the table design does not represent a well-defined, identifiable "thing" to the business.  Just throwing on a surrogate key does not change that.

I'm not opposing the surrogate key so much as encouraging you to discover and assert an alternate natural key.  If you do not do that, you are likely to eventually have bad data.  Either multiple rows for the same thing, or one row for multiple things, or a right row and a wrong row and the inability to tell them apart.

When you mentioned that your table did not have a set of columns that uniquely identify a row, red flags went up.  That's a concern indeed, but it's naive to think adding a surrogate key addresses it.

Do I make sense?  



It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.