Link to home
Start Free TrialLog in
Avatar of treehouse2008
treehouse2008

asked on

How to select a large amount of records in SQL SERVER

Select records from a table with 1.2 million records.  The returned records are 1.1 million.  
If usting the normal "Select * from table1 where fields1 = xxx", it takes about 9 seconds which is too long for the application.
I have used index on "field1".
Is there any way to improve the performance? The ideal speed bettter be within 1 second.
Avatar of aibusinesssolutions
aibusinesssolutions
Flag of United States of America image

If you use Select [field1], [field2], [field3], Instead of Select *, it will speed it up quite a bit.

Also, are there any binary or image columns in the table?  Are you connecting to a local server or a remote one?
Avatar of treehouse2008
treehouse2008

ASKER

Actually I only need one field, that is: "select field1 from table 1 where fields1 = xxx".
This field is int.
The server and my machine are in the same LAN.
There is binary field in the table, but I do not need to select them. I only need one field which is int and is the primary key.
Ok, if you had select * before, then changing that to select field1 would speed it up drastically.

Also, what type of data field is the field you are querying?
Can you please post your exact query ?

As a primary suggestion, following query will help for speedy execution
select field1,field2 from table1(Nolock) where field1='xxx'
In this case, I always only select 1 field, which is "int" type and it's primary key.
Avatar of Sharath S
Is the column in SELECT clause and WHERE clause same? If yes, you will be ended with xxx in the query result as many times as it is repeated in your table.
select field1 from table 1 where fields1 = xxx
My exact query is simple:
Select field1 from table1 with nolock  where field1 = 'xxx'
The table has about 1.2 million records.
The returned records are about 1.1 million.
 
Sorry. The exact query should be:
Select field1 from table1 with nolock  where field2 = 'xxx'
The SELECT column is not the same as WHERE field
 
Ok, what type of data field is field2?  Is it indexed?
I did index on the field2 (where clause)
field2 is also "int" type.
My point is if you run that query, you will get the below result.
field1
-------
xxx
xxx
xxx
....
1.1 million times
where are you using this value? You know that the value is xxx but you only want to know how many times it is repeated in your table.
you can try like this.
select count(field1) field1_count from table1 with nolock where field1 = xxx.
i guess xxx is numeric value. so i didn't enclose it in single quotes. The above query will give you the number of times you have xxx value in field1 in your table. you can use that value further in your application.
please ignore my previous post ( i overlooked as both columns are same in SELECT and WHERE clause)
My query is:
select field1 from table1 with nolock where field2 = xxx.
The "field1" in Select clause is not the same as the "field2" in where clause.
I made a mistake when I first posted my query. sorry for that.

Select field1 from table1 with nolock  where field2 = xxx
add a non-clustered index on field2 and your query perfofrmance will drastically change.
I did add non-clustered index on field2
SQL Server provides a wonderful facility known as the Index Tuning Wizard which greatly enhances the index selection process. To use this tool, first use SQL Profiler to capture a trace of the activity for which you wish to optimize performance. You may wish to run the trace for an extended period of time to capture a wide range of activity. Then, using Enterprise Manager, start the Index Tuning Wizard and instruct it to recommend indexes based upon the captured trace. It will not only suggest appropriate columns for queries but also provide you with an estimate of the performance increase youll experience after making those changes!.
>> I did add non-clustered index on field2
what is the query run time now?
Thank you for your advice. But in my case, only two fields are involved.  Both of the two fields have index.
The 9 seconds in my original question is the query run time after I added non-cluster index on field2.
Does it take 9 seconds if running the query in the sql profiler? or are you pulling the data in to some other software?
I run it in Management Studio. I used profiler to trace it. Profile also shows the same query run time.
ASKER CERTIFIED SOLUTION
Avatar of Sharath S
Sharath S
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Can some kind of temp table help?
You would still need to select into the temp table, what are you wanting to do exactly?

I can't see needing to return 1.1 million records over and over.
As to why need to return 1.1 milltion, it's kind of too complicated to explain in a few words here. It's related to the whole application.
I suggest using some sort of pagination then, if you are going to display result sets in your application. Storing 1.1 million records worth of data in a variable in an application wouldn't be good at all for memory usage.
>> As to why need to return 1.1 milltion, it's kind of too complicated to explain in a few words here. It's related to the whole application.
Thats upto you and your application. But i think that is the best time for your query. Creating temp table won't solve your problem as it may take extra seconds for execution.
My task is only part of a whole project. My task is only how to get the 1.1 millon in good performance. Other things are beyong my scope.
Thank both of you. Yeah, the best solution may be not to retrieve the 1.1 million records.
If there are 1.2 million records, and your query "where field2=xxx" returns 1.1 million, does that mean they all have the same value for field2 except for the other .1 million?

And yes, there is no other way to increase the speed, besides increasing processing power and the amount of RAM in your machine.  1.1 million records uses a lot of overhead.
I do have a suggestion..  if you just need the IDs for those 1.1 million records, you could write a stored procedure to put the IDs in a table as a single comma delimited string.  You could then query that single string as one variable, and split the data into an array in your application.
The problem is that before I "put the IDs in a table as a single comma delimited string",  I need to get the 1.1 million first.
yes, i understood.  Putting 1.1 million IDs in a comma delimited string is not a good idea.
>>The problem is that before I "put the IDs in a table as a single comma delimited string",  I need to get the 1.1 million first.

It may or may not run faster, I'm not sure, you can try it out and see how fast it is.
DECLARE @IdCSV VARCHAR(MAX)
SELECT @IdCSV = COALESCE(@IdCSV + ',','') + CAST(field AS VARCHAR)
FROM table1
WHERE field2 = xxx
 
SELECT @IdCSV

Open in new window

@Sharath:  Pulling 1 string instead of 1.1 million records through a SELECT command may be faster, it was just a suggestion.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
aibusinesssol& -  i don't think so. Asker can run both the queries and post the query times.
Would be inclined to create a covering index on field2 for field1 :

create nonclustered index idx_mytable_field2 on mytable
(field2)
include
(field1);

Then the "select field1 from mytable with (nolock) where field2 = 'xxx' " should be completely handled within the index and not needing to go to the underlying data.

Going to be hard for any user to do anything meaning with 1.1 million rows - are there any dupes ? maybe "select distinct field1 from mytable with (nolock) where field2 = 'xxx' " would work better in that case.

if there are 1.1 million rows for field2, then how big is the dataset / table, and is field2 reasonably unique ? Is there anything else that might help such as a period of time ?

Hang on, the table has 1.2 million rows and "where field2 = 'xxx' " returns 1.1 million ?  

Gosh batman, or Holy Auto Scan, field2 is not a good candidate for an index...

Can you show the structure of the table ?
I run
SELECT @IdCSV = COALESCE(@IdCSV + ',','') + CAST(field AS VARCHAR)
FROM table1 WHERE field2 = xxx
It took more than 1 minute (I canceled query after 1 minute passed).
Also, VARCHAR(MAX) only can handle at most 8000 bytes.
No varchar(MAX) can handle 2 gig - you can only see 8000 (ssms / query view problem, and some functions)

That is a fair bit different to : select field1 from mytable where field2 = xxx

convert to varchar will slow it down, coalesce will slow it down even more so.

mark_wills:
You are right. field2 is not a good candidate for an index. And I did create some indexes which combined field2 with other fields. It help a little. The 9 seconds is the time ater I added those combined indexes.
Field1 has no dupe. It's the primary key.
The 1.1 million results are just going to be put in a table variable, which will be used by other modules.
Field2 is not unique. Among the 1.2 million records, 1.1 million of them have the same filed2. That is what I want to get.
There are many fields in the table. But what I care is just field1 and field2.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Wow, it's amazing.  It only took about 1.5 seconds. Cann't believe it.
mark_wills:   Thank you.
Could you please  give me a script on how to put the result XML into a table variable? Thanks.
OK, so to get a bit more like a csv...

select (select ','+convert(varchar,[task]) from tablea for xml path('')) as idcsv

you will need to tidy up either the leading (or if you prefer, the trailing) comma. Sometimes, get the top 1 and put that in front and then the rest as the concatenated string, or, leave it at the end and let it be NULL, or substring it, etc...

mark_wills:
I run select (select ','+convert(varchar,[task]) from tablea for xml path('')) as idcsv", now it took more than 3 seconds.
Besides, I want to put the 1.1 million results in a table variable, like:
Declare @temp Table (field1 Int Primary Key).
That is, @temp will have 1.1 million records.
 
Thanks.
 
 
 
OK, then there is no need to concatenate, and it is no longer a simple query, we now need to insert... And that will take a lot longer than 1.5 seconds


declare @tbl table (field1 int)         -- it is a memory table - no need for a primary key ? !
insert @tbl
select field1 from t_million_rows t where t.field2 = 2
Bit puzzled...

first was a select,
then was a concatenation
now is an insert...

Kinf of an interesting application you got going there...
declare @tbl table (field1 int)        
insert @tbl
select field1 from t_million_rows t where t.field2 = 2
This is what I originally did. It took about 9 seconds. The time is the same for @tbl with primary key definition or not.


Do not think it is going to happen any faster - you are doing row-wise inserts.

t_million_rows for me has two million rows and it takes around 17 seconds - on a laptop.

We had one question here in ee where they needed to do two files, one 6 million and the other 2 million in a bulk insert - all happened in less than 30 seconds - it was a big machine.

So, it kind of indicates that 9 seconds would seem to be reasonable.

Is there a more lateral solution ? like getting rid of the 100,000 rows - rather - moving them somewhere else ?
Have you tried the covering index or indexed view solution?  Either of those creates a very narrow structure that your query can access in lieu of the larger table.   Let's say rows in your table are 80 bytes.  Leaf rows in the index will be 8 bytes.  That reduces I/O by a factor of 10!  
Think that covering index is still a good thing to try - so long as the query optimiser doesn't step in and decide a table scan is better. In 2008 at least we can tell it what index, but in 2005 have to do the plan thingy - never had too much joy with that...
>so long as the query optimiser doesn't step in and decide a table scan is better.

That's true and partly why I prefer to try an indexed view.  I doubt the optimizer would consider a table scan when all of the required columns are materialized in the view.
good call - might be worthwhile...
Interesting, but I don't see how even a materialized view would help much in this situation.

I wouldn't think reading 1.1M rows is going to be that much faster than reading 1.2M rows by ANY method, since the difference in rows is less than 10%.
Hi,

First of all scan you table for unused indexes and drop it.

Same as above use the fields in place of *.

Thanks