Solved

Can I create an index on a text field?

Posted on 2011-02-10
15
383 Views
Last Modified: 2012-06-21
Hi,
I have a database query that is extremely slow. In truth I have to make a query on a text field which is not a key. Below is the create statement for the table.

CREATE TABLE [dbo].[myTable](
	[var1] [varchar](20) NOT NULL,
	[va2r] [int] NULL,
	[var3] [float] NULL,
	[var4] [float] NULL,
	[var5] [varchar](32) NULL,
	[var6] [varchar](32) NULL,
	[b][var7[/b]] [varchar](64) NULL,
	[var8] [float] NULL,
	[var9] [varchar](10) NULL,
	[var10] [varchar](200) NULL,
	[var11] [varchar](15) NULL,
	[var12] [int] NULL
) ON [PRIMARY]

Open in new window


I perform my query using CRecord set. I need to perform a query the query below.
"Select * from myTable where var7 LIKE %pattern% and var1 > intval1 and val1 < intval2". In the code it looks a bit like this:

		
CString Filter.Format("var7 LIKE '%%%s%%' AND (val2 > %d) AND (val1 <= %d)", 
						pattern, intval1, intval2);
	
		m_log.m_strFilter = Filter;

Open in new window


Can you tell me how can I optimize the table in order to make my query faster? I'm open to ideas such as adding indexes but I don't know how best to do it. Please help me.
0
Comment
Question by:rjorge
  • 6
  • 5
  • 2
  • +2
15 Comments
 
LVL 32

Expert Comment

by:ewangoya
ID: 34866873
Add indexes to fields that you are going to use for searching frequently in this case var7
0
 
LVL 32

Expert Comment

by:phoffric
ID: 34867333
In a more complicated scenario, I was tasked to improve the speed of an hourly report that took three days (actually I had predicted a stress test disaster, but no one believed me until 6 months later when the product was deployed). The constraint was that I could not add indexes to the tables. Reason was that these tables were fed with continuous high speed real-time data, and adding indexes would naturally cause a slow-down to the point that the queues would back-up and data would be lost.

Given this restraint, I analyzed the query plan and changed the one sql query into a dozen in one transaction. The first step was to use the primary key in the base table to push the selected data into another temporary table (with 1 or 2 keys). Now this temporary table was 1/100th the size of the original table, so already there is an improvement.

I created other tables, sometimes just a table with only two columns with keys having a small number of records. This table was frequently used and would be cached. In general I focussed on trying to break down the problem so that I could cache the most likely data.

To do this requires some custom knowlege of the nature of the data; like what is the likelihood of a field being contained within some interval, and the relationship between columns.

When you analyze the data, and the flow of the sql engine, you can then see better why a query takes so long. I got that report down to 20 minutes on the hour, and this was done while the new data was flowing in.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 34867820
>>Can you tell me how can I optimize the table in order to make my query faster? I'm open to ideas such as adding indexes but I don't know how best to do it. <<
While you can add an index on var7 it will make zero difference, as you are using a LIKE condition with a wildcard in the front.  In other words the following condition cannot use an index:
WHERE var7 LIKE %pattern%

0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 34867840
Here is what you can do:
First read up as much as you can on Full-Text Search.  If you feel it is something that you can use, than go ahead and set it up in a test environment.  Once it is setup and the catalog is populated you will find the following in general far more efficient:
WHERE CONTAINS(var7, '"pattern*"')

Notice that there is no support for a leading wildcard.  This may not be as bad as you think.  Read up, and if you have any question let us know.
0
 
LVL 32

Expert Comment

by:phoffric
ID: 34867894
Now, if you know that the filter "(val2 > %d) AND (val1 <= %d)" will reduce the resultant set significantly, and if you have an index on val2 or val1 (use the index which produces the least number of rows; and if you known that val2 is unique, then make it a unique index), then you could create a temporary table, TEMP2, consisting of just val1, val2, and var7.

Then apply the single filter "var7 LIKE '%%%s%%' on TEMP2 to yield TEMP3 (same structure as TEMP3). Or you could delete TEMP2 rows that does not meet the var7 criteria, but that could actually be a more expensive operation than creating TEMP3 with the single filter.

Since the above operations is (hopefully) on a limited set of rows, the pattern filter will not take too long.

Then just select * from original table using join of original table with TEMP3 setting val2 = val2 and val1 = val1 and var7 = var7 on the two tables.
0
 

Author Comment

by:rjorge
ID: 34868537
@phoffric and all.
 I made a small typo: In the code part it should be

Select * from myTable where var7 LIKE %pattern% and var2 > intval1 and var2 < intval2".

Which also means the filter for the filter for the CRecordset woudl change to

CString Filter.Format("var7 LIKE '%%%s%%' AND (var2 > %d) AND (var2 <= %d)", 
						pattern, intval1, intval2);

Open in new window


Looking at your suggestions it does look that except for phoffric, my typo didn't affect your suggestions. But feel free to comment if itchange syour suggestion.

I need a day to go over your suggestions, as I feel the solution is already in your answers.

Thanks,
Rui



0
 
LVL 32

Accepted Solution

by:
phoffric earned 500 total points
ID: 34868728
Your last query makes better sense. I was thinking about asking about your OP but thought you had copied and pasted. If you change to the following, you may see slightly better performance:
Select * from myTable where var2 BETWEEN intval1 and intval2 AND var7 LIKE %pattern%

Open in new window

Even if there is no performance improvement (thanks to a good sql planner/optimizer), I think the intention of the filter is better communicated.

I would consider adding an index on var2 and would expect to see performance improvements.

You can also try creating a temporary table, TEMP2 (consisting of just var2 and var7), copied from myTable just using the filter:
       var2 between intval1 and intval2

The data in this table is very narrow, and if the result set is not large, then TEMP2 will be entirely cached.

You could then select from myTable joined with TEMP2's var2 and var7 using the additional filter that TEMP2.var7 LIKE %pattern% (rather than myTable.var7 LIKE %pattern%)
0
Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

 
LVL 32

Expert Comment

by:sarabande
ID: 34869427
the var7 text field is maximum 64 character. so it doesn't seem to be really free text. if it is more a list of keywords for search purpose you should think on making a second table with only two fields <keyword> and <var1> where the first gets an index and the second is assumed to be the primary key of mytable. when you store a record to mytable you would parse the var7 before and additionally store all (key)words to the second table.

your select statement would turn to

  select a.var1,a.var2, ..., a.var12 from mytable a, mykeys b
  where   b.key = '%pattern%'      
         and a.var1 = b.var1
         and a.var2 between %intval1% and %intval2%

the above would much faster than any wildcard search given that pattern is a constrictive criteria and var1 is primary key of mytable.
 
Sara              
0
 

Author Comment

by:rjorge
ID: 34878147
@phoffric

Sorry if the question is too basic, but can you tell how could I create the temporary Table TEMP2?


Regards,
R
0
 
LVL 32

Expert Comment

by:phoffric
ID: 34878237
What performance change occurred (for better or for worse) when you used the filter described in http:#34868728 as well as adding an index on var2? I was hoping that this would improve your performance.

What was the timing before and after you made these changes?

How many rows are in myTable, and how many rows does the where var2 BETWEEN intval1 and intval2 typically return for your typical values of intval1 and intval2?
0
 
LVL 32

Expert Comment

by:phoffric
ID: 34878280
The rules for creating a temp table may change depending upon which database you use. Here is a Temporary Tables link and a CREATE TEMPORARY TABLE tmp link that shows how to do it.
0
 

Author Comment

by:rjorge
ID: 34878494
@phoffric

Unfortunately I wa snot able to try your previous comments. The target DB my client provided me is bigger than my available space. I'll provide more feedback in a couple of hours. I'll try the index and put my feedback here in a couple of hours.
0
 
LVL 32

Expert Comment

by:phoffric
ID: 34878723
Ok, I'll check back tonight to see if I can help. In the meantime, here are some questions that may prove helpful for future answers.

What database are you using - SQL Server 2005 or MS SQL Server or ?

How long does the current query take without changes? What is the max time the query should take?

What indexes/keys do you already have on myTable?

Are there any performance related issues re: insertion/deletion into myTable, as adding keys/indexes will degrade that performance?

If you can provide the create sql for myTable, that may prove helpful.

I expect that adding an index on var2 will be sufficient to improve your performance. If var2 is unique, you can make it a Primary Key.

If foreign keys are not permitted on your temporary tables, then it may be beneficial to create a permanent auxiliary temp table with foreign key to myTable.
0
 

Author Comment

by:rjorge
ID: 34892605
Ok, a couple of days later than promised but here's my feedback.

The table that I described on my first comment already had a index (non-clustered) composed (var1, var2, var7) all ascending. I've added an  (non-clustered) index on var2.
 Running the query now resulted in nearly a 10x faster query.
It's truly great.
0
 

Author Closing Comment

by:rjorge
ID: 34892625
Really good follow-up. A true wealth of expertise in this crowd.
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

In this article I will describe the Copy Database Wizard method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Viewers will learn how the fundamental information of how to create a table.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now