Link to home
Start Free TrialLog in
Avatar of rjorge

asked on

Can I create an index on a text field?

I have a database query that is extremely slow. In truth I have to make a query on a text field which is not a key. Below is the create statement for the table.

CREATE TABLE [dbo].[myTable](
	[var1] [varchar](20) NOT NULL,
	[va2r] [int] NULL,
	[var3] [float] NULL,
	[var4] [float] NULL,
	[var5] [varchar](32) NULL,
	[var6] [varchar](32) NULL,
	[b][var7[/b]] [varchar](64) NULL,
	[var8] [float] NULL,
	[var9] [varchar](10) NULL,
	[var10] [varchar](200) NULL,
	[var11] [varchar](15) NULL,
	[var12] [int] NULL

Open in new window

I perform my query using CRecord set. I need to perform a query the query below.
"Select * from myTable where var7 LIKE %pattern% and var1 > intval1 and val1 < intval2". In the code it looks a bit like this:

CString Filter.Format("var7 LIKE '%%%s%%' AND (val2 > %d) AND (val1 <= %d)", 
						pattern, intval1, intval2);
		m_log.m_strFilter = Filter;

Open in new window

Can you tell me how can I optimize the table in order to make my query faster? I'm open to ideas such as adding indexes but I don't know how best to do it. Please help me.
Avatar of Ephraim Wangoya
Ephraim Wangoya
Flag of United States of America image

Add indexes to fields that you are going to use for searching frequently in this case var7
Avatar of phoffric

In a more complicated scenario, I was tasked to improve the speed of an hourly report that took three days (actually I had predicted a stress test disaster, but no one believed me until 6 months later when the product was deployed). The constraint was that I could not add indexes to the tables. Reason was that these tables were fed with continuous high speed real-time data, and adding indexes would naturally cause a slow-down to the point that the queues would back-up and data would be lost.

Given this restraint, I analyzed the query plan and changed the one sql query into a dozen in one transaction. The first step was to use the primary key in the base table to push the selected data into another temporary table (with 1 or 2 keys). Now this temporary table was 1/100th the size of the original table, so already there is an improvement.

I created other tables, sometimes just a table with only two columns with keys having a small number of records. This table was frequently used and would be cached. In general I focussed on trying to break down the problem so that I could cache the most likely data.

To do this requires some custom knowlege of the nature of the data; like what is the likelihood of a field being contained within some interval, and the relationship between columns.

When you analyze the data, and the flow of the sql engine, you can then see better why a query takes so long. I got that report down to 20 minutes on the hour, and this was done while the new data was flowing in.
>>Can you tell me how can I optimize the table in order to make my query faster? I'm open to ideas such as adding indexes but I don't know how best to do it. <<
While you can add an index on var7 it will make zero difference, as you are using a LIKE condition with a wildcard in the front.  In other words the following condition cannot use an index:
WHERE var7 LIKE %pattern%

Here is what you can do:
First read up as much as you can on Full-Text Search.  If you feel it is something that you can use, than go ahead and set it up in a test environment.  Once it is setup and the catalog is populated you will find the following in general far more efficient:
WHERE CONTAINS(var7, '"pattern*"')

Notice that there is no support for a leading wildcard.  This may not be as bad as you think.  Read up, and if you have any question let us know.
Now, if you know that the filter "(val2 > %d) AND (val1 <= %d)" will reduce the resultant set significantly, and if you have an index on val2 or val1 (use the index which produces the least number of rows; and if you known that val2 is unique, then make it a unique index), then you could create a temporary table, TEMP2, consisting of just val1, val2, and var7.

Then apply the single filter "var7 LIKE '%%%s%%' on TEMP2 to yield TEMP3 (same structure as TEMP3). Or you could delete TEMP2 rows that does not meet the var7 criteria, but that could actually be a more expensive operation than creating TEMP3 with the single filter.

Since the above operations is (hopefully) on a limited set of rows, the pattern filter will not take too long.

Then just select * from original table using join of original table with TEMP3 setting val2 = val2 and val1 = val1 and var7 = var7 on the two tables.
Avatar of rjorge


@phoffric and all.
 I made a small typo: In the code part it should be

Select * from myTable where var7 LIKE %pattern% and var2 > intval1 and var2 < intval2".

Which also means the filter for the filter for the CRecordset woudl change to

CString Filter.Format("var7 LIKE '%%%s%%' AND (var2 > %d) AND (var2 <= %d)", 
						pattern, intval1, intval2);

Open in new window

Looking at your suggestions it does look that except for phoffric, my typo didn't affect your suggestions. But feel free to comment if itchange syour suggestion.

I need a day to go over your suggestions, as I feel the solution is already in your answers.


Avatar of phoffric

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
the var7 text field is maximum 64 character. so it doesn't seem to be really free text. if it is more a list of keywords for search purpose you should think on making a second table with only two fields <keyword> and <var1> where the first gets an index and the second is assumed to be the primary key of mytable. when you store a record to mytable you would parse the var7 before and additionally store all (key)words to the second table.

your select statement would turn to

  select a.var1,a.var2, ..., a.var12 from mytable a, mykeys b
  where   b.key = '%pattern%'      
         and a.var1 = b.var1
         and a.var2 between %intval1% and %intval2%

the above would much faster than any wildcard search given that pattern is a constrictive criteria and var1 is primary key of mytable.
Avatar of rjorge



Sorry if the question is too basic, but can you tell how could I create the temporary Table TEMP2?

What performance change occurred (for better or for worse) when you used the filter described in http:#34868728 as well as adding an index on var2? I was hoping that this would improve your performance.

What was the timing before and after you made these changes?

How many rows are in myTable, and how many rows does the where var2 BETWEEN intval1 and intval2 typically return for your typical values of intval1 and intval2?
The rules for creating a temp table may change depending upon which database you use. Here is a Temporary Tables link and a CREATE TEMPORARY TABLE tmp link that shows how to do it.
Avatar of rjorge



Unfortunately I wa snot able to try your previous comments. The target DB my client provided me is bigger than my available space. I'll provide more feedback in a couple of hours. I'll try the index and put my feedback here in a couple of hours.
Ok, I'll check back tonight to see if I can help. In the meantime, here are some questions that may prove helpful for future answers.

What database are you using - SQL Server 2005 or MS SQL Server or ?

How long does the current query take without changes? What is the max time the query should take?

What indexes/keys do you already have on myTable?

Are there any performance related issues re: insertion/deletion into myTable, as adding keys/indexes will degrade that performance?

If you can provide the create sql for myTable, that may prove helpful.

I expect that adding an index on var2 will be sufficient to improve your performance. If var2 is unique, you can make it a Primary Key.

If foreign keys are not permitted on your temporary tables, then it may be beneficial to create a permanent auxiliary temp table with foreign key to myTable.
Avatar of rjorge


Ok, a couple of days later than promised but here's my feedback.

The table that I described on my first comment already had a index (non-clustered) composed (var1, var2, var7) all ascending. I've added an  (non-clustered) index on var2.
 Running the query now resulted in nearly a 10x faster query.
It's truly great.
Avatar of rjorge


Really good follow-up. A true wealth of expertise in this crowd.