Improve company productivity with a Business Account.Sign Up


Can I create an index on a text field?

Posted on 2011-02-10
Medium Priority
Last Modified: 2012-06-21
I have a database query that is extremely slow. In truth I have to make a query on a text field which is not a key. Below is the create statement for the table.

CREATE TABLE [dbo].[myTable](
	[var1] [varchar](20) NOT NULL,
	[va2r] [int] NULL,
	[var3] [float] NULL,
	[var4] [float] NULL,
	[var5] [varchar](32) NULL,
	[var6] [varchar](32) NULL,
	[b][var7[/b]] [varchar](64) NULL,
	[var8] [float] NULL,
	[var9] [varchar](10) NULL,
	[var10] [varchar](200) NULL,
	[var11] [varchar](15) NULL,
	[var12] [int] NULL

Open in new window

I perform my query using CRecord set. I need to perform a query the query below.
"Select * from myTable where var7 LIKE %pattern% and var1 > intval1 and val1 < intval2". In the code it looks a bit like this:

CString Filter.Format("var7 LIKE '%%%s%%' AND (val2 > %d) AND (val1 <= %d)", 
						pattern, intval1, intval2);
		m_log.m_strFilter = Filter;

Open in new window

Can you tell me how can I optimize the table in order to make my query faster? I'm open to ideas such as adding indexes but I don't know how best to do it. Please help me.
Question by:rjorge
  • 6
  • 5
  • 2
  • +2
LVL 32

Expert Comment

by:Ephraim Wangoya
ID: 34866873
Add indexes to fields that you are going to use for searching frequently in this case var7
LVL 33

Expert Comment

ID: 34867333
In a more complicated scenario, I was tasked to improve the speed of an hourly report that took three days (actually I had predicted a stress test disaster, but no one believed me until 6 months later when the product was deployed). The constraint was that I could not add indexes to the tables. Reason was that these tables were fed with continuous high speed real-time data, and adding indexes would naturally cause a slow-down to the point that the queues would back-up and data would be lost.

Given this restraint, I analyzed the query plan and changed the one sql query into a dozen in one transaction. The first step was to use the primary key in the base table to push the selected data into another temporary table (with 1 or 2 keys). Now this temporary table was 1/100th the size of the original table, so already there is an improvement.

I created other tables, sometimes just a table with only two columns with keys having a small number of records. This table was frequently used and would be cached. In general I focussed on trying to break down the problem so that I could cache the most likely data.

To do this requires some custom knowlege of the nature of the data; like what is the likelihood of a field being contained within some interval, and the relationship between columns.

When you analyze the data, and the flow of the sql engine, you can then see better why a query takes so long. I got that report down to 20 minutes on the hour, and this was done while the new data was flowing in.
LVL 75

Expert Comment

by:Anthony Perkins
ID: 34867820
>>Can you tell me how can I optimize the table in order to make my query faster? I'm open to ideas such as adding indexes but I don't know how best to do it. <<
While you can add an index on var7 it will make zero difference, as you are using a LIKE condition with a wildcard in the front.  In other words the following condition cannot use an index:
WHERE var7 LIKE %pattern%

What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

LVL 75

Expert Comment

by:Anthony Perkins
ID: 34867840
Here is what you can do:
First read up as much as you can on Full-Text Search.  If you feel it is something that you can use, than go ahead and set it up in a test environment.  Once it is setup and the catalog is populated you will find the following in general far more efficient:
WHERE CONTAINS(var7, '"pattern*"')

Notice that there is no support for a leading wildcard.  This may not be as bad as you think.  Read up, and if you have any question let us know.
LVL 33

Expert Comment

ID: 34867894
Now, if you know that the filter "(val2 > %d) AND (val1 <= %d)" will reduce the resultant set significantly, and if you have an index on val2 or val1 (use the index which produces the least number of rows; and if you known that val2 is unique, then make it a unique index), then you could create a temporary table, TEMP2, consisting of just val1, val2, and var7.

Then apply the single filter "var7 LIKE '%%%s%%' on TEMP2 to yield TEMP3 (same structure as TEMP3). Or you could delete TEMP2 rows that does not meet the var7 criteria, but that could actually be a more expensive operation than creating TEMP3 with the single filter.

Since the above operations is (hopefully) on a limited set of rows, the pattern filter will not take too long.

Then just select * from original table using join of original table with TEMP3 setting val2 = val2 and val1 = val1 and var7 = var7 on the two tables.

Author Comment

ID: 34868537
@phoffric and all.
 I made a small typo: In the code part it should be

Select * from myTable where var7 LIKE %pattern% and var2 > intval1 and var2 < intval2".

Which also means the filter for the filter for the CRecordset woudl change to

CString Filter.Format("var7 LIKE '%%%s%%' AND (var2 > %d) AND (var2 <= %d)", 
						pattern, intval1, intval2);

Open in new window

Looking at your suggestions it does look that except for phoffric, my typo didn't affect your suggestions. But feel free to comment if itchange syour suggestion.

I need a day to go over your suggestions, as I feel the solution is already in your answers.


LVL 33

Accepted Solution

phoffric earned 2000 total points
ID: 34868728
Your last query makes better sense. I was thinking about asking about your OP but thought you had copied and pasted. If you change to the following, you may see slightly better performance:
Select * from myTable where var2 BETWEEN intval1 and intval2 AND var7 LIKE %pattern%

Open in new window

Even if there is no performance improvement (thanks to a good sql planner/optimizer), I think the intention of the filter is better communicated.

I would consider adding an index on var2 and would expect to see performance improvements.

You can also try creating a temporary table, TEMP2 (consisting of just var2 and var7), copied from myTable just using the filter:
       var2 between intval1 and intval2

The data in this table is very narrow, and if the result set is not large, then TEMP2 will be entirely cached.

You could then select from myTable joined with TEMP2's var2 and var7 using the additional filter that TEMP2.var7 LIKE %pattern% (rather than myTable.var7 LIKE %pattern%)
LVL 35

Expert Comment

ID: 34869427
the var7 text field is maximum 64 character. so it doesn't seem to be really free text. if it is more a list of keywords for search purpose you should think on making a second table with only two fields <keyword> and <var1> where the first gets an index and the second is assumed to be the primary key of mytable. when you store a record to mytable you would parse the var7 before and additionally store all (key)words to the second table.

your select statement would turn to

  select a.var1,a.var2, ..., a.var12 from mytable a, mykeys b
  where   b.key = '%pattern%'      
         and a.var1 = b.var1
         and a.var2 between %intval1% and %intval2%

the above would much faster than any wildcard search given that pattern is a constrictive criteria and var1 is primary key of mytable.

Author Comment

ID: 34878147

Sorry if the question is too basic, but can you tell how could I create the temporary Table TEMP2?

LVL 33

Expert Comment

ID: 34878237
What performance change occurred (for better or for worse) when you used the filter described in http:#34868728 as well as adding an index on var2? I was hoping that this would improve your performance.

What was the timing before and after you made these changes?

How many rows are in myTable, and how many rows does the where var2 BETWEEN intval1 and intval2 typically return for your typical values of intval1 and intval2?
LVL 33

Expert Comment

ID: 34878280
The rules for creating a temp table may change depending upon which database you use. Here is a Temporary Tables link and a CREATE TEMPORARY TABLE tmp link that shows how to do it.

Author Comment

ID: 34878494

Unfortunately I wa snot able to try your previous comments. The target DB my client provided me is bigger than my available space. I'll provide more feedback in a couple of hours. I'll try the index and put my feedback here in a couple of hours.
LVL 33

Expert Comment

ID: 34878723
Ok, I'll check back tonight to see if I can help. In the meantime, here are some questions that may prove helpful for future answers.

What database are you using - SQL Server 2005 or MS SQL Server or ?

How long does the current query take without changes? What is the max time the query should take?

What indexes/keys do you already have on myTable?

Are there any performance related issues re: insertion/deletion into myTable, as adding keys/indexes will degrade that performance?

If you can provide the create sql for myTable, that may prove helpful.

I expect that adding an index on var2 will be sufficient to improve your performance. If var2 is unique, you can make it a Primary Key.

If foreign keys are not permitted on your temporary tables, then it may be beneficial to create a permanent auxiliary temp table with foreign key to myTable.

Author Comment

ID: 34892605
Ok, a couple of days later than promised but here's my feedback.

The table that I described on my first comment already had a index (non-clustered) composed (var1, var2, var7) all ascending. I've added an  (non-clustered) index on var2.
 Running the query now resulted in nearly a 10x faster query.
It's truly great.

Author Closing Comment

ID: 34892625
Really good follow-up. A true wealth of expertise in this crowd.

Featured Post

Build your data science skills into a career

Are you ready to take your data science career to the next step, or break into data science? With Springboard’s Data Science Career Track, you’ll master data science topics, have personalized career guidance, weekly calls with a data science expert, and a job guarantee.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Ready to get certified? Check out some courses that help you prepare for third-party exams.
This shares a stored procedure to retrieve permissions for a given user on the current database or across all databases on a server.
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…
Stellar Phoenix SQL Database Repair software easily fixes the suspect mode issue of SQL Server database. It is a simple process to bring the database from suspect mode to normal mode. Check out the video and fix the SQL database suspect mode problem.

606 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question