Fast search of 13 char indexed text field for 1 Million+ record table in Mysql InnoDB table

Posted on 2013-12-27
Last Modified: 2014-01-12
I have a table with 1 Million+ records that I need to search with a 13 char string.

Right now, the search is prohibitively long (3.5 seconds to retrieve a record) with about 850k records.

It's an InnoDB Database and the search field is always 13 chars long and a standard (not Primary) index with no special settings.

This is on a Linux Server.
Question by:genepilot2
  • 5
  • 3
LVL 34

Expert Comment

ID: 39743643
So you have one specific field that is always 13 characters long, and you're always searching this field with a 13 character search string?

With that kind of setup, a million records shouldn't take more than a couple seconds to search, so I wonder:

1. Are you pulling back more data than you need? If your SELECT results in lots of records containing hundreds of kilobytes of data, part of the "slowness" might be data transfer overhead.

2. Is the field definition longer than 13 characters and if so, did you specify the index length? For that matter, have you tried different index lengths?

3. Is the database sitting on a slow or busy hard drive? I/O tends to be the bottleneck most of the time, and if the hard drive controller is busy serving data to something else, it's going to take time to even get to the search request.

4. If the data in the search column changes infrequently (e.g. daily), you might consider a separate, full-text search engine like Sphinx or Solr, which can get you very specific results back in a 1/10th of a second (the caveat is that they have big indexes and the indexing is a slow operation that tends to be something you do daily). This seems like overkill for your situation.

5. If nothing else helps (also see my article below), a potentially radical move would be to split your 13-character string up into two columns (e.g. 3 character and 9 character columns), with a multi-column index. Depending on what your data looks like, this might help MySQL look up and organize data faster.

Check out my article on this topic:
LVL 108

Expert Comment

by:Ray Paseur
ID: 39743685
Please post the CREATE TABLE statement and the SQL statement, thanks.

Author Comment

ID: 39744232
To answer gr8gonzo's questions:
1. The data is about 1.4 GB in size in the database, but there is 4GB of RAM available for the Database.
2. The field is a varchar(13) and the data is always 13 digits long, all numeric.
3. This index is used heavily so the index will always be in memory.
4. This field doesn't change for existing records, but new records are added daily (about 10-20k per day as visitors ask for new books that aren't already in the system) during the busy period.
5. Splitting the index is one possibility, is a numerical index much faster than a string index?  I know that a 13 digit number is too big for a 32-bit server to use with PHP even if mysql can handle that size with it's largest int.

LVL 108

Expert Comment

by:Ray Paseur
ID: 39744246
We can create test cases if you post the CREATE TABLE statement and the SQL statement that is slow.  With a little test data we might be able to create some simulations, right? It will be easier to help if we have that small advantage.
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

LVL 34

Accepted Solution

gr8gonzo earned 500 total points
ID: 39744332
1. Follow my article, especially the part on tuning the configuration. RAM is good for caching data and query caching, but if you're getting hit with different IDs to search each time, then your query cache isn't going to help you as much.

2. A numerical index definitely has the POTENTIAL to be faster, but it's not necessarily true. Depending on your data, the split-column approach might definitely assist MySQL, though. If you can, just dupe the database, create two columns and split the data into the two columns so you have records like:

OLD: "1234567890123" (1 varchar column)
NEW: 1234 and 567890123 (2 int columns)

A simple query with SUBSTRING() and CAST() should populate the columns quickly so you can run your tests, but make sure you do it on a copy of the database, not on your original.

Then create a multi-column index on the two columns and run your SELECT (and make sure you EXPLAIN it so you can make sure your new index is used).

You don't need PHP to understand the full 13 characters as a big number - just substr() it to get the different parts:

$old = "1234567890123";
$part1 = substr($old,0,4);
$part2 = substr($old,4);
$query = "SELECT fields FROM table WHERE bookID1 = {$part1} AND bookID2 = {$part2}";
LVL 34

Expert Comment

ID: 39744336
Also, don't forget to run maintenance on the table. And if you can afford to do so, consider a dedicated SSD drive for the database. A couple hundred bucks nowadays can buy you a couple of moderately-sized SSDs that you can put in RAID 1 (mirroring for safety) and get you a huge boost in DB performance. If you're curious about how much difference I/O can make, you can always create a temporary RAMdisk and copy your database to it, then run queries against it (but that's just for proof of concept - never put production data onto a RAMdisk since any reboot or power loss means that RAMdisk is gone - it's just good for testing before you spend any money). A RAMdisk will be faster than an SSD, but you should still get an idea of the possibilities.

Finally, when you test, don't test from within your application (just in case code is slowing anything down). Open up phpMyAdmin or HeidiSQL or the shell or whatever you use and run the query directly to make sure you're getting accurate performance readings.
LVL 34

Expert Comment

ID: 39744338
And Ray is right - test data will help us help you, too, if you can share that!
LVL 34

Expert Comment

ID: 39746528
I just created a test database with a varchar(13) field called "serial":  
CREATE TABLE `books` (
	`serial` VARCHAR(13) NOT NULL DEFAULT '0',
	INDEX `serial` (`serial`)

Open in new window

Then I inserted about 2 million rows into it (about 200 megs of data). I pulled a random serial number from around the middle of the database (about 500,000 rows in), and performed a search on it:
SELECT * FROM books WHERE serial = '4397388423728'

Open in new window

As expected, the record came back lightning-fast - 0.062 seconds.

The only way I could get MySQL to run a really long query was to use LIKE with wildcards, like this:

SELECT * FROM books WHERE serial LIKE '%4397388423728%'

Open in new window

That resulted in the same record but in about 9 seconds. That said, I'm hoping that is not what you're doing, since you indicated that your search field is 13 chars long and the search string is 13 chars long, so you should only ever be doing a complete equals operator.
LVL 108

Expert Comment

by:Ray Paseur
ID: 39775284
No CREATE TABLE statement and no test data...?  To make your question and answer dialogs more effective you might want to learn about the SSCCE.  It's the process and tool professionals use when they want to get help.  You'll be amazed how much faster you get good results when you do that!

Best of luck with your project, ~Ray

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

More Fun with XML and MySQL – Parsing Delimited String with a Single SQL Statement Are you ready for another of my SQL tidbits?  Hopefully so, as in this adventure, I will be covering a topic that comes up a lot which is parsing a comma (or other…
All XML, All the Time; More Fun MySQL Tidbits – Dynamically Generate XML via Stored Procedure in MySQL Extensible Markup Language (XML) and database systems, a marriage we are seeing more and more of.  So the topics of parsing and manipulating XM…
Here's a very brief overview of the methods PRTG Network Monitor ( offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

930 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now