[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now


Fast search of 13 char indexed text field for 1 Million+ record table in Mysql InnoDB table

Posted on 2013-12-27
Medium Priority
Last Modified: 2014-01-12
I have a table with 1 Million+ records that I need to search with a 13 char string.

Right now, the search is prohibitively long (3.5 seconds to retrieve a record) with about 850k records.

It's an InnoDB Database and the search field is always 13 chars long and a standard (not Primary) index with no special settings.

This is on a Linux Server.
Question by:genepilot2
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
LVL 35

Expert Comment

ID: 39743643
So you have one specific field that is always 13 characters long, and you're always searching this field with a 13 character search string?

With that kind of setup, a million records shouldn't take more than a couple seconds to search, so I wonder:

1. Are you pulling back more data than you need? If your SELECT results in lots of records containing hundreds of kilobytes of data, part of the "slowness" might be data transfer overhead.

2. Is the field definition longer than 13 characters and if so, did you specify the index length? For that matter, have you tried different index lengths?

3. Is the database sitting on a slow or busy hard drive? I/O tends to be the bottleneck most of the time, and if the hard drive controller is busy serving data to something else, it's going to take time to even get to the search request.

4. If the data in the search column changes infrequently (e.g. daily), you might consider a separate, full-text search engine like Sphinx or Solr, which can get you very specific results back in a 1/10th of a second (the caveat is that they have big indexes and the indexing is a slow operation that tends to be something you do daily). This seems like overkill for your situation.

5. If nothing else helps (also see my article below), a potentially radical move would be to split your 13-character string up into two columns (e.g. 3 character and 9 character columns), with a multi-column index. Depending on what your data looks like, this might help MySQL look up and organize data faster.

Check out my article on this topic:
LVL 111

Expert Comment

by:Ray Paseur
ID: 39743685
Please post the CREATE TABLE statement and the SQL statement, thanks.

Author Comment

ID: 39744232
To answer gr8gonzo's questions:
1. The data is about 1.4 GB in size in the database, but there is 4GB of RAM available for the Database.
2. The field is a varchar(13) and the data is always 13 digits long, all numeric.
3. This index is used heavily so the index will always be in memory.
4. This field doesn't change for existing records, but new records are added daily (about 10-20k per day as visitors ask for new books that aren't already in the system) during the busy period.
5. Splitting the index is one possibility, is a numerical index much faster than a string index?  I know that a 13 digit number is too big for a 32-bit server to use with PHP even if mysql can handle that size with it's largest int.

Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

LVL 111

Expert Comment

by:Ray Paseur
ID: 39744246
We can create test cases if you post the CREATE TABLE statement and the SQL statement that is slow.  With a little test data we might be able to create some simulations, right? It will be easier to help if we have that small advantage.
LVL 35

Accepted Solution

gr8gonzo earned 2000 total points
ID: 39744332
1. Follow my article, especially the part on tuning the configuration. RAM is good for caching data and query caching, but if you're getting hit with different IDs to search each time, then your query cache isn't going to help you as much.

2. A numerical index definitely has the POTENTIAL to be faster, but it's not necessarily true. Depending on your data, the split-column approach might definitely assist MySQL, though. If you can, just dupe the database, create two columns and split the data into the two columns so you have records like:

OLD: "1234567890123" (1 varchar column)
NEW: 1234 and 567890123 (2 int columns)

A simple query with SUBSTRING() and CAST() should populate the columns quickly so you can run your tests, but make sure you do it on a copy of the database, not on your original.

Then create a multi-column index on the two columns and run your SELECT (and make sure you EXPLAIN it so you can make sure your new index is used).

You don't need PHP to understand the full 13 characters as a big number - just substr() it to get the different parts:

$old = "1234567890123";
$part1 = substr($old,0,4);
$part2 = substr($old,4);
$query = "SELECT fields FROM table WHERE bookID1 = {$part1} AND bookID2 = {$part2}";
LVL 35

Expert Comment

ID: 39744336
Also, don't forget to run maintenance on the table. And if you can afford to do so, consider a dedicated SSD drive for the database. A couple hundred bucks nowadays can buy you a couple of moderately-sized SSDs that you can put in RAID 1 (mirroring for safety) and get you a huge boost in DB performance. If you're curious about how much difference I/O can make, you can always create a temporary RAMdisk and copy your database to it, then run queries against it (but that's just for proof of concept - never put production data onto a RAMdisk since any reboot or power loss means that RAMdisk is gone - it's just good for testing before you spend any money). A RAMdisk will be faster than an SSD, but you should still get an idea of the possibilities.

Finally, when you test, don't test from within your application (just in case code is slowing anything down). Open up phpMyAdmin or HeidiSQL or the shell or whatever you use and run the query directly to make sure you're getting accurate performance readings.
LVL 35

Expert Comment

ID: 39744338
And Ray is right - test data will help us help you, too, if you can share that!
LVL 35

Expert Comment

ID: 39746528
I just created a test database with a varchar(13) field called "serial":  
CREATE TABLE `books` (
	`serial` VARCHAR(13) NOT NULL DEFAULT '0',
	INDEX `serial` (`serial`)

Open in new window

Then I inserted about 2 million rows into it (about 200 megs of data). I pulled a random serial number from around the middle of the database (about 500,000 rows in), and performed a search on it:
SELECT * FROM books WHERE serial = '4397388423728'

Open in new window

As expected, the record came back lightning-fast - 0.062 seconds.

The only way I could get MySQL to run a really long query was to use LIKE with wildcards, like this:

SELECT * FROM books WHERE serial LIKE '%4397388423728%'

Open in new window

That resulted in the same record but in about 9 seconds. That said, I'm hoping that is not what you're doing, since you indicated that your search field is 13 chars long and the search string is 13 chars long, so you should only ever be doing a complete equals operator.
LVL 111

Expert Comment

by:Ray Paseur
ID: 39775284
No CREATE TABLE statement and no test data...?  To make your question and answer dialogs more effective you might want to learn about the SSCCE.  It's the process and tool professionals use when they want to get help.  You'll be amazed how much faster you get good results when you do that!

Best of luck with your project, ~Ray

Featured Post

Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As a database administrator, you may need to audit your table(s) to determine whether the data types are optimal for your real-world data needs.  This Article is intended to be a resource for such a task. Preface The other day, I was involved …
In this blog post, we’ll look at how using thread_statistics can cause high memory usage.
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
Suggested Courses

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question