Solved

Fast search of 13 char indexed text field for 1 Million+ record table in Mysql InnoDB table

Posted on 2013-12-27
9
611 Views
Last Modified: 2014-01-12
I have a table with 1 Million+ records that I need to search with a 13 char string.

Right now, the search is prohibitively long (3.5 seconds to retrieve a record) with about 850k records.

It's an InnoDB Database and the search field is always 13 chars long and a standard (not Primary) index with no special settings.

This is on a Linux Server.
0
Comment
Question by:genepilot2
  • 5
  • 3
9 Comments
 
LVL 34

Expert Comment

by:gr8gonzo
Comment Utility
So you have one specific field that is always 13 characters long, and you're always searching this field with a 13 character search string?

With that kind of setup, a million records shouldn't take more than a couple seconds to search, so I wonder:

1. Are you pulling back more data than you need? If your SELECT results in lots of records containing hundreds of kilobytes of data, part of the "slowness" might be data transfer overhead.

2. Is the field definition longer than 13 characters and if so, did you specify the index length? For that matter, have you tried different index lengths?

3. Is the database sitting on a slow or busy hard drive? I/O tends to be the bottleneck most of the time, and if the hard drive controller is busy serving data to something else, it's going to take time to even get to the search request.

4. If the data in the search column changes infrequently (e.g. daily), you might consider a separate, full-text search engine like Sphinx or Solr, which can get you very specific results back in a 1/10th of a second (the caveat is that they have big indexes and the indexing is a slow operation that tends to be something you do daily). This seems like overkill for your situation.

5. If nothing else helps (also see my article below), a potentially radical move would be to split your 13-character string up into two columns (e.g. 3 character and 9 character columns), with a multi-column index. Depending on what your data looks like, this might help MySQL look up and organize data faster.


Check out my article on this topic:
http://www.experts-exchange.com/Database/MySQL/A_1250-3-Ways-to-Speed-Up-MySQL.html
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Please post the CREATE TABLE statement and the SQL statement, thanks.
0
 

Author Comment

by:genepilot2
Comment Utility
To answer gr8gonzo's questions:
1. The data is about 1.4 GB in size in the database, but there is 4GB of RAM available for the Database.
2. The field is a varchar(13) and the data is always 13 digits long, all numeric.
3. This index is used heavily so the index will always be in memory.
4. This field doesn't change for existing records, but new records are added daily (about 10-20k per day as visitors ask for new books that aren't already in the system) during the busy period.
5. Splitting the index is one possibility, is a numerical index much faster than a string index?  I know that a 13 digit number is too big for a 32-bit server to use with PHP even if mysql can handle that size with it's largest int.

Thanks,
Brett
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
We can create test cases if you post the CREATE TABLE statement and the SQL statement that is slow.  With a little test data we might be able to create some simulations, right? It will be easier to help if we have that small advantage.
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 34

Accepted Solution

by:
gr8gonzo earned 500 total points
Comment Utility
1. Follow my article, especially the part on tuning the configuration. RAM is good for caching data and query caching, but if you're getting hit with different IDs to search each time, then your query cache isn't going to help you as much.

2. A numerical index definitely has the POTENTIAL to be faster, but it's not necessarily true. Depending on your data, the split-column approach might definitely assist MySQL, though. If you can, just dupe the database, create two columns and split the data into the two columns so you have records like:

OLD: "1234567890123" (1 varchar column)
NEW: 1234 and 567890123 (2 int columns)

A simple query with SUBSTRING() and CAST() should populate the columns quickly so you can run your tests, but make sure you do it on a copy of the database, not on your original.

Then create a multi-column index on the two columns and run your SELECT (and make sure you EXPLAIN it so you can make sure your new index is used).

You don't need PHP to understand the full 13 characters as a big number - just substr() it to get the different parts:

$old = "1234567890123";
$part1 = substr($old,0,4);
$part2 = substr($old,4);
$query = "SELECT fields FROM table WHERE bookID1 = {$part1} AND bookID2 = {$part2}";
...
0
 
LVL 34

Expert Comment

by:gr8gonzo
Comment Utility
Also, don't forget to run maintenance on the table. And if you can afford to do so, consider a dedicated SSD drive for the database. A couple hundred bucks nowadays can buy you a couple of moderately-sized SSDs that you can put in RAID 1 (mirroring for safety) and get you a huge boost in DB performance. If you're curious about how much difference I/O can make, you can always create a temporary RAMdisk and copy your database to it, then run queries against it (but that's just for proof of concept - never put production data onto a RAMdisk since any reboot or power loss means that RAMdisk is gone - it's just good for testing before you spend any money). A RAMdisk will be faster than an SSD, but you should still get an idea of the possibilities.

Finally, when you test, don't test from within your application (just in case code is slowing anything down). Open up phpMyAdmin or HeidiSQL or the shell or whatever you use and run the query directly to make sure you're getting accurate performance readings.
0
 
LVL 34

Expert Comment

by:gr8gonzo
Comment Utility
And Ray is right - test data will help us help you, too, if you can share that!
0
 
LVL 34

Expert Comment

by:gr8gonzo
Comment Utility
I just created a test database with a varchar(13) field called "serial":  
CREATE TABLE `books` (
	`serial` VARCHAR(13) NOT NULL DEFAULT '0',
	INDEX `serial` (`serial`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;

Open in new window

Then I inserted about 2 million rows into it (about 200 megs of data). I pulled a random serial number from around the middle of the database (about 500,000 rows in), and performed a search on it:
SELECT * FROM books WHERE serial = '4397388423728'

Open in new window

As expected, the record came back lightning-fast - 0.062 seconds.

The only way I could get MySQL to run a really long query was to use LIKE with wildcards, like this:

SELECT * FROM books WHERE serial LIKE '%4397388423728%'

Open in new window


That resulted in the same record but in about 9 seconds. That said, I'm hoping that is not what you're doing, since you indicated that your search field is 13 chars long and the search string is 13 chars long, so you should only ever be doing a complete equals operator.
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
No CREATE TABLE statement and no test data...?  To make your question and answer dialogs more effective you might want to learn about the SSCCE.  It's the process and tool professionals use when they want to get help.  You'll be amazed how much faster you get good results when you do that!

Best of luck with your project, ~Ray
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Loading csv or delimited data files to MySQL database is a very common task frequently questioned about and almost every time LOAD DATA INFILE comes to the rescue. Here we will try to understand some of the very common scenarios for loading data …
As a database administrator, you may need to audit your table(s) to determine whether the data types are optimal for your real-world data needs.  This Article is intended to be a resource for such a task. Preface The other day, I was involved …
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now