How do I optimize for full table scan on a 1.2Million row table?

MySQL - Percona Edition.

Full table scan is taking over 1 second and crashing our backend servers, the table has about 1.2 million records, if the IP is located then it returns very quickly and has no issues, eventually we hit max connections on the server if it can't find it as it does a full table scan. This server is beefy, 128GB of ram + 800GB SSD and dual hexacore processors

SELECT *
  FROM geoip_proxy_ipv4_09_2018
  WHERE INET_ATON('192.168.0.1') BETWEEN ip_from AND ip_to
  LIMIT 1

Open in new window


The table structure is this:
CREATE TABLE IF NOT EXISTS `geoip_proxy_ipv4_09_2018` (
  `ip_from` decimal(10,0) NOT NULL,
  `ip_to` decimal(10,0) NOT NULL,
  `proxy_type` varchar(3) NOT NULL,
  `country_code` varchar(2) NOT NULL,
  `country_name` varchar(128) NOT NULL,
  `region_name` varchar(128) NOT NULL,
  `city_name` varchar(128) NOT NULL,
  UNIQUE KEY `idx_ip_from_ip_to` (`ip_from`,`ip_to`),
  KEY `ip_to` (`ip_to`),
  KEY `ip_from` (`ip_from`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Open in new window


These are the indexes:
Indexes
Keyname	Type	Unique	Packed	Column	Cardinality	Collation	Null	Comment
idx_ip_from_ip_to	BTREE	Yes	No	ip_from	1254839	A	No	
ip_to	1254839	A	No
ip_to	BTREE	No	No	ip_to	1254839	A	No	
ip_from	BTREE	No	No	ip_from	1254839	A	No	

Open in new window


If we do 800 requests per second (lookups) our database server and our backend servers max out their connections waiting for response from backend server... thoughts?
Mario BernheimCTOAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

nociSoftware EngineerCommented:
Hm....
Few questions...
Why the index on ip_from..., the index idx_ip_from_ip_to will be used for select with ip_from as well... (try the explain  statement)...
(Now you have one index that will never be used but will need to be updated on changes.).

The INET_ATON delivers an INT UNSIGNED ,......  decimal != INT.... So why used decimal type.
INET_ATON will result in a 4 byte unsigned integer.. (max 9 digits).
DECIMAL is a numeric string (with up to 65 digits...) this is not the same as a binary integer, or even mapped to an integer type.

From the manual:
To store values generated by INET_ATON(), use an INT UNSIGNED column rather than INT, which is signed. If you use a signed column, values corresponding to IP addresses for which the first octet is greater than 127 cannot be stored correctly. See Section 11.2.6, “Out-of-Range and Overflow Handling”.


Please try to use the same types for ip_from & ip_to as INET_ATON gives that at least prevents type conversions, and possibly will get the index to actually get used.
Mario BernheimCTOAuthor Commented:
Thanks, Ive gone ahead and done those changes... the change on the type and kept one index on the from and to columns... unfortunately, when the situation persists... while doing a full table scan it will hang all backend servers and they will eventualy  max out and start swapping... the database server also runs out of resources if I leave the benchmark on too long...
Mario BernheimCTOAuthor Commented:
Hello, this seemed to fix the issue...

SELECT proxy_type FROM (SELECT proxy_type, `ip_from` FROM `geoip_proxy_ipv4_09_2018` WHERE ip_to >= INET_ATON('173.245.203.9') LIMIT 1) AS mytable WHERE ip_from <= INET_ATON('173.245.203.9')

Open in new window


The queries will take about  (Query took 0.0003 sec) on results where it cant find the IP... question is... WHY?
Active Protection takes the fight to cryptojacking

While there were several headline-grabbing ransomware attacks during in 2017, another big threat started appearing at the same time that didn’t get the same coverage – illicit cryptomining.

Steve BinkCommented:
MySQL's optimization docs don't really address range queries in which the constant value is the target, but it does provide some hints on how this would be optimized.  Note that this is a little educated guess-work on my part, based on notes in https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html.  We might have more insight based on the EXPLAIN results.

MySQL only uses a single index to generate the initial scan of a table.  So initially, MySQL is going to choose between the `ip_to` and `ip_from` indexes.  Also, MySQL collapses range scan requirements into the "greatest common denominator" (my term), which is almost always more general than the actual WHERE clause.  In this example, in which C is the searched constant, A is ip_from, and B is ip_to:
[...] WHERE C BETWEEN A AND B
/* Refactored = early step of optimization */
WHERE A <= C AND C <= B

Open in new window

Here it becomes a little more clear what's going to happen.  MySQL can only choose one of the indexes available, so it will scan the index of the table to "guess" which predicate will yield less rows for the first part of row selection.  I.e, are there less rows in "A<=C" or "C<=B"?  Whichever it is, MySQL will pre-select all the matching rows for secondary filtering against the entire WHERE clause.  In cases where your target is in the middle of the domain range, or in which MySQL guesses incorrectly, it could be pre-selecting half (or slightly more) of the table.  The secondary filtering is a row-by-row comparison of all predicates.  If your pre-selected rows constitutes a large enough bulk, MySQL may resort to file-based filtering/sorting at this point.  I'm guessing that is happening, since you described your server's thrashing and swapping.

With the second query, the benefits are immediately clear in that MySQL can now use both indexes, though it uses them consecutively.  The first query will result in a simple predicate of "C<=B", which will use the `ip_to` index.  The results of this move into the outer query, with another simple predicate of "B<=A", which will use the `ip_from` index.  Both of these are strictly index scans, which happens very quickly.

Again, this is a little guess-work, but it makes sense in the context of how MySQL internally optimizes queries.  I would imagine that your EXPLAIN results would show a similar scenario to what I've described.
nociSoftware EngineerCommented:
You already did find the remaining issue, indexes not being used. The bracketed select is an optimisation for the between that only performs better.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Mario BernheimCTOAuthor Commented:
thanks everyone
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Databases

From novice to tech pro — start learning today.