MySQL >  Index question

Posted on 2012-09-14
Last Modified: 2012-09-19
Using MySQL + an ETL tool to parse large amounts of XML.  I end up with a table that holds variables and values of those variables that are found in the XML.  I already have an index on the Variable column, but wondering if there is a way to optimize queries on the VariableValue column.  The problem is that based on the 'Variable' name, the 'VariableValue' value may or may not contain a value that would otherwise be good to index on.  For example:

[Variable]               [VariableValue]
State                      TX
CustomerType       New
CustomerType       Existing
State                      FL
Address                 123 Whatever Street
OrderNumber       13456789

In the table above, I think it would make sense to want to index State (will hold 1 of 50 values) and Customer Type (will hold 1 of 2 values), but Address values, and Order Number values will always hold 1 of infinite values and it wouldn't make much sense to index those.

Is there a good way to tackle these besides building new tables/fields and doing some kind of weird mapping?  I wonder if indexing on the first X bytes of VariableValue would make sense.  I'd love your advice.
Question by:ducky801
    LVL 25

    Assisted Solution

    Curious structure ... how is the data accessed?  Indexes are used to speed the retrieval of data - are you searching on just [VariableValue] or both [VariableValue] and [Variable]?  I would guess both as the value has little meaning without [Variable] value to give it context.  Do you ever lookup based on just [Variable]?

    Can you index on certain rows ... no, it is the entire table or nothing.
    Can you index only a portion (substring) of a column ... yes; but I am not sure that would be of much benefit ... I am assuming [VariableValue] is varchar.

    My best guess would be to index both columns ... I am not sure which should be first in the index however.  I need to understand more about the data access.
    LVL 26

    Accepted Solution

    You either index the column or not, you can't make indexes on N characters unless you create a third column and populate it with the first N characters of the Value column but I don't think is worth the effort.

    If you create index on the second column it should be part of a composite index with the variable column and the second in position. If you will make queries based on the value only then you need 2 indexes:

    [Variable] + [VariableValue]

    This way you're covered regardless of how you make the searches and for sure will help with the retrieval speed. I wouldn't be worried of the values themselves as log as the column is varchar anyway.
    LVL 26

    Assisted Solution

    If you are worried by the size of the index on the Value column, but i don't think you should, you can create a third column and populate it with the SOUNDEX([VariableValue]). Later you will do queries based on the soundex of a value but it will return all the values that "sound" the same. The index will be considerably less in size but not as exact.

    Here about soundex function:
    LVL 5

    Author Closing Comment

    Thanks guys.  You've given me some ideas. I think what I'll end up doing is adding a column called VarValueIndexed and have that value be the same as VariableValue in cases where we're dealing with a limited number of values (State, CustomerType, etc), and null for the rest.

    To answer your question about access, lwadwell:  "Lots and lots of subqueries"

    The challenge is that I'm re-structuring this data, not always knowing what I'll end up using it for, and the variable names could change pretty easily without me knowing.  Thanks again for your help!

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Top 6 Sources for Identifying Threat Actor TTPs

    Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

    More Fun with XML and MySQL – Parsing Delimited String with a Single SQL Statement Are you ready for another of my SQL tidbits?  Hopefully so, as in this adventure, I will be covering a topic that comes up a lot which is parsing a comma (or other…
    Both Easy and Powerful How easy is PHP? (  Very easy.  It has been described as "a programming language even my grandmother can use." How powerful is PHP?  http://en.wikiped…
    This video discusses moving either the default database or any database to a new volume.
    This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor ( If you're looking for how to monitor bandwidth using netflow or packet s…

    794 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now