Learn how to a build a cloud-first strategyRegister Now

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 370
  • Last Modified:

MySQL > Index question

Using MySQL + an ETL tool to parse large amounts of XML.  I end up with a table that holds variables and values of those variables that are found in the XML.  I already have an index on the Variable column, but wondering if there is a way to optimize queries on the VariableValue column.  The problem is that based on the 'Variable' name, the 'VariableValue' value may or may not contain a value that would otherwise be good to index on.  For example:

[Variable]               [VariableValue]
State                      TX
CustomerType       New
CustomerType       Existing
State                      FL
Address                 123 Whatever Street
OrderNumber       13456789

In the table above, I think it would make sense to want to index State (will hold 1 of 50 values) and Customer Type (will hold 1 of 2 values), but Address values, and Order Number values will always hold 1 of infinite values and it wouldn't make much sense to index those.

Is there a good way to tackle these besides building new tables/fields and doing some kind of weird mapping?  I wonder if indexing on the first X bytes of VariableValue would make sense.  I'd love your advice.
  • 2
3 Solutions
Curious structure ... how is the data accessed?  Indexes are used to speed the retrieval of data - are you searching on just [VariableValue] or both [VariableValue] and [Variable]?  I would guess both as the value has little meaning without [Variable] value to give it context.  Do you ever lookup based on just [Variable]?

Can you index on certain rows ... no, it is the entire table or nothing.
Can you index only a portion (substring) of a column ... yes; but I am not sure that would be of much benefit ... I am assuming [VariableValue] is varchar.

My best guess would be to index both columns ... I am not sure which should be first in the index however.  I need to understand more about the data access.
You either index the column or not, you can't make indexes on N characters unless you create a third column and populate it with the first N characters of the Value column but I don't think is worth the effort.

If you create index on the second column it should be part of a composite index with the variable column and the second in position. If you will make queries based on the value only then you need 2 indexes:

[Variable] + [VariableValue]

This way you're covered regardless of how you make the searches and for sure will help with the retrieval speed. I wouldn't be worried of the values themselves as log as the column is varchar anyway.
If you are worried by the size of the index on the Value column, but i don't think you should, you can create a third column and populate it with the SOUNDEX([VariableValue]). Later you will do queries based on the soundex of a value but it will return all the values that "sound" the same. The index will be considerably less in size but not as exact.

Here about soundex function:

ducky801Author Commented:
Thanks guys.  You've given me some ideas. I think what I'll end up doing is adding a column called VarValueIndexed and have that value be the same as VariableValue in cases where we're dealing with a limited number of values (State, CustomerType, etc), and null for the rest.

To answer your question about access, lwadwell:  "Lots and lots of subqueries"

The challenge is that I'm re-structuring this data, not always knowing what I'll end up using it for, and the variable names could change pretty easily without me knowing.  Thanks again for your help!

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now