Solved

Optimize MySQL Database

Posted on 2014-04-10
5
346 Views
Last Modified: 2014-04-14
Sorry for the long winded explanation, but, here goes...

I receive data hourly from a third party which is a zip file containing multiple .csv files. I've written a script which extracts the .csv files, creates tables for each file and loads records from the .csv files. The PHP script recognizes when a new column is added and alters the table if necessary. It also recognizes specific columns and creates indexes on those columns. For each campaign being monitored, it will create eleven different tables most of which are benign dictionaries containing fewer than 20 records. Three of the tables are not so benign.

One table contains all the contacts involved in a campaign and specific static information for each record. This table may easily contain more than 100K records.

Another table contains all the events for the campaign. Similar to a running activity log. Some events could generate as many as 15-20 records. Each event has an event code which can be cross-referenced in the Event-dictionary table. Each event also contains a Contact_ID which references a record in the contacts table. Both these columns are indexed. This table easily contains more than 1 million records.

One of the events is an optout event.

I frequently run queries to determine the number of contacts who've performed a specific event. I need to remove from this count those who've opted out as well as those who've been deactivated. The deactivated contact ids are in a separate table which has only one column, Contact_ID.

My query selects columns from the contact table for all Contact_IDs in a sub-query of Contact_IDs who've performed a specific event, minus all the Contact_IDs who've performed the optout event, minus all the Contact_IDs in the disabled table.
SELECT ...cols... 
FROM contacts c, data d 
WHERE c.Contact_ID=d.Contact_ID 
AND d.Event=eventcode 
AND NOT Contact_ID IN (
	SELECT Contact_ID FROM data WHERE Event=optout
) 
AND NOT Contact_ID IN (
	SELECT Contact_ID FROM disabled
)
GROUP BY Contact_ID

Open in new window

Since contacts may trigger the event multiple times and since date is one of the columns I'm selecting, I have to use a group by clause. Distinct doesn't work since the date changes for each time they trigger the event.

This query takes so long to complete that PHP drops the connection.

So I decided to create a view of all the Contact_IDs who've opted out (the third not-so-benign table). But after creating these views for six campaigns, mysql was so slow that it took 10 minutes for phpmyadmin to display the list of tables for that database.

If I remove the clause that eliminates the optout records (lines 5-7), the query takes less than a second to execute. With that clause, it takes nearly 60 seconds.

I've removed the views of optout Contact_IDs. Please help me optimize the performance of this database or the individual query.
0
Comment
Question by:Kim Walker
  • 3
  • 2
5 Comments
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39993304
1. Do you have an index on data.Event?
2. If the majority of the Event values are optout, then it's faster to select those that are not.
3. Try using a JOIN. Maybe something like this (untested):

SELECT ...cols...
    FROM contacts c
    LEFT JOIN data d
    ON c.Contact_ID=d.Contact_ID
    RIGHT JOIN (SELECT Contact_ID FROM data WHERE Event !=optout) q
    ON c.Contact_ID = q.Contact_ID
    WHERE Contact_ID NOT IN (SELECT Contact_ID FROM disabled)
    AND d.Event=eventcode
    GROUP BY Contact_ID
0
 
LVL 21

Author Comment

by:Kim Walker
ID: 39993962
1. Yes, data.Event is indexed
2. Quite the opposite, optouts are probably only about 1% of the Event values
3. The problem with a join is that each contact could have hundreds of records in the data table and many different events. I'm looking for unique contacts who have one particular Event value in one record if they don't also have an optout Event value in a different record.
0
 
LVL 34

Accepted Solution

by:
Dan Craciun earned 500 total points
ID: 39995745
SELECT ...cols...
FROM contacts c, data d
WHERE c.Contact_ID=d.Contact_ID
is the same as
SELECT ...cols...
    FROM contacts c
    LEFT JOIN data d
    ON c.Contact_ID=d.Contact_ID
Well, technically there are differences (cartesian product vs join), but the result is the same, the time to execute is the same and the memory usage is the same. At least from my tests. And I find joins easier to read.

NOT IN is about the slowest query I can think of. You're better off just doing a join and then filtering on NULL values.
So instead of
SELECT columns FROM table WHERE column NOT IN (SELECT column FROM another_table)
it's faster to do
SELECT columns FROM table t1
LEFT JOIN another_table t2 ON t1.column = t2.column
WHERE t2.column IS NULL
0
 
LVL 21

Author Closing Comment

by:Kim Walker
ID: 39998935
Dan Craciun you are exactly right. I just had to figure out how to join two queries from the same table. This reduced my query time from 25+ seconds to <1 sec. Thanks.
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39998999
Glad I could help!
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

I imagine that there are some, like me, who require a way of getting currency exchange rates for implementation in web project from time to time, so I thought I would share a solution that I have developed for this purpose. It turns out that Yaho…
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now