Sorry for the long winded explanation, but, here goes...
I receive data hourly from a third party which is a zip file containing multiple .csv files. I've written a script which extracts the .csv files, creates tables for each file and loads records from the .csv files. The PHP script recognizes when a new column is added and alters the table if necessary. It also recognizes specific columns and creates indexes on those columns. For each campaign being monitored, it will create eleven different tables most of which are benign dictionaries containing fewer than 20 records. Three of the tables are not so benign.
One table contains all the contacts involved in a campaign and specific static information for each record. This table may easily contain more than 100K records.
Another table contains all the events for the campaign. Similar to a running activity log. Some events could generate as many as 15-20 records. Each event has an event code which can be cross-referenced in the Event-dictionary table. Each event also contains a Contact_ID which references a record in the contacts table. Both these columns are indexed. This table easily contains more than 1 million records.
One of the events is an optout event.
I frequently run queries to determine the number of contacts who've performed a specific event. I need to remove from this count those who've opted out as well as those who've been deactivated. The deactivated contact ids are in a separate table which has only one column, Contact_ID.
My query selects columns from the contact table for all Contact_IDs in a sub-query of Contact_IDs who've performed a specific event, minus all the Contact_IDs who've performed the optout event, minus all the Contact_IDs in the disabled table.
FROM contacts c, data d
AND NOT Contact_ID IN (
SELECT Contact_ID FROM data WHERE Event=optout
AND NOT Contact_ID IN (
SELECT Contact_ID FROM disabled
GROUP BY Contact_ID
Since contacts may trigger the event multiple times and since date is one of the columns I'm selecting, I have to use a group by clause. Distinct doesn't work since the date changes for each time they trigger the event.
This query takes so long to complete that PHP drops the connection.
So I decided to create a view of all the Contact_IDs who've opted out (the third not-so-benign table). But after creating these views for six campaigns, mysql was so slow that it took 10 minutes for phpmyadmin to display the list of tables for that database.
If I remove the clause that eliminates the optout records (lines 5-7), the query takes less than a second to execute. With that clause, it takes nearly 60 seconds.
I've removed the views of optout Contact_IDs. Please help me optimize the performance of this database or the individual query.