Speeding up a very fast complex query

Hello everyone,

I have what I think might be a tough challenge.  In essence I have 3 very large tables:

extraction_file_detail
extract_document_list
email.dirs

email.dirs.notesid = extraction_file_detail.notesid
extraction_file_detail.f_docnumber = extract_document_list.f_docnumber

I have data in email.dirs that I need to map to data in extract_document_list.

Here are a few attempts:

EXPLAIN select * from p2.extract_document_list where f_docnumber in 
    (select f_docnumber
        from p2.extraction_file_detail  e
            inner join email.dirs d on d.notesid = e.notesid
            and d.flag = 3);

Open in new window


EXPLAIN select * from p2.extract_document_list 
where f_docnumber in (select f_docnumber from p2.extraction_file_detail e 
                        where exists (select * from email.dirs d
                        where d.notesid = e.notesid
                        and d.flag = 3));

Open in new window


And a few others of simliar types, all of which result in extremely slow response.

I can tell you that, with exception of flag, all of the fields I am searching on are primary keys.  This is why I thought the searches would be faster.

So my questions are, is there a better way to search? I have tried JOIN, EXISTS, IN, all 3 I never got to actually finish before giving up and stopping the query.  Or, do I need to think about how the database is optimized?

Any assistance would be very much appreciated.

dbridleAsked:
Who is Participating?
 
arnoldCommented:
The following is based on the toad for mysql.
SELECT  `extract_document_list`.F_DOCNUMBER
	, `extract_document_list`.F_DOCCLASSNUMBER
	, `extract_document_list`.export_flag
	, `extract_document_list`.conversion_flag
	, `extract_document_list`.import_flag
	, `extract_document_list`.F_ENTRYDATE
	, `extract_document_list`.encrypted
	, `extract_document_list`.emailFrom
FROM 
	extract_document_list `extract_document_list` CROSS JOIN 
	extraction_file_detail `extraction_file_detail`
	INNER JOIN
	dirs `dirs`
	ON (`extraction_file_detail`.notesID = `dirs`.notesid) 
WHERE (`dirs`.flag = '3')

Open in new window

0
 
arnoldCommented:
Please post the create table <tablename>

you could get http://dev.mysql.com/downloads/workbench/5.2.html or toad for mysql
And use that to formulate the query using the query designer.

0
 
simonpaul64Commented:
Your third table Email.dirs sounds like a table.fieldname reference to me. Not sure that is right - so I will reference it as emailtable and you can check if it is correct.

I would do something like

Select list.*
from extract_document_list list
INNER JOIN extraction_file_detail det on
INNER JOIN email.dirs emailtable on emailtable.notesid = det.notesid
WHERE and emailtable.flag = 3

 
0
Network Scalability - Handle Complex Environments

Monitor your entire network from a single platform. Free 30 Day Trial Now!

 
JoeNuvoCommented:
EXPLAIN select edl.*
from
      email.dirs d
      INNER JOIN p2.extraction_file_detail efd ON d.notesid = efd.notesid
      INNER JOIN p2.extract_document_list edl ON efd.f_docnumber = edl.f_docnumber
WHERE
      d.flag = 3

should have index/primary key for
1. p2.extract_document_list
    - f_docnumber   (maybe primary key?)
2. p2.extraction_file_detail
    - notesid   (maybe primary key?)
    - f_docnumber
3. email.dirs
    - flag

so, the search will narrow down by flag to obtain list of notesid from email.dirs
and then it will uses index of notesid to narrow down f_docnumber of p2.extraction_file_detail
and go get the target data of p2.extract_document_list base on f_docnumber
0
 
simonpaul64Commented:
Just noticed mine has an unwanted and in the where clause

Select list.*
from extract_document_list list
INNER JOIN extraction_file_detail det on
INNER JOIN email.dirs emailtable on emailtable.notesid = det.notesid
WHERE emailtable.flag = 3
0
 
dbridleAuthor Commented:
Before I start pasting tables, that I would need to blank out some of the field names, I was wondering if we could work off of simonpauls sql statement?

Select list.*
from extract_document_list list
INNER JOIN extraction_file_detail det on
INNER JOIN email.dirs emailtable on emailtable.notesid = det.notesid
WHERE emailtable.flag = 3

is giving an error on the second INNER

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'INNER' at line 4
0
 
dbridleAuthor Commented:
I got this working and is now running, but I suspect it is going to be very slow.

@JoeNuvo

All of the fields you listed are primary keys, except flag.
0
 
simonpaul64Commented:
OK the second inner relates to the table that I said looks wrong. I imagine you would get a similar error if you tried to do the following statement you would also have a problem

 select * from email.dirs

email.dirs isn't a tablename it is referencing a field (dirs) in the table email. So - if your table is actually called email, then the follwoing should work

Select list.*
from extract_document_list list
INNER JOIN extraction_file_detail det on
INNER JOIN email emailtable on emailtable.notesid = det.notesid
WHERE emailtable.flag = 3



0
 
dbridleAuthor Commented:
I think you posted at the same time I did :)

email is the schema, dirs is the table name, notesid one of 2 primary keys.

Here are the create statements asked for:

email.dirs

delimiter $$

CREATE TABLE `dirs` (
  `iddirs` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `dirname` varchar(500) DEFAULT NULL,
  `notesid` varchar(50) NOT NULL,
  `flag` int(1) DEFAULT NULL,
  PRIMARY KEY (`iddirs`,`notesid`)
) ENGINE=InnoDB AUTO_INCREMENT=1779387 DEFAULT CHARSET=latin1$$

Open in new window


p2.extraction_file_detail:

delimiter $$

CREATE TABLE `extraction_file_detail` (
  `mime_type` varchar(50) DEFAULT NULL,
  `path` varchar(1024) DEFAULT NULL,
  `notesID` varchar(255) NOT NULL DEFAULT '0',
  `F_DOCNUMBER` int(10) NOT NULL,
  `dxlConvertedPath` varchar(1024) DEFAULT NULL,
  `fileID` int(7) DEFAULT NULL,
  PRIMARY KEY (`F_DOCNUMBER`,`notesID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1$$

Open in new window


and finally extract_document_list:

delimiter $$

CREATE TABLE `extract_document_list` (
  `F_DOCNUMBER` int(10) unsigned NOT NULL,
  `F_DOCCLASSNUMBER` int(11) DEFAULT NULL,
  `export_flag` int(1) DEFAULT NULL,
  `conversion_flag` int(1) DEFAULT NULL,
  `import_flag` int(1) DEFAULT NULL,
  `F_ENTRYDATE` int(10) DEFAULT NULL,
  `encrypted` int(1) DEFAULT NULL,
  `emailFrom` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`F_DOCNUMBER`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1$$

Open in new window

0
 
simonpaul64Commented:
Regarding the performance, yep you are joining three tables and the only criteria you are using is "emailtable.flag = 3". Depending on how selective that is, that might mean you are retrieving a result set as large as the largest table. Are there any more criteria you can add?
0
 
dbridleAuthor Commented:
Sorry one more thing.  Here is the completed SQL Statement:

Select doc.*
from p2.extract_document_list doc
INNER JOIN p2.extraction_file_detail det on doc.f_docnumber = det.f_docnumber
INNER JOIN email.dirs emailtable on emailtable.notesid = det.notesid
WHERE emailtable.flag = 3;

Open in new window


It ran and took 505 seconds to run.  I'm hoping to reduce that some, and I think thats my question in all of this :)
0
 
dbridleAuthor Commented:
@ Simon, not really.  Thankfully I know that flag 3 is a small slice of the pie so to speak.  About 5% of all records.  Is there a way that I can optomize that particular search?
0
 
JoeNuvoCommented:
IMO,

1. you should create index ( flag ) on table dirs: this will speed up row filtering
2. you should create index ( notesID ) on table extraction_file_detail
    since primary key order is F_DOCNUMBER, notesID,  utilize for this query is not much effective/no effect
    if it's reverse (notesID, F_DOCNUMBER), it will be ok on this query.
    but it may have negative effect on other query of yr system. So, new index should be better

only table extract_document_list is ok

give it a try if you could.
0
 
dbridleAuthor Commented:
Arnold this query completed in 3.59 seconds which is excellent.  

I have a quick question about these indexes.  When I alter the table in mysql workbench, there is one primarykey with multiple indexes.  For the notesid, I have created a seperate index for it.  I assume that multiple indexes in a primary key is not so fast.  What i have ended up with is as follows:

PRIMARY (Primary Key) - notesid, f_docnumber
SECONDARY (Index) - notesid

Is this right? Or do i need to somehow drop the notesid out of the primary key?
0
 
wolfgang_93Commented:
There is a possible way to do even better -- subsecond response most of the time --
but only if queries are done more frequently than tables being updated:

Simply turn on the query cache. I do this on some of our servers by simply putting 2 lines
like this in the my.cnf configuration file and rebooting the MySQL server:

query_cache_type = 1
query_cache_size = 100M

The way a query cache works in MySQL is that it will maintain a cache of query results.
The next time the same query is issued, if the tables involved have not changed, it
will fetch the result from cache rather than re-execute the query -- which is done in
subsecond time. Otherwise it will execute the query per method described above and
update the query cache.

Think of the saving in overall query time if even 20% of the queries are fetched from
cache versus the minimal overhead that MySQL imposes by updating the query
cache.
0
 
dbridleAuthor Commented:
Hopefully this thread will be as helpful to others as it was for me, doing this alone would have been a daunting task!
0
 
arnoldCommented:
As a primary key, it should have a index for the column. This index often enforces the no duplicate (unique)
The second index notesid, f_docnumber depending on how you define it can enforce that there will not be data errors i.e. two entries with notesid with the same f_docnumber.  I.e. a last ditch data validation/integrity mechanism.

http://www.quest.com/toad-for-mysql/
Each tool has its advantages.  It also has an ER tool that you might want to use in the future i.e. use foreign keys/constraints such that the database will enforce data integrity.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.