Link to home
Start Free TrialLog in
Avatar of Gokuld
Gokuld

asked on

Help to improve performance of a Join query

Hi everyone,

I have Two tables namely Clients and Contacts. Clients contain details of companies and contacts hold the contact person information of those companies. The contact persons may be many and so there is possibility of many records and in some cases no contacts also. So, there are 4000 records in Clients and around 3500 records in Contacts table. To get both the information, I run the query as

select cl.*, co.FirstName, co.LastName from clients cl Left Join contacts co on (cl.CompanyId = co.CompanyId) group by cl.companyId order by cl.companyname

My problem is :

1. MySql takes 19 seconds to compile the query... Is there a possibility to improve the performance of this query?

2. The Join returns all the multiple records for a particular company. But I would like to have just one record - a distinct record for each company.... Could anyone help me? It is urgent....
Avatar of gnudiff
gnudiff


1) i am surprised you are not getting some kind of "GROUP BY must be used with aggregate clause" error.

you are grouping by companyId, but not selecting, depending on what criteria a single row of that company should be retrieved (which is what GROUP BY is for).

2) try "SELECT DISTINCT cl.*, co.FirstName, co.LastName from clients cl Left Join contacts co on (cl.CompanyId = co.CompanyId) group by cl.companyId order by cl.companyname ".  I am suspicious of your table structure :). See if it helps.
Avatar of Gokuld

ASKER

#
# Table structure for table 'clients'
#

DROP TABLE IF EXISTS `clients`;
CREATE TABLE `clients` (
  `CompanyID` varchar(20) default NULL,
  `UserID` varchar(20) default NULL,
  `CompanyName` varchar(50) default NULL,
  `Address` varchar(255) default NULL,
  `Phone` varchar(25) default NULL,
  `Fax` varchar(25) default NULL,
  `EMail` varchar(25) default NULL,
  `MarketSector` varchar(50) default NULL,
  `Size` varchar(50) default NULL,
  `County` varchar(50) default NULL,
  `Postcode` varchar(50) default NULL,
  `Country` varchar(50) default NULL,
  `Dept` varchar(25) default NULL,
  `Branch` varchar(25) default NULL,
  `AccountManager` varchar(50) default NULL,
  `Category` varchar(50) default NULL,
  `CustType` varchar(255) default NULL,
  `WebSite` varchar(50) default NULL,
  `ModifiedDate` datetime default NULL,
  `Status` int(11) default NULL
) TYPE=MyISAM;



#
# Table structure for table 'contacts'
#

DROP TABLE IF EXISTS `contacts`;
CREATE TABLE `contacts` (
  `ContactID` varchar(20) default NULL,
  `UserID` varchar(50) default NULL,
  `CompanyID` varchar(20) default NULL,
  `Title` varchar(20) default NULL,
  `FirstName` varchar(50) default NULL,
  `LastName` varchar(50) default NULL,
  `Department` varchar(50) default NULL,
  `Phone` varchar(50) default NULL,
  `Extension` varchar(50) default NULL,
  `Fax` varchar(50) default NULL,
  `Mobile` varchar(50) default NULL,
  `JobTitle` varchar(50) default NULL,
  `EMail` varchar(50) default NULL,
  `ModifiedDate` datetime default NULL
) TYPE=MyISAM;

This is the table structure.... As you have said, MySQL didn't give out any errors..... I tried out your query and it took 18.5 seconds... almost the same as my query did...
ASKER CERTIFIED SOLUTION
Avatar of gnudiff
gnudiff

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Actually the HAVING clause could be rewritten better, because HAVING is happening AFTER grouping.

SELECT DISTINCT cl.companyName, co.FirstName, co.LastName, max(co.modifiedDate)
FROM clients cl LEFT JOIN contacts co ON (cl.CompanyId = co.CompanyId)
GROUP BY cl.companyId, co.FirstName,co.LastName
ORDER BY cl.companyname

Tested this with a couple temporary tables - works for me.
Avatar of Gokuld

ASKER

The query works fine... the time taken is 19.94 seconds.... I actually want to improve the peroformance of the query drastically....

The fact that it works does not mean that it wouldn't work faster, if it was written differently.

What about creating indexes, as I wrote in the first part of the suggestion?
Avatar of Gokuld

ASKER

gnudiff...

I think I didn't hurt you in any sense... I have added indexes... but the performance improved only a second earlier and not much.... I have discovered that joins are slow with MySQL
> I think I didn't hurt you in any sense...
No, you didn't. :)
The rewriting of query was due to your question about getting multiple records for the company, when you want only one; so, saying that the original query "works" was kind of distractive. ;)

>I have added indexes... but the performance improved only a second earlier and not much.... I have discovered that joins are slow with MySQL

Well, it could be the case, yes, however 4000 & 3000 row tables should not be very much. 15-20 seconds is a lot of time for a 4000 row query.

But then again - it might depend on the computer, which runs MySQL - how fast is it, how much RAM it has, and whetherit is running anything else resource intensive, besides MySQL.

Also, using symbolic field as a primary key used in JOINs, can be _quite_ quite slowing down, compared to using numbers, especially for large tables:

CREATE TABLE `clients` (
  `CompanyID` varchar(20) default NUL

CREATE TABLE `contacts` (
  `ContactID` varchar(20) default NULL,
...
  `CompanyID` varchar(20) default NULL

all of those should optimally read "integer" instead of "varchar(20)",
and clients.CompanyID should be NOT NULL, instead of "default NULL"


You can change that on running table in MySQL, I think, but it is risky -
1) it won't work correctly, if CompanyIDs can actually contain not only numbers,
2) it could theoretically break things which try to INSERT / SELECT that data = need to look deeper into the actual DB code.

Still, if it could be done, I believe it could speed the things up quite a lot.

Because, the way it is done currently, a CompanyID, let's say 123456 is actually stored as a _string_
"123456", which contains at least 6 bytes (x2 if string is Unicode).

The same value, but stored as a NUMBER 1234 would be 4 bytes. More for larger numbers of course.

When doing JOINs, it is obviously usually faster to compare numbers than variable length strings, hence I suspect, much of your slowdown on such small table sizes (4000 rows is peanuts, for databases generally).