Solved

Help to improve performance of a Join query

Posted on 2004-10-10
8
499 Views
Last Modified: 2008-03-17
Hi everyone,

I have Two tables namely Clients and Contacts. Clients contain details of companies and contacts hold the contact person information of those companies. The contact persons may be many and so there is possibility of many records and in some cases no contacts also. So, there are 4000 records in Clients and around 3500 records in Contacts table. To get both the information, I run the query as

select cl.*, co.FirstName, co.LastName from clients cl Left Join contacts co on (cl.CompanyId = co.CompanyId) group by cl.companyId order by cl.companyname

My problem is :

1. MySql takes 19 seconds to compile the query... Is there a possibility to improve the performance of this query?

2. The Join returns all the multiple records for a particular company. But I would like to have just one record - a distinct record for each company.... Could anyone help me? It is urgent....
0
Comment
Question by:Gokuld
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
8 Comments
 
LVL 3

Expert Comment

by:gnudiff
ID: 12284204

1) i am surprised you are not getting some kind of "GROUP BY must be used with aggregate clause" error.

you are grouping by companyId, but not selecting, depending on what criteria a single row of that company should be retrieved (which is what GROUP BY is for).

2) try "SELECT DISTINCT cl.*, co.FirstName, co.LastName from clients cl Left Join contacts co on (cl.CompanyId = co.CompanyId) group by cl.companyId order by cl.companyname ".  I am suspicious of your table structure :). See if it helps.
0
 

Author Comment

by:Gokuld
ID: 12285219
#
# Table structure for table 'clients'
#

DROP TABLE IF EXISTS `clients`;
CREATE TABLE `clients` (
  `CompanyID` varchar(20) default NULL,
  `UserID` varchar(20) default NULL,
  `CompanyName` varchar(50) default NULL,
  `Address` varchar(255) default NULL,
  `Phone` varchar(25) default NULL,
  `Fax` varchar(25) default NULL,
  `EMail` varchar(25) default NULL,
  `MarketSector` varchar(50) default NULL,
  `Size` varchar(50) default NULL,
  `County` varchar(50) default NULL,
  `Postcode` varchar(50) default NULL,
  `Country` varchar(50) default NULL,
  `Dept` varchar(25) default NULL,
  `Branch` varchar(25) default NULL,
  `AccountManager` varchar(50) default NULL,
  `Category` varchar(50) default NULL,
  `CustType` varchar(255) default NULL,
  `WebSite` varchar(50) default NULL,
  `ModifiedDate` datetime default NULL,
  `Status` int(11) default NULL
) TYPE=MyISAM;



#
# Table structure for table 'contacts'
#

DROP TABLE IF EXISTS `contacts`;
CREATE TABLE `contacts` (
  `ContactID` varchar(20) default NULL,
  `UserID` varchar(50) default NULL,
  `CompanyID` varchar(20) default NULL,
  `Title` varchar(20) default NULL,
  `FirstName` varchar(50) default NULL,
  `LastName` varchar(50) default NULL,
  `Department` varchar(50) default NULL,
  `Phone` varchar(50) default NULL,
  `Extension` varchar(50) default NULL,
  `Fax` varchar(50) default NULL,
  `Mobile` varchar(50) default NULL,
  `JobTitle` varchar(50) default NULL,
  `EMail` varchar(50) default NULL,
  `ModifiedDate` datetime default NULL
) TYPE=MyISAM;

This is the table structure.... As you have said, MySQL didn't give out any errors..... I tried out your query and it took 18.5 seconds... almost the same as my query did...
0
 
LVL 3

Accepted Solution

by:
gnudiff earned 100 total points
ID: 12286421
Okay, this is much better amount of data about the question. :)

> 1. MySql takes 19 seconds to compile the query... Is there a possibility to improve the performance of this query?

Do you have indexes defined? From the dump, it seems not.
For a normal SQL table form:
- I would  assume CompanyID should be unique for the clients table,
- clients.CompanyID should really not have possible NULL values. Dunno about MySQL, but I know that if you can specify field as NOT NULL, it speeds up queries.

You could try these:

CREATE UNIQUE INDEX clientIDs ON clients(companyID);
CREATE INDEX companycontacts ON contacts(companyID);

This should speed up the above query a lot.

Also note, that if you really use numbers for companyID, it is a waste to store the field as varchar, and this can slow things down considerably too. Same goes for other ID fields: ContactID, UserID.


> 2. The Join returns all the multiple records for a particular company. But I would like to have just one record - a distinct record for each company.... Could anyone help me? It is urgent....

Sure, you can, you just have to specify - _which_ contact should be selected for the particular company, somehow.
Let's say you want that contact for the company, who has been modified most recently:

SELECT DISTINCT cl.companyName, co.FirstName, co.LastName
FROM clients cl LEFT JOIN contacts co ON (cl.CompanyId = co.CompanyId)
GROUP BY cl.companyId HAVING co.modifiedDate = max(co.modifiedDate) OR co.modifiedDate IS NULL
ORDER BY cl.companyname

Something like that...
0
The Ultimate Checklist to Optimize Your Website

Websites are getting bigger and complicated by the day. Video, images, custom fonts are all great for showcasing your product/service. But the price to pay in terms of reduced page load times and ultimately, decreased sales, can lead to some difficult decisions about what to cut.

 
LVL 3

Expert Comment

by:gnudiff
ID: 12286448
Actually the HAVING clause could be rewritten better, because HAVING is happening AFTER grouping.

SELECT DISTINCT cl.companyName, co.FirstName, co.LastName, max(co.modifiedDate)
FROM clients cl LEFT JOIN contacts co ON (cl.CompanyId = co.CompanyId)
GROUP BY cl.companyId, co.FirstName,co.LastName
ORDER BY cl.companyname

Tested this with a couple temporary tables - works for me.
0
 

Author Comment

by:Gokuld
ID: 12307051
The query works fine... the time taken is 19.94 seconds.... I actually want to improve the peroformance of the query drastically....
0
 
LVL 3

Expert Comment

by:gnudiff
ID: 12307887

The fact that it works does not mean that it wouldn't work faster, if it was written differently.

What about creating indexes, as I wrote in the first part of the suggestion?
0
 

Author Comment

by:Gokuld
ID: 12308308
gnudiff...

I think I didn't hurt you in any sense... I have added indexes... but the performance improved only a second earlier and not much.... I have discovered that joins are slow with MySQL
0
 
LVL 3

Expert Comment

by:gnudiff
ID: 12308853
> I think I didn't hurt you in any sense...
No, you didn't. :)
The rewriting of query was due to your question about getting multiple records for the company, when you want only one; so, saying that the original query "works" was kind of distractive. ;)

>I have added indexes... but the performance improved only a second earlier and not much.... I have discovered that joins are slow with MySQL

Well, it could be the case, yes, however 4000 & 3000 row tables should not be very much. 15-20 seconds is a lot of time for a 4000 row query.

But then again - it might depend on the computer, which runs MySQL - how fast is it, how much RAM it has, and whetherit is running anything else resource intensive, besides MySQL.

Also, using symbolic field as a primary key used in JOINs, can be _quite_ quite slowing down, compared to using numbers, especially for large tables:

CREATE TABLE `clients` (
  `CompanyID` varchar(20) default NUL

CREATE TABLE `contacts` (
  `ContactID` varchar(20) default NULL,
...
  `CompanyID` varchar(20) default NULL

all of those should optimally read "integer" instead of "varchar(20)",
and clients.CompanyID should be NOT NULL, instead of "default NULL"


You can change that on running table in MySQL, I think, but it is risky -
1) it won't work correctly, if CompanyIDs can actually contain not only numbers,
2) it could theoretically break things which try to INSERT / SELECT that data = need to look deeper into the actual DB code.

Still, if it could be done, I believe it could speed the things up quite a lot.

Because, the way it is done currently, a CompanyID, let's say 123456 is actually stored as a _string_
"123456", which contains at least 6 bytes (x2 if string is Unicode).

The same value, but stored as a NUMBER 1234 would be 4 bytes. More for larger numbers of course.

When doing JOINs, it is obviously usually faster to compare numbers than variable length strings, hence I suspect, much of your slowdown on such small table sizes (4000 rows is peanuts, for databases generally).
0

Featured Post

Turn Insights into Action

Communication across every corner of your business is essential to increase the velocity of your application delivery and support pipeline. Automate, standardize, and contextualize your communication processes with xMatters.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been using r1soft Continuous Data Protection (http://www.r1soft.com/linux-cdp/) for many years now with the mySQL Addon and wanted to share a trick I have used several times. For those of us that don't have the luxury of using all transact…
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
If you're a developer or IT admin, you’re probably tasked with managing multiple websites, servers, applications, and levels of security on a daily basis. While this can be extremely time consuming, it can also be frustrating when systems aren't wor…
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor (https://www.adremsoft.com/). Top Charts is a view in which you can set seve…

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question