Solved

Help to improve performance of a Join query

Posted on 2004-10-10
8
495 Views
Last Modified: 2008-03-17
Hi everyone,

I have Two tables namely Clients and Contacts. Clients contain details of companies and contacts hold the contact person information of those companies. The contact persons may be many and so there is possibility of many records and in some cases no contacts also. So, there are 4000 records in Clients and around 3500 records in Contacts table. To get both the information, I run the query as

select cl.*, co.FirstName, co.LastName from clients cl Left Join contacts co on (cl.CompanyId = co.CompanyId) group by cl.companyId order by cl.companyname

My problem is :

1. MySql takes 19 seconds to compile the query... Is there a possibility to improve the performance of this query?

2. The Join returns all the multiple records for a particular company. But I would like to have just one record - a distinct record for each company.... Could anyone help me? It is urgent....
0
Comment
Question by:Gokuld
  • 5
  • 3
8 Comments
 
LVL 3

Expert Comment

by:gnudiff
ID: 12284204

1) i am surprised you are not getting some kind of "GROUP BY must be used with aggregate clause" error.

you are grouping by companyId, but not selecting, depending on what criteria a single row of that company should be retrieved (which is what GROUP BY is for).

2) try "SELECT DISTINCT cl.*, co.FirstName, co.LastName from clients cl Left Join contacts co on (cl.CompanyId = co.CompanyId) group by cl.companyId order by cl.companyname ".  I am suspicious of your table structure :). See if it helps.
0
 

Author Comment

by:Gokuld
ID: 12285219
#
# Table structure for table 'clients'
#

DROP TABLE IF EXISTS `clients`;
CREATE TABLE `clients` (
  `CompanyID` varchar(20) default NULL,
  `UserID` varchar(20) default NULL,
  `CompanyName` varchar(50) default NULL,
  `Address` varchar(255) default NULL,
  `Phone` varchar(25) default NULL,
  `Fax` varchar(25) default NULL,
  `EMail` varchar(25) default NULL,
  `MarketSector` varchar(50) default NULL,
  `Size` varchar(50) default NULL,
  `County` varchar(50) default NULL,
  `Postcode` varchar(50) default NULL,
  `Country` varchar(50) default NULL,
  `Dept` varchar(25) default NULL,
  `Branch` varchar(25) default NULL,
  `AccountManager` varchar(50) default NULL,
  `Category` varchar(50) default NULL,
  `CustType` varchar(255) default NULL,
  `WebSite` varchar(50) default NULL,
  `ModifiedDate` datetime default NULL,
  `Status` int(11) default NULL
) TYPE=MyISAM;



#
# Table structure for table 'contacts'
#

DROP TABLE IF EXISTS `contacts`;
CREATE TABLE `contacts` (
  `ContactID` varchar(20) default NULL,
  `UserID` varchar(50) default NULL,
  `CompanyID` varchar(20) default NULL,
  `Title` varchar(20) default NULL,
  `FirstName` varchar(50) default NULL,
  `LastName` varchar(50) default NULL,
  `Department` varchar(50) default NULL,
  `Phone` varchar(50) default NULL,
  `Extension` varchar(50) default NULL,
  `Fax` varchar(50) default NULL,
  `Mobile` varchar(50) default NULL,
  `JobTitle` varchar(50) default NULL,
  `EMail` varchar(50) default NULL,
  `ModifiedDate` datetime default NULL
) TYPE=MyISAM;

This is the table structure.... As you have said, MySQL didn't give out any errors..... I tried out your query and it took 18.5 seconds... almost the same as my query did...
0
 
LVL 3

Accepted Solution

by:
gnudiff earned 100 total points
ID: 12286421
Okay, this is much better amount of data about the question. :)

> 1. MySql takes 19 seconds to compile the query... Is there a possibility to improve the performance of this query?

Do you have indexes defined? From the dump, it seems not.
For a normal SQL table form:
- I would  assume CompanyID should be unique for the clients table,
- clients.CompanyID should really not have possible NULL values. Dunno about MySQL, but I know that if you can specify field as NOT NULL, it speeds up queries.

You could try these:

CREATE UNIQUE INDEX clientIDs ON clients(companyID);
CREATE INDEX companycontacts ON contacts(companyID);

This should speed up the above query a lot.

Also note, that if you really use numbers for companyID, it is a waste to store the field as varchar, and this can slow things down considerably too. Same goes for other ID fields: ContactID, UserID.


> 2. The Join returns all the multiple records for a particular company. But I would like to have just one record - a distinct record for each company.... Could anyone help me? It is urgent....

Sure, you can, you just have to specify - _which_ contact should be selected for the particular company, somehow.
Let's say you want that contact for the company, who has been modified most recently:

SELECT DISTINCT cl.companyName, co.FirstName, co.LastName
FROM clients cl LEFT JOIN contacts co ON (cl.CompanyId = co.CompanyId)
GROUP BY cl.companyId HAVING co.modifiedDate = max(co.modifiedDate) OR co.modifiedDate IS NULL
ORDER BY cl.companyname

Something like that...
0
 
LVL 3

Expert Comment

by:gnudiff
ID: 12286448
Actually the HAVING clause could be rewritten better, because HAVING is happening AFTER grouping.

SELECT DISTINCT cl.companyName, co.FirstName, co.LastName, max(co.modifiedDate)
FROM clients cl LEFT JOIN contacts co ON (cl.CompanyId = co.CompanyId)
GROUP BY cl.companyId, co.FirstName,co.LastName
ORDER BY cl.companyname

Tested this with a couple temporary tables - works for me.
0
Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

 

Author Comment

by:Gokuld
ID: 12307051
The query works fine... the time taken is 19.94 seconds.... I actually want to improve the peroformance of the query drastically....
0
 
LVL 3

Expert Comment

by:gnudiff
ID: 12307887

The fact that it works does not mean that it wouldn't work faster, if it was written differently.

What about creating indexes, as I wrote in the first part of the suggestion?
0
 

Author Comment

by:Gokuld
ID: 12308308
gnudiff...

I think I didn't hurt you in any sense... I have added indexes... but the performance improved only a second earlier and not much.... I have discovered that joins are slow with MySQL
0
 
LVL 3

Expert Comment

by:gnudiff
ID: 12308853
> I think I didn't hurt you in any sense...
No, you didn't. :)
The rewriting of query was due to your question about getting multiple records for the company, when you want only one; so, saying that the original query "works" was kind of distractive. ;)

>I have added indexes... but the performance improved only a second earlier and not much.... I have discovered that joins are slow with MySQL

Well, it could be the case, yes, however 4000 & 3000 row tables should not be very much. 15-20 seconds is a lot of time for a 4000 row query.

But then again - it might depend on the computer, which runs MySQL - how fast is it, how much RAM it has, and whetherit is running anything else resource intensive, besides MySQL.

Also, using symbolic field as a primary key used in JOINs, can be _quite_ quite slowing down, compared to using numbers, especially for large tables:

CREATE TABLE `clients` (
  `CompanyID` varchar(20) default NUL

CREATE TABLE `contacts` (
  `ContactID` varchar(20) default NULL,
...
  `CompanyID` varchar(20) default NULL

all of those should optimally read "integer" instead of "varchar(20)",
and clients.CompanyID should be NOT NULL, instead of "default NULL"


You can change that on running table in MySQL, I think, but it is risky -
1) it won't work correctly, if CompanyIDs can actually contain not only numbers,
2) it could theoretically break things which try to INSERT / SELECT that data = need to look deeper into the actual DB code.

Still, if it could be done, I believe it could speed the things up quite a lot.

Because, the way it is done currently, a CompanyID, let's say 123456 is actually stored as a _string_
"123456", which contains at least 6 bytes (x2 if string is Unicode).

The same value, but stored as a NUMBER 1234 would be 4 bytes. More for larger numbers of course.

When doing JOINs, it is obviously usually faster to compare numbers than variable length strings, hence I suspect, much of your slowdown on such small table sizes (4000 rows is peanuts, for databases generally).
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As a database administrator, you may need to audit your table(s) to determine whether the data types are optimal for your real-world data needs.  This Article is intended to be a resource for such a task. Preface The other day, I was involved …
Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Along with being a a promotional video for my three-day Annielytics Dashboard Seminor, this Micro Tutorial is an intro to Google Analytics API data.
Migrating to Microsoft Office 365 is becoming increasingly popular for organizations both large and small. If you have made the leap to Microsoft’s cloud platform, you know that you will need to create a corporate email signature for your Office 365…

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now