Solved

Help to improve performance of a Join query

Posted on 2004-10-10
8
494 Views
Last Modified: 2008-03-17
Hi everyone,

I have Two tables namely Clients and Contacts. Clients contain details of companies and contacts hold the contact person information of those companies. The contact persons may be many and so there is possibility of many records and in some cases no contacts also. So, there are 4000 records in Clients and around 3500 records in Contacts table. To get both the information, I run the query as

select cl.*, co.FirstName, co.LastName from clients cl Left Join contacts co on (cl.CompanyId = co.CompanyId) group by cl.companyId order by cl.companyname

My problem is :

1. MySql takes 19 seconds to compile the query... Is there a possibility to improve the performance of this query?

2. The Join returns all the multiple records for a particular company. But I would like to have just one record - a distinct record for each company.... Could anyone help me? It is urgent....
0
Comment
Question by:Gokuld
  • 5
  • 3
8 Comments
 
LVL 3

Expert Comment

by:gnudiff
ID: 12284204

1) i am surprised you are not getting some kind of "GROUP BY must be used with aggregate clause" error.

you are grouping by companyId, but not selecting, depending on what criteria a single row of that company should be retrieved (which is what GROUP BY is for).

2) try "SELECT DISTINCT cl.*, co.FirstName, co.LastName from clients cl Left Join contacts co on (cl.CompanyId = co.CompanyId) group by cl.companyId order by cl.companyname ".  I am suspicious of your table structure :). See if it helps.
0
 

Author Comment

by:Gokuld
ID: 12285219
#
# Table structure for table 'clients'
#

DROP TABLE IF EXISTS `clients`;
CREATE TABLE `clients` (
  `CompanyID` varchar(20) default NULL,
  `UserID` varchar(20) default NULL,
  `CompanyName` varchar(50) default NULL,
  `Address` varchar(255) default NULL,
  `Phone` varchar(25) default NULL,
  `Fax` varchar(25) default NULL,
  `EMail` varchar(25) default NULL,
  `MarketSector` varchar(50) default NULL,
  `Size` varchar(50) default NULL,
  `County` varchar(50) default NULL,
  `Postcode` varchar(50) default NULL,
  `Country` varchar(50) default NULL,
  `Dept` varchar(25) default NULL,
  `Branch` varchar(25) default NULL,
  `AccountManager` varchar(50) default NULL,
  `Category` varchar(50) default NULL,
  `CustType` varchar(255) default NULL,
  `WebSite` varchar(50) default NULL,
  `ModifiedDate` datetime default NULL,
  `Status` int(11) default NULL
) TYPE=MyISAM;



#
# Table structure for table 'contacts'
#

DROP TABLE IF EXISTS `contacts`;
CREATE TABLE `contacts` (
  `ContactID` varchar(20) default NULL,
  `UserID` varchar(50) default NULL,
  `CompanyID` varchar(20) default NULL,
  `Title` varchar(20) default NULL,
  `FirstName` varchar(50) default NULL,
  `LastName` varchar(50) default NULL,
  `Department` varchar(50) default NULL,
  `Phone` varchar(50) default NULL,
  `Extension` varchar(50) default NULL,
  `Fax` varchar(50) default NULL,
  `Mobile` varchar(50) default NULL,
  `JobTitle` varchar(50) default NULL,
  `EMail` varchar(50) default NULL,
  `ModifiedDate` datetime default NULL
) TYPE=MyISAM;

This is the table structure.... As you have said, MySQL didn't give out any errors..... I tried out your query and it took 18.5 seconds... almost the same as my query did...
0
 
LVL 3

Accepted Solution

by:
gnudiff earned 100 total points
ID: 12286421
Okay, this is much better amount of data about the question. :)

> 1. MySql takes 19 seconds to compile the query... Is there a possibility to improve the performance of this query?

Do you have indexes defined? From the dump, it seems not.
For a normal SQL table form:
- I would  assume CompanyID should be unique for the clients table,
- clients.CompanyID should really not have possible NULL values. Dunno about MySQL, but I know that if you can specify field as NOT NULL, it speeds up queries.

You could try these:

CREATE UNIQUE INDEX clientIDs ON clients(companyID);
CREATE INDEX companycontacts ON contacts(companyID);

This should speed up the above query a lot.

Also note, that if you really use numbers for companyID, it is a waste to store the field as varchar, and this can slow things down considerably too. Same goes for other ID fields: ContactID, UserID.


> 2. The Join returns all the multiple records for a particular company. But I would like to have just one record - a distinct record for each company.... Could anyone help me? It is urgent....

Sure, you can, you just have to specify - _which_ contact should be selected for the particular company, somehow.
Let's say you want that contact for the company, who has been modified most recently:

SELECT DISTINCT cl.companyName, co.FirstName, co.LastName
FROM clients cl LEFT JOIN contacts co ON (cl.CompanyId = co.CompanyId)
GROUP BY cl.companyId HAVING co.modifiedDate = max(co.modifiedDate) OR co.modifiedDate IS NULL
ORDER BY cl.companyname

Something like that...
0
 
LVL 3

Expert Comment

by:gnudiff
ID: 12286448
Actually the HAVING clause could be rewritten better, because HAVING is happening AFTER grouping.

SELECT DISTINCT cl.companyName, co.FirstName, co.LastName, max(co.modifiedDate)
FROM clients cl LEFT JOIN contacts co ON (cl.CompanyId = co.CompanyId)
GROUP BY cl.companyId, co.FirstName,co.LastName
ORDER BY cl.companyname

Tested this with a couple temporary tables - works for me.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:Gokuld
ID: 12307051
The query works fine... the time taken is 19.94 seconds.... I actually want to improve the peroformance of the query drastically....
0
 
LVL 3

Expert Comment

by:gnudiff
ID: 12307887

The fact that it works does not mean that it wouldn't work faster, if it was written differently.

What about creating indexes, as I wrote in the first part of the suggestion?
0
 

Author Comment

by:Gokuld
ID: 12308308
gnudiff...

I think I didn't hurt you in any sense... I have added indexes... but the performance improved only a second earlier and not much.... I have discovered that joins are slow with MySQL
0
 
LVL 3

Expert Comment

by:gnudiff
ID: 12308853
> I think I didn't hurt you in any sense...
No, you didn't. :)
The rewriting of query was due to your question about getting multiple records for the company, when you want only one; so, saying that the original query "works" was kind of distractive. ;)

>I have added indexes... but the performance improved only a second earlier and not much.... I have discovered that joins are slow with MySQL

Well, it could be the case, yes, however 4000 & 3000 row tables should not be very much. 15-20 seconds is a lot of time for a 4000 row query.

But then again - it might depend on the computer, which runs MySQL - how fast is it, how much RAM it has, and whetherit is running anything else resource intensive, besides MySQL.

Also, using symbolic field as a primary key used in JOINs, can be _quite_ quite slowing down, compared to using numbers, especially for large tables:

CREATE TABLE `clients` (
  `CompanyID` varchar(20) default NUL

CREATE TABLE `contacts` (
  `ContactID` varchar(20) default NULL,
...
  `CompanyID` varchar(20) default NULL

all of those should optimally read "integer" instead of "varchar(20)",
and clients.CompanyID should be NOT NULL, instead of "default NULL"


You can change that on running table in MySQL, I think, but it is risky -
1) it won't work correctly, if CompanyIDs can actually contain not only numbers,
2) it could theoretically break things which try to INSERT / SELECT that data = need to look deeper into the actual DB code.

Still, if it could be done, I believe it could speed the things up quite a lot.

Because, the way it is done currently, a CompanyID, let's say 123456 is actually stored as a _string_
"123456", which contains at least 6 bytes (x2 if string is Unicode).

The same value, but stored as a NUMBER 1234 would be 4 bytes. More for larger numbers of course.

When doing JOINs, it is obviously usually faster to compare numbers than variable length strings, hence I suspect, much of your slowdown on such small table sizes (4000 rows is peanuts, for databases generally).
0

Featured Post

Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

Join & Write a Comment

I use MySQL for many of my development projects in a Windows environment. To manage my databases (and perform queries) for years I used a tool called MySQL administrator.  This tool has since been replaced by MySQL Workbench. So I decided to m…
I have been using r1soft Continuous Data Protection (http://www.r1soft.com/linux-cdp/) for many years now with the mySQL Addon and wanted to share a trick I have used several times. For those of us that don't have the luxury of using all transact…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now