?
Solved

Help to improve performance of a Join query

Posted on 2004-10-10
8
Medium Priority
?
500 Views
Last Modified: 2008-03-17
Hi everyone,

I have Two tables namely Clients and Contacts. Clients contain details of companies and contacts hold the contact person information of those companies. The contact persons may be many and so there is possibility of many records and in some cases no contacts also. So, there are 4000 records in Clients and around 3500 records in Contacts table. To get both the information, I run the query as

select cl.*, co.FirstName, co.LastName from clients cl Left Join contacts co on (cl.CompanyId = co.CompanyId) group by cl.companyId order by cl.companyname

My problem is :

1. MySql takes 19 seconds to compile the query... Is there a possibility to improve the performance of this query?

2. The Join returns all the multiple records for a particular company. But I would like to have just one record - a distinct record for each company.... Could anyone help me? It is urgent....
0
Comment
Question by:Gokuld
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
8 Comments
 
LVL 3

Expert Comment

by:gnudiff
ID: 12284204

1) i am surprised you are not getting some kind of "GROUP BY must be used with aggregate clause" error.

you are grouping by companyId, but not selecting, depending on what criteria a single row of that company should be retrieved (which is what GROUP BY is for).

2) try "SELECT DISTINCT cl.*, co.FirstName, co.LastName from clients cl Left Join contacts co on (cl.CompanyId = co.CompanyId) group by cl.companyId order by cl.companyname ".  I am suspicious of your table structure :). See if it helps.
0
 

Author Comment

by:Gokuld
ID: 12285219
#
# Table structure for table 'clients'
#

DROP TABLE IF EXISTS `clients`;
CREATE TABLE `clients` (
  `CompanyID` varchar(20) default NULL,
  `UserID` varchar(20) default NULL,
  `CompanyName` varchar(50) default NULL,
  `Address` varchar(255) default NULL,
  `Phone` varchar(25) default NULL,
  `Fax` varchar(25) default NULL,
  `EMail` varchar(25) default NULL,
  `MarketSector` varchar(50) default NULL,
  `Size` varchar(50) default NULL,
  `County` varchar(50) default NULL,
  `Postcode` varchar(50) default NULL,
  `Country` varchar(50) default NULL,
  `Dept` varchar(25) default NULL,
  `Branch` varchar(25) default NULL,
  `AccountManager` varchar(50) default NULL,
  `Category` varchar(50) default NULL,
  `CustType` varchar(255) default NULL,
  `WebSite` varchar(50) default NULL,
  `ModifiedDate` datetime default NULL,
  `Status` int(11) default NULL
) TYPE=MyISAM;



#
# Table structure for table 'contacts'
#

DROP TABLE IF EXISTS `contacts`;
CREATE TABLE `contacts` (
  `ContactID` varchar(20) default NULL,
  `UserID` varchar(50) default NULL,
  `CompanyID` varchar(20) default NULL,
  `Title` varchar(20) default NULL,
  `FirstName` varchar(50) default NULL,
  `LastName` varchar(50) default NULL,
  `Department` varchar(50) default NULL,
  `Phone` varchar(50) default NULL,
  `Extension` varchar(50) default NULL,
  `Fax` varchar(50) default NULL,
  `Mobile` varchar(50) default NULL,
  `JobTitle` varchar(50) default NULL,
  `EMail` varchar(50) default NULL,
  `ModifiedDate` datetime default NULL
) TYPE=MyISAM;

This is the table structure.... As you have said, MySQL didn't give out any errors..... I tried out your query and it took 18.5 seconds... almost the same as my query did...
0
 
LVL 3

Accepted Solution

by:
gnudiff earned 300 total points
ID: 12286421
Okay, this is much better amount of data about the question. :)

> 1. MySql takes 19 seconds to compile the query... Is there a possibility to improve the performance of this query?

Do you have indexes defined? From the dump, it seems not.
For a normal SQL table form:
- I would  assume CompanyID should be unique for the clients table,
- clients.CompanyID should really not have possible NULL values. Dunno about MySQL, but I know that if you can specify field as NOT NULL, it speeds up queries.

You could try these:

CREATE UNIQUE INDEX clientIDs ON clients(companyID);
CREATE INDEX companycontacts ON contacts(companyID);

This should speed up the above query a lot.

Also note, that if you really use numbers for companyID, it is a waste to store the field as varchar, and this can slow things down considerably too. Same goes for other ID fields: ContactID, UserID.


> 2. The Join returns all the multiple records for a particular company. But I would like to have just one record - a distinct record for each company.... Could anyone help me? It is urgent....

Sure, you can, you just have to specify - _which_ contact should be selected for the particular company, somehow.
Let's say you want that contact for the company, who has been modified most recently:

SELECT DISTINCT cl.companyName, co.FirstName, co.LastName
FROM clients cl LEFT JOIN contacts co ON (cl.CompanyId = co.CompanyId)
GROUP BY cl.companyId HAVING co.modifiedDate = max(co.modifiedDate) OR co.modifiedDate IS NULL
ORDER BY cl.companyname

Something like that...
0
Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

 
LVL 3

Expert Comment

by:gnudiff
ID: 12286448
Actually the HAVING clause could be rewritten better, because HAVING is happening AFTER grouping.

SELECT DISTINCT cl.companyName, co.FirstName, co.LastName, max(co.modifiedDate)
FROM clients cl LEFT JOIN contacts co ON (cl.CompanyId = co.CompanyId)
GROUP BY cl.companyId, co.FirstName,co.LastName
ORDER BY cl.companyname

Tested this with a couple temporary tables - works for me.
0
 

Author Comment

by:Gokuld
ID: 12307051
The query works fine... the time taken is 19.94 seconds.... I actually want to improve the peroformance of the query drastically....
0
 
LVL 3

Expert Comment

by:gnudiff
ID: 12307887

The fact that it works does not mean that it wouldn't work faster, if it was written differently.

What about creating indexes, as I wrote in the first part of the suggestion?
0
 

Author Comment

by:Gokuld
ID: 12308308
gnudiff...

I think I didn't hurt you in any sense... I have added indexes... but the performance improved only a second earlier and not much.... I have discovered that joins are slow with MySQL
0
 
LVL 3

Expert Comment

by:gnudiff
ID: 12308853
> I think I didn't hurt you in any sense...
No, you didn't. :)
The rewriting of query was due to your question about getting multiple records for the company, when you want only one; so, saying that the original query "works" was kind of distractive. ;)

>I have added indexes... but the performance improved only a second earlier and not much.... I have discovered that joins are slow with MySQL

Well, it could be the case, yes, however 4000 & 3000 row tables should not be very much. 15-20 seconds is a lot of time for a 4000 row query.

But then again - it might depend on the computer, which runs MySQL - how fast is it, how much RAM it has, and whetherit is running anything else resource intensive, besides MySQL.

Also, using symbolic field as a primary key used in JOINs, can be _quite_ quite slowing down, compared to using numbers, especially for large tables:

CREATE TABLE `clients` (
  `CompanyID` varchar(20) default NUL

CREATE TABLE `contacts` (
  `ContactID` varchar(20) default NULL,
...
  `CompanyID` varchar(20) default NULL

all of those should optimally read "integer" instead of "varchar(20)",
and clients.CompanyID should be NOT NULL, instead of "default NULL"


You can change that on running table in MySQL, I think, but it is risky -
1) it won't work correctly, if CompanyIDs can actually contain not only numbers,
2) it could theoretically break things which try to INSERT / SELECT that data = need to look deeper into the actual DB code.

Still, if it could be done, I believe it could speed the things up quite a lot.

Because, the way it is done currently, a CompanyID, let's say 123456 is actually stored as a _string_
"123456", which contains at least 6 bytes (x2 if string is Unicode).

The same value, but stored as a NUMBER 1234 would be 4 bytes. More for larger numbers of course.

When doing JOINs, it is obviously usually faster to compare numbers than variable length strings, hence I suspect, much of your slowdown on such small table sizes (4000 rows is peanuts, for databases generally).
0

Featured Post

Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Does the idea of dealing with bits scare or confuse you? Does it seem like a waste of time in an age where we all have terabytes of storage? If so, you're missing out on one of the core tools in every professional programmer's toolbox. Learn how to …
This post looks at MongoDB and MySQL, and covers high-level MongoDB strengths, weaknesses, features, and uses from the perspective of an SQL user.
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question