Solved

MySQL: Left Join on Two Fields Problem

Posted on 2014-02-13
14
670 Views
Last Modified: 2014-02-14
Hi

Below is a snippet of code that provides intermittent results.

The first two Left Joins Work Fine.

The last one gives intermittent results. (ie: 50% of the time it returns the correct result the other 50% it returns NULL when there is definitely data in the related table)

It's worth noting that there is a UNIQUE KEY constraint on
 and [codes_region.region_code]

Any ideas what the problem is?

[code]
FROM log
LEFT OUTER JOIN ip 
ON log.ip = ip.ip

LEFT OUTER JOIN codes_country 
ON ip.country_code = codes_country.code

LEFT OUTER JOIN codes_region 
ON ip.country_code = codes_region.country_code 
AND ip.region_code = codes_region.region_code

Open in new window

0
Comment
Question by:AdrianSmithUK
  • 8
  • 5
14 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39855785
I don't really have an answer, but I'd like to ask for a little more information so we can complete the SSCCE.  Please show us the table structures via the CREATE TABLE statements, some test data so we can reproduce your results and the complete query that fails.  Armed with that we may be able to help you get it working right.  Thanks, ~Ray
0
 

Author Comment

by:AdrianSmithUK
ID: 39855935
Hi Ray

I've slimmed the files down as much as possible.

Here are the create table statements

CREATE TABLE `log` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `ip` varchar(40) DEFAULT NULL,
  PRIMARY KEY (`id`)
);

CREATE TABLE `ip` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `ip` varchar(128) NOT NULL DEFAULT '0',
  `country_code` varchar(128) DEFAULT '0',
  `region_code` varchar(128) DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `ip` (`ip`)
);

CREATE TABLE `codes_country` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `code` varchar(2) DEFAULT NULL,
  `name` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `code_UNIQUE` (`code`)
);

CREATE TABLE `codes_region` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `country_code` varchar(2) DEFAULT NULL,
  `region_code` varchar(2) DEFAULT NULL,
  `region_name` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `country_code_unique` (`country_code`,`region_code`)
);

Open in new window


Here is the query:

SELECT
log.ip,
ip.country_code,
codes_country.name,
ip.region_code,
codes_region.region_name
LEFT OUTER JOIN ip ON log.ip = ip.ip
LEFT OUTER JOIN codes_country ON ip.country_code = codes_country.code
LEFT OUTER JOIN codes_region ON ip.country_code = codes_region.country_code AND ip.region_code = codes_region.region_code
WHERE 1;

Open in new window



Here a screenshot showing the results:

Notice AR/7 and AR/13

Here is a screenshot showing the region codes.

 Notice AR/7 and AR/13
Hopefully you can see that AR/13 and AR/7 are identical scenarios, yet AR/13 produces the correct result and AR/7 does not.

Does that help?

Kind Regards,
Adrian
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 39856249
I think the issue is you need an INNER JOIN between the codes_region and the ip tables; therefore, you can try like this.
...
FROM log
LEFT OUTER JOIN (
    ip JOIN codes_country ON ip.country_code = codes_country.code
    JOIN codes_region ON ip.country_code = codes_region.country_code 
     AND ip.region_code = codes_region.region_code
) ON log.ip = ip.ip
...

Open in new window


You still get the OUTER JOIN to log, but if you do have a match to ip table there must be a match in the codes_country and codes_region table.  If you may have an ip without a country, then you may need to go another level by making sure that codes_country.code = codes_region.country_code.  However, I would try what I just sent first, then we can adjust based on the data.
0
 

Author Comment

by:AdrianSmithUK
ID: 39856293
Hi Kevin

Many thanks for the suggestion.

Unfortunately the snippet threw up hundreds of NULL values in the [country_code] column.

In the previous example the [country_code] column was 100% full (as expected).

Here is a screen shot:

Notice the NULLs in  the [country_code] column.
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 39856305
Okay, so you definitely have IP addresses without codes_regions.
Try:
...
FROM log
LEFT OUTER JOIN (
    ip JOIN codes_country ON ip.country_code = codes_country.code
    LEFT OUTER JOIN codes_region 
      ON ip.country_code = codes_region.country_code 
     AND ip.region_code = codes_region.region_code
) ON log.ip = ip.ip
...

Open in new window


EDIT: if you still get inconsistent results, make sure that the region_code is consistent in the ip table with the value in the codes_region table.  If you provide sample data of the rows that are failing with the supporting entries in the other tables along with the expected results, we can work through the solution with you.
0
 

Author Comment

by:AdrianSmithUK
ID: 39856320
Unfortunately still a problem. Notice that AR/7 is still NULL

AR/7 = NULL
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 39856339
Notice the region code says '07' versus 7.  Check that the data types are the same between the two tables.
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 

Author Comment

by:AdrianSmithUK
ID: 39856376
Did you miss the create statements at the top of this post.
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 39856415
Yes, sorry.

`country_code` varchar(128) DEFAULT '0'
  `region_code` varchar(128) DEFAULT '0'

In your other two tables, though, you define:
`country_code` varchar(2) DEFAULT NULL
`region_code` varchar(2) DEFAULT NULL

Therefore, i would expect the columns in the ip table to match as well as, optionally, define REFERENCES to the respective country and region table.

The reason they are not matching is you have different data - '07' != '7'.  Maybe you intended for all the region codes to be two digits exactly (e.g., '05', '10', et cetera), so one fix is to update the region table since it likely has less records given the ip and other referencing tables consistently use two digits already.  If they have some records with '7' and some with '07', you have to fix the existing data and any automated processes to make sure you get consistency.
0
 

Author Comment

by:AdrianSmithUK
ID: 39856455
Good Observation. Give me 20 mins and I'll fix it.
0
 

Author Comment

by:AdrianSmithUK
ID: 39856563
OK - I've cleaned it up:

Here are the new create statements:

CREATE TABLE `codes_country` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `code` varchar(2) DEFAULT NULL,
  `name` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `code_unique` (`code`)
) ENGINE=InnoDB AUTO_INCREMENT=255 DEFAULT CHARSET=latin1;

CREATE TABLE `codes_region` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `country_code` varchar(2) DEFAULT NULL,
  `region_code` varchar(2) DEFAULT NULL,
  `region_name` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `country_code_unique` (`country_code`,`region_code`)
) ENGINE=InnoDB AUTO_INCREMENT=4063 DEFAULT CHARSET=latin1;

CREATE TABLE `ip` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `ip` varchar(40) DEFAULT '0',
  `country_code` varchar(2) DEFAULT NULL,
  `region_code` varchar(2) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `ip` (`ip`)
) ENGINE=MyISAM AUTO_INCREMENT=3043 DEFAULT CHARSET=latin1;

CREATE TABLE `log` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `ip` varchar(40) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uid` (`uid`,`instance`)
) ENGINE=MyISAM AUTO_INCREMENT=6622 DEFAULT CHARSET=latin1;

Open in new window


And here is the screenshot of the results.

AR/7 is NULL
Notice AR/7 is unfortunately still NULL ... grrr!

I also tried your previous snippet and still same results.

Sorry Kevin, I've got to leave for the evening now and I can't test anything at home because I'm using an EC2 server that won't let me in, so unfortunately I'll have to come back to this tomorrow.

Thanks for all the help so far.

Kind Regards,
Adrian
0
 
LVL 59

Accepted Solution

by:
Kevin Cross earned 500 total points
ID: 39856570
Cleaning up the create statements is one thing.  You have to clean the data.  AR/7 is not the same as AR/07.
0
 

Author Comment

by:AdrianSmithUK
ID: 39856586
Genius!!!

I bet that's the problem.

That's why AR/13 works and not AR/7

I'll test it first thing in the morning and if that's the problem I'll close the question.

Many thanks.
Adrian
0
 

Author Closing Comment

by:AdrianSmithUK
ID: 39858455
The original code was fine.

As Kevin suggested, it was the fact that one table had two char codes (eg: 07) and the other had only one (eg: 7).

Many thanks Kevin.
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now