Solved

Query to compare data sets

Posted on 2011-09-21
14
384 Views
Last Modified: 2012-05-12
What I am trying to do is compare two data sets and only return rows that have differing values or are add/deleted  example:

Series    item num    price
TTT    1234    1.23
TTT    3456    2.45
TTT    6789    6.78
TTT    8765    9.01
ZZZ    1234    1.23
ZZZ    3456    2.45
ZZZ    6789    6.78
ZZZ    4321    7.89
ZZZ    8765    9.02
ZZZ    9999    9.88

I want to compare series ZZZ in the same table to previous series TTT and find what has been added, removed, and changed so the quries would produce

Deleted:  ZZZ    4321    7.89
Added: ZZZ    9999    9.88
Changed:  TTT    8765    9.01, ZZZ    8765    9.02

I know how to do basic queries but I think this would require joins of some sort.
0
Comment
Question by:jason987
  • 7
  • 5
  • 2
14 Comments
 
LVL 3

Expert Comment

by:dkellner
ID: 36575732
Not completely clear but if I get it right, you're looking for this:

(assume you have a MyTable with a field ListNr that's 1 for the first recordset and 2 for the second; there are some kind of products in the table and they're identified by a productID...  then you can adapt this to your actual situation and queries)

So, the trick is to join the same table to itself:

SELECT * FROM MyTable AS T1
    LEFT JOIN MyTable AS T2 ON T1.productID=T2.productID
    WHERE 1
    AND T1.ListNr=1
    AND T2.ListNr=2
    AND T1.productID IS NOT NULL
    AND T2.productID IS NULL
;

Open in new window


Wonder if this helps and the logic is clear.  I didn't test this example as it's time consuming to build the necessary tables but it illustrates the way to go.
0
 
LVL 5

Author Comment

by:jason987
ID: 36575867
I made a table using those field names and ran your query and it returned an empty set.
0
 
LVL 3

Expert Comment

by:dkellner
ID: 36575893
Yes because it's only the way to do it.
But if you give me an sql export I'll try to tell what's missing.
0
 
LVL 5

Author Comment

by:jason987
ID: 36575975

SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";


/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;


-- Table structure for table `test`
--

CREATE TABLE IF NOT EXISTS `test` (
  `ListNr` varchar(11) NOT NULL,
  `productID` varchar(256) NOT NULL,
  `PRICE` decimal(11,8) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

--
-- Dumping data for table `test`
--

INSERT INTO `test` (`ListNr`, `productID`, `PRICE`) VALUES
('TTT', '1234', '1.23000000'),
('TTT', '3456', '2.45000000'),
('TTT', '6789', '6.78000000'),
('TTT', '8765', '9.01000000'),
('ZZZ', '1234', '1.23000000'),
('ZZZ', '3456', '2.45000000'),
('ZZZ', '6789', '6.78000000'),
('ZZZ', '4321', '7.89000000'),
('ZZZ', '8765', '9.02000000'),
('ZZZ', '9999', '9.00000000');

Open in new window

0
 
LVL 3

Expert Comment

by:dkellner
ID: 36576255
Okay, now my query tried to find all the rows missing from list 2, existing in list 1 only. There are no such rows as I see. You need several queries for several kinds of differences, like "in list1 but not in list2" (this is what you have now), "in list2 but not in list1" (this would return a nonzero result; swap t1/t2 in my last 2 lines in query), and "in list1 and in list2 but not the same data in other fields", which I have not implemented in detail but it's quite obvious.
0
 
LVL 5

Author Comment

by:jason987
ID: 36576348
I ran both:

SELECT * FROM test AS T1
   LEFT JOIN test AS T2 ON T1.productID=T2.productID
   WHERE 1
   AND T1.ListNr='ZZZ'
   AND T2.ListNr='TTT'

   AND T1.productID IS NOT NULL
   AND T2.productID IS NULL

and

SELECT * FROM test AS T1
   LEFT JOIN test AS T2 ON T1.productID=T2.productID
   WHERE 1
   AND T1.ListNr='TTT'
   AND T2.ListNr='ZZZ'
   AND T1.productID IS NOT NULL
   AND T2.productID IS NULL

Both returned nothing.  Am I misunderstanding you?
0
 
LVL 3

Expert Comment

by:dkellner
ID: 36576992
No, sorry, I'm telling the wrong thing.
You can't do this left join thing with both lists in one table because join will always find the product in the same list and say "hey, I've got it".  Won't be NULL.  Damn thing.

Okay then.
Shall we use temporary tables or solve the problem in PHP?

TMP tables will make this join possible but you must do ay INSERT ... SELECT for each comparison which is rarely wise.  With PHP you'll have to fetch both record sets and do the comparison in the result arrays - which is easy unless you have millions of records.

OR, maybe someone else has a better solution.

Which way to go?
0
Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

 
LVL 5

Author Comment

by:jason987
ID: 36577074
Yea I'm on the verge of the PHP solution which I can do but I hoped to make something I could distribute easily.
0
 
LVL 3

Expert Comment

by:dkellner
ID: 36577086
Okay, sorry again.
0
 
LVL 5

Author Comment

by:jason987
ID: 36577110
No need to apologize I appreciate the help.
0
 
LVL 4

Accepted Solution

by:
Fugas earned 500 total points
ID: 36579117
Here is a simple query. If you have  many records, and it takes long time to execute, it is possible to optimize it.
select *,'not in ZZZ' as result from test where listNr='TTT' and productID not in (select productID from test where ListNr='ZZZ')
union 
select *,'not in TTT' as result from test where listNr='ZZZ' and productID not in (select productID from test where ListNr='TTT')

Open in new window

0
 
LVL 5

Author Comment

by:jason987
ID: 36584274
THanks that seems to do the trick.  I can't get it to work on my large DB for some reason but that is probably my fault.  What would be an optimization for that query?
0
 
LVL 5

Author Comment

by:jason987
ID: 36584284
Also, would you know how to get changed prices?
0
 
LVL 4

Expert Comment

by:Fugas
ID: 36585334
For optimalization, I have to see, how many records you have in the table and the count of those categories like 'TTT' and 'ZZZ'. But now is the query ok for less than a milion records.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Creating and Managing Databases with phpMyAdmin in cPanel.
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now