Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Self-Join with Current and Previous Year's Data

Posted on 2011-05-13
15
Medium Priority
?
509 Views
Last Modified: 2012-08-13
Im trying to eliminate the duplicates from this attached query;
I have two tables: prev and curr, and I get the correct data for the most part- 20 records.
This query brings back the correct desired resultset -see image- work_qry
But as soon as I bring in the mileage fields from previous and current tables, this resultset grows to about 700 records.
This is because of the vehicle_id field as you can see in attachment 2. - many_qry
I get back one mileage record for each type of fuel and vehicle.
How can I eliminate these dupes and get the correct results?
Thanks
John
SELECT  
		DISTINCT B.OGS_DOB_CLASS_FW AS currClass,
		A.D750_FUEL_TYPE AS prevFuel,
		B.D750_FUEL_TYPE AS currFuel,
		A.D750_VEHICLE_TYPE AS prevVeh, 
		B.D750_VEHICLE_TYPE AS currVeh,
		A.VEHICLE_COUNT_FW AS prevCT, 
		B.VEHICLE_COUNT_FW AS currCT,
		ISNULL(B.VEHICLE_COUNT_FW,0) - 

ISNULL(A.VEHICLE_COUNT_FW,0) AS NetVehCT,
		A.MONTH_FW AS prevMo, 
		B.MONTH_FW AS currMo,
		A.YEAR_FW AS prevYr, 
		B.YEAR_FW AS currYr 
--
FROM 
(
Select	
	DISTINCT D.[OGS_DOB_CLASS_FW] AS OGS_DOB_CLASS_FW
	,C.VEHICLE_COUNT_FW AS VEHICLE_COUNT_FW 
	,C.D750_FUEL_TYPE AS D750_FUEL_TYPE
	, C.D750_VEHICLE_TYPE AS D750_VEHICLE_TYPE
	, C.MONTH_FW AS MONTH_FW 
	, C.YEAR_FW AS YEAR_FW 
	, D.VEHICLE_ID_FW AS VEHICLE_ID_FW 
	, D.MILEAGE_END_REPORT_PERIOD_FW AS 

MILEAGE_END_REPORT_PERIOD_FW
	, D.MODEL_YEAR_FW AS MODEL_YEAR_FW

FROM  
dbo.OGS_D750_STAT_GRID_FW C 
INNER JOIN 
dbo.OGS_D750_DETAIL_FW D
ON C.[MONTH_FW] = D.[MONTH_FW]
AND C.[YEAR_FW] = D.[YEAR_FW]
WHERE (C.MONTH_FW + C.YEAR_FW) IN (92010)
AND (D.MONTH_FW + D.YEAR_FW) IN (92010)
AND C.[DOB_CLASS_FW]=D.[OGS_DOB_CLASS_FW]  

--WHERE (D.MONTH_FW + D.YEAR_FW) IN (@startdate) 
--AND (C.MONTH_FW + C.YEAR_FW) IN (@startdate)
GROUP BY 	
D.OGS_DOB_CLASS_FW,
C.[DOB_CLASS_FW],
C.[D750_FUEL_TYPE],
C.[D750_VEHICLE_TYPE],
C.MONTH_FW,
C.YEAR_FW,
D.MILEAGE_END_REPORT_PERIOD_FW,
C.VEHICLE_COUNT_FW, 
D.VEHICLE_ID_FW,
D.MODEL_YEAR_FW 
) A 
-- 
--UNION ALL 
INNER JOIN 
(
SELECT
--* 	
	  DISTINCT E.[OGS_DOB_CLASS_FW] AS OGS_DOB_CLASS_FW 
	, D.VEHICLE_COUNT_FW AS VEHICLE_COUNT_FW 
	, D.D750_FUEL_TYPE AS D750_FUEL_TYPE
	, D.D750_VEHICLE_TYPE AS D750_VEHICLE_TYPE
	, D.MONTH_FW AS MONTH_FW 
	, D.YEAR_FW AS YEAR_FW 
	, E.MILEAGE_END_REPORT_PERIOD_FW AS 

MILEAGE_END_REPORT_PERIOD_FW
	, E.VEHICLE_ID_FW AS VEHICLE_ID_FW 
	, E.MODEL_YEAR_FW AS MODEL_YEAR_FW
----
FROM  
dbo.OGS_D750_STAT_GRID_FW D 
INNER JOIN 
dbo.OGS_D750_DETAIL_FW E
ON E.[MONTH_FW] = D.[MONTH_FW]
AND E.[YEAR_FW] = D.[YEAR_FW]
AND E.[OGS_DOB_CLASS_FW]=D.[DOB_CLASS_FW]  
WHERE (D.MONTH_FW + D.YEAR_FW) IN (42011)
AND (E.MONTH_FW + E.YEAR_FW) IN (42011)
--WHERE (D.MONTH_FW + D.YEAR_FW) IN (@enddate) 
--AND (E.MONTH_FW + E.YEAR_FW) IN (@enddate) 

GROUP BY 	
E.[OGS_DOB_CLASS_FW],
D.[DOB_CLASS_FW],
D.[D750_FUEL_TYPE],
D.[D750_VEHICLE_TYPE],
D.MONTH_FW,
D.YEAR_FW,
D.VEHICLE_COUNT_FW, 
E.VEHICLE_ID_FW,
E.MILEAGE_END_REPORT_PERIOD_FW,
E.MODEL_YEAR_FW 
) B 
--
ON A.OGS_DOB_CLASS_FW=B.OGS_DOB_CLASS_FW
AND A.VEHICLE_ID_FW=B.VEHICLE_ID_FW 
AND A.D750_FUEL_TYPE=B.D750_FUEL_TYPE 
AND A.D750_VEHICLE_TYPE=B.D750_VEHICLE_TYPE

Open in new window

work-qry.jpg
many-qry.jpg
0
Comment
Question by:jtrapat1
  • 7
  • 5
  • 3
15 Comments
 
LVL 61

Assisted Solution

by:Kevin Cross
Kevin Cross earned 400 total points
ID: 35755018
Hi.  You will probably need to wrap the mileage column in an aggregate versus adding it in GROUP BY.  I might be missing it above, but I don't see prevMiles column in query.

If it is MILEAGE_END_REPORT_PERIOD_FW for example, remove that column from GROUP BY and use MAX(MILEAGE_END_REPORT_PERIOD_FW).
0
 
LVL 42

Expert Comment

by:dqmq
ID: 35755073
Which field contains the mileage?

What are the "correct results" when mileage is shown.  By that I mean, for each of your desired rows you have multiple mileage numbers.  So, what to you want to see:  max, min, average, sum???

0
 

Author Comment

by:jtrapat1
ID: 35755187
Hi-
the field containing the mileage is prevMiles in the screenshot above;
As you can see, the mileage repeats after the 18th record when there is a change in fuel type;
I would like to return the difference between the current and previous mileages-
So, (B.MILEAGE_END_REPORT_PERIOD_FW -A.MILEAGE_END_REPORT_PERIOD_FW )
And this value should be based on Class.
So, I could next display mileage traveled in one year based on class.

Thanks
John
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 61

Expert Comment

by:Kevin Cross
ID: 35755306
I saw the prevMiles in the image, but you don't actually have that in your SELECT statement in code snippet, so just needed confirmation that was the column we are talking about.  It appears you have multiple miles reporting that matches a given month/year combination so you need to remove the mileage column from GROUP BY and aggregate it.  If the mileage is really the ODOMETER reading, then you can do (MAX(MILEAGE_END_REPORT_PERIOD_FW)-MIN(MILEAGE_END_REPORT_PERIOD_FW) AS Mileage) then in the other query do (B.Mileage-A.Mileage).
0
 
LVL 42

Expert Comment

by:dqmq
ID: 35755507
Thank you.  we now know what column contains the mileage.

>>I would like to return the difference between the current and previous mileages

Note the plural in "mileages".  As such, that does not resolve the problem:  you have just as many mileage differences for each row as you have mileages.


>>So, I could next display mileage traveled in one year based on class.
Does that mean you only care about one row per year from the history table? Can we just use the min() mileage from the history table?


BTW, using DISTINCT is rarely a good practice. In this case it is masking fundamental flaws in your sub-queries.  By that I mean, if you remove DISTINCT, the dup rows show up even without the mileage column in the results.  I encourage you to focus on the GROUP BY clause to aggregate your sub-queries to the logically correct results so that the duplicates do not appear.    


 
0
 

Author Comment

by:jtrapat1
ID: 35755519
Hi-
thanks for the help-
But I still dont get the correct results
Attached is my select -
I get 0 amounts for my mileage calculations.
You are correct -these are odometers at the end of the start month/year and end month/year periods.
What am I doing wrong here?
Thanks
John
SELECT  
		DISTINCT B.OGS_DOB_CLASS_FW AS currClass,
		A.D750_FUEL_TYPE AS prevFuel,
		B.D750_FUEL_TYPE AS currFuel,
		B.Mileage-A.Mileage ,
		A.D750_VEHICLE_TYPE AS prevVeh, 
		B.D750_VEHICLE_TYPE AS currVeh,
		A.VEHICLE_COUNT_FW AS prevCT, 
		B.VEHICLE_COUNT_FW AS currCT,
		ISNULL(B.VEHICLE_COUNT_FW,0) - ISNULL(A.VEHICLE_COUNT_FW,0) AS NetVehCT,
		A.MONTH_FW AS prevMo, 
		B.MONTH_FW AS currMo,
		A.YEAR_FW AS prevYr, 
		B.YEAR_FW AS currYr 
--
FROM 
(
Select	
	DISTINCT D.[OGS_DOB_CLASS_FW] AS OGS_DOB_CLASS_FW
	,C.VEHICLE_COUNT_FW AS VEHICLE_COUNT_FW 
	,C.D750_FUEL_TYPE AS D750_FUEL_TYPE
	, C.D750_VEHICLE_TYPE AS D750_VEHICLE_TYPE
	, C.MONTH_FW AS MONTH_FW 
	, C.YEAR_FW AS YEAR_FW 
	, D.VEHICLE_ID_FW AS VEHICLE_ID_FW
	, MAX(D.MILEAGE_END_REPORT_PERIOD_FW)-MIN(D.MILEAGE_END_REPORT_PERIOD_FW) AS Mileage 
	, D.MODEL_YEAR_FW AS MODEL_YEAR_FW

FROM  
dbo.OGS_D750_STAT_GRID_FW C 
INNER JOIN 
dbo.OGS_D750_DETAIL_FW D
ON C.[MONTH_FW] = D.[MONTH_FW]
AND C.[YEAR_FW] = D.[YEAR_FW]
AND C.[DOB_CLASS_FW]=D.[OGS_DOB_CLASS_FW]  
WHERE (C.MONTH_FW + C.YEAR_FW) IN (92010)
AND (D.MONTH_FW + D.YEAR_FW) IN (92010)

--WHERE (D.MONTH_FW + D.YEAR_FW) IN (@startdate) 
--AND (C.MONTH_FW + C.YEAR_FW) IN (@startdate)
GROUP BY 	
D.OGS_DOB_CLASS_FW,
C.[DOB_CLASS_FW],
C.[D750_FUEL_TYPE],
C.[D750_VEHICLE_TYPE],
C.MONTH_FW,
C.YEAR_FW,
C.VEHICLE_COUNT_FW, 
D.VEHICLE_ID_FW,
D.MODEL_YEAR_FW 
) A 
-- 
--UNION ALL 
INNER JOIN 
(
SELECT
--* 	
	  DISTINCT E.[OGS_DOB_CLASS_FW] AS OGS_DOB_CLASS_FW 
	, D.VEHICLE_COUNT_FW AS VEHICLE_COUNT_FW 
	, D.D750_FUEL_TYPE AS D750_FUEL_TYPE
	, D.D750_VEHICLE_TYPE AS D750_VEHICLE_TYPE
	, D.MONTH_FW AS MONTH_FW 
	, D.YEAR_FW AS YEAR_FW 
	, MAX(E.MILEAGE_END_REPORT_PERIOD_FW)-MIN(E.MILEAGE_END_REPORT_PERIOD_FW) AS Mileage 
	, E.VEHICLE_ID_FW AS VEHICLE_ID_FW 
	, E.MODEL_YEAR_FW AS MODEL_YEAR_FW
----
FROM  
dbo.OGS_D750_STAT_GRID_FW D 
INNER JOIN 
dbo.OGS_D750_DETAIL_FW E
ON E.[MONTH_FW] = D.[MONTH_FW]
AND E.[YEAR_FW] = D.[YEAR_FW]
AND E.[OGS_DOB_CLASS_FW]=D.[DOB_CLASS_FW]  
WHERE (D.MONTH_FW + D.YEAR_FW) IN (42011)
AND (E.MONTH_FW + E.YEAR_FW) IN (42011)
--WHERE (D.MONTH_FW + D.YEAR_FW) IN (@enddate) 
--AND (E.MONTH_FW + E.YEAR_FW) IN (@enddate) 

GROUP BY 	
E.[OGS_DOB_CLASS_FW],
D.[DOB_CLASS_FW],
D.[D750_FUEL_TYPE],
D.[D750_VEHICLE_TYPE],
D.MONTH_FW,
D.YEAR_FW,
D.VEHICLE_COUNT_FW, 
E.VEHICLE_ID_FW,
E.MODEL_YEAR_FW 
) B 
--
ON A.OGS_DOB_CLASS_FW=B.OGS_DOB_CLASS_FW
AND A.VEHICLE_ID_FW=B.VEHICLE_ID_FW 
AND A.D750_FUEL_TYPE=B.D750_FUEL_TYPE 
AND A.D750_VEHICLE_TYPE=B.D750_VEHICLE_TYPE

Open in new window

0
 

Author Comment

by:jtrapat1
ID: 35755560
dgmg:
I would like to use this attached code because I have to display a table below my matrix on SSRS:
And I thought this was the proper way to check the ranges.
Then I would return kind of a summary table-
One row returned for each class based on month/year.
thanks for the DISTINCT mention;
Im desperate and trying everything to eliminate the dupes.

John

needed.txt
0
 
LVL 42

Accepted Solution

by:
dqmq earned 1600 total points
ID: 35755871
You are still missing the fundamental problem.  Let me try to explain.

You want to summarize, let's say by class/month/year.
You are joining two result sets A and B.

One of your result sets (probably the history side) has multiple rows per class/month/year each having a different mileage.  Let's illustrate with a single class/month/year

resultset B
------------
c/m/y    mileage  
1           300


resultset A
------------
c/m/y    mileage  
1           300
1           291
1           283


After joining, you have this:
result set a inner join b
--------------------------
c/m/y   b.mileage    a.mileage  a-mileage - b.mileage
1          300             300           0
1          300             291           9
1          300             283           17

The solution is to find the side (a or b) that is producing the duplicate rows per c/m/y and summarize that side before the join.

To do that, you must make a business decision about how to aggregate the multiple rows.  Let's say we want  to use the lowest mileage.  Then, your result sets look like this:

resultset B
------------
c/m/y    mileage  
1           300


resultset A
------------
c/m/y    min(mileage)  
1           283


And after joining those result sets:
c/m/y   b.mileage    a.mileage  a-mileage - b.mileage
1          300             283           17

Thus your duplication is gone.

In short:
Aggregate both A and B queries to the level of summarization you desire using GROUP BY
    get the A select to run independently without DISTINCT and no dups
    get the B select to run independently without DISTNICT and no dups
   
    note:  DO NOT INCLUDE the mileage columns in the GROUP BY.
    note:  DO NOT INCLUDE the mileage columns in the GROUP BY.
    note:  DO NOT INCLUDE the mileage columns in the GROUP BY.

    In case you didn't notice, the above is important.  It will force you to aggregate the independent result sets properly.







 
 

 























0
 
LVL 61

Expert Comment

by:Kevin Cross
ID: 35755965
dqmg,

I will leave you to it.  I think we are on the same page as to what the potential issue is, but you have explained it much better.  So aside from this post, I will avoid posting anything to clutter discussion.
Good luck!

Regards,
Kevin
0
 

Author Comment

by:jtrapat1
ID: 35756367
thanks for all the help-
with my code, i have a vehicle_id in my history table and so that is why i have the mileage dupes;
how can i eliminate this?
i tried to write a subquery like:
 where vehicle_id in
(select max(vehicle_id) from ..)
but  i couldnt get that part to work either.
Even if i join the previous and current queries on vehicle_id - i still get multiple mileages since they are coming from different month/year combinations.

john
0
 

Author Comment

by:jtrapat1
ID: 35757480
Thank you both for all the help
I think I understand your explanation and my screw-ups-
Ill close the question soon-
I think I can fix The vehicle_id problem if I could modify the query to accept another date range.
Is this possible?
I would only have to modify the "B" portion of the join, meaning the Current data.
Instead of using the input parameters of start and end date - i would need to use:
Current Date and subtract one year as my date range.
Could I just JOIN the same tables again as an alias based on this date?
And then do the subtraction at the outermost level?

Before I do this i want to make sure my totals dont explode again!

thanks again for all the help.
john
0
 
LVL 42

Expert Comment

by:dqmq
ID: 35758031
Confusing me. Figuring out current date - 1 year is easy enough, giving you one date range.  But your result set as a begin mo/year and and end mo/year giving another date range.  Not sure what dates you want within what dates.  
0
 

Author Comment

by:jtrapat1
ID: 35764188
dgmg-
Its confusing me too- but ill try to explain the best i can-
I accept a start and end date range for the entire report.
I have to take it in month/year format (no day) because of the database ive been given.
there are three separate sections on this one report-
the first has three matrices -
which compare data from the two parameter dates given.
the second section is basically a summary section that I figured I would do in a table.
this includes the annual mileage per year (for vehicle and staying within class) and is compared with the mileage of one year ago.
As you mention, if I could figure it out simply by doing current year -1 that would be great, but i dont see how that would work.
Since im already using previous and current in my query.
If I could somehow Group the date range results, that would be great.
But I dont know how.
And the third part:
Will probably require another subquery, since I need to find the missing mileage values from the date range entered by the user.
Because some of the data has Start Mileage but no end mileage and vice-versa.
So, I thought I would have to write a query such as (NOT IN) or something like this.
Hope I explained this well.

thanks again
john
0
 
LVL 42

Expert Comment

by:dqmq
ID: 35764473
I think we've wandered too far off the trail and are so deep in the brush that the destination is no longer in sight. Started out trying to eliminate "duplicate" rows, then filtering by date ranges, and now missing values.  

I do have a couple final suggestions.  

1.  Convert your Year/month strings to a single datetime field for comparison purposes.  I suggest writing a simple reusable UDF (probably a single statement using the CONVERT FUNCTION) to accomplish that.  

2.  Isolate your issues in separate questions.  Get the UDF working first and if you need help post another question specific to that.  Then work on your curr data query and prev data query indpendently until each returns the desired results.  You may want to save the historical side to a view.  If needed separate question for that.  And so on.  Divide and Conquer.

I will not object if you request to close this question for 0 points

0
 

Author Closing Comment

by:jtrapat1
ID: 35765584
Thanks for the help-
Youre right; i was trying to avoid posting multiple questions in one issue but the problem seemed difficult to explain.
Ill close this question and open another one concerning the multiple dates with one select query.
thanks again.

John
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
MSSQL DB-maintenance also needs implementation of multiple activities. However, unprecedented errors can hamper the database management. In that case, deploying Stellar SQL Database Toolkit ensures fast and accurate database and backup repair as wel…
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to setup several different housekeeping processes for a SQL Server.

571 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question