Link to home
Start Free TrialLog in
Avatar of jtrapat1
jtrapat1

asked on

Self-Join with Current and Previous Year's Data

Im trying to eliminate the duplicates from this attached query;
I have two tables: prev and curr, and I get the correct data for the most part- 20 records.
This query brings back the correct desired resultset -see image- work_qry
But as soon as I bring in the mileage fields from previous and current tables, this resultset grows to about 700 records.
This is because of the vehicle_id field as you can see in attachment 2. - many_qry
I get back one mileage record for each type of fuel and vehicle.
How can I eliminate these dupes and get the correct results?
Thanks
John
SELECT  
		DISTINCT B.OGS_DOB_CLASS_FW AS currClass,
		A.D750_FUEL_TYPE AS prevFuel,
		B.D750_FUEL_TYPE AS currFuel,
		A.D750_VEHICLE_TYPE AS prevVeh, 
		B.D750_VEHICLE_TYPE AS currVeh,
		A.VEHICLE_COUNT_FW AS prevCT, 
		B.VEHICLE_COUNT_FW AS currCT,
		ISNULL(B.VEHICLE_COUNT_FW,0) - 

ISNULL(A.VEHICLE_COUNT_FW,0) AS NetVehCT,
		A.MONTH_FW AS prevMo, 
		B.MONTH_FW AS currMo,
		A.YEAR_FW AS prevYr, 
		B.YEAR_FW AS currYr 
--
FROM 
(
Select	
	DISTINCT D.[OGS_DOB_CLASS_FW] AS OGS_DOB_CLASS_FW
	,C.VEHICLE_COUNT_FW AS VEHICLE_COUNT_FW 
	,C.D750_FUEL_TYPE AS D750_FUEL_TYPE
	, C.D750_VEHICLE_TYPE AS D750_VEHICLE_TYPE
	, C.MONTH_FW AS MONTH_FW 
	, C.YEAR_FW AS YEAR_FW 
	, D.VEHICLE_ID_FW AS VEHICLE_ID_FW 
	, D.MILEAGE_END_REPORT_PERIOD_FW AS 

MILEAGE_END_REPORT_PERIOD_FW
	, D.MODEL_YEAR_FW AS MODEL_YEAR_FW

FROM  
dbo.OGS_D750_STAT_GRID_FW C 
INNER JOIN 
dbo.OGS_D750_DETAIL_FW D
ON C.[MONTH_FW] = D.[MONTH_FW]
AND C.[YEAR_FW] = D.[YEAR_FW]
WHERE (C.MONTH_FW + C.YEAR_FW) IN (92010)
AND (D.MONTH_FW + D.YEAR_FW) IN (92010)
AND C.[DOB_CLASS_FW]=D.[OGS_DOB_CLASS_FW]  

--WHERE (D.MONTH_FW + D.YEAR_FW) IN (@startdate) 
--AND (C.MONTH_FW + C.YEAR_FW) IN (@startdate)
GROUP BY 	
D.OGS_DOB_CLASS_FW,
C.[DOB_CLASS_FW],
C.[D750_FUEL_TYPE],
C.[D750_VEHICLE_TYPE],
C.MONTH_FW,
C.YEAR_FW,
D.MILEAGE_END_REPORT_PERIOD_FW,
C.VEHICLE_COUNT_FW, 
D.VEHICLE_ID_FW,
D.MODEL_YEAR_FW 
) A 
-- 
--UNION ALL 
INNER JOIN 
(
SELECT
--* 	
	  DISTINCT E.[OGS_DOB_CLASS_FW] AS OGS_DOB_CLASS_FW 
	, D.VEHICLE_COUNT_FW AS VEHICLE_COUNT_FW 
	, D.D750_FUEL_TYPE AS D750_FUEL_TYPE
	, D.D750_VEHICLE_TYPE AS D750_VEHICLE_TYPE
	, D.MONTH_FW AS MONTH_FW 
	, D.YEAR_FW AS YEAR_FW 
	, E.MILEAGE_END_REPORT_PERIOD_FW AS 

MILEAGE_END_REPORT_PERIOD_FW
	, E.VEHICLE_ID_FW AS VEHICLE_ID_FW 
	, E.MODEL_YEAR_FW AS MODEL_YEAR_FW
----
FROM  
dbo.OGS_D750_STAT_GRID_FW D 
INNER JOIN 
dbo.OGS_D750_DETAIL_FW E
ON E.[MONTH_FW] = D.[MONTH_FW]
AND E.[YEAR_FW] = D.[YEAR_FW]
AND E.[OGS_DOB_CLASS_FW]=D.[DOB_CLASS_FW]  
WHERE (D.MONTH_FW + D.YEAR_FW) IN (42011)
AND (E.MONTH_FW + E.YEAR_FW) IN (42011)
--WHERE (D.MONTH_FW + D.YEAR_FW) IN (@enddate) 
--AND (E.MONTH_FW + E.YEAR_FW) IN (@enddate) 

GROUP BY 	
E.[OGS_DOB_CLASS_FW],
D.[DOB_CLASS_FW],
D.[D750_FUEL_TYPE],
D.[D750_VEHICLE_TYPE],
D.MONTH_FW,
D.YEAR_FW,
D.VEHICLE_COUNT_FW, 
E.VEHICLE_ID_FW,
E.MILEAGE_END_REPORT_PERIOD_FW,
E.MODEL_YEAR_FW 
) B 
--
ON A.OGS_DOB_CLASS_FW=B.OGS_DOB_CLASS_FW
AND A.VEHICLE_ID_FW=B.VEHICLE_ID_FW 
AND A.D750_FUEL_TYPE=B.D750_FUEL_TYPE 
AND A.D750_VEHICLE_TYPE=B.D750_VEHICLE_TYPE

Open in new window

work-qry.jpg
many-qry.jpg
SOLUTION
Avatar of Kevin Cross
Kevin Cross
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Which field contains the mileage?

What are the "correct results" when mileage is shown.  By that I mean, for each of your desired rows you have multiple mileage numbers.  So, what to you want to see:  max, min, average, sum???

Avatar of jtrapat1
jtrapat1

ASKER

Hi-
the field containing the mileage is prevMiles in the screenshot above;
As you can see, the mileage repeats after the 18th record when there is a change in fuel type;
I would like to return the difference between the current and previous mileages-
So, (B.MILEAGE_END_REPORT_PERIOD_FW -A.MILEAGE_END_REPORT_PERIOD_FW )
And this value should be based on Class.
So, I could next display mileage traveled in one year based on class.

Thanks
John
I saw the prevMiles in the image, but you don't actually have that in your SELECT statement in code snippet, so just needed confirmation that was the column we are talking about.  It appears you have multiple miles reporting that matches a given month/year combination so you need to remove the mileage column from GROUP BY and aggregate it.  If the mileage is really the ODOMETER reading, then you can do (MAX(MILEAGE_END_REPORT_PERIOD_FW)-MIN(MILEAGE_END_REPORT_PERIOD_FW) AS Mileage) then in the other query do (B.Mileage-A.Mileage).
Thank you.  we now know what column contains the mileage.

>>I would like to return the difference between the current and previous mileages

Note the plural in "mileages".  As such, that does not resolve the problem:  you have just as many mileage differences for each row as you have mileages.


>>So, I could next display mileage traveled in one year based on class.
Does that mean you only care about one row per year from the history table? Can we just use the min() mileage from the history table?


BTW, using DISTINCT is rarely a good practice. In this case it is masking fundamental flaws in your sub-queries.  By that I mean, if you remove DISTINCT, the dup rows show up even without the mileage column in the results.  I encourage you to focus on the GROUP BY clause to aggregate your sub-queries to the logically correct results so that the duplicates do not appear.    


 
Hi-
thanks for the help-
But I still dont get the correct results
Attached is my select -
I get 0 amounts for my mileage calculations.
You are correct -these are odometers at the end of the start month/year and end month/year periods.
What am I doing wrong here?
Thanks
John
SELECT  
		DISTINCT B.OGS_DOB_CLASS_FW AS currClass,
		A.D750_FUEL_TYPE AS prevFuel,
		B.D750_FUEL_TYPE AS currFuel,
		B.Mileage-A.Mileage ,
		A.D750_VEHICLE_TYPE AS prevVeh, 
		B.D750_VEHICLE_TYPE AS currVeh,
		A.VEHICLE_COUNT_FW AS prevCT, 
		B.VEHICLE_COUNT_FW AS currCT,
		ISNULL(B.VEHICLE_COUNT_FW,0) - ISNULL(A.VEHICLE_COUNT_FW,0) AS NetVehCT,
		A.MONTH_FW AS prevMo, 
		B.MONTH_FW AS currMo,
		A.YEAR_FW AS prevYr, 
		B.YEAR_FW AS currYr 
--
FROM 
(
Select	
	DISTINCT D.[OGS_DOB_CLASS_FW] AS OGS_DOB_CLASS_FW
	,C.VEHICLE_COUNT_FW AS VEHICLE_COUNT_FW 
	,C.D750_FUEL_TYPE AS D750_FUEL_TYPE
	, C.D750_VEHICLE_TYPE AS D750_VEHICLE_TYPE
	, C.MONTH_FW AS MONTH_FW 
	, C.YEAR_FW AS YEAR_FW 
	, D.VEHICLE_ID_FW AS VEHICLE_ID_FW
	, MAX(D.MILEAGE_END_REPORT_PERIOD_FW)-MIN(D.MILEAGE_END_REPORT_PERIOD_FW) AS Mileage 
	, D.MODEL_YEAR_FW AS MODEL_YEAR_FW

FROM  
dbo.OGS_D750_STAT_GRID_FW C 
INNER JOIN 
dbo.OGS_D750_DETAIL_FW D
ON C.[MONTH_FW] = D.[MONTH_FW]
AND C.[YEAR_FW] = D.[YEAR_FW]
AND C.[DOB_CLASS_FW]=D.[OGS_DOB_CLASS_FW]  
WHERE (C.MONTH_FW + C.YEAR_FW) IN (92010)
AND (D.MONTH_FW + D.YEAR_FW) IN (92010)

--WHERE (D.MONTH_FW + D.YEAR_FW) IN (@startdate) 
--AND (C.MONTH_FW + C.YEAR_FW) IN (@startdate)
GROUP BY 	
D.OGS_DOB_CLASS_FW,
C.[DOB_CLASS_FW],
C.[D750_FUEL_TYPE],
C.[D750_VEHICLE_TYPE],
C.MONTH_FW,
C.YEAR_FW,
C.VEHICLE_COUNT_FW, 
D.VEHICLE_ID_FW,
D.MODEL_YEAR_FW 
) A 
-- 
--UNION ALL 
INNER JOIN 
(
SELECT
--* 	
	  DISTINCT E.[OGS_DOB_CLASS_FW] AS OGS_DOB_CLASS_FW 
	, D.VEHICLE_COUNT_FW AS VEHICLE_COUNT_FW 
	, D.D750_FUEL_TYPE AS D750_FUEL_TYPE
	, D.D750_VEHICLE_TYPE AS D750_VEHICLE_TYPE
	, D.MONTH_FW AS MONTH_FW 
	, D.YEAR_FW AS YEAR_FW 
	, MAX(E.MILEAGE_END_REPORT_PERIOD_FW)-MIN(E.MILEAGE_END_REPORT_PERIOD_FW) AS Mileage 
	, E.VEHICLE_ID_FW AS VEHICLE_ID_FW 
	, E.MODEL_YEAR_FW AS MODEL_YEAR_FW
----
FROM  
dbo.OGS_D750_STAT_GRID_FW D 
INNER JOIN 
dbo.OGS_D750_DETAIL_FW E
ON E.[MONTH_FW] = D.[MONTH_FW]
AND E.[YEAR_FW] = D.[YEAR_FW]
AND E.[OGS_DOB_CLASS_FW]=D.[DOB_CLASS_FW]  
WHERE (D.MONTH_FW + D.YEAR_FW) IN (42011)
AND (E.MONTH_FW + E.YEAR_FW) IN (42011)
--WHERE (D.MONTH_FW + D.YEAR_FW) IN (@enddate) 
--AND (E.MONTH_FW + E.YEAR_FW) IN (@enddate) 

GROUP BY 	
E.[OGS_DOB_CLASS_FW],
D.[DOB_CLASS_FW],
D.[D750_FUEL_TYPE],
D.[D750_VEHICLE_TYPE],
D.MONTH_FW,
D.YEAR_FW,
D.VEHICLE_COUNT_FW, 
E.VEHICLE_ID_FW,
E.MODEL_YEAR_FW 
) B 
--
ON A.OGS_DOB_CLASS_FW=B.OGS_DOB_CLASS_FW
AND A.VEHICLE_ID_FW=B.VEHICLE_ID_FW 
AND A.D750_FUEL_TYPE=B.D750_FUEL_TYPE 
AND A.D750_VEHICLE_TYPE=B.D750_VEHICLE_TYPE

Open in new window

dgmg:
I would like to use this attached code because I have to display a table below my matrix on SSRS:
And I thought this was the proper way to check the ranges.
Then I would return kind of a summary table-
One row returned for each class based on month/year.
thanks for the DISTINCT mention;
Im desperate and trying everything to eliminate the dupes.

John

needed.txt
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
dqmg,

I will leave you to it.  I think we are on the same page as to what the potential issue is, but you have explained it much better.  So aside from this post, I will avoid posting anything to clutter discussion.
Good luck!

Regards,
Kevin
thanks for all the help-
with my code, i have a vehicle_id in my history table and so that is why i have the mileage dupes;
how can i eliminate this?
i tried to write a subquery like:
 where vehicle_id in
(select max(vehicle_id) from ..)
but  i couldnt get that part to work either.
Even if i join the previous and current queries on vehicle_id - i still get multiple mileages since they are coming from different month/year combinations.

john
Thank you both for all the help
I think I understand your explanation and my screw-ups-
Ill close the question soon-
I think I can fix The vehicle_id problem if I could modify the query to accept another date range.
Is this possible?
I would only have to modify the "B" portion of the join, meaning the Current data.
Instead of using the input parameters of start and end date - i would need to use:
Current Date and subtract one year as my date range.
Could I just JOIN the same tables again as an alias based on this date?
And then do the subtraction at the outermost level?

Before I do this i want to make sure my totals dont explode again!

thanks again for all the help.
john
Confusing me. Figuring out current date - 1 year is easy enough, giving you one date range.  But your result set as a begin mo/year and and end mo/year giving another date range.  Not sure what dates you want within what dates.  
dgmg-
Its confusing me too- but ill try to explain the best i can-
I accept a start and end date range for the entire report.
I have to take it in month/year format (no day) because of the database ive been given.
there are three separate sections on this one report-
the first has three matrices -
which compare data from the two parameter dates given.
the second section is basically a summary section that I figured I would do in a table.
this includes the annual mileage per year (for vehicle and staying within class) and is compared with the mileage of one year ago.
As you mention, if I could figure it out simply by doing current year -1 that would be great, but i dont see how that would work.
Since im already using previous and current in my query.
If I could somehow Group the date range results, that would be great.
But I dont know how.
And the third part:
Will probably require another subquery, since I need to find the missing mileage values from the date range entered by the user.
Because some of the data has Start Mileage but no end mileage and vice-versa.
So, I thought I would have to write a query such as (NOT IN) or something like this.
Hope I explained this well.

thanks again
john
I think we've wandered too far off the trail and are so deep in the brush that the destination is no longer in sight. Started out trying to eliminate "duplicate" rows, then filtering by date ranges, and now missing values.  

I do have a couple final suggestions.  

1.  Convert your Year/month strings to a single datetime field for comparison purposes.  I suggest writing a simple reusable UDF (probably a single statement using the CONVERT FUNCTION) to accomplish that.  

2.  Isolate your issues in separate questions.  Get the UDF working first and if you need help post another question specific to that.  Then work on your curr data query and prev data query indpendently until each returns the desired results.  You may want to save the historical side to a view.  If needed separate question for that.  And so on.  Divide and Conquer.

I will not object if you request to close this question for 0 points

Thanks for the help-
Youre right; i was trying to avoid posting multiple questions in one issue but the problem seemed difficult to explain.
Ill close this question and open another one concerning the multiple dates with one select query.
thanks again.

John