mskitten
asked on
Group by one column only
Hi,
I have a query that's using an implicit join. I would like to group the results by JOB TYPE only, listing the associated LOCATION and NETWORK number, and COST sum based on JOB TYPE. In order to get results without error, I have to add c.afenumber and a.wellname to the GROUP BY clause. I only want to group by b.wvtyp, since I'd like to see the duplicates for LOCATION. For example, I'd like to see:
LOCATION NETWORK COST JOB TYPE
11111 net1 30000 jobtype1
11111 net2 50000 jobtype2
There are typically more than one Network numbers associated to a single location, which is why I'd like to see the Location duplicates. If I group by LOCATION, NETWORK, and JOB TYPE, I get the following:
LOCATION NETWORK COST JOB TYPE
11111 net1 30000 jobtype1
22222 net2 50000 jobtype1
44444 net6 20000 jobtype3
It will only show me one NETWORK number for each LOCATION, since the LOCATION doesn't display the duplicates.
I hope I'm making sense, since I'm pretty new to SQL. Here is the code (without the working group by):
SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) as COST,
b.wvtyp "JOB TYPE"
FROM
wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e
WHERE
a.idwell = b.idwell
AND b.idwell = c.idwell
AND c.idwell = e.idwell
AND b.idrec = c.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =
( SELECT MAX(f.syscreatedate)
FROM wvjobafe f
WHERE f.idwell = c.idwell)
GROUP BY
b.wvtyp
ORDER BY
a.wellname
I hope I've provided some clarity in regards to what I'm trying to accomplish
Thanks!
I have a query that's using an implicit join. I would like to group the results by JOB TYPE only, listing the associated LOCATION and NETWORK number, and COST sum based on JOB TYPE. In order to get results without error, I have to add c.afenumber and a.wellname to the GROUP BY clause. I only want to group by b.wvtyp, since I'd like to see the duplicates for LOCATION. For example, I'd like to see:
LOCATION NETWORK COST JOB TYPE
11111 net1 30000 jobtype1
11111 net2 50000 jobtype2
There are typically more than one Network numbers associated to a single location, which is why I'd like to see the Location duplicates. If I group by LOCATION, NETWORK, and JOB TYPE, I get the following:
LOCATION NETWORK COST JOB TYPE
11111 net1 30000 jobtype1
22222 net2 50000 jobtype1
44444 net6 20000 jobtype3
It will only show me one NETWORK number for each LOCATION, since the LOCATION doesn't display the duplicates.
I hope I'm making sense, since I'm pretty new to SQL. Here is the code (without the working group by):
SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) as COST,
b.wvtyp "JOB TYPE"
FROM
wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e
WHERE
a.idwell = b.idwell
AND b.idwell = c.idwell
AND c.idwell = e.idwell
AND b.idrec = c.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =
( SELECT MAX(f.syscreatedate)
FROM wvjobafe f
WHERE f.idwell = c.idwell)
GROUP BY
b.wvtyp
ORDER BY
a.wellname
I hope I've provided some clarity in regards to what I'm trying to accomplish
Thanks!
use SUM() OVER (PARTITION BY ) you don't need a groub by and in the PARTITION BY you specify what you want to sum by ...
SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) OVER (PARTITION BY b.wvtyp) as COST,
b.wvtyp "JOB TYPE"
FROM
wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e
WHERE
a.idwell = b.idwell
AND b.idwell = c.idwell
AND c.idwell = e.idwell
AND b.idrec = c.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =
( SELECT MAX(f.syscreatedate)
FROM wvjobafe f
WHERE f.idwell = c.idwell)
ORDER BY
a.wellname
mskitten,
There's either a problem with your example or your explanation. As imran_fast has noted (although there's an extra comma before the Order-by), the grouping of the three columns will give you every unique combination of the three that actually exists in the data and the attendant sum. It will not sum the two LOCATION=11111 as you had in your example (not that the sum you had was a sum either). So, I'm not sure what you were trying to accomplish. You may have to show us the actual SQL and data.
Good luck!
There's either a problem with your example or your explanation. As imran_fast has noted (although there's an extra comma before the Order-by), the grouping of the three columns will give you every unique combination of the three that actually exists in the data and the attendant sum. It will not sum the two LOCATION=11111 as you had in your example (not that the sum you had was a sum either). So, I'm not sure what you were trying to accomplish. You may have to show us the actual SQL and data.
Good luck!
ASKER
Hello imran_fast,
When I try to run your script, I get an error at the ORDER BY clause, saying that a.wellname is not a GROUP BY expression.
========================== ========== ========== ========== ========== ========== ====
Hello Geert Gruwez,
When I try your script, something seems to go wrong. I return many duplicate lines of the exact same data:
ROW LOCATION NETWORK COST JOB TYPE
1 CRESTAR 6-18-76 VCCU001307 244733.13 WELL INTERVENTION
2 CRESTAR 6-18-76 VCCU001307 244733.13 WELL INTERVENTION
3 CRESTAR 6-18-76 VCCU001307 244733.13 WELL INTERVENTION
4 CRESTAR 6-18-76 VCCU001307 244733.13 WELL INTERVENTION
5 CRESTAR 6-18-76 VCCU001307 244733.13 WELL INTERVENTION
...........
========================== ========== ========== ========== ========== ========== ====
Hello DrSQL,
What I'd like to accomplish would look like the following:
ROW LOCATION NETWORK COST JOB TYPE
1 CRESTAR 6-18-76 VCCU001307 244733.13 WELL INTERVENTION
2 CRESTAR 6-18-76 VCCW036457 513651.26 DRILLING
3 CRESTAR 6-18-76 NOOWI63242 384733.82 COMPLETIONS
4 CTBK 10-33-65-9 XCCTT013083 244733.13 ABANDONMENT
5 CTBK 10-33-65-9 PLNIT3125348 244733.13 DRILLING
...........
As you can see, in rows 1-3, the same LOCATION is displayed, as well as the unique NETWORK and COST(sum) for each related JOB TYPE that was assigned to that specific LOCATION. Same applies for rows 4-5.
Each LOCATION will have at least one JOB TYPE that has been assigned a unique NETWORK number, meaning that if there is more than one JOB TYPE per LOCATION (which is usually the case), the NETWORK numbers will be different (since two different JOB TYPES can't be assigned the same NETWORK) .
I tried to accomplish this output with the script I posted, however, I don't wan't to group the LOCATION, COST, or NETWORK. I'd only like to GROUP the JOB TYPE and then ORDER BY LOCATION.
In my mind, if I GROUP the JOB TYPE, it should display the corresponding NETWORK, COST total, and LOCATION it's associated with, however, I'm unable to do this because I keep getting a "not a GROUP BY expression" if I don't add the NETWORK and LOCATION to the GROUP BY clause.
If I do add the NETWORK and LOCATION to the GROUP BY clause, I return something like this:
ROW LOCATION NETWORK COST JOB TYPE
1 NOEL A-94-I/93-P 10223536 2238526.44 COMPLETIONS
2 PARKLDNE 1-21-14-26 10218612 1447410.51 ABANDONMENT
3 VIK-KINS 2-15-47-10 10213789 298955.28 WELL INTERVENTION
It only gives me PARTIAL information for what I need. Each LOCATION above has more than one JOB TYPE, but because I had to group the LOCATION for the script to run, it returned unique LOCATION results.
Does this help any?
Thanks so much!
When I try to run your script, I get an error at the ORDER BY clause, saying that a.wellname is not a GROUP BY expression.
==========================
Hello Geert Gruwez,
When I try your script, something seems to go wrong. I return many duplicate lines of the exact same data:
ROW LOCATION NETWORK COST JOB TYPE
1 CRESTAR 6-18-76 VCCU001307 244733.13 WELL INTERVENTION
2 CRESTAR 6-18-76 VCCU001307 244733.13 WELL INTERVENTION
3 CRESTAR 6-18-76 VCCU001307 244733.13 WELL INTERVENTION
4 CRESTAR 6-18-76 VCCU001307 244733.13 WELL INTERVENTION
5 CRESTAR 6-18-76 VCCU001307 244733.13 WELL INTERVENTION
...........
==========================
Hello DrSQL,
What I'd like to accomplish would look like the following:
ROW LOCATION NETWORK COST JOB TYPE
1 CRESTAR 6-18-76 VCCU001307 244733.13 WELL INTERVENTION
2 CRESTAR 6-18-76 VCCW036457 513651.26 DRILLING
3 CRESTAR 6-18-76 NOOWI63242 384733.82 COMPLETIONS
4 CTBK 10-33-65-9 XCCTT013083 244733.13 ABANDONMENT
5 CTBK 10-33-65-9 PLNIT3125348 244733.13 DRILLING
...........
As you can see, in rows 1-3, the same LOCATION is displayed, as well as the unique NETWORK and COST(sum) for each related JOB TYPE that was assigned to that specific LOCATION. Same applies for rows 4-5.
Each LOCATION will have at least one JOB TYPE that has been assigned a unique NETWORK number, meaning that if there is more than one JOB TYPE per LOCATION (which is usually the case), the NETWORK numbers will be different (since two different JOB TYPES can't be assigned the same NETWORK) .
I tried to accomplish this output with the script I posted, however, I don't wan't to group the LOCATION, COST, or NETWORK. I'd only like to GROUP the JOB TYPE and then ORDER BY LOCATION.
In my mind, if I GROUP the JOB TYPE, it should display the corresponding NETWORK, COST total, and LOCATION it's associated with, however, I'm unable to do this because I keep getting a "not a GROUP BY expression" if I don't add the NETWORK and LOCATION to the GROUP BY clause.
If I do add the NETWORK and LOCATION to the GROUP BY clause, I return something like this:
ROW LOCATION NETWORK COST JOB TYPE
1 NOEL A-94-I/93-P 10223536 2238526.44 COMPLETIONS
2 PARKLDNE 1-21-14-26 10218612 1447410.51 ABANDONMENT
3 VIK-KINS 2-15-47-10 10213789 298955.28 WELL INTERVENTION
It only gives me PARTIAL information for what I need. Each LOCATION above has more than one JOB TYPE, but because I had to group the LOCATION for the script to run, it returned unique LOCATION results.
Does this help any?
Thanks so much!
mskitten,
I think you misunderstand the meaning of the term "Group BY". It doesn't mean the thing you want to organize by, it means these are the items where I want to see EVERY combination that exists in the data, anything else I only want to see the min, max, sum, or average. So, you DO need to group by all three (as selected) in order to see what you want. The solution is very similar to what imran_fast had, there were just a couple of syntax errors. Here's what you should need:
Good luck!
I think you misunderstand the meaning of the term "Group BY". It doesn't mean the thing you want to organize by, it means these are the items where I want to see EVERY combination that exists in the data, anything else I only want to see the min, max, sum, or average. So, you DO need to group by all three (as selected) in order to see what you want. The solution is very similar to what imran_fast had, there were just a couple of syntax errors. Here's what you should need:
Good luck!
SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) as COST,
b.wvtyp "JOB TYPE"
FROM wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e
WHERE
a.idwell = b.idwell
AND c.idwell = b.idwell
AND e.idwell = b.idwell
AND c.idrec = b.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =
( SELECT MAX(f.syscreatedate)
FROM wvjobafe f
WHERE f.idwell = b.idwell)
GROUP BY
b.wvtyp,
SUBSTR(a.wellname,1,30) ,
SUBSTR(c.afenumber,1,15)
ORDER BY
SUBSTR(a.wellname,1,30) ,
SUBSTR(c.afenumber,1,15)
I couldn't resist adjusting your join - it's usually best to keep using the same table for the join condition rather than make it a string of a to b to c to e. The issue you may have been having is that the group by has to be EXACTLY the same as what is selected in most cases. And the Order by has to be something selected, not solting USED by a function in the select. If this still isn't what you're looking for - and I understand if you're skeptical - we can try using analytical windowing functions, but I'm going to need some sample data.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
this was the first step, you need to verify if this dataset is correct
off course this is without the SUM
off course this is without the SUM
ASKER
Hello Geert,
I apologize for my delayed response. I was away.
I tried your script, but changed the Joins around a bit. I think the Joins were where most of my data issues came from.
Using the script I've pasted below, I have been able to generate correct results (but only half). For example;
LOCATION NETWORK COST JOB TYPE
AMOCO CESS 14-11MU VCCU001226 56464.1 WELL INTERVENTION
56464.1 is the SUM of the particular WELL INTERVENTION JOB that was assigned the VCCU001226 NETWORK number for the AMOCO LOCATION (I calculated the sum manually, of course).
The costs that were returned for this job are correct, which is really good! However, there are 3 other JOB TYPES associated to the VCCU001226 NETWORK number. It seems like the script is only pulling the last JOB from that site.
What I'd like to pull, is the COST sum for each JOB TYPE that is associated with the same NETWORK number. In somecases there are multiple JOBs of the same TYPE, AND use the same NETWORK. What I'd like to achieve is to have the script display the sum for the TYPE in general, rather than the last JOB entered. I'm thinking that this may also be a JOIN issue.
SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
e.cost COST,
b.wvtyp JOB
FROM wvwellheader a
JOIN wvjob b ON a.idwell = b.idwell
JOIN wvjobafe c ON b.idrec = c.idrecparent
JOIN wvjobreport d ON b.idrec = d.idrecparent
JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent
JOIN (SELECT idwell, MAX(syscreatedate) maxsyscreatedate FROM wvjobafe GROUP BY idwell) f
ON f.idwell = c.idwell AND c.syscreatedate = f.maxsyscreatedate
WHERE c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
ORDER BY a.wellname
This is what the hierarchy looks like if it will help.
a.wellname (LOCATION)
idwell
|
------- b.wvjob (JOB)
idwell
idrec
|
---------------- c.wvjobafe (NETWORK)
idwell
idrec
idrecparent
|
-----------------d. wvjobreport (DAILY ENTRY - Just used for Join)
idwell
idrec
idrecparent
|
------------------------- e.wvjobreportcostgen ($$)
idwell
idrec
idrecparent
I apologize for my delayed response. I was away.
I tried your script, but changed the Joins around a bit. I think the Joins were where most of my data issues came from.
Using the script I've pasted below, I have been able to generate correct results (but only half). For example;
LOCATION NETWORK COST JOB TYPE
AMOCO CESS 14-11MU VCCU001226 56464.1 WELL INTERVENTION
56464.1 is the SUM of the particular WELL INTERVENTION JOB that was assigned the VCCU001226 NETWORK number for the AMOCO LOCATION (I calculated the sum manually, of course).
The costs that were returned for this job are correct, which is really good! However, there are 3 other JOB TYPES associated to the VCCU001226 NETWORK number. It seems like the script is only pulling the last JOB from that site.
What I'd like to pull, is the COST sum for each JOB TYPE that is associated with the same NETWORK number. In somecases there are multiple JOBs of the same TYPE, AND use the same NETWORK. What I'd like to achieve is to have the script display the sum for the TYPE in general, rather than the last JOB entered. I'm thinking that this may also be a JOIN issue.
SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
e.cost COST,
b.wvtyp JOB
FROM wvwellheader a
JOIN wvjob b ON a.idwell = b.idwell
JOIN wvjobafe c ON b.idrec = c.idrecparent
JOIN wvjobreport d ON b.idrec = d.idrecparent
JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent
JOIN (SELECT idwell, MAX(syscreatedate) maxsyscreatedate FROM wvjobafe GROUP BY idwell) f
ON f.idwell = c.idwell AND c.syscreatedate = f.maxsyscreatedate
WHERE c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
ORDER BY a.wellname
This is what the hierarchy looks like if it will help.
a.wellname (LOCATION)
idwell
|
------- b.wvjob (JOB)
idwell
idrec
|
---------------- c.wvjobafe (NETWORK)
idwell
idrec
idrecparent
|
-----------------d. wvjobreport (DAILY ENTRY - Just used for Join)
idwell
idrec
idrecparent
|
------------------------- e.wvjobreportcostgen ($$)
idwell
idrec
idrecparent
ASKER
So the idrecparent is the child of idrec. Sounds backwards, I know.
ASKER
Sorry, this is how it should look!
a.wvwellheader(LOCATION)
idwell
|
------- b.wvjob (JOB)
idwell
idrec
|
---------------- c.wvjobafe (NETWORK)
| idwell
| idrec
| idrecparent
|
-----------------d. wvjobreport (DAILY ENTRY - Just used for Join)
idwell
idrec
idrecparent
|
------------------------- e.wvjobreportcostgen ($$)
idwell
idrec
idrecparent
a.wvwellheader(LOCATION)
idwell
|
------- b.wvjob (JOB)
idwell
idrec
|
---------------- c.wvjobafe (NETWORK)
| idwell
| idrec
| idrecparent
|
-----------------d. wvjobreport (DAILY ENTRY - Just used for Join)
idwell
idrec
idrecparent
|
------------------------- e.wvjobreportcostgen ($$)
idwell
idrec
idrecparent
Here's why it is only getting the most recent JOB: "c.syscreatedate = f.maxsyscreatedate" You didn't mention the "f" table, what is its purpose here? It should either be "c.syscreatedate <= f.maxsyscreatedate" or it should be removed.
Good luck!
Good luck!
ASKER
Sorry about that. I removed that whole section completely, so now it's just
SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
e.cost COST,
b.wvtyp JOB
FROM wvwellheader a
JOIN wvjob b ON a.idwell = b.idwell
JOIN wvjobafe c ON b.idrec = c.idrecparent
JOIN wvjobreport d ON b.idrec = d.idrecparent
JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent
WHERE c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
ORDER BY a.wellname
I'm still only retrieving the most recent JOB . I really do apologize for my lack of understandig here, and just wanted to let you know that I do really appreciate all the help!
SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
e.cost COST,
b.wvtyp JOB
FROM wvwellheader a
JOIN wvjob b ON a.idwell = b.idwell
JOIN wvjobafe c ON b.idrec = c.idrecparent
JOIN wvjobreport d ON b.idrec = d.idrecparent
JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent
WHERE c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
ORDER BY a.wellname
I'm still only retrieving the most recent JOB . I really do apologize for my lack of understandig here, and just wanted to let you know that I do really appreciate all the help!
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I think this is it!! I changed a few things around and now it seems to be showing what I want! Here is the sript:
SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
SUM(e.cost) COST,
b.wvtyp JOB
FROM wvwellheader a
JOIN wvjob b ON a.idwell = b.idwell
JOIN wvjobafe c ON b.idrec = c.idrecparent
JOIN wvjobreport d ON b.idrec = d.idrecparent
JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent
WHERE c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/MAY/2007', 'DD/MON/YYYY')
GROUP BY a.wellname, c.afenumber, b.wvtyp
ORDER BY a.wellname
Thank you both very much for all of your help!!! :D
SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
SUM(e.cost) COST,
b.wvtyp JOB
FROM wvwellheader a
JOIN wvjob b ON a.idwell = b.idwell
JOIN wvjobafe c ON b.idrec = c.idrecparent
JOIN wvjobreport d ON b.idrec = d.idrecparent
JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent
WHERE c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/MAY/2007', 'DD/MON/YYYY')
GROUP BY a.wellname, c.afenumber, b.wvtyp
ORDER BY a.wellname
Thank you both very much for all of your help!!! :D
ASKER
125 points each for Geert and DrSQL
Open in new window