asked on

Group by one column only

Hi,

I have a query that's using an implicit join. I would like to group the results by JOB TYPE only, listing the associated LOCATION and NETWORK number, and COST sum based on JOB TYPE. In order to get results without error, I have to add c.afenumber and a.wellname to the GROUP BY clause. I only want to group by b.wvtyp, since I'd like to see the duplicates for LOCATION. For example, I'd like to see:

LOCATION NETWORK COST JOB TYPE
11111 net1 30000 jobtype1
11111 net2 50000 jobtype2

There are typically more than one Network numbers associated to a single location, which is why I'd like to see the Location duplicates. If I group by LOCATION, NETWORK, and JOB TYPE, I get the following:

LOCATION NETWORK COST JOB TYPE
11111 net1 30000 jobtype1
22222 net2 50000 jobtype1
44444 net6 20000 jobtype3

It will only show me one NETWORK number for each LOCATION, since the LOCATION doesn't display the duplicates.

I hope I'm making sense, since I'm pretty new to SQL. Here is the code (without the working group by):

SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) as COST,
b.wvtyp "JOB TYPE"

FROM
wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e

WHERE
a.idwell = b.idwell
AND b.idwell = c.idwell
AND c.idwell = e.idwell
AND b.idrec = c.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =

( SELECT MAX(f.syscreatedate)
FROM wvjobafe f
WHERE f.idwell = c.idwell)

GROUP BY
b.wvtyp

ORDER BY
a.wellname

I hope I've provided some clarity in regards to what I'm trying to accomplish

Thanks!

imran_fast

Did you try this

SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) as COST,
b.wvtyp "JOB TYPE"
 
 
FROM
wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e
 
WHERE
a.idwell = b.idwell
AND b.idwell = c.idwell
AND c.idwell = e.idwell
AND b.idrec = c.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =
 
                                     ( SELECT MAX(f.syscreatedate)
                                       FROM wvjobafe f
                                       WHERE f.idwell = c.idwell)
 
 
GROUP BY
b.wvtyp,
SUBSTR(a.wellname,1,30) ,
SUBSTR(c.afenumber,1,15) ,
 
ORDER BY
a.wellname

Open in new window

Geert G

use SUM() OVER (PARTITION BY ) you don't need a groub by and in the PARTITION BY you specify what you want to sum by ...

SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) OVER (PARTITION BY b.wvtyp) as COST,
b.wvtyp "JOB TYPE"
FROM
wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e
WHERE
a.idwell = b.idwell
AND b.idwell = c.idwell
AND c.idwell = e.idwell
AND b.idrec = c.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =
                                     ( SELECT MAX(f.syscreatedate)
                                       FROM wvjobafe f
                                       WHERE f.idwell = c.idwell)
ORDER BY
a.wellname

Open in new window

DrSQL - Scott Anderson

mskitten,
There's either a problem with your example or your explanation. As imran_fast has noted (although there's an extra comma before the Order-by), the grouping of the three columns will give you every unique combination of the three that actually exists in the data and the attendant sum. It will not sum the two LOCATION=11111 as you had in your example (not that the sum you had was a sum either). So, I'm not sure what you were trying to accomplish. You may have to show us the actual SQL and data.

Good luck!

mskitten

ASKER

Hello imran_fast,

When I try to run your script, I get an error at the ORDER BY clause, saying that a.wellname is not a GROUP BY expression.

================================================================================

Hello Geert Gruwez,

When I try your script, something seems to go wrong. I return many duplicate lines of the exact same data:

ROW LOCATION NETWORK COST JOB TYPE
1 CRESTAR 6-18-76        VCCU001307 244733.13 WELL INTERVENTION
2 CRESTAR 6-18-76        VCCU001307 244733.13 WELL INTERVENTION
3 CRESTAR 6-18-76        VCCU001307 244733.13 WELL INTERVENTION
4 CRESTAR 6-18-76        VCCU001307 244733.13 WELL INTERVENTION
5 CRESTAR 6-18-76        VCCU001307 244733.13 WELL INTERVENTION
...........

================================================================================

Hello DrSQL,

What I'd like to accomplish would look like the following:

ROW LOCATION NETWORK COST JOB TYPE
1 CRESTAR 6-18-76        VCCU001307 244733.13 WELL INTERVENTION
2 CRESTAR 6-18-76        VCCW036457 513651.26 DRILLING
3 CRESTAR 6-18-76        NOOWI63242 384733.82 COMPLETIONS
4 CTBK 10-33-65-9 XCCTT013083 244733.13 ABANDONMENT
5 CTBK 10-33-65-9 PLNIT3125348 244733.13 DRILLING
...........

As you can see, in rows 1-3, the same LOCATION is displayed, as well as the unique NETWORK and COST(sum) for each related JOB TYPE that was assigned to that specific LOCATION. Same applies for rows 4-5.

Each LOCATION will have at least one JOB TYPE that has been assigned a unique NETWORK number, meaning that if there is more than one JOB TYPE per LOCATION (which is usually the case), the NETWORK numbers will be different (since two different JOB TYPES can't be assigned the same NETWORK) .

I tried to accomplish this output with the script I posted, however, I don't wan't to group the LOCATION, COST, or NETWORK. I'd only like to GROUP the JOB TYPE and then ORDER BY LOCATION.

In my mind, if I GROUP the JOB TYPE, it should display the corresponding NETWORK, COST total, and LOCATION it's associated with, however, I'm unable to do this because I keep getting a "not a GROUP BY expression" if I don't add the NETWORK and LOCATION to the GROUP BY clause.

If I do add the NETWORK and LOCATION to the GROUP BY clause, I return something like this:

ROW LOCATION NETWORK COST JOB TYPE
1 NOEL A-94-I/93-P 10223536 2238526.44 COMPLETIONS
2 PARKLDNE 1-21-14-26 10218612 1447410.51 ABANDONMENT
3 VIK-KINS 2-15-47-10 10213789 298955.28 WELL INTERVENTION

It only gives me PARTIAL information for what I need. Each LOCATION above has more than one JOB TYPE, but because I had to group the LOCATION for the script to run, it returned unique LOCATION results.

Does this help any?
Thanks so much!

DrSQL - Scott Anderson

mskitten,
I think you misunderstand the meaning of the term "Group BY". It doesn't mean the thing you want to organize by, it means these are the items where I want to see EVERY combination that exists in the data, anything else I only want to see the min, max, sum, or average. So, you DO need to group by all three (as selected) in order to see what you want. The solution is very similar to what imran_fast had, there were just a couple of syntax errors. Here's what you should need:

Good luck!

SELECT
	SUBSTR(a.wellname,1,30) "LOCATION",
	SUBSTR(c.afenumber,1,15) "NETWORK",
	SUM(e.cost) as COST,
	b.wvtyp "JOB TYPE"
  FROM	wvwellheader a,
	wvjob b,
	wvjobafe c,
	wvjobreportcostgen e
WHERE
    a.idwell = b.idwell
AND c.idwell = b.idwell
AND e.idwell = b.idwell
AND c.idrec = b.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =
                                      ( SELECT MAX(f.syscreatedate)
                                       FROM wvjobafe f
                                       WHERE f.idwell = b.idwell) 
GROUP BY
  b.wvtyp,
  SUBSTR(a.wellname,1,30) ,
  SUBSTR(c.afenumber,1,15)
ORDER BY
 SUBSTR(a.wellname,1,30) ,
 SUBSTR(c.afenumber,1,15)
 
I couldn't resist adjusting your join - it's usually best to keep using the same table for the join condition rather than make it a string of a to b to c to e.  The issue you may have been having is that the group by has to be EXACTLY the same as what is selected in most cases.  And the Order by has to be something selected, not solting USED by a function in the select.  If this still isn't what you're looking for - and I understand if you're skeptical - we can try using analytical windowing functions, but I'm going to need some sample data.

Open in new window

SOLUTION

Geert G

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Geert G

this was the first step, you need to verify if this dataset is correct
off course this is without the SUM

mskitten

ASKER

Hello Geert,
I apologize for my delayed response. I was away.

I tried your script, but changed the Joins around a bit. I think the Joins were where most of my data issues came from.

Using the script I've pasted below, I have been able to generate correct results (but only half). For example;

LOCATION NETWORK COST JOB TYPE
AMOCO CESS 14-11MU VCCU001226 56464.1 WELL INTERVENTION

56464.1 is the SUM of the particular WELL INTERVENTION JOB that was assigned the VCCU001226 NETWORK number for the AMOCO LOCATION (I calculated the sum manually, of course).

The costs that were returned for this job are correct, which is really good! However, there are 3 other JOB TYPES associated to the VCCU001226 NETWORK number. It seems like the script is only pulling the last JOB from that site.

What I'd like to pull, is the COST sum for each JOB TYPE that is associated with the same NETWORK number. In somecases there are multiple JOBs of the same TYPE, AND use the same NETWORK. What I'd like to achieve is to have the script display the sum for the TYPE in general, rather than the last JOB entered. I'm thinking that this may also be a JOIN issue.

SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
e.cost COST,
b.wvtyp JOB

FROM wvwellheader a
JOIN wvjob b ON a.idwell = b.idwell
JOIN wvjobafe c ON b.idrec = c.idrecparent
JOIN wvjobreport d ON b.idrec = d.idrecparent
JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent

JOIN (SELECT idwell, MAX(syscreatedate) maxsyscreatedate FROM wvjobafe GROUP BY idwell) f
ON f.idwell = c.idwell AND c.syscreatedate = f.maxsyscreatedate
WHERE c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
ORDER BY a.wellname

This is what the hierarchy looks like if it will help.

a.wellname (LOCATION)
idwell
|
------- b.wvjob (JOB)
idwell
idrec
|
---------------- c.wvjobafe (NETWORK)
idwell
idrec
idrecparent
|
-----------------d. wvjobreport (DAILY ENTRY - Just used for Join)
idwell
idrec
idrecparent
|
------------------------- e.wvjobreportcostgen ($$)
idwell
idrec
idrecparent

mskitten

ASKER

So the idrecparent is the child of idrec. Sounds backwards, I know.

mskitten

ASKER

DrSQL - Scott Anderson

Here's why it is only getting the most recent JOB: "c.syscreatedate = f.maxsyscreatedate" You didn't mention the "f" table, what is its purpose here? It should either be "c.syscreatedate <= f.maxsyscreatedate" or it should be removed.

Good luck!

mskitten

ASKER

Sorry about that. I removed that whole section completely, so now it's just

SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
e.cost COST,
b.wvtyp JOB

FROM wvwellheader a
JOIN wvjob b ON a.idwell = b.idwell
JOIN wvjobafe c ON b.idrec = c.idrecparent
JOIN wvjobreport d ON b.idrec = d.idrecparent
JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent

WHERE c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
ORDER BY a.wellname

I'm still only retrieving the most recent JOB . I really do apologize for my lack of understandig here, and just wanted to let you know that I do really appreciate all the help!

ASKER CERTIFIED SOLUTION

DrSQL - Scott Anderson

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

mskitten

ASKER

I think this is it!! I changed a few things around and now it seems to be showing what I want! Here is the sript:

SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
SUM(e.cost) COST,
b.wvtyp JOB

FROM wvwellheader a
JOIN wvjob b ON a.idwell = b.idwell
JOIN wvjobafe c ON b.idrec = c.idrecparent
JOIN wvjobreport d ON b.idrec = d.idrecparent
JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent

WHERE c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/MAY/2007', 'DD/MON/YYYY')

GROUP BY a.wellname, c.afenumber, b.wvtyp
ORDER BY a.wellname

Thank you both very much for all of your help!!! :D

mskitten

ASKER

125 points each for Geert and DrSQL