Link to home
Start Free TrialLog in
Avatar of mskitten
mskitten

asked on

Group by one column only

Hi,

I have a query that's using an implicit join. I would like to group the results by JOB TYPE only, listing the associated LOCATION and NETWORK number, and COST sum based on JOB TYPE. In order to get results without error, I have to add c.afenumber and a.wellname to the GROUP BY clause. I only want to group by b.wvtyp, since I'd like to see the duplicates for LOCATION. For example, I'd like to see:

LOCATION            NETWORK          COST             JOB TYPE
11111                       net1                30000                 jobtype1
11111                       net2                50000                 jobtype2

There are typically more than one Network numbers associated to a single location, which is why I'd like to see the Location duplicates. If I group by LOCATION, NETWORK, and JOB TYPE, I get the following:

LOCATION            NETWORK          COST             JOB TYPE
11111                       net1                30000                 jobtype1
22222                       net2                50000                 jobtype1
44444                       net6                20000                 jobtype3

It will only show me one NETWORK number for each LOCATION, since the LOCATION doesn't display the duplicates.

I hope I'm making sense, since I'm pretty new to SQL. Here is the code (without the working group by):

SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) as COST,
b.wvtyp "JOB TYPE"


FROM
wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e

WHERE
a.idwell = b.idwell
AND b.idwell = c.idwell
AND c.idwell = e.idwell
AND b.idrec = c.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =

                                     ( SELECT MAX(f.syscreatedate)
                                       FROM wvjobafe f
                                       WHERE f.idwell = c.idwell)


GROUP BY
b.wvtyp

ORDER BY
a.wellname

I hope I've provided some clarity in regards to what I'm trying to accomplish

Thanks!
Avatar of imran_fast
imran_fast

Did you try this
SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) as COST,
b.wvtyp "JOB TYPE"
 
 
FROM
wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e
 
WHERE
a.idwell = b.idwell
AND b.idwell = c.idwell
AND c.idwell = e.idwell
AND b.idrec = c.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =
 
                                     ( SELECT MAX(f.syscreatedate)
                                       FROM wvjobafe f
                                       WHERE f.idwell = c.idwell)
 
 
GROUP BY
b.wvtyp,
SUBSTR(a.wellname,1,30) ,
SUBSTR(c.afenumber,1,15) ,
 
ORDER BY
a.wellname

Open in new window

Avatar of Geert G
use SUM() OVER (PARTITION BY ) you don't need a groub by and in the PARTITION BY you specify what you want to sum by ...

SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) OVER (PARTITION BY b.wvtyp) as COST,
b.wvtyp "JOB TYPE"
FROM
wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e
WHERE
a.idwell = b.idwell
AND b.idwell = c.idwell
AND c.idwell = e.idwell
AND b.idrec = c.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =
                                     ( SELECT MAX(f.syscreatedate)
                                       FROM wvjobafe f
                                       WHERE f.idwell = c.idwell)
ORDER BY
a.wellname

Open in new window

mskitten,
    There's either a problem with your example or your explanation.  As imran_fast has noted (although there's an extra comma before the Order-by), the grouping of the three columns will give you every unique combination of the three that actually exists in the data and the attendant sum.  It will not sum the two LOCATION=11111 as you had in your example (not that the sum you had was a sum either).  So, I'm not sure what you were trying to accomplish.  You may have to show us the actual SQL and data.

Good luck!
Avatar of mskitten

ASKER

Hello imran_fast,

When I try to run your script, I get an error at the ORDER BY clause, saying that a.wellname is not a GROUP BY expression.

================================================================================

Hello Geert Gruwez,

When I try your script, something seems to go wrong. I return many duplicate lines of the exact same data:

ROW    LOCATION                         NETWORK                   COST             JOB TYPE
1          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
2          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
3          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
4          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
5          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
...........

================================================================================

Hello DrSQL,

What I'd like to accomplish would look like the following:

ROW    LOCATION                         NETWORK                   COST             JOB TYPE
1          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
2          CRESTAR 6-18-76                 VCCW036457             513651.26      DRILLING
3          CRESTAR 6-18-76                  NOOWI63242             384733.82      COMPLETIONS
4          CTBK 10-33-65-9              XCCTT013083            244733.13      ABANDONMENT
5          CTBK 10-33-65-9              PLNIT3125348            244733.13      DRILLING
...........

As you can see, in rows 1-3, the same LOCATION is displayed, as well as the unique NETWORK and COST(sum) for each related JOB TYPE that was assigned to that specific LOCATION. Same applies for rows 4-5.  

Each LOCATION will have at least one JOB TYPE that has been assigned a unique NETWORK number, meaning that if there is more than one JOB TYPE per LOCATION (which is usually the case), the NETWORK numbers will be different (since two different JOB TYPES can't be assigned the same NETWORK) .

I tried to accomplish this output with the script I posted, however, I don't wan't to group the LOCATION, COST, or NETWORK. I'd only like to GROUP the JOB TYPE and then ORDER BY LOCATION.

In my mind, if I GROUP the JOB TYPE, it should display the corresponding NETWORK, COST total, and LOCATION it's associated with, however, I'm unable to do this because I keep getting a "not a GROUP BY expression" if I don't add the NETWORK and LOCATION to the GROUP BY clause.

If I do add the NETWORK and LOCATION to the GROUP BY clause, I return something like this:

ROW      LOCATION                                     NETWORK      COST                   JOB TYPE
1            NOEL A-94-I/93-P                          10223536       2238526.44         COMPLETIONS
2            PARKLDNE 1-21-14-26                 10218612       1447410.51         ABANDONMENT
3            VIK-KINS 2-15-47-10                     10213789       298955.28           WELL INTERVENTION

It only gives me PARTIAL information for what I need.  Each LOCATION above has more than one JOB TYPE, but because I had to group the LOCATION for the script to run, it returned unique LOCATION results.

Does this help any?  
Thanks so much!
mskitten,
    I think you misunderstand the meaning of the term "Group BY".  It doesn't mean the thing you want to organize by, it means these are the items where I want to see EVERY combination that exists in the data, anything else I only want to see the min, max, sum, or average.  So, you DO need to group by all three (as selected) in order to see what you want.  The solution is very similar to what imran_fast had, there were just a couple of syntax errors.  Here's what you should need:

Good luck!

SELECT
	SUBSTR(a.wellname,1,30) "LOCATION",
	SUBSTR(c.afenumber,1,15) "NETWORK",
	SUM(e.cost) as COST,
	b.wvtyp "JOB TYPE"
  FROM	wvwellheader a,
	wvjob b,
	wvjobafe c,
	wvjobreportcostgen e
WHERE
    a.idwell = b.idwell
AND c.idwell = b.idwell
AND e.idwell = b.idwell
AND c.idrec = b.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =
                                      ( SELECT MAX(f.syscreatedate)
                                       FROM wvjobafe f
                                       WHERE f.idwell = b.idwell) 
GROUP BY
  b.wvtyp,
  SUBSTR(a.wellname,1,30) ,
  SUBSTR(c.afenumber,1,15)
ORDER BY
 SUBSTR(a.wellname,1,30) ,
 SUBSTR(c.afenumber,1,15)
 
I couldn't resist adjusting your join - it's usually best to keep using the same table for the join condition rather than make it a string of a to b to c to e.  The issue you may have been having is that the group by has to be EXACTLY the same as what is selected in most cases.  And the Order by has to be something selected, not solting USED by a function in the select.  If this still isn't what you're looking for - and I understand if you're skeptical - we can try using analytical windowing functions, but I'm going to need some sample data.

Open in new window

SOLUTION
Avatar of Geert G
Geert G
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
this was the first step, you need to verify if this dataset is correct
off course this is without the SUM
Hello Geert,
I apologize for my delayed response. I was away.

I tried your script, but changed the Joins around a bit. I think the Joins were where most of my data issues came from.

Using the script I've pasted below, I have been able to generate correct  results (but only half).  For example;

LOCATION                          NETWORK                        COST            JOB TYPE
AMOCO CESS 14-11MU      VCCU001226                    56464.1         WELL INTERVENTION

56464.1  is the SUM of the particular WELL INTERVENTION JOB that was assigned the VCCU001226 NETWORK number for the AMOCO LOCATION (I calculated the sum manually, of course).

The costs that were returned for this job are correct, which is really good! However, there are 3 other JOB TYPES associated to the VCCU001226 NETWORK number. It seems like the script is only pulling the last JOB from that site.

What I'd like to pull, is the COST sum for each JOB TYPE  that is associated with the same NETWORK number. In somecases there are multiple JOBs of the same TYPE, AND use the same NETWORK. What I'd like to achieve is to have the script display the sum for the TYPE in general, rather than the last JOB entered. I'm thinking that this may also be a JOIN issue.

SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
e.cost COST,
b.wvtyp JOB
 
FROM wvwellheader a
   JOIN wvjob b ON a.idwell = b.idwell
   JOIN wvjobafe c ON b.idrec = c.idrecparent
   JOIN wvjobreport d ON b.idrec = d.idrecparent
   JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent  
   
  JOIN (SELECT idwell, MAX(syscreatedate) maxsyscreatedate FROM wvjobafe GROUP BY idwell) f
    ON f.idwell = c.idwell AND c.syscreatedate = f.maxsyscreatedate
WHERE c.afenumber IS NOT NULL
  AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
ORDER BY a.wellname

This is what the hierarchy looks like if it will help.

a.wellname (LOCATION)
idwell
   |
   -------   b.wvjob (JOB)
               idwell
               idrec
                    |
                    ---------------- c.wvjobafe (NETWORK)
                                           idwell
                                           idrec
                                           idrecparent
                                                   |
                                                   -----------------d. wvjobreport (DAILY ENTRY - Just used for Join)
                                                                           idwell
                                                                           idrec
                                                                           idrecparent
                                                                                      |
                                                                                      ------------------------- e.wvjobreportcostgen ($$)
                                                                                                                         idwell
                                                                                                                         idrec
                                                                                                                         idrecparent



      
So the idrecparent is the child of idrec. Sounds backwards, I know.
Sorry, this is how it should look!

a.wvwellheader(LOCATION)
idwell
   |
   -------   b.wvjob (JOB)
               idwell
               idrec
                    |
                    ---------------- c.wvjobafe (NETWORK)
                    |                      idwell
                    |                      idrec
                    |                      idrecparent
                    |                              
                     -----------------d. wvjobreport (DAILY ENTRY - Just used for Join)
                                                                           idwell
                                                                           idrec
                                                                           idrecparent
                                                                                      |
                                                                                      ------------------------- e.wvjobreportcostgen ($$)
                                                                                                                         idwell
                                                                                                                         idrec
                                                                                                                         idrecparent
Here's why it is only getting the most recent JOB: "c.syscreatedate = f.maxsyscreatedate"  You didn't mention the "f" table, what is its purpose here?  It should either be "c.syscreatedate <= f.maxsyscreatedate" or it should be removed.

Good luck!
Sorry about that. I removed that whole section completely, so now it's just

SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
e.cost COST,
b.wvtyp JOB
 
FROM wvwellheader a
   JOIN wvjob b ON a.idwell = b.idwell
   JOIN wvjobafe c ON b.idrec = c.idrecparent
   JOIN wvjobreport d ON b.idrec = d.idrecparent
   JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent
   
WHERE c.afenumber IS NOT NULL
  AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
ORDER BY a.wellname

I'm still only retrieving the most recent JOB . I really do apologize for my lack of understandig here, and just wanted to let you know that I do really appreciate all the help!
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I think this is it!! I changed a few things around and now it seems to be showing what I want! Here is the sript:

SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
SUM(e.cost) COST,
b.wvtyp JOB

FROM wvwellheader a
JOIN wvjob b ON a.idwell = b.idwell
JOIN wvjobafe c ON b.idrec = c.idrecparent
JOIN wvjobreport d ON b.idrec = d.idrecparent
JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent

WHERE c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/MAY/2007', 'DD/MON/YYYY')

GROUP BY a.wellname, c.afenumber, b.wvtyp
ORDER BY a.wellname

Thank you both very much for all of your help!!!  :D
125 points each for Geert and DrSQL