Solved

Group by one column only

Posted on 2008-06-24
18
1,589 Views
Last Modified: 2013-12-18
Hi,

I have a query that's using an implicit join. I would like to group the results by JOB TYPE only, listing the associated LOCATION and NETWORK number, and COST sum based on JOB TYPE. In order to get results without error, I have to add c.afenumber and a.wellname to the GROUP BY clause. I only want to group by b.wvtyp, since I'd like to see the duplicates for LOCATION. For example, I'd like to see:

LOCATION            NETWORK          COST             JOB TYPE
11111                       net1                30000                 jobtype1
11111                       net2                50000                 jobtype2

There are typically more than one Network numbers associated to a single location, which is why I'd like to see the Location duplicates. If I group by LOCATION, NETWORK, and JOB TYPE, I get the following:

LOCATION            NETWORK          COST             JOB TYPE
11111                       net1                30000                 jobtype1
22222                       net2                50000                 jobtype1
44444                       net6                20000                 jobtype3

It will only show me one NETWORK number for each LOCATION, since the LOCATION doesn't display the duplicates.

I hope I'm making sense, since I'm pretty new to SQL. Here is the code (without the working group by):

SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) as COST,
b.wvtyp "JOB TYPE"


FROM
wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e

WHERE
a.idwell = b.idwell
AND b.idwell = c.idwell
AND c.idwell = e.idwell
AND b.idrec = c.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =

                                     ( SELECT MAX(f.syscreatedate)
                                       FROM wvjobafe f
                                       WHERE f.idwell = c.idwell)


GROUP BY
b.wvtyp

ORDER BY
a.wellname

I hope I've provided some clarity in regards to what I'm trying to accomplish

Thanks!
0
Comment
Question by:mskitten
  • 7
  • 4
  • 3
  • +1
18 Comments
 
LVL 28

Expert Comment

by:imran_fast
ID: 21863030
Did you try this
SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) as COST,
b.wvtyp "JOB TYPE"
 
 
FROM
wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e
 
WHERE
a.idwell = b.idwell
AND b.idwell = c.idwell
AND c.idwell = e.idwell
AND b.idrec = c.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =
 
                                     ( SELECT MAX(f.syscreatedate)
                                       FROM wvjobafe f
                                       WHERE f.idwell = c.idwell)
 
 
GROUP BY
b.wvtyp,
SUBSTR(a.wellname,1,30) ,
SUBSTR(c.afenumber,1,15) ,
 
ORDER BY
a.wellname

Open in new window

0
 
LVL 37

Expert Comment

by:Geert Gruwez
ID: 21864728
use SUM() OVER (PARTITION BY ) you don't need a groub by and in the PARTITION BY you specify what you want to sum by ...

SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) OVER (PARTITION BY b.wvtyp) as COST,
b.wvtyp "JOB TYPE"
FROM
wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e
WHERE
a.idwell = b.idwell
AND b.idwell = c.idwell
AND c.idwell = e.idwell
AND b.idrec = c.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =
                                     ( SELECT MAX(f.syscreatedate)
                                       FROM wvjobafe f
                                       WHERE f.idwell = c.idwell)
ORDER BY
a.wellname

Open in new window

0
 
LVL 22

Expert Comment

by:DrSQL
ID: 21864765
mskitten,
    There's either a problem with your example or your explanation.  As imran_fast has noted (although there's an extra comma before the Order-by), the grouping of the three columns will give you every unique combination of the three that actually exists in the data and the attendant sum.  It will not sum the two LOCATION=11111 as you had in your example (not that the sum you had was a sum either).  So, I'm not sure what you were trying to accomplish.  You may have to show us the actual SQL and data.

Good luck!
0
MIM Survival Guide for Service Desk Managers

Major incidents can send mastered service desk processes into disorder. Systems and tools produce the data needed to resolve these incidents, but your challenge is getting that information to the right people fast. Check out the Survival Guide and begin bringing order to chaos.

 

Author Comment

by:mskitten
ID: 21870974
Hello imran_fast,

When I try to run your script, I get an error at the ORDER BY clause, saying that a.wellname is not a GROUP BY expression.

================================================================================

Hello Geert Gruwez,

When I try your script, something seems to go wrong. I return many duplicate lines of the exact same data:

ROW    LOCATION                         NETWORK                   COST             JOB TYPE
1          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
2          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
3          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
4          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
5          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
...........

================================================================================

Hello DrSQL,

What I'd like to accomplish would look like the following:

ROW    LOCATION                         NETWORK                   COST             JOB TYPE
1          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
2          CRESTAR 6-18-76                 VCCW036457             513651.26      DRILLING
3          CRESTAR 6-18-76                  NOOWI63242             384733.82      COMPLETIONS
4          CTBK 10-33-65-9              XCCTT013083            244733.13      ABANDONMENT
5          CTBK 10-33-65-9              PLNIT3125348            244733.13      DRILLING
...........

As you can see, in rows 1-3, the same LOCATION is displayed, as well as the unique NETWORK and COST(sum) for each related JOB TYPE that was assigned to that specific LOCATION. Same applies for rows 4-5.  

Each LOCATION will have at least one JOB TYPE that has been assigned a unique NETWORK number, meaning that if there is more than one JOB TYPE per LOCATION (which is usually the case), the NETWORK numbers will be different (since two different JOB TYPES can't be assigned the same NETWORK) .

I tried to accomplish this output with the script I posted, however, I don't wan't to group the LOCATION, COST, or NETWORK. I'd only like to GROUP the JOB TYPE and then ORDER BY LOCATION.

In my mind, if I GROUP the JOB TYPE, it should display the corresponding NETWORK, COST total, and LOCATION it's associated with, however, I'm unable to do this because I keep getting a "not a GROUP BY expression" if I don't add the NETWORK and LOCATION to the GROUP BY clause.

If I do add the NETWORK and LOCATION to the GROUP BY clause, I return something like this:

ROW      LOCATION                                     NETWORK      COST                   JOB TYPE
1            NOEL A-94-I/93-P                          10223536       2238526.44         COMPLETIONS
2            PARKLDNE 1-21-14-26                 10218612       1447410.51         ABANDONMENT
3            VIK-KINS 2-15-47-10                     10213789       298955.28           WELL INTERVENTION

It only gives me PARTIAL information for what I need.  Each LOCATION above has more than one JOB TYPE, but because I had to group the LOCATION for the script to run, it returned unique LOCATION results.

Does this help any?  
Thanks so much!
0
 
LVL 22

Expert Comment

by:DrSQL
ID: 21871276
mskitten,
    I think you misunderstand the meaning of the term "Group BY".  It doesn't mean the thing you want to organize by, it means these are the items where I want to see EVERY combination that exists in the data, anything else I only want to see the min, max, sum, or average.  So, you DO need to group by all three (as selected) in order to see what you want.  The solution is very similar to what imran_fast had, there were just a couple of syntax errors.  Here's what you should need:

Good luck!

SELECT
	SUBSTR(a.wellname,1,30) "LOCATION",
	SUBSTR(c.afenumber,1,15) "NETWORK",
	SUM(e.cost) as COST,
	b.wvtyp "JOB TYPE"
  FROM	wvwellheader a,
	wvjob b,
	wvjobafe c,
	wvjobreportcostgen e
WHERE
    a.idwell = b.idwell
AND c.idwell = b.idwell
AND e.idwell = b.idwell
AND c.idrec = b.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =
                                      ( SELECT MAX(f.syscreatedate)
                                       FROM wvjobafe f
                                       WHERE f.idwell = b.idwell) 
GROUP BY
  b.wvtyp,
  SUBSTR(a.wellname,1,30) ,
  SUBSTR(c.afenumber,1,15)
ORDER BY
 SUBSTR(a.wellname,1,30) ,
 SUBSTR(c.afenumber,1,15)
 
I couldn't resist adjusting your join - it's usually best to keep using the same table for the join condition rather than make it a string of a to b to c to e.  The issue you may have been having is that the group by has to be EXACTLY the same as what is selected in most cases.  And the Order by has to be something selected, not solting USED by a function in the select.  If this still isn't what you're looking for - and I understand if you're skeptical - we can try using analytical windowing functions, but I'm going to need some sample data.

Open in new window

0
 
LVL 37

Assisted Solution

by:Geert Gruwez
Geert Gruwez earned 125 total points
ID: 21872770
if you list for each table what the next table is coupled without any other where elements we can start from there and build the query.
and if possible for each table what specific records you want

Sometimes it's difficult to see the forest between trees ...

I tend to use ANSI joins to make things easier to read.
You get to see what table is matched to what other table with what keys ...

example as I understand your querie from the top sample:
The WHERE becomes shorter like this

SELECT 
  SUBSTR(a.wellname,1,30) "LOCATION",
  SUBSTR(c.afenumber,1,15) "NETWORK",
  e.COST,
  b.wvtyp "JOB TYPE"
FROM wvwellheader a
  JOIN wvjob b ON a.idwell = b.idwell
  JOIN wvjobafe c ON b.idwell = c.idwell AND c.idrecparent = b.idrec
  JOIN wvjobreportcostgen e ON c.idwell = e.idwell
  JOIN (SELECT idwell, MAX(syscreatedate) maxsyscreatedate FROM wvjobafe GROUP BY idwell) f 
    ON f.idwell = c.idwell AND c.syscreatedate = f.maxsyscreatedate
WHERE c.afenumber IS NOT NULL
  AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
ORDER BY a.wellname

Open in new window

0
 
LVL 37

Expert Comment

by:Geert Gruwez
ID: 21872772
this was the first step, you need to verify if this dataset is correct
off course this is without the SUM
0
 

Author Comment

by:mskitten
ID: 21920521
Hello Geert,
I apologize for my delayed response. I was away.

I tried your script, but changed the Joins around a bit. I think the Joins were where most of my data issues came from.

Using the script I've pasted below, I have been able to generate correct  results (but only half).  For example;

LOCATION                          NETWORK                        COST            JOB TYPE
AMOCO CESS 14-11MU      VCCU001226                    56464.1         WELL INTERVENTION

56464.1  is the SUM of the particular WELL INTERVENTION JOB that was assigned the VCCU001226 NETWORK number for the AMOCO LOCATION (I calculated the sum manually, of course).

The costs that were returned for this job are correct, which is really good! However, there are 3 other JOB TYPES associated to the VCCU001226 NETWORK number. It seems like the script is only pulling the last JOB from that site.

What I'd like to pull, is the COST sum for each JOB TYPE  that is associated with the same NETWORK number. In somecases there are multiple JOBs of the same TYPE, AND use the same NETWORK. What I'd like to achieve is to have the script display the sum for the TYPE in general, rather than the last JOB entered. I'm thinking that this may also be a JOIN issue.

SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
e.cost COST,
b.wvtyp JOB
 
FROM wvwellheader a
   JOIN wvjob b ON a.idwell = b.idwell
   JOIN wvjobafe c ON b.idrec = c.idrecparent
   JOIN wvjobreport d ON b.idrec = d.idrecparent
   JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent  
   
  JOIN (SELECT idwell, MAX(syscreatedate) maxsyscreatedate FROM wvjobafe GROUP BY idwell) f
    ON f.idwell = c.idwell AND c.syscreatedate = f.maxsyscreatedate
WHERE c.afenumber IS NOT NULL
  AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
ORDER BY a.wellname

This is what the hierarchy looks like if it will help.

a.wellname (LOCATION)
idwell
   |
   -------   b.wvjob (JOB)
               idwell
               idrec
                    |
                    ---------------- c.wvjobafe (NETWORK)
                                           idwell
                                           idrec
                                           idrecparent
                                                   |
                                                   -----------------d. wvjobreport (DAILY ENTRY - Just used for Join)
                                                                           idwell
                                                                           idrec
                                                                           idrecparent
                                                                                      |
                                                                                      ------------------------- e.wvjobreportcostgen ($$)
                                                                                                                         idwell
                                                                                                                         idrec
                                                                                                                         idrecparent



      
0
 

Author Comment

by:mskitten
ID: 21920650
So the idrecparent is the child of idrec. Sounds backwards, I know.
0
 

Author Comment

by:mskitten
ID: 21920736
Sorry, this is how it should look!

a.wvwellheader(LOCATION)
idwell
   |
   -------   b.wvjob (JOB)
               idwell
               idrec
                    |
                    ---------------- c.wvjobafe (NETWORK)
                    |                      idwell
                    |                      idrec
                    |                      idrecparent
                    |                              
                     -----------------d. wvjobreport (DAILY ENTRY - Just used for Join)
                                                                           idwell
                                                                           idrec
                                                                           idrecparent
                                                                                      |
                                                                                      ------------------------- e.wvjobreportcostgen ($$)
                                                                                                                         idwell
                                                                                                                         idrec
                                                                                                                         idrecparent
0
 
LVL 22

Expert Comment

by:DrSQL
ID: 21920813
Here's why it is only getting the most recent JOB: "c.syscreatedate = f.maxsyscreatedate"  You didn't mention the "f" table, what is its purpose here?  It should either be "c.syscreatedate <= f.maxsyscreatedate" or it should be removed.

Good luck!
0
 

Author Comment

by:mskitten
ID: 21921330
Sorry about that. I removed that whole section completely, so now it's just

SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
e.cost COST,
b.wvtyp JOB
 
FROM wvwellheader a
   JOIN wvjob b ON a.idwell = b.idwell
   JOIN wvjobafe c ON b.idrec = c.idrecparent
   JOIN wvjobreport d ON b.idrec = d.idrecparent
   JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent
   
WHERE c.afenumber IS NOT NULL
  AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
ORDER BY a.wellname

I'm still only retrieving the most recent JOB . I really do apologize for my lack of understandig here, and just wanted to let you know that I do really appreciate all the help!
0
 
LVL 22

Accepted Solution

by:
DrSQL earned 125 total points
ID: 21922234
I don't know the dat eof the prior jobs, but if they are before 1/JUN/2008, then you are excluding them.  If the problem is the relationship between wvjob (b) and wvbjobafe (c) we might need more information.  What I see right now is that idrecparent is always a foreign key back to the next higher table in the hierarchy except for a and b.  Not really a relational design, but we can work with it.

We need a join that gets all of the records into opplace and then we can group them:

select wellname, network, sum(cost) cost, job
from (
    SELECT
              a.wellname,
              SUBSTR(c.afenumber,1,15) NETWORK,
              e.cost COST,
              b.wvtyp JOB
                FROM wvwellheader a, wvjob b, wvjobafe c, wvjobreportcostgen e
               where a.idwell = b.idwell and
                          b.idrec = c.idrecparent and
                          c.afenumber is not null and
                          b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY') -- make sure this is after all but the last job
                          e.recidparent in (select idrec from wvjobreport d
                                                      where idrecparent = b.idrec)
        )
GROUP by wellname, network, job
order by wellname,network,job
/

If this doesn't work we'll need you to state the relationship between the tables:

Example:
   There is one wellheader for one or more jobs (connected by idwell)
   There is one job for zero or more jobafes (connected by job idrec = jobafe idrecparent)
   etc.

Good luck!
0
 

Author Comment

by:mskitten
ID: 21948554
I think this is it!! I changed a few things around and now it seems to be showing what I want! Here is the sript:

SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
SUM(e.cost) COST,
b.wvtyp JOB

FROM wvwellheader a
JOIN wvjob b ON a.idwell = b.idwell
JOIN wvjobafe c ON b.idrec = c.idrecparent
JOIN wvjobreport d ON b.idrec = d.idrecparent
JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent

WHERE c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/MAY/2007', 'DD/MON/YYYY')

GROUP BY a.wellname, c.afenumber, b.wvtyp
ORDER BY a.wellname

Thank you both very much for all of your help!!!  :D
0
 

Author Comment

by:mskitten
ID: 21948574
125 points each for Geert and DrSQL
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
add more rows to hierarchy 3 34
oracle query 3 26
Display field if column exists 7 33
Converting Stored Procedure to SQL Statement 5 40
How to Unravel a Tricky Query Introduction If you browse through the Oracle zones or any of the other database-related zones you'll come across some complicated solutions and sometimes you'll just have to wonder how anyone came up with them.  …
When it comes to protecting Oracle Database servers and systems, there are a ton of myths out there. Here are the most common.
Via a live example show how to connect to RMAN, make basic configuration settings changes and then take a backup of a demo database
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question