Solved

Group by one column only

Posted on 2008-06-24
18
1,583 Views
Last Modified: 2013-12-18
Hi,

I have a query that's using an implicit join. I would like to group the results by JOB TYPE only, listing the associated LOCATION and NETWORK number, and COST sum based on JOB TYPE. In order to get results without error, I have to add c.afenumber and a.wellname to the GROUP BY clause. I only want to group by b.wvtyp, since I'd like to see the duplicates for LOCATION. For example, I'd like to see:

LOCATION            NETWORK          COST             JOB TYPE
11111                       net1                30000                 jobtype1
11111                       net2                50000                 jobtype2

There are typically more than one Network numbers associated to a single location, which is why I'd like to see the Location duplicates. If I group by LOCATION, NETWORK, and JOB TYPE, I get the following:

LOCATION            NETWORK          COST             JOB TYPE
11111                       net1                30000                 jobtype1
22222                       net2                50000                 jobtype1
44444                       net6                20000                 jobtype3

It will only show me one NETWORK number for each LOCATION, since the LOCATION doesn't display the duplicates.

I hope I'm making sense, since I'm pretty new to SQL. Here is the code (without the working group by):

SELECT
SUBSTR(a.wellname,1,30) "LOCATION",
SUBSTR(c.afenumber,1,15) "NETWORK",
SUM(e.cost) as COST,
b.wvtyp "JOB TYPE"


FROM
wvwellheader a,
wvjob b,
wvjobafe c,
wvjobreportcostgen e

WHERE
a.idwell = b.idwell
AND b.idwell = c.idwell
AND c.idwell = e.idwell
AND b.idrec = c.idrecparent
AND c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
AND c.syscreatedate =

                                     ( SELECT MAX(f.syscreatedate)
                                       FROM wvjobafe f
                                       WHERE f.idwell = c.idwell)


GROUP BY
b.wvtyp

ORDER BY
a.wellname

I hope I've provided some clarity in regards to what I'm trying to accomplish

Thanks!
0
Comment
Question by:mskitten
  • 7
  • 4
  • 3
  • +1
18 Comments
 
LVL 28

Expert Comment

by:imran_fast
ID: 21863030
Did you try this
SELECT

SUBSTR(a.wellname,1,30) "LOCATION",

SUBSTR(c.afenumber,1,15) "NETWORK",

SUM(e.cost) as COST,

b.wvtyp "JOB TYPE"
 
 

FROM

wvwellheader a,

wvjob b,

wvjobafe c,

wvjobreportcostgen e
 

WHERE

a.idwell = b.idwell

AND b.idwell = c.idwell

AND c.idwell = e.idwell

AND b.idrec = c.idrecparent

AND c.afenumber IS NOT NULL

AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')

AND c.syscreatedate =
 

                                     ( SELECT MAX(f.syscreatedate)

                                       FROM wvjobafe f

                                       WHERE f.idwell = c.idwell)
 
 

GROUP BY

b.wvtyp,

SUBSTR(a.wellname,1,30) ,

SUBSTR(c.afenumber,1,15) ,
 

ORDER BY

a.wellname

Open in new window

0
 
LVL 36

Expert Comment

by:Geert Gruwez
ID: 21864728
use SUM() OVER (PARTITION BY ) you don't need a groub by and in the PARTITION BY you specify what you want to sum by ...

SELECT

SUBSTR(a.wellname,1,30) "LOCATION",

SUBSTR(c.afenumber,1,15) "NETWORK",

SUM(e.cost) OVER (PARTITION BY b.wvtyp) as COST,

b.wvtyp "JOB TYPE"

FROM

wvwellheader a,

wvjob b,

wvjobafe c,

wvjobreportcostgen e

WHERE

a.idwell = b.idwell

AND b.idwell = c.idwell

AND c.idwell = e.idwell

AND b.idrec = c.idrecparent

AND c.afenumber IS NOT NULL

AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')

AND c.syscreatedate =

                                     ( SELECT MAX(f.syscreatedate)

                                       FROM wvjobafe f

                                       WHERE f.idwell = c.idwell)

ORDER BY

a.wellname

Open in new window

0
 
LVL 22

Expert Comment

by:DrSQL
ID: 21864765
mskitten,
    There's either a problem with your example or your explanation.  As imran_fast has noted (although there's an extra comma before the Order-by), the grouping of the three columns will give you every unique combination of the three that actually exists in the data and the attendant sum.  It will not sum the two LOCATION=11111 as you had in your example (not that the sum you had was a sum either).  So, I'm not sure what you were trying to accomplish.  You may have to show us the actual SQL and data.

Good luck!
0
 

Author Comment

by:mskitten
ID: 21870974
Hello imran_fast,

When I try to run your script, I get an error at the ORDER BY clause, saying that a.wellname is not a GROUP BY expression.

================================================================================

Hello Geert Gruwez,

When I try your script, something seems to go wrong. I return many duplicate lines of the exact same data:

ROW    LOCATION                         NETWORK                   COST             JOB TYPE
1          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
2          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
3          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
4          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
5          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
...........

================================================================================

Hello DrSQL,

What I'd like to accomplish would look like the following:

ROW    LOCATION                         NETWORK                   COST             JOB TYPE
1          CRESTAR 6-18-76                 VCCU001307              244733.13      WELL INTERVENTION
2          CRESTAR 6-18-76                 VCCW036457             513651.26      DRILLING
3          CRESTAR 6-18-76                  NOOWI63242             384733.82      COMPLETIONS
4          CTBK 10-33-65-9              XCCTT013083            244733.13      ABANDONMENT
5          CTBK 10-33-65-9              PLNIT3125348            244733.13      DRILLING
...........

As you can see, in rows 1-3, the same LOCATION is displayed, as well as the unique NETWORK and COST(sum) for each related JOB TYPE that was assigned to that specific LOCATION. Same applies for rows 4-5.  

Each LOCATION will have at least one JOB TYPE that has been assigned a unique NETWORK number, meaning that if there is more than one JOB TYPE per LOCATION (which is usually the case), the NETWORK numbers will be different (since two different JOB TYPES can't be assigned the same NETWORK) .

I tried to accomplish this output with the script I posted, however, I don't wan't to group the LOCATION, COST, or NETWORK. I'd only like to GROUP the JOB TYPE and then ORDER BY LOCATION.

In my mind, if I GROUP the JOB TYPE, it should display the corresponding NETWORK, COST total, and LOCATION it's associated with, however, I'm unable to do this because I keep getting a "not a GROUP BY expression" if I don't add the NETWORK and LOCATION to the GROUP BY clause.

If I do add the NETWORK and LOCATION to the GROUP BY clause, I return something like this:

ROW      LOCATION                                     NETWORK      COST                   JOB TYPE
1            NOEL A-94-I/93-P                          10223536       2238526.44         COMPLETIONS
2            PARKLDNE 1-21-14-26                 10218612       1447410.51         ABANDONMENT
3            VIK-KINS 2-15-47-10                     10213789       298955.28           WELL INTERVENTION

It only gives me PARTIAL information for what I need.  Each LOCATION above has more than one JOB TYPE, but because I had to group the LOCATION for the script to run, it returned unique LOCATION results.

Does this help any?  
Thanks so much!
0
 
LVL 22

Expert Comment

by:DrSQL
ID: 21871276
mskitten,
    I think you misunderstand the meaning of the term "Group BY".  It doesn't mean the thing you want to organize by, it means these are the items where I want to see EVERY combination that exists in the data, anything else I only want to see the min, max, sum, or average.  So, you DO need to group by all three (as selected) in order to see what you want.  The solution is very similar to what imran_fast had, there were just a couple of syntax errors.  Here's what you should need:

Good luck!

SELECT

	SUBSTR(a.wellname,1,30) "LOCATION",

	SUBSTR(c.afenumber,1,15) "NETWORK",

	SUM(e.cost) as COST,

	b.wvtyp "JOB TYPE"

  FROM	wvwellheader a,

	wvjob b,

	wvjobafe c,

	wvjobreportcostgen e

WHERE

    a.idwell = b.idwell

AND c.idwell = b.idwell

AND e.idwell = b.idwell

AND c.idrec = b.idrecparent

AND c.afenumber IS NOT NULL

AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')

AND c.syscreatedate =

                                      ( SELECT MAX(f.syscreatedate)

                                       FROM wvjobafe f

                                       WHERE f.idwell = b.idwell) 

GROUP BY

  b.wvtyp,

  SUBSTR(a.wellname,1,30) ,

  SUBSTR(c.afenumber,1,15)

ORDER BY

 SUBSTR(a.wellname,1,30) ,

 SUBSTR(c.afenumber,1,15)
 

I couldn't resist adjusting your join - it's usually best to keep using the same table for the join condition rather than make it a string of a to b to c to e.  The issue you may have been having is that the group by has to be EXACTLY the same as what is selected in most cases.  And the Order by has to be something selected, not solting USED by a function in the select.  If this still isn't what you're looking for - and I understand if you're skeptical - we can try using analytical windowing functions, but I'm going to need some sample data.

Open in new window

0
 
LVL 36

Assisted Solution

by:Geert Gruwez
Geert Gruwez earned 125 total points
ID: 21872770
if you list for each table what the next table is coupled without any other where elements we can start from there and build the query.
and if possible for each table what specific records you want

Sometimes it's difficult to see the forest between trees ...

I tend to use ANSI joins to make things easier to read.
You get to see what table is matched to what other table with what keys ...

example as I understand your querie from the top sample:
The WHERE becomes shorter like this

SELECT 

  SUBSTR(a.wellname,1,30) "LOCATION",

  SUBSTR(c.afenumber,1,15) "NETWORK",

  e.COST,

  b.wvtyp "JOB TYPE"

FROM wvwellheader a

  JOIN wvjob b ON a.idwell = b.idwell

  JOIN wvjobafe c ON b.idwell = c.idwell AND c.idrecparent = b.idrec

  JOIN wvjobreportcostgen e ON c.idwell = e.idwell

  JOIN (SELECT idwell, MAX(syscreatedate) maxsyscreatedate FROM wvjobafe GROUP BY idwell) f 

    ON f.idwell = c.idwell AND c.syscreatedate = f.maxsyscreatedate

WHERE c.afenumber IS NOT NULL

  AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')

ORDER BY a.wellname

Open in new window

0
 
LVL 36

Expert Comment

by:Geert Gruwez
ID: 21872772
this was the first step, you need to verify if this dataset is correct
off course this is without the SUM
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:mskitten
ID: 21920521
Hello Geert,
I apologize for my delayed response. I was away.

I tried your script, but changed the Joins around a bit. I think the Joins were where most of my data issues came from.

Using the script I've pasted below, I have been able to generate correct  results (but only half).  For example;

LOCATION                          NETWORK                        COST            JOB TYPE
AMOCO CESS 14-11MU      VCCU001226                    56464.1         WELL INTERVENTION

56464.1  is the SUM of the particular WELL INTERVENTION JOB that was assigned the VCCU001226 NETWORK number for the AMOCO LOCATION (I calculated the sum manually, of course).

The costs that were returned for this job are correct, which is really good! However, there are 3 other JOB TYPES associated to the VCCU001226 NETWORK number. It seems like the script is only pulling the last JOB from that site.

What I'd like to pull, is the COST sum for each JOB TYPE  that is associated with the same NETWORK number. In somecases there are multiple JOBs of the same TYPE, AND use the same NETWORK. What I'd like to achieve is to have the script display the sum for the TYPE in general, rather than the last JOB entered. I'm thinking that this may also be a JOIN issue.

SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
e.cost COST,
b.wvtyp JOB
 
FROM wvwellheader a
   JOIN wvjob b ON a.idwell = b.idwell
   JOIN wvjobafe c ON b.idrec = c.idrecparent
   JOIN wvjobreport d ON b.idrec = d.idrecparent
   JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent  
   
  JOIN (SELECT idwell, MAX(syscreatedate) maxsyscreatedate FROM wvjobafe GROUP BY idwell) f
    ON f.idwell = c.idwell AND c.syscreatedate = f.maxsyscreatedate
WHERE c.afenumber IS NOT NULL
  AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
ORDER BY a.wellname

This is what the hierarchy looks like if it will help.

a.wellname (LOCATION)
idwell
   |
   -------   b.wvjob (JOB)
               idwell
               idrec
                    |
                    ---------------- c.wvjobafe (NETWORK)
                                           idwell
                                           idrec
                                           idrecparent
                                                   |
                                                   -----------------d. wvjobreport (DAILY ENTRY - Just used for Join)
                                                                           idwell
                                                                           idrec
                                                                           idrecparent
                                                                                      |
                                                                                      ------------------------- e.wvjobreportcostgen ($$)
                                                                                                                         idwell
                                                                                                                         idrec
                                                                                                                         idrecparent



      
0
 

Author Comment

by:mskitten
ID: 21920650
So the idrecparent is the child of idrec. Sounds backwards, I know.
0
 

Author Comment

by:mskitten
ID: 21920736
Sorry, this is how it should look!

a.wvwellheader(LOCATION)
idwell
   |
   -------   b.wvjob (JOB)
               idwell
               idrec
                    |
                    ---------------- c.wvjobafe (NETWORK)
                    |                      idwell
                    |                      idrec
                    |                      idrecparent
                    |                              
                     -----------------d. wvjobreport (DAILY ENTRY - Just used for Join)
                                                                           idwell
                                                                           idrec
                                                                           idrecparent
                                                                                      |
                                                                                      ------------------------- e.wvjobreportcostgen ($$)
                                                                                                                         idwell
                                                                                                                         idrec
                                                                                                                         idrecparent
0
 
LVL 22

Expert Comment

by:DrSQL
ID: 21920813
Here's why it is only getting the most recent JOB: "c.syscreatedate = f.maxsyscreatedate"  You didn't mention the "f" table, what is its purpose here?  It should either be "c.syscreatedate <= f.maxsyscreatedate" or it should be removed.

Good luck!
0
 

Author Comment

by:mskitten
ID: 21921330
Sorry about that. I removed that whole section completely, so now it's just

SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
e.cost COST,
b.wvtyp JOB
 
FROM wvwellheader a
   JOIN wvjob b ON a.idwell = b.idwell
   JOIN wvjobafe c ON b.idrec = c.idrecparent
   JOIN wvjobreport d ON b.idrec = d.idrecparent
   JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent
   
WHERE c.afenumber IS NOT NULL
  AND b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY')
ORDER BY a.wellname

I'm still only retrieving the most recent JOB . I really do apologize for my lack of understandig here, and just wanted to let you know that I do really appreciate all the help!
0
 
LVL 22

Accepted Solution

by:
DrSQL earned 125 total points
ID: 21922234
I don't know the dat eof the prior jobs, but if they are before 1/JUN/2008, then you are excluding them.  If the problem is the relationship between wvjob (b) and wvbjobafe (c) we might need more information.  What I see right now is that idrecparent is always a foreign key back to the next higher table in the hierarchy except for a and b.  Not really a relational design, but we can work with it.

We need a join that gets all of the records into opplace and then we can group them:

select wellname, network, sum(cost) cost, job
from (
    SELECT
              a.wellname,
              SUBSTR(c.afenumber,1,15) NETWORK,
              e.cost COST,
              b.wvtyp JOB
                FROM wvwellheader a, wvjob b, wvjobafe c, wvjobreportcostgen e
               where a.idwell = b.idwell and
                          b.idrec = c.idrecparent and
                          c.afenumber is not null and
                          b.DTTMSTART > to_date( '01/JUN/2008', 'DD/MON/YYYY') -- make sure this is after all but the last job
                          e.recidparent in (select idrec from wvjobreport d
                                                      where idrecparent = b.idrec)
        )
GROUP by wellname, network, job
order by wellname,network,job
/

If this doesn't work we'll need you to state the relationship between the tables:

Example:
   There is one wellheader for one or more jobs (connected by idwell)
   There is one job for zero or more jobafes (connected by job idrec = jobafe idrecparent)
   etc.

Good luck!
0
 

Author Comment

by:mskitten
ID: 21948554
I think this is it!! I changed a few things around and now it seems to be showing what I want! Here is the sript:

SELECT
a.wellname,
SUBSTR(c.afenumber,1,15) NETWORK,
SUM(e.cost) COST,
b.wvtyp JOB

FROM wvwellheader a
JOIN wvjob b ON a.idwell = b.idwell
JOIN wvjobafe c ON b.idrec = c.idrecparent
JOIN wvjobreport d ON b.idrec = d.idrecparent
JOIN wvjobreportcostgen e ON d.idrec = e.idrecparent

WHERE c.afenumber IS NOT NULL
AND b.DTTMSTART > to_date( '01/MAY/2007', 'DD/MON/YYYY')

GROUP BY a.wellname, c.afenumber, b.wvtyp
ORDER BY a.wellname

Thank you both very much for all of your help!!!  :D
0
 

Author Comment

by:mskitten
ID: 21948574
125 points each for Geert and DrSQL
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Introduction A previously published article on Experts Exchange ("Joins in Oracle", http://www.experts-exchange.com/Database/Oracle/A_8249-Joins-in-Oracle.html) makes a statement about "Oracle proprietary" joins and mixes the join syntax with gen…
Background In several of the companies I have worked for, I noticed that corporate reporting is off loaded from the production database and done mainly on a clone database which needs to be kept up to date daily by various means, be it a logical…
This video explains at a high level with the mandatory Oracle Memory processes are as well as touching on some of the more common optional ones.
This video shows how to Export data from an Oracle database using the Datapump Export Utility.  The corresponding Datapump Import utility is also discussed and demonstrated.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now