Solved

SQL select first and last records from big resultset (DB2 and Oracle)

Posted on 2010-11-13
10
1,545 Views
Last Modified: 2012-08-14
Hi.

Seems I need a complex thing - a way to select only FIRST and LAST records from big resultset.

The query itself is unpredictable - it is built dynamically based on some set of rules. Only the known thing is - it is always selecting data from ~2-4 tables: ORDER, SUBORDER, BATCH, BATCHITEM.

So, query could looks like:

SELECT s.ordnr, s.subOrdId, s.calloff, o.seqnr, ...
  FROM SubOrder S, Order O
 WHERE s.dept = ? AND s.location = ?
   AND o.dept = s.dept AND o.location = s.location
   AND o.ordnr = s.ordnr
   AND o.status = ?  AND o.pline = ? AND ...
   AND NOT EXISTS ( SELECT i.ordnr FROM BatchItem I, Batch T WHERE .... )
 ORDER BY s.refNo, s.calloff
BROWSE ACCESS

The WHERE clause is unpredictable (it could be any - depends on configuration) - could be any combination of fields in ORDER or SUBORDER tables.
And the ORDER BY clause is unpredictable - could be any (fields used in ORDER BY usually is not part of key) and there could be duplicate values, so values used in ORDER BY are not unique for 100%. Better to say - one of fields in ORDER BY could be unique but it is not predictable what of.
Such resutset could have a big amount of data and the thing I need - to TAKE ONLY THE FIRST and THE LAST records from such resultset.

And SQL should works for DB2 (target version is 8.2+) and for Oracle (target version is 10gR2+).

My concerns are:

1) loading all the data could be a "heavy" operation (consume memory and affects system performance)
2) executing query with the same WHERE + ORDER BY more than 1 time could be a "heavy" operation

So, that is why I'm thinking to take only first and last records from a resultset.

The first thing I have tried - replaced list of fields to select with aggregate functions like count(*), min(ordnr), max(ordnr) but it fail because of ORDER BY. And seems nor DB2 nor Oracle has FIRST() or LAST() functions :-\

Do you have any hints/ideas on how to solve this task?

Regards,
Dmitry.
0
Comment
Question by:Dmitry_Bond
10 Comments
 
LVL 8

Accepted Solution

by:
raulggonzalez earned 125 total points
ID: 34126584
Hi,

You can use

WITH CTE AS
(
HERE YOUR QUERY
)

And after thartt you can do

SELECT field1, field2, fieldn, MIN(fieldX) AS ColName
FROM CTE
GROUP BY field1, field2, fieldn
UNION
SELECT field1, field2, fieldn, MIN(fieldX) AS ColName
FROM CTE
GROUP BY field1, field2, fieldn


Good luck
0
 
LVL 37

Assisted Solution

by:momi_sabag
momi_sabag earned 125 total points
ID: 34126938
well, you can pull it off in one query in the following way, but do take into mind that it will be very slow (unless you have an index on every combination of the order by clause)

select * from (
select *, min(rown) over() minr, max(rown) ovre() maxr
from (select row_number() over(order by -your_order-by-columns) rown, rest-of-your-columns
          from ... -rest-of-your-select
         ) t
) t2
where rown = minr
or        rown = maxr
0
 
LVL 6

Expert Comment

by:dragos_craciun
ID: 34131472
The answer is very easy IF you have appropriate indexes.

The idea is to do something like this (for Oracle):

select * from (your query order by your criteria) where rownum = 1
union
select * from (your query order by reversed order ) where rownum = 1

There are several requirements:
a) your query has to NOT contain any group by ( this has to consume all data and will take a long time )
b) you need to have the appropriate indexes in place so the query plan can use only NESTED LOOPS. You may need to force this using hints like FIRST ROWS or USE_NL

For DB2 ... I have no idea unfortunately
0
 
LVL 6

Expert Comment

by:dragos_craciun
ID: 34131494
When I said reversed order I mean something like this:

Normal: ORDER BY COL1 ASC, COL2 ASC, COL3 ASC

Reverse: ORDER BY COL1 DESC, COL2 DESC, COL3 DESC

So the order of the columns in the ORDER BY clause does not change, you only replace ASC with DESC and DESC with ASC
0
 
LVL 8

Assisted Solution

by:mustaccio
mustaccio earned 125 total points
ID: 34142114
> My concerns are:
>
> 1) loading all the data could be a "heavy" operation (consume memory and affects system
> performance)
> 2) executing query with the same WHERE + ORDER BY more than 1 time could be a "heavy"
> operation
>
> So, that is why I'm thinking to take only first and last records from a resultset.

Records in a table or a result set have no intrinsic ordering. Therefore, "first" and "last" have no meaning unless you specify the order, in which case you pay the price of performing a sort.

Here's one option that should run in both DB2 8.2 and Oracle 10g R2
select * from (
select allyourcolumns
, row_number() over (order by whatever asc) rn1
, row_number() over (order by whatever desc) rn2
where yoursearchcriteria
) t
where rn1 = 1 or rn2 = 1

Open in new window

0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 6

Expert Comment

by:dragos_craciun
ID: 34143947
Hi Dmitry,

Your comments made me ask myself a question:
Do you really need the "first" and "last" records? It does not seem logical.
Why do you need these?
Having some arbitrary rows are not enough?
0
 

Author Comment

by:Dmitry_Bond
ID: 34178924
I need to gather statistic for a "pipeline" of customer orders, so need only first and last items from a long reseltset (sorted in a special way).
0
 
LVL 6

Expert Comment

by:dragos_craciun
ID: 34179935
In order to get first and last record very fast you need to have an index, but also you need to to meet certain requirements.

For example: I have a table of invoices, and I need to find first and last of them ordered by date.
You add an index on date and you write something like (ORACLE):

SELECT * from INVOICES WHERE DATE = (SELECT MIN(DATE) FROM INVOICES)
and
SELECT * from INVOICES WHERE DATE = (SELECT MAX(DATE) FROM INVOICES)

if you have the index and properly gathered statistics it will be very fast. You need to have good statistics otherwise the server may choose a bad plan and it will be very slow.

If you have two columns that tell your order you need to have an index on both of them.

You said you can have multiple variants of the query. You need to have the appropriate indexes for every possible query.

Now, if your business rules requires you to take first and last that also meet some other criteria everything becomes more complicated.

If you can share with us more details about the business problem maybe we can find a better approach.

For example Oracle ( I don't know about DB2 ) store a lot of statistics about the data (if you gather stats). These are accessible and may be useful for you, as you said you need some "statistics"
0
 

Author Comment

by:Dmitry_Bond
ID: 34180067
I know the SQL on a bit higher level than you suggesting solution for. Unfortunately it is not a solution in my case! If things would be such simple I even not asked questions here. :)

So, please read carefully original question - you can see that there lot of different conditions in a WHERE clause, also - it is selecting data not from the one table, also - there is mixed ORDER BY, so the variant with MIN()/MAX() by a single field (as you suggest) IS NOT THE SOLUTION AT ALL.

Btw, if you look at http://www.w3schools.com/Sql/sql_functions.asp you can see FIRST() and LAST() SQL aggregate functions listed there. Unfortunately, that is related to some "other" SQL than Oracle and DB2 supports. :-\

PS. Sorry, for delay with replies, I got a urgent piece of work, so going to return to this SQL question in 2-3 days.
0
 
LVL 6

Assisted Solution

by:dragos_craciun
dragos_craciun earned 125 total points
ID: 34183014
You are right about the min/max.
Depending on what you need you can write something like:

SELECT * from INVOICES order by date asc

and fetch only the first row.

then you use "DESC" and fetch again only one row.

But there are some possible issues depending on language used for front-end or application server ( C++, C#, Java etc. ) and the libraries you use.

Also, the query plan has to use only nested loops.
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

APEX (Application Express) is used to develop a web application from Oracle. SQL Workshop is one of the tools that comes with Oracle APEX to query or modify the database objects or to make any changes to the structure.
SQL Command Tool comes with APEX under SQL Workshop. It helps us to make changes on the database directly using a graphical user interface. This helps us writing any SQL/ PLSQL queries and execute it on the database and we can create any database ob…
This video shows setup options and the basic steps and syntax for duplicating (cloning) a database from one instance to another. Examples are given for duplicating to the same machine and to different machines
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now