asked on

optimize

hi guys

any one knows what optimize means

select * from ...

where ...

ORDER BY ID
OPTIMIZE FOR 1 ROW // what does this mean??
FETCH FIRST 10 ROWS ONLY

any information appreciated

thanks
J

wbhinson

DB2 usually optimizes queries to retrieve all rows that qualify. But sometimes you want to retrieve only the first row or first few rows.

Use OPTIMIZE FOR 1 ROW is used to influence the access path. OPTIMIZE FOR 1 ROW tells DB2 to select an access path that returns the first qualifying row quickly. It should be used typically for small data sets.

Read more here: http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db2.doc.apsg/qipfrr.htm

jaggernat

ASKER

>>>OPTIMIZE FOR 1 ROW tells DB2 to select an access path that returns the first qualifying row quickly

what happens if first qualifying row is returned quickly? i mean how does it improve performace.
why cant i use 'OPTIMIZE FOR 1 ROW ' for large result sets or can i?

thanks for responding!
J

SOLUTION

wbhinson

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Kent Olsen

Hi J,

I don't get anything following your link. The page never displays.

Hi jaggernat,

Here's another link. I assume that it contains something very similar to what J posted. It's got a good desription of how/why OPTIMIZE works, though it may be just a tad technical.

Good Luck,
Kent

wbhinson

From IBM's web site:

OPTIMIZE FOR n ROWS Clause

The OPTIMIZE FOR clause declares the intent to retrieve only a subset of the result or to give priority to retrieving only the first few rows. The optimizer can then prefer access plans that minimize the response time for retrieving the first few rows. In addition, the number of rows that are sent to the client as a single block are bounded by the value of "n" in the OPTIMIZE FOR clause. Thus the OPTIMIZE FOR clause affects both how the server retrieves the qualifying rows from the database by the server, and how it returns the qualifying rows to the client.

For example, suppose you are querying the employee table for the employees with the highest salary on a regular basis.

SELECT LASTNAME,FIRSTNAME,EMPNO,SALARY
FROM EMPLOYEE
ORDER BY SALARY DESC You have defined a descending index on the SALARY column. However, since employees are ordered by employee number, the salary index is likely to be very poorly clustered. To avoid many random synchronous I/Os, the optimizer would probably choose to use the list prefetch access method, which requires sorting the row identifiers of all rows that qualify. This sort causes a delay before the first qualifying rows can be returned to the application. To prevent this delay, add the OPTIMIZE FOR clause to the statement as follows:

SELECT LASTNAME,FIRSTNAME,EMPNO,SALARY
FROM EMPLOYEE
ORDER BY SALARY DESC
OPTIMIZE FOR 20 ROWS In this case, the optimizer probably chooses to use the SALARY index directly because only the twenty employees with the highest salaries are retrieved. Regardless of how many rows might be blocked, a block of rows is returned to the client every twenty rows.

With the OPTIMIZE FOR clause the optimizer favors access plans that avoid bulk operations or interrupt the flow of rows, such as sorts. You are most likely to influence an access path by using OPTIMIZE FOR 1 ROW. Using this clause might have the following effects:

Join sequences with composite inner tables are less likely because they require a temporary table.
The join method might change. A nested loop join is the most likely choice, because it has low overhead cost and is usually more efficient to retrieve a few rows.
An index that matches the ORDER BY clause is more likely because no sort is required for the ORDER BY.
List prefetch is less likely because this access method requires a sort.
Sequential prefetch is less likely because of the understanding that only a small number of rows is required.
In a join query, the table with the columns in the ORDER BY clause is likely to be picked as the outer table if an index on the outer table provides the ordering needed for the ORDER BY clause.
Although the OPTIMIZE FOR clause applies to all optimization levels, it works best for optimization class 3 and higher because classes below 3 use Greedy join enumeration method. This method sometimes results in access plans for multi-table joins that do not lend themselves to quick retrieval of the first few rows.

The OPTIMIZE FOR clause does not prevent you from retrieving all the qualifying rows. If you do retrieve all qualifying rows, the total elapsed time might be significantly greater than if the optimizer had optimized for the entire answer set.

If a packaged application uses the call level interface (DB2 CLI or ODBC), you can use the OPTIMIZEFORNROWS keyword in the db2cli.ini configuration file to have DB2 CLI automatically append an OPTIMIZE FOR clause to the end of each query statement.

When data is selected from nicknames, results may vary depending on data source support. If the data source referenced by the nickname supports the OPTIMIZE FOR clause and the DB2 optimizer pushes down the entire query to the data source, then the clause is generated in the remote SQL sent to the data source. If the data source does not support this clause or if the optimizer decides that the least-cost plan is local execution, the OPTIMIZE FOR clause is applied locally. In this case, the DB2 optimizer prefers access plans that minimize the response time for retrieving the first few rows of a query, but the options available to the optimizer for generating plans are slightly limited and performance gains from the OPTIMIZE FOR clause may be negligible.

If both the FETCH FIRST clause and the OPTIMIZE FOR clause are specified, the lower of the two values affects the communications buffer size. The two values are considered independent of each other for optimization purposes.

Kent Olsen

Hi wbhinson,

That's the exact verbage from the link that I THOUGHT that I'd posted. :)

It's shaping up to be a long "rest of the week".......

Kent

jaggernat

ASKER

ok, i got it guys
but i still dont understand one thing. When to use OPTIMIZE FOR 1 ROW and when to not?

I am working on a web application where in my back end i am writing a select query and fetching only the first 200 records and displaying those 200 records on screen.

and hence this is how my query looks
select * from ...
where ...
ORDER BY ID
OPTIMIZE FOR 1 ROW
FETCH FIRST 201 ROWS ONLY

Therefore, eventhough the query might return 4000 rows but i am fetching the first 200 only
so is it a good idea to use OPTIMIZE FOR 1 ROW in above query.

In what scenarios do i AVOID using it? when i have large result sets?not sure..

thanks

SOLUTION

Member_2_276102

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

wbhinson

How about this link for a better explanation:

http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.pmtg/optnrow.htm

B

ghp7000

you can use the OPTIMIZE clause in small and large result sets, the impact will be more noticeable in large result sets---why, you may ask-the reason is simple
if you ask for all qualifying rows for a large result set, the dbm will most likely have to perform a sort or use a temp table to write out the result set before returning the data to the user-this means cpu overhead and disk i/o, not to mention bufferpool activity, agent utilization, page swaping etc etc. All of this is avoided when the optimize clause is specified becausethe result is immediately returned to the user as soon as that first row is retrieved, it is similar to using fetch first n rows only, except that with fetch, the query is processed in full and only the first n rows are returned.

ASKER CERTIFIED SOLUTION

Kent Olsen

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ghp7000

hmmm, forget about my last post, the information is incorrect. Better you read this link:
http://publib.boulder.ibm.com/infocenter/db2luw/v8//topic/com.ibm.db2.udb.doc/admin/r0000879.htm?resultof=%22%6f%70%74%69%6d%69%7a%65%22%20%22%6f%70%74%69%6d%22%20%22%66%6f%72%22%20%22%6e%22%20%22%72%6f%77%73%22%20%22%72%6f%77%22%20

ghp7000

in other words, optimize is related to row blocking, that means fetching of the data buffers and is tied to the communication buffers, but I would think that it does also have an impact on the bufferpool.