Link to home
Start Free TrialLog in
Avatar of iBinc
iBincFlag for United States of America

asked on

Oracle Mod Function to Break up Large Table

Hi,

My question is general ... not specific.

Using the MOD function in Oracle sql, and returning ROWID to uniquely identify a row for update later, can you use a non-unique indexed column within the mod function and capture all rows within the table?
ASKER CERTIFIED SOLUTION
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of iBinc

ASKER

where mod(nonunique numeric column,4) = from 1 to 4  to retrieve one of the 4 sections
can you show few sample rows and expected output..

Do you want first row output of the 4 rows.. you have ...Cant we use ROW_NUMBER() OVER (PARTITION BY....)  rnk ...and then rnk = 1
Avatar of iBinc

ASKER

Maybe, but I am looking for an explanation why we can't use MOD function and if a non-unique index can be used.
it depends on what you are doing. MOD will give you the remainder after the division thats all. So you may need to use this MOD function in combination with other or some ranking function to achieve what you are doing.
Avatar of iBinc

ASKER

I don't think that is accurate. It does work, however I believe it is slower than possibly other methods. I'm looking for someone with knowledge about this function specifically.
Avatar of iBinc

ASKER

The NTILE analytic function is much quicker than mod... I believe because pulling 4 using mod is doing full table scan 4 times but not sure. Same results in about 1/3 the time.
Avatar of Sean Stuber
Sean Stuber

mod(nonunique numeric column,4)   --- this isn't going to split your table into approximately equal groups

Is that your goal?  The very fact that your column is non-unique should make that obvious.

Here's a simple example.

MY_NONUNIQUE_COLUMN
-----------------------------------------
1
1
1
1
1

All of those rows will be put into the same group based on mod(MY_NONUNIQUE_COLUMN,4)
>>> I believe because pulling 4 using mod is doing full table scan 4 times but not sure.

It would not, at least not with any normal usage.

You could contruct a query to force it to, but by default, that will not happen
>>>The NTILE analytic function is much quicker than mod...

You are probably experiencing a benefit from caching rather than an actual speed boost from the function itself.

In my  test, mod was faster, but not so much that I'd rely on that as benchmark for all use cases.

SQL> select rowid rid, ntile(4) over(order by n) from sample_data;

140640 rows selected.

Elapsed: 00:00:05.03

SQL> select rowid rid, mod(n,4) from sample_data;

140640 rows selected.

Elapsed: 00:00:04.94

Open in new window

Avatar of iBinc

ASKER

Well just tested and it does divide in 4 very close to equal groups.... and turns out using mod is much quicker than using the analytic function.
>>> using mod is much quicker than using the analytic function.

if that is a reliable result it's not because of the function, it's likely because you're skipping a sorting step which ntile requires, but mod does not.

If your groups are nearly even then your data must be evenly distributed as well.
As a general rule for splitting data into groups though,  MOD is not reliable, but NTILE is
Avatar of iBinc

ASKER

using mod took 4 min on 4 million rows... using the ntile took 12 minutes.
again, not likely due to the function, but extra sorting.

post your testing statements and their respective explain plans
We cannot say that NTILE is faster than MOD. Both are used for different purposes. Also it all depends on how you are using , what indexes you have , what you are trying to achieve? There are things that you can only achieve with MOD like remainder after division same  there are things which can only be achieved by NTILE/ROW_NUMBER etc... So it is not fair to compare these functions. Also note that for grouping/ranking etc..these function(ROW_NUMBER,RANK,DENSE rank ) are widely used to solve business problems.
Avatar of iBinc

ASKER

Awarding points to slightwv because he did supply another solution even though that solution was not optimum for this case.
>>> Was the quickest.

think about that for a second...

mod(ora_hash())  is faster than mod()   ?

Calling MOD plus another function is faster than just calling MOD?  Some how the addition of extra work causes MOD to accelerate?
That doesn't make sense.

I believe you when you say you see faster times,  I don't believe though that you have constructed a valid test to make reliable benchmarking decisions.
Avatar of iBinc

ASKER

You misunderstood. I said MOD was faster than the solution given to me by the experts, not that the hash mod did. You read out of context.
Avatar of iBinc

ASKER

To be specific NTILE was the comparison and I'm not stating in all cases. I'm the one asking the question for a solution to MY problem. I found what worked for my situation with the optimum results.
>>> I ended up using this solution.  .... was the quickest

I'm not sure how I could read that post in any way other than what you said.
The solution you linked to in the post you said you used was the mod(hash())  which is, obviously, going to be slower than just mod().
The context was exactly one post.

So, if you ended up NOT using the post you linked to, but rather used MOD, that's GREAT. because MOD will be faster than MOD(HASH()).
So as long as you recognize the limitations of mod with respect to reliability and those limitations are acceptable, it's perfectly acceptable to use it.

>> I'm the one asking the question for a solution to MY problem.

You don't need to defend.  My comments were solely intended to make sure you understood the fastest answer as well as the most reliable answer and why they are not necessarily the same and why either is ok as long as the requirements are met.
Avatar of iBinc

ASKER

That's cool. I found the best solution for my needs which was the purpose.