Oracle Mod Function to Break up Large Table

Hi,

My question is general ... not specific.

Using the MOD function in Oracle sql, and returning ROWID to uniquely identify a row for update later, can you use a non-unique indexed column within the mod function and capture all rows within the table?
iBincAsked:
Who is Participating?
 
slightwv (䄆 Netminder)Connect With a Mentor Commented:
I don't think I understand the question.

If you are wanting to break up a large table into "buckets" to process them in smaller batches I probably wouldn't use MOD.  I would look at NTILE:  https://docs.oracle.com/database/122/SQLRF/NTILE.htm#SQLRF00680

I'm not understanding the part about the indexed columns.  Can you explain that a little more?
0
 
iBincAuthor Commented:
where mod(nonunique numeric column,4) = from 1 to 4  to retrieve one of the 4 sections
0
 
Pawan KumarDatabase ExpertCommented:
can you show few sample rows and expected output..

Do you want first row output of the 4 rows.. you have ...Cant we use ROW_NUMBER() OVER (PARTITION BY....)  rnk ...and then rnk = 1
0
Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

 
iBincAuthor Commented:
Maybe, but I am looking for an explanation why we can't use MOD function and if a non-unique index can be used.
0
 
Pawan KumarDatabase ExpertCommented:
it depends on what you are doing. MOD will give you the remainder after the division thats all. So you may need to use this MOD function in combination with other or some ranking function to achieve what you are doing.
0
 
iBincAuthor Commented:
I don't think that is accurate. It does work, however I believe it is slower than possibly other methods. I'm looking for someone with knowledge about this function specifically.
0
 
iBincAuthor Commented:
The NTILE analytic function is much quicker than mod... I believe because pulling 4 using mod is doing full table scan 4 times but not sure. Same results in about 1/3 the time.
0
 
sdstuberCommented:
mod(nonunique numeric column,4)   --- this isn't going to split your table into approximately equal groups

Is that your goal?  The very fact that your column is non-unique should make that obvious.

Here's a simple example.

MY_NONUNIQUE_COLUMN
-----------------------------------------
1
1
1
1
1

All of those rows will be put into the same group based on mod(MY_NONUNIQUE_COLUMN,4)
0
 
sdstuberCommented:
>>> I believe because pulling 4 using mod is doing full table scan 4 times but not sure.

It would not, at least not with any normal usage.

You could contruct a query to force it to, but by default, that will not happen
0
 
sdstuberCommented:
>>>The NTILE analytic function is much quicker than mod...

You are probably experiencing a benefit from caching rather than an actual speed boost from the function itself.

In my  test, mod was faster, but not so much that I'd rely on that as benchmark for all use cases.

SQL> select rowid rid, ntile(4) over(order by n) from sample_data;

140640 rows selected.

Elapsed: 00:00:05.03

SQL> select rowid rid, mod(n,4) from sample_data;

140640 rows selected.

Elapsed: 00:00:04.94

Open in new window

0
 
iBincAuthor Commented:
Well just tested and it does divide in 4 very close to equal groups.... and turns out using mod is much quicker than using the analytic function.
0
 
sdstuberCommented:
>>> using mod is much quicker than using the analytic function.

if that is a reliable result it's not because of the function, it's likely because you're skipping a sorting step which ntile requires, but mod does not.

If your groups are nearly even then your data must be evenly distributed as well.
As a general rule for splitting data into groups though,  MOD is not reliable, but NTILE is
0
 
iBincAuthor Commented:
using mod took 4 min on 4 million rows... using the ntile took 12 minutes.
0
 
sdstuberCommented:
again, not likely due to the function, but extra sorting.

post your testing statements and their respective explain plans
0
 
Pawan KumarDatabase ExpertCommented:
We cannot say that NTILE is faster than MOD. Both are used for different purposes. Also it all depends on how you are using , what indexes you have , what you are trying to achieve? There are things that you can only achieve with MOD like remainder after division same  there are things which can only be achieved by NTILE/ROW_NUMBER etc... So it is not fair to compare these functions. Also note that for grouping/ranking etc..these function(ROW_NUMBER,RANK,DENSE rank ) are widely used to solve business problems.
0
 
iBincAuthor Commented:
Awarding points to slightwv because he did supply another solution even though that solution was not optimum for this case.
0
 
sdstuberCommented:
>>> Was the quickest.

think about that for a second...

mod(ora_hash())  is faster than mod()   ?

Calling MOD plus another function is faster than just calling MOD?  Some how the addition of extra work causes MOD to accelerate?
That doesn't make sense.

I believe you when you say you see faster times,  I don't believe though that you have constructed a valid test to make reliable benchmarking decisions.
0
 
iBincAuthor Commented:
You misunderstood. I said MOD was faster than the solution given to me by the experts, not that the hash mod did. You read out of context.
0
 
iBincAuthor Commented:
To be specific NTILE was the comparison and I'm not stating in all cases. I'm the one asking the question for a solution to MY problem. I found what worked for my situation with the optimum results.
0
 
sdstuberCommented:
>>> I ended up using this solution.  .... was the quickest

I'm not sure how I could read that post in any way other than what you said.
The solution you linked to in the post you said you used was the mod(hash())  which is, obviously, going to be slower than just mod().
The context was exactly one post.

So, if you ended up NOT using the post you linked to, but rather used MOD, that's GREAT. because MOD will be faster than MOD(HASH()).
So as long as you recognize the limitations of mod with respect to reliability and those limitations are acceptable, it's perfectly acceptable to use it.

>> I'm the one asking the question for a solution to MY problem.

You don't need to defend.  My comments were solely intended to make sure you understood the fastest answer as well as the most reliable answer and why they are not necessarily the same and why either is ok as long as the requirements are met.
0
 
iBincAuthor Commented:
That's cool. I found the best solution for my needs which was the purpose.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.