Given integers X <= Y
What is the smallest N >= X such that Y is evenly divisible by N.

Here are few SQL attempts.

Very bad...

WITH target AS (SELECT 15000 y FROM DUAL), factors AS (SELECT n FROM ( SELECT y, LEVEL n FROM target CONNECT BY LEVEL <= y) WHERE MOD(y, n) = 0), numbers AS ( SELECT y, LEVEL x FROM target CONNECT BY LEVEL <= y) SELECT x, MIN(n) FROM numbers, factors WHERE n >= xGROUP BY xORDER BY x;

These are better, note performance of which is best varies between 11g and 12c.

WITH target AS (SELECT 15000 y FROM DUAL), factors AS ( SELECT y, COLLECT(CASE WHEN MOD(y, LEVEL) = 0 THEN LEVEL END) factors FROM target CONNECT BY LEVEL <= y), numbers AS ( SELECT LEVEL x, factors FROM factors CONNECT BY LEVEL <= y)SELECT x, (SELECT MIN(COLUMN_VALUE) FROM TABLE(factors) WHERE COLUMN_VALUE >= x) n FROM numbers;

WITH target AS (SELECT 15000 y FROM DUAL), numbers AS ( SELECT LEVEL x, CASE WHEN MOD(y, LEVEL) = 0 THEN LEVEL END n FROM target CONNECT BY LEVEL <= y) SELECT x, FIRST_VALUE(CASE WHEN n >= x THEN n END IGNORE NULLS) OVER(ORDER BY x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM numbersORDER BY x;

WITH target AS (SELECT 15000 y FROM DUAL), numbers AS ( SELECT LEVEL x, CASE WHEN MOD(y, LEVEL) = 0 THEN LEVEL END n FROM target CONNECT BY LEVEL <= y) SELECT x, MIN(n) OVER(ORDER BY x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM numbersORDER BY x;

What I'm looking for is something more efficient than what I already have.

I won't award points for effort.
If your query is demonstrably worse than any of these or any other posts that precede it please don't dilute the thread by posting it.

If you post a link, I will ignore it unless there is sufficient text posted along with it that the post can stand on its own. I'll simply delete it if egregious enough.

My measuring stick for superior/inferior is v$mystat. In particular the db time, gets, cpu, sorts and memory statistics. (i.e. autotrace but a little more thorough.)

If you don't know SQL but want to post the description of an algorithm; that's fine. I don't mind trying to implement it in SQL. Once implemented, if it's not as good as what's already been offered I'll say so.

declare @x int=5declare @y int=22declare @n int=6;with t as(select @x a,@y%@x bunion allselect a+1,@y%(a+1) from t where a<@y)select min(a) minNumber from t where b=0 and (@y/a)%2=0

declare @x int=5declare @y int=22declare @n int=6;with t as(select @x aunion allselect a+1 from t where a<@y)select min(a) minNumber from t where @y%a=0 and (@y/a)%2=0

The queries in the original post generated the correct N for all X 1..15000.
I didn't specify that as a requirement though.

However, it's an easy modification of one of my original queries to produce the same kind of single N result.

I just tested both of the t-sql snippets above when converted to oracle sql syntax. They appear to be identical in that the minor differences in execution time and resource consumption can be accounted for in standard variance of a live system.

However, in comparing 100 executions of this translated version of one of the above....

WITH t(a, b) AS (SELECT 67 a, MOD(15000, 67) b FROM DUAL UNION ALL SELECT a + 1, MOD(15000, (a + 1)) FROM t WHERE a < 15000)SELECT MIN(a) minnumber FROM t WHERE b = 0 AND MOD(15000 / a, 2) = 0;Test duration: +000000000 00:00:06.312939000STAT...CPU used by this session 630STAT...sorts (memory) 1,493,500STAT...sorts (rows) 1,493,400Total Latches: 4,676,314

to 100 executions of this alteration of one of my original queries...

WITH target AS (SELECT 15000 y FROM DUAL), numbers AS (SELECT LEVEL x, CASE WHEN MOD(y, LEVEL) = 0 THEN LEVEL END n FROM target CONNECT BY LEVEL < y) SELECT * FROM (SELECT x, FIRST_VALUE(CASE WHEN n >= x THEN n END IGNORE NULLS) OVER(ORDER BY x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM numbers) WHERE x = 67;Test duration: +000000000 00:00:02.021022000STAT...CPU used by this session 202STAT...sorts (memory) 200STAT...sorts (rows) 1,500,000Total Latches: 484

While this particular path is not nearly as efficient, I do appreciate the idea of a recursive CTE as opposed to the connect by level method of generating values.
I will continue to explore that and see if I can come up with something better.

0

At Springboard, we know how to get you a job in data science. With Springboard’s Data Science Career Track, you’ll master data science with a curriculum built by industry experts. You’ll work on real projects, and get 1-on-1 mentorship from a data scientist.

The recursive with clause is the big performance killer. Simply generating numbers 1-15000 via the recursive cte vs connect by level add significant weight.

So, continuing to adapt my other queries to use the recursion, I reversed the t-sql above to use the oracle connect by syntax with significant results.

WITH t
AS (SELECT 67 + LEVEL - 1 a, MOD(15000, 67 + LEVEL - 1) b
FROM DUAL
CONNECT BY LEVEL <= 15000)
SELECT MIN(a) minnumber
FROM t
WHERE b = 0 AND MOD(15000 / a, 2) = 0

running 100 times...

Test duration: +000000000 00:00:01.006264000STAT...CPU used by this session 101STAT...sorts (memory) 100STAT...sorts (rows) 100Total Latches: 542

Sounds good. Unfortunately, I do not have Oracle. There is another table in SQL server that can give numbers for comparison. But I am sure it does not exist in Oracle so did not propose. Essentially, min should do the trick. Let me see if I can think of anything better in SQL that you can adopt/adapt.

Starting with my number (67 in these test cases) I simply walk forward until I reach my N and then stop, then take whatever the last number I found.

It's dirt dumb but also the most efficient thus far because I'm not trying to find all of the factors, and then sifting through to find the one I want. I arrange the search so the first one I find will be the N I'm looking for.

SELECT MAX(a) FROM ( SELECT 67 + LEVEL - 1 a FROM DUAL CONNECT BY LEVEL <= 15000 AND MOD(15000, 67 + LEVEL - 2) != 0)

This would be faster only if N is closer to lower bound.

no, that was my thought too and why I didn't do that originally.

but: if X = 1 or X = 15000 or X is anything in between, all of the posts above generate all of the factors between X and 15000 by "walking" from X to each new factor.

So, that effort is already being taken by the previous ideas; but the problem with them is that after walking from X to N, they keep on walking looking for other factors that we already know we're going to throw away.

100 executions of the previous best
(note I removed the extra mod, it wasn't needed and, in this case resulted in wrong results anyway)

WITH t AS ( SELECT 7501 + LEVEL - 1 a, MOD(15000, 7501 + LEVEL - 1) b FROM DUAL CONNECT BY LEVEL <= 15000) SELECT MIN(a) minnumber FROM t WHERE b = 0Test duration: +000000000 00:00:03.609293000Total Latches: 580

rather than walking up from X to N
walk down from Y/X to Y/N

WITH setup AS (SELECT 15000 y, 7501 x FROM DUAL), divisors AS ( SELECT y, FLOOR(y / x) - LEVEL + 1 d FROM setup CONNECT BY MOD(y, FLOOR(y / x) - LEVEL + 2) != 0) SELECT y / MIN(d) FROM divisors GROUP BY y;Test duration: +000000000 00:00:00.005595000STAT...CPU used by this session 1Total Latches: 158

WITH setup AS (SELECT 15000 y, 67 x FROM DUAL), divisors AS ( SELECT y, FLOOR(y / x) - LEVEL + 1 d FROM setup CONNECT BY MOD(y, FLOOR(y / x) - LEVEL + 2) != 0) SELECT y / MIN(d) FROM divisors GROUP BY y;Test duration: +000000000 00:00:00.010635000STAT...CPU used by this session 2Total Latches: 157

As expected the scaling is different here. Depending on the actual Y and X values the number of steps UP or DOWN to walk will be more or less expensive.

I haven't determined if there is an efficient way to pick before starting to walk or if that effort will be greater than the savings of picking the wrong one.

Most of the time walking down will be the shorter path so I'll default to that for now.

I am not certain that I grasp completely what you are trying to do.
Let's say Y = 1024
And x = 32

You are looking for the SQL that would return N=64?
(the next factor of 1024 that is bigger than 32)

Is that right?

Because if that is what you are after, then a table of Prime numbers that are less than Y, if statically entered would probably aid in efficiency. You'd start factoring from 2 up until you got to X. From there, you'd multiply factors together until you got something greater than X

no, N >= X So in your example N=32 because 32 is a factor of 1024

Generating all factors by multiplying combinations of primes sounds good, the problem is either generating the list of primes or reading them from a source. You'd either have to know already how many primes to use or you'd have to include a square root calculation to make sure you didn't get too many. Furthermore, simply having a list of primes isn't enough, you also have to know how many times each prime can be included in factors. For 1024: 2 is the only prime factor, but it's a factor 10 times.
I happen to know that but determining all prime factors and the number of each in all combinations of subfactors is not trivial for a general solution.

Dynamically multiplying within sql is not a cheap operation either.

Here's my best attempt at implementing your algorithm.
Rather than introduce physical IO, I took advantage of the fact that I know 2 is the only prime factor of 1024. Obviously this is impractical as a general solution but I wanted to give this the best chance possible.

Here though is where things start to go bad. Generating all of the factor combinations is not trivial. Fortunately Oracle has the POWERMULTISET function to do that for us; but it requires a collection object to operate on. So I built that into the starting point.

Next though is the actual multiplication. There is no direct means of multiplying all numbers in a set. So we use a math fact that EXP(SUM(LN())) will be the product. This is convenient syntax but is a lot of work for the cpu. We could write a separate pl/sql function or a user defined aggregate but those would be even more expensive than using the built in functions.

Once we have all the factor product combinations then it is a simple matter to compare those to 32 and pick the smallest product that is a factor of 1024.

WITH prime_factors AS (SELECT ora_mining_number_nt(2,2,2,2,2,2,2,2,2,2) f from dual)SELECT MIN(f) FROM (SELECT rn, FLOOR(EXP(SUM(LN(y.COLUMN_VALUE)))) f FROM (SELECT ROWNUM rn, COLUMN_VALUE s FROM TABLE(POWERMULTISET((SELECT f FROM prime_factors)))) x, TABLE(x.s) y GROUP BY rn) WHERE f >= 32 AND MOD(1024, f) = 0

However, as anticipated, even with an artificial head start this is simply too expensive and takes nearly ten seconds to complete the test run of 100 executions.

Test duration: +000000000 00:00:09.772921000STAT...CPU used by this session 975Total Latches: 7,949

I welcome suggestions on how to implement this method more efficiently; but I have my doubts it will ever be able to compete because it has too many expensive preliminary steps to create the data before it even starts evaluating.

I'd like to understand:
do the rules of this question exclude for loops?
what is this for? (a business problem or a theory exercise)
& what are "the expected results" (are you really looking for 150,000 rows of 2 columns?)

based on 3rd code block in question just nesting that final select and summarizing

WITH target AS ( SELECT 15000 y FROM DUAL ), numbers AS ( SELECT LEVEL x, CASE WHEN MOD(y, LEVEL) = 0 THEN LEVEL END AS n FROM target CONNECT BY LEVEL <= y )SELECT n , min(x) , max(x) , count(*)FROM ( SELECT x , FIRST_VALUE(CASE WHEN n >= x THEN n END IGNORE NULLS) OVER ( ORDER BY x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) AS n FROM numbers )GROUP BY nORDER BY n DESC;

If you can get pl/sql to generate an answer with loops faster and with less cpu/memory or other resource consumption then by all means go for it.

The final result must be sql accessible though, so if you generate a function
that I call by "select your_function(x,y) from dual" - the cost of the sql & pl/sql context switch (albeit minor) would be part of the cost of the algorithm.

I don't understand the intent of your posted code. I can, obviously, see the results and understand the logic; but given the examples already posted. I'm not sure why you're pursuing a code path with those extra calculations. It doesn't produce anything at all like the results I've posted. You don't need to explain it though, it doesn't produce what I'm looking for.

I currently have no business use. However a couple of place it could apply would be tiling rectangular areas or determining a fixed width for text given a minimum width requirement.

For example: I have 120 one-foot square tiles to layout for an area. I have a 7-foot long object that could be anywhere in my tiled area. What is the minimum width I can use and generate a rectangular area? Answer: 8. That's a simple one, most could probably do it in their head, I'm looking for the most efficient means of generating that answer for a general case.

Expected results: The question pretty much says it. Given X and Y, find N.

My original queries in my question listed the answer N for each X of 1 to Y.
And that was what I was looking for.

However I didn't actually specify generating everything as a requirement and sameer2010, the first responder, proposed a solution that would solve for a specific X.

This a more practical use case and is the target I've been pursuing as well since then.

If you'd like to attempt an answer, my post http:#a39471023 is currently the best general solution for finding the correct N for a given X and Y.

Do note, as I mentioned above, it is possible that for a minority of X for a given Y, it is possible that searching UP will be more efficient than searching DOWN for N. So, if you find something that happens to work well for some values but is worse than the above for most others then it's not really a candidate.

I am testing all queries with the literal values embedded. This is, of course, somewhat of a cheat because the final solution, if implemented in real code would use bind variables. However, this is an exercise in curiosity so I'm willing to accept that and I'd like to give each proposed solution its best possible chance and giving the optimizer all required information is as good as we can do.

I have been using Y=15000, X=67, with expected result of 75 in my test cases to provide a common baseline. I do test other values as well. If I see a significant performance variance by using other values I'll note it in my responses.

Thanks, got it (in spades:) - wasn't expecting such a comprehensive reply, but appreciated.
Only purpose of the result was to indicate 'waste' and I wasn't just repeating your own code as my suggestion either in case that wasn't clear.

working 'up' to a minimum makes sense too. Cheers, Paul.

Note for anyone using sqlfiddle, be sure to post your code in the thread as PortletPaul did in his example. An answer that only exists via link isn't much use if the link goes stale.

Also, all of the answers above finish sub-second. Even the worst of them is only about a tenth of a second. So, the network latency and lag as well as html rendering time will completely negate tracing of efficiencies.

Using that or similar sites to test your method works is great; but since I already know how to get the correct answer, simply working isn't sufficient. I will only accept the answer that is demonstrably and repeatably superior to all others.

I don't want to discourage participation; but I don't want to delude anyone into thinking that just because your answer is correct and "fast" that it will be accepted.

I've already demonstrated that I can write fast but inferior methods in multiple ways. :)
I don't need to know additional ways to NOT do it.
So now I'm only looking for the best.

Also note I will be testing the code on various machines of wildly different computing capacity with versions spanning 11.2.0.1 to 12.1.0.1. If you happen to find an answer that works well on one version but not on another that can still be a candidate answer. The current best version is the best on everything I've tried; but my original "best" answers differed based on version 11 vs 12.

Alright.
I thought about it and played with it.
Lists of prime numbers were interesting, but not useful.
On the other hand, I generated a table of factors for the first 1,000,000 integers.
Took ~6 minutes in MS Access VBA
Here it is.
It's a big file, though.
41 MB

Haul it down and mount it up.
Index both columns, if you'd like.

At that point
SELECT TOP 1 TheFactor from tblOrderedfactors where theNumber = 15000 and TheFactor >=67;"
runs in an execution time that my desktop cannot effectively measure.

Theoretically, I'd think this would have to be the most efficient manner in terms of CPU time and execution time -- but who knows, maybe the disk time, and CPU time to get it from disk would be more expensive than computing it.

I doubt it, but until you conduct the experiment, you won't know for sure.

Factors for the first 1M integers generates an 11M row table. If that's overkill (and it is, but it is fun to generate tables that big!) let me know what upper bound you'd prefer and I'll generate and upload it. OrderFactors1M.zip

Running it over in my head, and grinding a few numbers, I now doubt very much that an algorithm to strictly calculate N by walking is the best solution. Worst case scenario for 1M factors is Y = 999998 and X = 3.

499999 is a prime number, so a straight walk would run through almost a half-million operations before coming to a solution.

And that isn't really an edge case. Of the numbers between 1 and 1M, 553531 of them have 5 factors or less, so more often than not, walking is going to very expensive.

(Generating the table once for use in the general case is likely the most efficient solution. Think how long generating ALL the solutions for Y = 1 to 15000 and X = 1 to 7500 would take! Straight select from a generated table would be the winner.)

I am thinking that, if a person were to pursue an algorithmic solution, a progression using a table of primes might be more efficient in the general case than walking. A person is looking for the smallest upper bound ( that can be cheaply discovered) that N can be, which would either be the next prime number large than X or the product of all the prime factors of Y incrementally factored out at each step. From there, you'd walk down to X. Such an algorithm would minimize the number of steps taken in the general case. You'd start by loading Y into a variable. Check if Y is prime. There's no point in doing anything else if it is! You'd divide by the first prime (2) until there is a remainder, or the result < X, in which case multiply the smallest factored prime back in, and the result is the upper bound for N. If you succeed in factoring out primes, multiply each of them together. If that product > X, that is the upper bound for N. If the result or the product is X, then the solution is found. If a remainder is found when attempting to factor, check if the original number is prime. Stop if it is. Further factoring will be useless.

Now the question becomes what to do on the iterations. To work from the top end, would require finding the largest prime < TheResult/TheNextSmallPrime. Which is just factoring from the low end, anyway, so that doesn't make any sense. Working the low end IS simultaneously working the high-end. So what about working the middle? Find the square root of the result, and test the first prime smaller than it and the first prime larger than it.

Primes thin out moving away from zero. The efficacy of working the middles as a means to cut the walk down is going to be a function of the factoring result. The larger the result, the more likely a walk from zero up is will take more steps. So an algorithm would need to work that in. Probably a result > 2000 is when a shot at the middle (because square root is expensive) might become worthwhile.

I had a lookup table already and while I have plenty of space to store it I was looking for something a little more generic. I failed to specify this in my question though so I'll give credit for it unless something else runs better.

The table based lookup IS faster but requires more latches across the board, but far less than the worst cases.

I'll work up some kind of summary to display. I don't have a good way to show the pattern across results.

I already have all primes up to 1000000 in a table as well. That's not really helpful though since I still need to assemble them into factors for a given Y. I tried the prime assembly before and it's awful.

I do have another problem though. I tried solving for larger Y values and the table simply ran out of values and the connect by ran out of memory.

Of course, I could make the table bigger and I could add more memory or, bite the bullet and go back to the recursive with clause. It's more expensive in processing but tends to not hit the memory limits as soon as the connect by.

The other option would be to go with pl/sql. I'll try that too; but I'm thinking the question may be winding to a conclusion because it's looking like these implementations are forcing me to compromise.

>>I'm thinking the question may be winding to a conclusion
This is a suggestion...
have you thought about using MODEL?

It's not something I have used and was "thinking of investigating" but I've seen claims that suggest it may be fast, and the problem type does seem to suit a "formula".

& is java out of the question? (not volunteering, just hinting)

java is acceptable but a problem would be measuring it accurately against the sql operations.

v$mystat and v$latch give me solid metrics to compare in an apples-to-apples way.

I can compare overall execution time; but I am interested in how expensive it is to run.
I'm open to suggestions on how to compare reliably, especially since we're already in the realm of micro-to-milliseconds.

I'm leaning toward a pl/sql solution because it just occurred to me that whatever the final best method is, I'll probably wrap it in a function anyway. It's possible the pl/sql will simply be

begin select .... into THE_ANSWER from....; return THE_ANSWER; end;

but, knowing there will be at least one layer of pl/sql involved; if anyone was discouraged from trying that before; please do give it a shot.

I'm willing to give MODEL a try, if nothing else to try to resolve the memory errors when solving large values with connect by.

As it stands now, a precalculated lookup table is still the fastest so far but sometimes heavier, sometimes lighter on resource consumption with the obvious limitation that it's only useful up to its largest value.

:)
With an algorithm in place to calculate a table of whatever size you might want on a non-production machine, it becomes a question of disk space.

I don't do LISP, but the underpinnings of this are very LISP-y. The factors of 1024 are a function of 512, 512's of 256, 256's of 128...and so on. LISP is very good with recursive stuff like that. You may want to look up a place to post LISP questions and pose this problem.

It also is heavily involved with number theory, prime factorization and all being the root of what needs doing. Posing the question in a mathematics forum may also yield some interesting results.

I am going to keep playing with it. It's pretty clear that a straight walk will have occasions where it will be a hideously poor solution. An algorithm that makes use of a table of prime numbers and perhaps a hint table of numbers whose first prime factor is than some arbitrary value may perform reasonably well. I have to build that first, though, before you can test it

It's already cross-posted to the math & science topic area.

I haven't touched lisp in 20+ years; but even if I were proficient in it I don't think it'd be a good fit here because I'm looking for something that can can be invoked from the database.

I suppose it might be possible to write an external procedure call via a compiled library; but if I go that route, I'd just write it in c and I can definitely recurse in c efficiently; but I'd probably implement it with iteration instead simply to avoid stack overhead.

This method is so fast that there is no reason to store a large list of primes on a computer--an efficient implementation can find them faster than a computer can read from a disk.

Now, I poked into these webpages AFTER I built the upcoming sample, and earlier you had mentioned a square root calculation--and that turns out to be a critical bit. Take 100M. The square root of that is 10K. There are 1230 prime numbers between 2 and 10007 -- which is makes for a fairly small table. From there, you need a routine to systematically factor out and record prime numbers starting at 2, until either there's nothing left to factor, or you run out of primes. Any number that doesn't have a factor less that it's own square root is going to be prime. Then you need a routine to create the compound factors and it's off to the races

So, with a few lines of VBA and a table of 1230 rows, you've got something that can answer your question in general for Y <100M. Not bad. I fired a random number at it, 23145691 -- figuring that ~23M ought to be a good test. Turns out that number is prime.

My system isn't able to measure precisely enough to tell me how long it takes to execute. Zero is what it says. VBA code below. Command8_Click fires it off. Text1 is Y. Text3 is X
Text5 is N.

Private Sub Command8_Click()Dim db As DatabaseDim rs As RecordsetDim thefactors() As LongDim x As IntegerDim y As IntegerDim Candidate As DoubleDim Test As DoubleDim BiggerThanN As LongDim TheStart As DoubleDim TheEnd As DoubleTheStart = NowCandidate = Me.Text1.ValueReDim thefactors(100)Set db = CurrentDbSet rs = db.OpenRecordset("select * from tblSmallPrimes;", dbOpenDynaset, dbSeeChanges)x = 0Do Until rs.EOF Test = Candidate Do Until CLng(Test) <> Test Or Test < rs!ThePrime Test = Test / rs!ThePrime If CLng(Test) = Test Then Candidate = Test thefactors(x) = rs!ThePrime x = x + 1 End If If Candidate = 1 Then Exit Do Loop If Candidate = 1 Then Exit Do rs.MoveNextLoop'ok, to get here what's left has to be a prime number > 10007'Q. why does it have to be prime?'A. If it wasn't, it would have a factor that was already triedIf Candidate <> 1 Then thefactors(x) = CandidateEnd IfReDim Preserve thefactors(x)thefactors() = BuildTheFactors(thefactors())'ok now I have the CompoundFactor'now what's the first one equal to or greater than x?rs.Closedb.Execute ("delete * from tblCompoundFactors;")Set rs = db.OpenRecordset("select * from tblCompoundFactors;", dbOpenDynaset, dbSeeChanges)For x = 0 To UBound(thefactors) If thefactors(x) <> 0 Then With rs .AddNew !ThePrime = thefactors(x) .Update End With 'MsgBox thefactors(x) End IfNext xrs.CloseSet rs = db.OpenRecordset("select top 1 * from tblCompoundFactors where theprime >= " & Me.Text3 & ";", dbOpenDynaset, dbSeeChanges)Me.Text5.Value = rs!ThePrimeTheEnd = NowMsgBox TheEnd & " " & TheStartEnd SubPrivate Function BuildTheFactors(thefactors() As Long) As Long()Dim CompoundFactors() As LongDim InThere As BooleanDim w As IntegerDim x As IntegerDim y As IntegerDim z As IntegerDim LastCompound As Integer'ok, we take every element of the array passed in'we multiply it by every value already in the output array'when we are done we have the compound factorsReDim CompoundFactors(500)CompoundFactors(0) = thefactors(0)y = 1z = 0LastCompound = 0For x = 1 To UBound(thefactors) For z = 0 To LastCompound 'check if thefactors(x) * CompoundFactors(z)is already in CompoundFactors() InThere = False For w = 0 To 500 If CompoundFactors(w) = thefactors(x) * CompoundFactors(z) Then InThere = True Exit For ElseIf CompoundFactors(w) = 0 Then Exit For End If Next w If InThere = False Then If Me.Text1.Value Mod (thefactors(x) * CompoundFactors(z)) = 0 Then CompoundFactors(y) = thefactors(x) * CompoundFactors(z) y = y + 1 End If End If Next z 'check if thefactors(x)is already in CompoundFactors() InThere = False For w = 0 To 500 If CompoundFactors(w) = thefactors(x) Then InThere = True Exit For ElseIf CompoundFactors(w) = 0 Then Exit For End If Next w If InThere = False Then CompoundFactors(y) = thefactors(x) y = y + 1 End If LastCompound = yNext x'For x = 0 To UBound(CompoundFactors)' If CompoundFactors(x) = 0 Then ReDim Preserve CompoundFactors(UBound(CompoundFactors))' End If'Next xBuildTheFactors = CompoundFactorsEnd Function

It requires a table, tblSmallPrimes, of the first 1230 primes, and another table, tblCompoundFactors, to throw the results of the factorization in.

I could knock the VBA into T-SQL if you'd like. What's being done with tables, arrays and recordsets can get done with @variables, sprocs, and table variables.

This sort of leaves a bad taste in my mouth but it works.

A simple loop d:= d-1 or n:=n+1 is very fast, I can walk up or down very quickly.

With a walk down limit of 100000 (100K) I can quickly do many (but not all) factors up to 10 billion. If I can't do that, but Y is a million or less, again, I simply walk up. Yes, in a really bad case I might walk hundreds of thousands of steps, but it's a very tight loop and fast.

If I can't do either of those, then I bite the bullet and read from a table of primes and do the big ugly powermultiset I had already ruled out earlier.
It's still bad, but it will only be used for monstrous values and only in the worst cases where a walkdown isn't possible.

It still has the limitation that I only have primes up to 10 million.; but that should be more than sufficient for any real world cases I might run into. I do have a fast prime generator and could use it instead of a fixed table and generate them on the fly if I really want to extend it, the limit then simply being memory, or, of course, just make a bigger table.

The 100K and 1M boundaries are just educated guesses. I don't think there is a "best" number for all cases but those seem to work for me on one of my beefier machines. On less powerful the brute force threshold is lower.

CREATE OR REPLACE FUNCTION next_factor(x IN INTEGER, y IN INTEGER) RETURN INTEGERIS n INTEGER; d INTEGER;BEGIN IF y / x < 100000 THEN --- It's a small divisor to N, simply walk down to find it d := FLOOR(y / x); WHILE MOD(y, d) != 0 LOOP d := d - 1; END LOOP; n := y / d; ELSIF y <= 1000000 THEN -- As long as Y less than a million we can walk up pretty quickly n := x; WHILE MOD(y, n) != 0 LOOP n := n + 1; END LOOP; ELSIF y <= 9999991 * 9999991 THEN -- Y is too big and the factor is too far away to walk -- So lookup from primes SELECT MIN(factor) INTO n FROM (SELECT rn, ROUND(EXP(SUM(LN(f.COLUMN_VALUE)))) factor FROM (SELECT ROWNUM rn, COLUMN_VALUE factors FROM TABLE( POWERMULTISET( (SELECT CAST(COLLECT(n) AS ora_mining_number_nt) FROM primes_lt_10million WHERE n * n <= y AND MOD(y, n) = 0) ) )), TABLE(factors) f GROUP BY rn) WHERE factor >= x; ELSE -- Too big, can't do it RAISE VALUE_ERROR; END IF; RETURN n;END;

where y random integer between 1000 and 10000000000
x random integer between 1 and y

00:00:12.111905000

So, for very bad cases with very large numbers it slows down to about a tenth of second.
On average though it runs about one one-hundred-thousandth of a second.

by the way, except for very large numbers, it definitely IS possible to generate primes faster than you can read them. I created a table of primes less than 10 million simply because that was the rough tipping point where I couldn't outpace the io.

I've been building a modified sieve of Eratosthenes to create the compound factors without reference to a table of primes. It is markedly slower than referencing the table

Private Sub Command9_Click()'implement a modified sieve of Eratosthenes to find the first factor of a number that is larger than a second numberPrivate sub Eratosthenses()Dim CompoundFactors() As LongDim theY As LongDim theSQR As LongDim x As LongDim y As LongDim N As LongtheY = Me.Text1.ValueReDim CompoundFactors(theY, 1)For x = 2 To theY If CompoundFactors(x, 1) = 1 Then GoTo alreadychecked End If If theY Mod x <> 0 Then For y = 1 To theY If x * y < theY Then CompoundFactors(x * y, 1) = 1 Else GoTo bigenough End If Next y Else N = x If (N >= Me.Text3) And (theY Mod N = 0) Then Me.Text5 = N GoTo done End Ifbigenough: End Ifalreadychecked:Next xdone:MsgBox "done!"End Sub

With a table of 10M primes, you can find the prime factors of numbers less than 100 trillion using the algorithm I posted! That REALLY ought to be enough for real world situations. Building the list of compound factors once you have the prime factors is computationally trivial. For numbers less than 2^31 you will have a walk of 2^15.5 (~46000) in the very worst case scenario to find all the prime factors. Worse case scenarios for finding compound factors for numbers in the same range are less that 525 factors.

Beats the tar out of walking a 100K loop, no matter how tight!

What do you have for prime generator code? Is it SQL? What I've seen so far is loading up an array of Y elements and then zeroing out the multiples of each prime in order, until you reach the upper bound of the array. Doesn't strike me as efficient -- and it doesn't perform nearly as well as looking up a table of the first square root Y primes.

I'll have to do a run of Y = 1K to 2 M and x = 3 to 5K with what I've got and see how long that takes.
And kick it over to T-SQL and see how that does.

My generator is a modified Eratosthenes with the Atkin Modulus 60 check. I wrote it in pl/sql. If I was really serious about the prime generation speed I'd do it in c as an external procedure.

I seed the first few primes into a collection.
Then loop through odd numbers.
For each I do the Atkin Modulus 60 check.
If one of the possible remainders is found then I iterate through my collection of primes up until I either find a factor of the current number or I exceed the square root.

If I find a factor that number is not prime, otherwise it is and I add it to my collection.
I repeat until I fill a collection of all primes up to some limit.

Building the list of compound factors once you have the prime factors is computationally trivial.

Syntax wise, yes, that's what powermultiset does. Cpu, no. There's quite a bit of work involved generating the combinations. But I agree it's fast. That's why I cap my walk up loops at 1 million. Noting that is still only the 2nd choice.

The walkdown method is still the fastest and least resource intensive for many values.
For any X>Y/2, the walkdown is 1 step. So, half of all numbers, regardless of how large will always be trivial. For X>Y/3 walkdown is only 2 steps. So two-thirds of all numbers are trivial. X>Y/4 = 3steps, three-fourths of all numbers are still trivial, etc.

It's just a matter of finding that threshold where it stops being trivial and then do some real work.

It's just a matter of finding that threshold where it stops being trivial and then do some real work.

I alluded to this earlier but it's critical to the evaluation. The "threshold" is entirely dependent on the hardware. The machine I was testing on last night, I'm willing to loop up to a million rather than do some io. One of the machines I could test on today, I'd probably cap my walk down limit at 20K and walk up limit on Y at 100K then switch to the more intensive prime and factor search.

Beats the tar out of walking a 100K loop, no matter how tight!

yes once you're done, but you have to absorb the work of finding those primes first (which can be helped by precomputing if very large) then generating the factors.

Even on the smallest box I've been using, the speed of any of these is still solidly sub-second. I really am quibbling out at the 4 and 5th decimal points and that's when I start looking at the cpu, io and latches involved. I quit including them in my results posts because they are measures that only apply within Oracle. If you're testing within other frameworks there would be similar metrics but it would be impossible to correlate them meaningfully.

So, some combinations of X ,Y ,version and hardware with a particular method will knock off an extra few microseconds, maybe even a whole millisecond and I do care about those gains; but if the cost of that speed is excessive resource consumption then I care about that too.

What I've got above with the "flip-flop" of switching algorithms seems, to me, to be the best. I still find it distasteful though.

All three methods are still basically a brute force exhaustive search. I was hoping for something a little smarter. Something that could put some computational caps.

Kind of like the walk down factors. Simply reversing the direction of the search has a huge impact; but then it eventually fails and walking up will be better.

Both of those will still fail though for large primes or numbers so large that there is a prohibitively large gap between factors.

Searching for primes lets us scan only a few numbers but requires a lot of work to find those numbers and it too is an exhaustive search. All possible factors will be generated.

Maybe a more "clever" approach isn't possible because at some point we are forced to do prime factorization and that forces me to impose an arbitrary practical limit rather than a platform limit and that's really what's bugging me. I don't particularly care that trillions will almost certainly be big enough. The platform can support numbers into the duodecillions and, with the walk downs, it is possible to successfully iterate over most values of X and Y. But "most" still leaves 9 decillion values that the methods above can't handle. 9 decillion is a pretty big hole, even if it is only 0.00001 of the entire population.

Another way to look at it - the walk downs with a 100K limit are already solving 99.999% of all values the platform can support. So, it seems like we're almost there. The walk up method is pretty dumb but still captures the bulk of the remaining "real world" values.

The primes are the most complicated and really, only extend the range of supported values by about 0.000000000000000000000001% of the entire population.

So, yes, the primes do make the "real world" realm larger; but since we were already 99.999% done, what they add in terms of total platform support is inconsequential.

We're essentially running into the Pareto principle taken to a few more decimal places.

hmmm, just thought of a problem with my factor generation.

I'm not taking into account repeats.
Simply having all of the primes and multiplying them in various combinations isn't sufficient to get all of the factors.

Like in the 1024 example above, there is only one prime, but there are 10 factors because we reuse that same prime.

Since the prime search and factor generation doesn't really add much value, I'm tempted to simply remove them altogether and just let them fail.

It's not so much that I want to support a particular large value, like duodecillions,
as simply not wanting to impose a smaller value.

I'd like the method to be viable up to the platforms limitations.
That is, I want Oracle's code to fail before mine does.

If that means supporting umpteen gajillion, then so be it.

I'm coming to the conclusion though, that it's not likely to be feasible. So then my quandary is: since I can't solve everything, how much complexity do I want to include simply to make a microscopically marginal improvement.

This puts prime and factor generation on the cutting block for low ROI.

It's a little trickier than that.
You walk through the result set of the primes

1. You add the first element to the result set of the compound factors
2. Move to the next element of the prime factors
3. You multiply the elements in the compound factors by the present prime factor
4. If the result is not in compound factors, add it
5. When you've multiplied the present prime by all the factors in compound factors, see if the present prime is in compound factors. If it is not, add it
6. Move to the next prime
7. do 2-6 until you come to the end of the prime result set
8. Order the results

Then find the first one >= X

Here it is as VBA

Private Function BuildTheFactors(thefactors() As Long) As Long()Dim CompoundFactors() As LongDim InThere As BooleanDim w As IntegerDim x As IntegerDim y As IntegerDim z As IntegerDim LastCompound As Integer'ok, we take every element of the array passed in'we multiply it by every value already in the output array'when we are done we have the compound factorsReDim CompoundFactors(1500)CompoundFactors(0) = thefactors(0)y = 1z = 0LastCompound = 0For x = 1 To UBound(thefactors) For z = 0 To LastCompound 'check if thefactors(x) * CompoundFactors(z)is already in CompoundFactors() InThere = False For w = 0 To 500 If CompoundFactors(w) = thefactors(x) * CompoundFactors(z) Then InThere = True Exit For ElseIf CompoundFactors(w) = 0 Then Exit For End If Next w If InThere = False Then If Me.Text1.Value Mod (thefactors(x) * CompoundFactors(z)) = 0 Then CompoundFactors(y) = thefactors(x) * CompoundFactors(z) y = y + 1 End If End If Next z 'check if thefactors(x)is already in CompoundFactors() InThere = False For w = 0 To 500 If CompoundFactors(w) = thefactors(x) Then InThere = True Exit For ElseIf CompoundFactors(w) = 0 Then Exit For End If Next w If InThere = False Then CompoundFactors(y) = thefactors(x) y = y + 1 End If LastCompound = yNext xReDim Preserve CompoundFactors(UBound(CompoundFactors))BuildTheFactors = CompoundFactorsEnd Function

Here it is as T-SQL, using the 1230 row prime table
You may be able to adapt and optimize it for your purposes.
And replace the prime table with code to generate those on demand

ALTER PROCEDURE [dbo].[qryFindFactor] -- Sproc to find the first factor of Y larger or equal to X --Present upper limit Y < 100M @y int = 0, @x int = 0ASBEGIN -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET NOCOUNT ON; -- Insert statements for procedure here Declare @test int Declare @initialY int Declare @ThePrime int Declare @ThePrimeID int Declare @theMax int Declare @TheCompound int Declare @theProduct int Declare @theCompoundMax int Declare @TheCompoundID int Declare @theCompoundMaxID int Declare @PrimeFactors table (RowNum int IDENTITY(1, 1) UNIQUE CLUSTERED , TheFactor int) Declare @CompoundFactors table (RowNum int IDENTITY(1, 1) UNIQUE CLUSTERED , TheFactor int) Select @theMax = max(thePrime) from tblSmallPrimes select @ThePrime = min(ThePrime) from tblSmallPrimes select @initialY = @y while @ThePrime <> @theMax begin if @y % @ThePrime = 0 And @y <> 2 begin Insert into @PrimeFactors(Thefactor) select @ThePrime select @y =@y/@theprime end else begin select @ThePrime = min(ThePrime) from tblSmallPrimes where ThePrime > @ThePrime end end if @y > @theMax begin Insert into @PrimeFactors(Thefactor) select @y end --alright, that's the prime factors --now the compound factors --Load the first factor in Select @theMax = max(rownum) from @PrimeFactors select @ThePrimeID = min(rownum) from @PrimeFactors select @TheCompound = min(TheFactor) from @PrimeFactors where rownum = @ThePrimeID Insert into @CompoundFactors(Thefactor) select @TheCompound select @TheCompoundID = min(RowNum) from @CompoundFactors --and move to the next select @ThePrimeID = min(rownum) from @PrimeFactors where rownum > @ThePrimeID select @ThePrime = min(TheFactor) from @PrimeFactors where rownum = @ThePrimeID --now loop through until you're done while @ThePrimeID <= @theMax --And @ThePrimeID <> 0 begin --the prime loop Select @TheCompoundMaxID = max(RowNum) from @CompoundFactors --do all this when there are elements to test in CompoundFactors while @TheCompoundID <= @TheCompoundMaxID --AND @TheCompoundID <> 0 begin --the compound loop select @TheCompoundID = min(RowNum) from @CompoundFactors where rownum = @TheCompoundID --load the value select @TheCompound = min(TheFactor) from @CompoundFactors where rownum = @TheCompoundID select @TheProduct = @TheCompound * @ThePrime select @test = thefactor from @CompoundFactors where thefactor = @TheProduct if @@rowcount = 0 and @TheProduct <= @initialY begin --pound it in Insert into @CompoundFactors(Thefactor) select @TheProduct end --do we have another compound factor? select @TheCompoundID = min(RowNum) from @CompoundFactors where rownum > @TheCompoundID end--end the compound loop --reset the compound loop select @TheCompoundID = min(RowNum) from @CompoundFactors --see if our prime was already in the compound table variable select @test = min(theFactor) from @CompoundFactors where thefactor = @ThePrime if @@rowcount = 0 and @ThePrime <= @initialY begin --pound it in insert into @CompoundFactors(Thefactor) select @ThePrime end --move on to the next prime or bail select @ThePrimeID = min(rownum) from @PrimeFactors where rownum > @ThePrimeID select @ThePrime = min(TheFactor) from @PrimeFactors where rownum = @ThePrimeID Select @TheCompoundMaxID = max(RowNum) from @CompoundFactors end --the prime loop return (select top 1 Thefactor from @CompoundFactors where thefactor >= @x order by thefactor)END

Here's what it might look with all factors calculated from the primes...
A side effect of generating all factors is you'll get duplicate factors if you have duplicate primes.

For example 4 has 2 factors A and B. Both of which happen to be 2.
So A, B, AxB and BxA are the complete list factors of A.
Which means 2, 2, 4 and 4 will be in the list.
Even if you control the generation so you don't do BxA if you have AxB, you'll still get A,B,AxB or 2,2,4 - and a duplicate happens anyway.
It requires more work to try to find the distinct combinations than to simply sort them into the list when searching for the smallest greater than X.

But, as I've noted a few times before. The entire prime and factor generation effort is largely pointless, since it only applies in less than 0.001% of all inputs while not really extending the functional range in a significant way.

CREATE OR REPLACE FUNCTION next_factor(x IN INTEGER, y IN INTEGER) RETURN INTEGERIS c_factoring_limit constant integer := 99999820000081; n INTEGER; d INTEGER; v_temp INTEGER := y; prime_factors ora_mining_number_nt;BEGIN IF y / x < 100000 THEN --- It's a small divisor to N, simply walk down to find it d := FLOOR(y / x); WHILE MOD(y, d) != 0 LOOP d := d - 1; END LOOP; n := y / d; ELSIF y <= 1000000 THEN -- As long as Y less than a million we can walk up pretty quickly n := x; WHILE MOD(y, n) != 0 LOOP n := n + 1; END LOOP; ELSIF y <= c_factoring_limit THEN -- Y is too big and the factor is too far away to walk -- So lookup from primes v_temp := y; prime_factors := ora_mining_number_nt(); FOR i IN (SELECT n f FROM primes_lt_10million WHERE n * n <= y) LOOP WHILE MOD(v_temp, i.f) = 0 LOOP prime_factors.EXTEND; prime_factors(prime_factors.COUNT) := i.f; v_temp := v_temp / i.f; END LOOP; IF v_temp = 1 THEN EXIT; END IF; END LOOP; IF v_temp > 1 THEN prime_factors.EXTEND; prime_factors(prime_factors.COUNT) := v_temp; END IF; SELECT MIN(factor) INTO n FROM ( SELECT rn, ROUND(EXP(SUM(LN(f.COLUMN_VALUE)))) factor FROM (SELECT ROWNUM rn, COLUMN_VALUE factors FROM TABLE(POWERMULTISET(prime_factors))), TABLE(factors) f GROUP BY rn) WHERE factor >= x; ELSE -- Too big, can't do it RAISE VALUE_ERROR; END IF; RETURN n;END;

CREATE OR REPLACE FUNCTION next_factor(x IN INTEGER, y IN INTEGER, force_it IN VARCHAR2 default 'N') RETURN INTEGERIS n INTEGER; d INTEGER;BEGIN IF y / x < 100000 THEN --- It's a small divisor to N, simply walk down to find it d := FLOOR(y / x); WHILE MOD(y, d) != 0 LOOP d := d - 1; END LOOP; n := y / d; ELSIF y <= 1000000 OR UPPER(SUBSTR(force_it, 1, 1)) = 'Y' THEN -- As long as Y less than a million we can walk up pretty quickly -- If not, we can still force the algorithm to try n := x; WHILE MOD(y, n) != 0 LOOP n := n + 1; END LOOP; ELSE -- Too big, can't do it RAISE VALUE_ERROR; END IF; RETURN n;END;

It's the same basic flip-flip on super-fast walk-down for 99.999% of all combinations
with the remaining 0.001% being walk-ups.

By default I'll error out if I might need to walk up a value over 1 million.
But I added an optional 3rd parameter to allowing forcing it to exceed the limit and walk as far as necessary.

I conceded that for very large values this is impractical. However, walking up to a million is definitely a feasible option and covers all anticipated real-world values.

If I have to go larger then I'm stuck.
I can wait for small increases in Y; but no where near the platform limits.
The prime stuff, while very fun to play with, isn't a viable extension.

I'll leave the question open a while longer. I've tried everything suggested and I'm willing to keep doing so.

Why do you stop at just 1 iteration when you hit a factor?

WHILE MOD(y, d) != 0
LOOP
d := d - 1;
END LOOP;

n := y / d;

Why not

WHILE MOD(y, d) != 0
LOOP
d := d - 1;
END LOOP;

WHILE MOD(y, d) = 0 And n>= x
LOOP
y := y / d;
END LOOP;

n := y / d;

You'd need to test that the division didn't lower n below x, but if you've found a factor, why stop until you've removed them all?

And really, as a preliminary, you should factor out 2, 3, and 5. Doing so eliminates at least 60% of the possible walking distance, maybe more.

The syntax is quite alien to me, but I hope you get the gist.
I wonder about computational expense

Is y mod x more expensive than y/x?
Is 1000 mod x more expensive than 1000000 mod x?
One wonders.

I was thinking about my T-SQL.
It is possible to generate the compound factors on the fly.
At present, it generates the prime factors first, and then the composite factors from them.
Doing them simultaneously is actually simpler.
I'll have to hack that together and try it

mod(y/d) = mod(35/3 ) = 2
2 != 0
so d = d -1 = 3-1 = 2

mod(y/d) = mod(35/2) = 1
1!= 0
so d= d -1 = 2 -1 = 1

mod(y/d) = mod(35/1) = 0
0= 0
exit loop

n = y/d = 35/1 = 35

another example

Y=35
X = 3
expected result N = 5

d = floor(y/x) = floor(35/3) = floor(11.666666) = 11

mod(y/d) = mod(35/11 ) = 2
2 != 0
so d = d -1 = 11-1 = 10

mod(y/d) = mod(35/10) = 5
5!= 0
so d= d -1 = 10 -1 = 9

mod(y/d) = mod(35/9) = 8
8!= 0
so d= d -1 = 9 -1 = 8

mod(y/d) = mod(35/8) = 3
3!= 0
so d= d -1 = 8 -1 = 7

mod(y/d) = mod(35/7) = 0
0 = 0
exit loop

n = y/d = 35/7 = 5

I don't need to additional looping on d doing a walk down
Just like I don't need to additional looping on a walk up.

It's essentially the same thing, except walk down has the benefit of rapid reduction for the majority of values.

I can do arbitrary prime generation on the fly as well. That's not the problem.
The problem is it simply doesn't help much.

Let's imagine we have a really slow computer and looping will be too expensive.
So, instead of doing walk downs for y/x < 100000 I reduce it to just y/x < 10.

That still means 90% of all (x,y) combinations will be resolved by the walk downs in 9 steps or less.
So even with that exaggerated simplification it already becomes hard to justify using prime calculations.

Here's a version that will, if you force it to, generate primes and compound factors.
It will still only attempt this for the tiny 0.001% of the time that walk down won't work and it's not less than a million.

CREATE OR REPLACE FUNCTION next_factor( x IN INTEGER, y IN INTEGER, force_it IN VARCHAR2 DEFAULT 'N') RETURN INTEGERIS n INTEGER; d INTEGER; v_temp INTEGER := y; prime_factors ora_mining_number_nt;BEGIN IF y / x < 100000 THEN --- It's a small divisor to N, simply walk down to find it d := FLOOR(y / x); WHILE MOD(y, d) != 0 LOOP d := d - 1; END LOOP; n := y / d; ELSIF y <= 1000000 THEN -- As long as Y less than a million we can walk up pretty quickly -- If not, we can still force the algorithm to try n := x; WHILE MOD(y, n) != 0 LOOP n := n + 1; END LOOP; ELSIF UPPER(SUBSTR(force_it, 1, 1)) = 'Y' THEN -- Y is too big and the factor is too far away to walk -- Searching for prime factors is expensive but we can force it. v_temp := y; prime_factors := ora_mining_number_nt(); FOR i IN (SELECT COLUMN_VALUE f FROM TABLE(sdsmath.sieve_primes(SQRT(y)))) LOOP WHILE MOD(v_temp, i.f) = 0 LOOP prime_factors.EXTEND; prime_factors(prime_factors.COUNT) := i.f; v_temp := v_temp / i.f; END LOOP; IF v_temp = 1 THEN EXIT; END IF; END LOOP; IF v_temp > 1 THEN prime_factors.EXTEND; prime_factors(prime_factors.COUNT) := v_temp; END IF; -- From individual primes, generate all factors of Y SELECT MIN(factor) INTO n FROM (SELECT rn, ROUND(EXP(SUM(LN(f.COLUMN_VALUE)))) factor FROM (SELECT ROWNUM rn, COLUMN_VALUE factors FROM TABLE(POWERMULTISET(prime_factors))), TABLE(factors) f GROUP BY rn) WHERE factor >= x; ELSE -- Too big, can't do it RAISE VALUE_ERROR; END IF; RETURN n;END;

sdsmath.sieve_primes is the prime generator I described above. It will generate primes up to any number oracle will support, but you'll run out memory long before Y gets anywhere close to that limit.

Whatever steps you can take to minimize SQRT(y) cuts the number of primes you need to generate and that has to save execution time. Worst case scenario you are running 3 useless mod operations.

------------------------------------
On the walkdown

Thank you for the example. I see clearly now what the y,x,n,d in your sproc are doing.
Take 499977 and 47 for real world examples. 73 is the expected result. Walkdown gets there in ~3800 steps. It can be factored by 3, though, yielding the answer in ~1270 steps.

>>> I'd still look to remove factors of 2, 3 and 5 from y here

I'm not sure where you're going with that.

There is virtually zero effort in including in the prime generation. 2 is a seed in the list because it's special as the only even. 3 is another seed because it's essentially the starting point since I only iterate on odd numbers. So 5 is the first value to actually be evaluated and it's immediately identified because I never need to look at 2 and 3*3 > 5 so 5 is a prime in one step.

Walking by these can certainly be faster if the the N I'm looking for is itself one of these factors; but that's not particularly helpful.

A simple example, 194 = 2x97. I can't walk increments of 2,3 or 5 or any other prime to find my factors. Other numbers with other factors have the same issue. Just because Y is divisible by 2, 3 or 5 doesn't mean the "next" factor from X is necessarily a multiple of those.

Maybe I'm not understanding what you're trying to say.

-----------------------

Walking up from 2 to sqrt(y) is slower than primes, depending on the specific values but being "dumb" it consumes few resources. The primes takes more resources to get the speed. Which is "better"? I'm not sure. Probably the primes since it can extend to larger numbers before it too become infeasible.

I do cache my primes, so multiple runs are faster, especially if Y is smaller than previous runs so it can reuse the cache and doesn't have to generate anything new. But that benefit is also a penalty because it means I'm consuming memory that I'm not going to reuse.

The primes "can" be faster for some combinations but not all, and, if it's a first time run, the cost and time of initial generation puts them behind.

It would be probably be possible to extend the IF/ELSE chain into a bunch of categories that would each have their own optimal search pattern. This would be ideal.

However, what I keep coming back to is I've already got a solution that covers 99.999% of all possible combinations.

From a real world implementation perspective:

I have almost zero incentive to add complexity for that remaining 0.001% ESPECIALLY since the vast majority of the remaining values aren't solvable regardless of the implementation.

From a purely academic perspective:

I have slightly more than zero incentive to pursue these other options; but, in all cases we're simply "walking". Walking up, walking down, walking through primes in order to generate factors then walking through those.

So walking isn't particularly interesting anymore because, despite any marginal speed improvements, it's still too limited and basically rehashing what's already been mentioned.

For pure speed, including support for the largest real numbers I might hit, the best option would be to do the walk down for some threshold (100K seems ok, but might not be optimal) and then precalculate all of the prime factors to some large value as shown in one of the early suggestions. That would let me calculate to the square of that number.

It's not particularly sophisticated but that's ok. The direction of the several iterations has been to add a lot of complexity for only marginal gains. I don't like using a table of stored values; but I dislike even more the idea of caching in memory, potentially millions of values, or regenerating those values every time.

With a little analysis it would be possible to filter the table of pregenerated factors and remove all entries that would be viably solved by the simple walk down.

As long as I keep in mind that the "bad" solutions are only for a tiny minority of inputs I don't have to care much.

What I really, really want is a method to directly calculate some set of boundary conditions that I could apply to either the walking or the lookup table.

Without a boundary I'm always going to be doing some sort of "dumb" looping because there won't be a way to make a smart decision of what to skip or when to stop.

Consider Y = 1024002 and x = 31
Y is even, so you are checking with
WHILE MOD(y, d) != 0
LOOP
d := d - 1;
END LOOP;
because n may or may not be even. You start that walk at 33032 and walk down ~8650 steps to get 42 for a result

divide out 2 though and you get 502001, which is odd.
now you can check with
WHILE MOD(y, d) != 0
LOOP
d := d - 2;
END LOOP;
after ensuring that the initial d is odd -- because the quotient of an odd denominator and odd numerator is also odd. By ensuring that the starting value you walk down is not a product of two, you can double the speed of the walk AND cut the spread in at least half

Making the starting point the odd number 16517 (floor(502001/31) is even) and decrementing by 2, the solution is found in~2300 steps, almost a 4 fold improvement.

Removing factors of 3 (there are 5 of them) leaves 2107 -- which is not divisible by 5 -- so we could build a check that wouldn't bother with d where mod(d,3) = 0 or mod(d,5)=0 -- but since that's more work than actually checking every value, there isn't much point

Floor(2107/31) leaves 67 -- nice and odd -- so decrementing by two, in eleven steps we get to the expected 43

Now, 2107 is also divisible by 7, leaving 301. That leaves a d less than x--which means a walk up from 31 by two is in order. 6 steps later, 43 is achieved.

So, eliminating all factors of 2, 3, 5, and 7 -- which cost 8 mod operations and one addition operation to esnsure that the start number was odd, netted us a result in ~15 steps instead of ~8650 steps.

Ensuring that the start value is odd allows decrementing by two. That's clearly worth doing. I think the risk that Y isn't a factor of 2,3,5 and 7 is worth taking. At worst it costs 4 mod operations. At best, it saves thousands.

Two perhaps final points.
One
Distribution of primes follows a law 1/LN(n) where n is the number of digits in the number.
The odds of a given n digit being prime are 1/LN(n).

Walking all odd values (because I think I've conclusively shown that incrementing/decrementing should be by two) is SQRT(Y)/2 operations

Walking the primes is (1/LN(LOG10(Y))) * SQRT(Y)

You can plot these. They do cross between 10M and 100M, so depending on your machine and the cost of generating primes, the cutover is in that range

Two

I have slightly more than zero incentive to pursue these other options; but, in all cases we're simply "walking". Walking up, walking down, walking through primes in order to generate factors then walking through those.

That why they use it for cryptography, because there aren't better ways to do it--and folks have gone batty over factoring since Eratosthenes and before.

I wasn't quite on with ID: 39489324
If Y is odd, N must be odd
If Y is even, you take out factors of 2 until the result is odd.
You can then walk down by two's until you get the lowest ODD factor
What I missed is that you then have to walk down by one's from the lowest odd factor to X, because there may be a lower even factor.

Details. I can see why this stuff has driven people mad since ancient times.

For any Y where you are going to factorize, it isn't necessary generate all the compound factors. Everything else has been predicated on walking down from an arbitrary value larger than x. Why not do that here too? Walk the primes up. Have a variable that starts at 1 and gets multiplied by each factor found. The instant that variable > x, walk it down by one's to x.

That saves the expensive step of generating all the compound factors

I can't simply walk through primes multiplying as I go.

Let's take something simple like

Y = 42 = 2 x 3 x 7

X = 15

2 x 3 = 6
walk to next prime 7
2 x 3 x 7 = 42

I skipped N=21, the answer I was looking for.

So, yes, I could potentially skip some compound factors on the high end.

If X = 12, I need to look at 2x7; but I could skip 3 x7 and 2x3x7

So I do need to generate intermediate combinations.

as to previous posts

Definitely walking by 2 seems reasonable if Y is odd.

I haven't had time to think about walking by 2 when Y is even or 3,5,7, etc.
Probably won't until Sunday

I have no idea where you were going with the cryptography reference.
I know prime generation is used in many of the algorithms; but other than that I'm not sure there is anything to relate to this task. Sure, loops involved in both; but I don't think that's sufficient to make a correlation.

From way back at ID: 39478356
For something that you want to attack with the primes, you don't need to generate all the composite factors. All we really want is an arbitrary value for d such that d>=x and mod(y,d) = 0

So, for Y =1M X =23, let d =1 and multiply d by each factor removed until d>x
1M/2 = 500K, d= 2
500K/2 = 250K, d=4
250K/2 = 125K, d=8
125K/2 = 62.5K, d=16
62.5K/2=31250, d=32

d>x, so now, I don't know the syntax, but we'd like to save d say as di, in case we fluked into the correct value, and run the loop.

di integer; (is that the syntax?)
di = d;
WHILE And d >x
LOOP
IF MOD(y, d) != 0
THEN
d := d - 1;
ELSE
di = d;
END IF;
END LOOP;

n := y / di;

So, 32 was overshoot, but we walked it back to the correct answer of 25.

Gets the job done without POWERMULTISET, which is something you wanted.

For big numbers, say, Y = 99499799 and x = 6
Start d at FLOOR(SQRT(Y))+1 (that equals 9975)
Walk it down. When you hit a factor, set d = FLOOR(SQRT(d))+1 and keep walking down

di integer; (is that the syntax?)
di = d;
WHILE d >x
LOOP
IF MOD(y, d) != 0
THEN
d := d - 1;
ELSE
di: = d;
d:=FLOOR(SQRT(d))+1;
END IF;
END LOOP;

n := y / di;

This alternative doesn't need the primes.
North of 1T, the primes may be faster, depending on relative costs of generating and accessing the primes vs sqrt and mod

There's still bugs in this, but I am too fried to find them at the moment
Here is is though, to grab the logic of it
-------------------------------
----Here's the squares method as T-SQL
@y decimal(38,0) = 0,
@x decimal(38,0) = 0
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

-- Insert statements for procedure here
Declare @d decimal(38,0)
Declare @n decimal(38,0)
if Floor(sqrt(@y)) >= @x --square root big enough?
begin --yes
select @d = Floor(sqrt(@y))+1
--select @d as 'dropRoot'
end
else
begin
select @d = @y-1
end
select @n = @d
while @d >= @x
begin
while (@y % @d) <> 0
begin
--select 'dropping' as huh
select @d = @d-1
--select @d as 'dropOne'
end
--hit a factor
select @n=@d
if Floor(sqrt(@d))+1 >= @x --square root big enough?
begin --yes
select @d = Floor(sqrt(@d))+1
--select @d as 'dropRoot'
end
else
begin --no
if (@y % @d) = 0
begin
select @d = @d-1
--select @d as 'Bump Down'
end
while (@y % @d) <> 0 and @d >= @x
begin
--select @d as 'PreRootTooSmall'
select @d = @d-1
--select @d as 'PostRootTooSmall'
end
end
end
return @n

BigInt is limited to an integer of 2GB. Declaring decimals lets me get a bit bigger. SQL Server Express won't return a SQRT larger than the 2GB BigInt limit though :(

All tests below 4GB for Y come in at ~0.026 seconds, so that seems to be the system baseline. It's subsecond for all Y > 4.5T. North of that performance goes south quick.

And things go south when Sqrt(y) > x
So you have to check that the square roots of things used remain greater than x
So this bit is for occasions where Y is very big, but Y/X > sqrt(y)

What I really, really want is a method to directly calculate some set of boundary conditions that I could apply to either the walking or the lookup table.

One is plotting (SQRT(Y) - x)/2 vs (1/LN(LOG10(Y))) * SQRT(Y)
The left term is the cost of walking all values, the right is the likely cost of walking the primes.
'Ok given a Y and X what are our choices for algorithms to pursue?

'One: We can have a largish Y and small X
'If it is odd, walking down from floor(y/x) will get the largest factor, and thereby the smallest

'Two: We can have a Y and a large X, larger than SQRT(Y)
'walking up from x by ones, if y is even, or two if odd
OR
'walking down from SQRT(y), depending on the spread between x and SQRT(y)
'The desired factor will not be optimally found by method one.

'Three: We have a very large Y, and a small x
'Find the nth even root of y that is still larger than x
'walk down to x, looking for a candidate
'if we find one great, if not, walk up from that root

'Four: We have a humungous Y, larger than 100B
'prime factorization of Y until the product of primes exceeds X
'Then walkdown from the product to x
'if no candidate is found, then walk up from x

'five: maybe worthwhile in every case
'prime factorization of 2,3,5,and 7 from Y
'apply the applicable case above to the result

Thanks for the fun discussion. I'm still going with my basic walkdown as the first pass, with checks for odd/even and some possibly some simple prime checking.

I'll use other prime checks for other limits though. I don't have code to post and probably won't for a long time; but I'm content with the status as it is now.

The remaining optimizations will be playing with thresholds vs hardware/versions to determine which methods to use when. So, even if I did have a version to post right now, it would only reflect a single platform and thus not be ideal for everyone.

For PAQ value, I think there is sufficient direction here for others to pursue their own methods, including port to non-oracle platform. On a pure speed note, I might go ahead and port to "c" and run it as an external procedure. I suppose if I did that I could link in libraries for math of arbitrary size.

That might be fun. If I do that, it may be warrant a new question.

It isn't often that one gets the chance to discuss the kind of math we were playing with outside an academic environment, throw in platform optimizations and hardware underpinnings. Glad you found what I had to offer useful.

Nick67

0

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

At Springboard, we know how to get you a job in data science. With Springboard’s Data Science Career Track, you’ll master data science with a curriculum built by industry experts. You’ll work on real projects, and get 1-on-1 mentorship from a data scientist.

Open in new window

Or this

Open in new window