asked on

SQL Math Query

Given integers X <= Y
What is the smallest N >= X such that Y is evenly divisible by N.

Here are few SQL attempts.

Very bad...

WITH target AS (SELECT 15000 y FROM DUAL),
     factors
     AS (SELECT n
           FROM (    SELECT y, LEVEL n
                       FROM target
                 CONNECT BY LEVEL <= y)
          WHERE MOD(y, n) = 0),
     numbers
     AS (    SELECT y, LEVEL x
               FROM target
         CONNECT BY LEVEL <= y)
  SELECT x, MIN(n)
    FROM numbers, factors
   WHERE n >= x
GROUP BY x
ORDER BY x;

Open in new window

These are better, note performance of which is best varies between 11g and 12c.

WITH target AS (SELECT 15000 y FROM DUAL),
     factors
     AS (    SELECT y, COLLECT(CASE WHEN MOD(y, LEVEL) = 0 THEN LEVEL END) factors
               FROM target
         CONNECT BY LEVEL <= y),
     numbers
     AS (    SELECT LEVEL x, factors
               FROM factors
         CONNECT BY LEVEL <= y)
SELECT x,
       (SELECT MIN(COLUMN_VALUE)
          FROM TABLE(factors)
         WHERE COLUMN_VALUE >= x)
           n
  FROM numbers;

Open in new window

WITH target AS (SELECT 15000 y FROM DUAL),
     numbers
     AS (    SELECT LEVEL x, CASE WHEN MOD(y, LEVEL) = 0 THEN LEVEL END n
               FROM target
         CONNECT BY LEVEL <= y)
  SELECT x,
         FIRST_VALUE(CASE WHEN n >= x THEN n END IGNORE NULLS)
             OVER(ORDER BY x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
    FROM numbers
ORDER BY x;

Open in new window

WITH target AS (SELECT 15000 y FROM DUAL),
     numbers
     AS (    SELECT LEVEL x, CASE WHEN MOD(y, LEVEL) = 0 THEN LEVEL END n
               FROM target
         CONNECT BY LEVEL <= y)
  SELECT x, MIN(n) OVER(ORDER BY x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
    FROM numbers
ORDER BY x;

Open in new window

What I'm looking for is something more efficient than what I already have.

I won't award points for effort.
If your query is demonstrably worse than any of these or any other posts that precede it please don't dilute the thread by posting it.

If you post a link, I will ignore it unless there is sufficient text posted along with it that the post can stand on its own. I'll simply delete it if egregious enough.

My measuring stick for superior/inferior is v$mystat. In particular the db time, gets, cpu, sorts and memory statistics. (i.e. autotrace but a little more thorough.)

If you don't know SQL but want to post the description of an algorithm; that's fine. I don't mind trying to implement it in SQL. Once implemented, if it's not as good as what's already been offered I'll say so.

sameer2010

Try this

declare @x int=5
declare @y int=22
declare @n int=6

;with t as(
select @x a,@y%@x b
union all
select a+1,@y%(a+1) from t where a<@y
)
select min(a) minNumber from t where b=0 and (@y/a)%2=0

Open in new window

Or this

declare @x int=5
declare @y int=22
declare @n int=6

;with t as(
select @x a
union all
select a+1 from t where a<@y
)

select min(a) minNumber from t where @y%a=0 and (@y/a)%2=0

Open in new window

sameer2010

You can replace % by MOD, etc. to suit Oracle specific syntax. But logic should be similar.

Sean Stuber

ASKER

The queries in the original post generated the correct N for all X 1..15000.
I didn't specify that as a requirement though.

However, it's an easy modification of one of my original queries to produce the same kind of single N result.

I just tested both of the t-sql snippets above when converted to oracle sql syntax. They appear to be identical in that the minor differences in execution time and resource consumption can be accounted for in standard variance of a live system.

However, in comparing 100 executions of this translated version of one of the above....

WITH t(a, b)
     AS (SELECT 67 a, MOD(15000, 67) b FROM DUAL
         UNION ALL
         SELECT a + 1, MOD(15000, (a + 1))
           FROM t
          WHERE a < 15000)
SELECT MIN(a) minnumber
  FROM t
 WHERE b = 0 AND MOD(15000 / a, 2) = 0;

Test duration: +000000000 00:00:06.312939000
STAT...CPU used by this session                                          630
STAT...sorts (memory)                                              1,493,500
STAT...sorts (rows)                                                1,493,400
Total Latches:        4,676,314

Open in new window

to 100 executions of this alteration of one of my original queries...

WITH target AS (SELECT 15000 y FROM DUAL),
                     numbers
                     AS (SELECT LEVEL x, CASE WHEN MOD(y, LEVEL) = 0 THEN LEVEL END n
                           FROM target
                         CONNECT BY LEVEL < y)
                SELECT *
                  FROM (SELECT x,
                               FIRST_VALUE(CASE WHEN n >= x THEN n END IGNORE NULLS)
                               OVER(ORDER BY x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
                          FROM numbers)
                 WHERE x = 67;

Test duration: +000000000 00:00:02.021022000
STAT...CPU used by this session                                          202
STAT...sorts (memory)                                                    200
STAT...sorts (rows)                                                1,500,000
Total Latches:              484

Open in new window

The differences are quite dramatic.

While this particular path is not nearly as efficient, I do appreciate the idea of a recursive CTE as opposed to the connect by level method of generating values.
I will continue to explore that and see if I can come up with something better.

Sean Stuber

ASKER

The performance stats above were in 12c.

I'll compare multiple versions again in 11g tomorrow.

Sean Stuber

ASKER

The recursive with clause is the big performance killer. Simply generating numbers 1-15000 via the recursive cte vs connect by level add significant weight.

So, continuing to adapt my other queries to use the recursion, I reversed the t-sql above to use the oracle connect by syntax with significant results.

WITH t
AS (SELECT 67 + LEVEL - 1 a, MOD(15000, 67 + LEVEL - 1) b
FROM DUAL
CONNECT BY LEVEL <= 15000)
SELECT MIN(a) minnumber
FROM t
WHERE b = 0 AND MOD(15000 / a, 2) = 0

running 100 times...

Test duration: +000000000 00:00:01.006264000
STAT...CPU used by this session                                          101
STAT...sorts (memory)                                                    100
STAT...sorts (rows)                                                      100
Total Latches:              542

Open in new window

again, tests on 12c

sameer2010

Sounds good. Unfortunately, I do not have Oracle. There is another table in SQL server that can give numbers for comparison. But I am sure it does not exist in Oracle so did not propose. Essentially, min should do the trick. Let me see if I can think of anything better in SQL that you can adopt/adapt.

Sean Stuber

ASKER

I taking a different approach now, much simpler.

Starting with my number (67 in these test cases) I simply walk forward until I reach my N and then stop, then take whatever the last number I found.

It's dirt dumb but also the most efficient thus far because I'm not trying to find all of the factors, and then sifting through to find the one I want. I arrange the search so the first one I find will be the N I'm looking for.

SELECT MAX(a)
  FROM (    SELECT 67 + LEVEL - 1 a
              FROM DUAL
        CONNECT BY LEVEL <= 15000 AND MOD(15000, 67 + LEVEL - 2) != 0)

Open in new window

100 executions on 11g with a much less powerful machine than the 12c tests above

Test duration: +000000000 00:00:00.007181000
STAT...CPU used by this session                                            2
STAT...sorts (memory)                                                    100
STAT...sorts (rows)                                                      100
Total Latches:              157

Open in new window

sameer2010

Starting with my number (67 in these test cases) I simply walk forward until I reach my N and then stop, then take whatever the last number I found.

This would be faster only if N is closer to lower bound.

Sean Stuber

ASKER

This would be faster only if N is closer to lower bound.

no, that was my thought too and why I didn't do that originally.

but: if X = 1 or X = 15000 or X is anything in between, all of the posts above generate all of the factors between X and 15000 by "walking" from X to each new factor.

So, that effort is already being taken by the previous ideas; but the problem with them is that after walking from X to N, they keep on walking looking for other factors that we already know we're going to throw away.

Sean Stuber

ASKER

it is always best to test though...

using Y=15000, X=7501 - worst case scenario

100 executions of the previous best
(note I removed the extra mod, it wasn't needed and, in this case resulted in wrong results anyway)

WITH t
                       AS (    SELECT 7501 + LEVEL - 1 a, MOD(15000, 7501 + LEVEL - 1) b
                                 FROM DUAL
                           CONNECT BY LEVEL <= 15000)
                  SELECT MIN(a) minnumber
                    FROM t
                   WHERE b = 0

Test duration: +000000000 00:00:03.609293000
Total Latches:              580

Open in new window

100 executions of

SELECT MAX(a)
                    FROM (    SELECT 7501 + LEVEL - 1 a
                                FROM DUAL
                          CONNECT BY LEVEL <= 15000 AND MOD(15000, 7501 + LEVEL - 2) != 0)

Test duration: +000000000 00:00:01.839518000
Total Latches:              186

Open in new window

sorts are still 100 for both

Sean Stuber

ASKER

another approach...

rather than walking up from X to N
walk down from Y/X to Y/N

WITH setup AS (SELECT 15000 y, 7501 x FROM DUAL),
                       divisors
                       AS (    SELECT y, FLOOR(y / x) - LEVEL + 1 d
                                 FROM setup
                           CONNECT BY MOD(y, FLOOR(y / x) - LEVEL + 2) != 0)
                    SELECT y / MIN(d)
                      FROM divisors
                  GROUP BY y;

Test duration: +000000000 00:00:00.005595000
STAT...CPU used by this session                                            1
Total Latches:              158

Open in new window

WITH setup AS (SELECT 15000 y, 67 x FROM DUAL),
                       divisors
                       AS (    SELECT y, FLOOR(y / x) - LEVEL + 1 d
                                 FROM setup
                           CONNECT BY MOD(y, FLOOR(y / x) - LEVEL + 2) != 0)
                    SELECT y / MIN(d)
                      FROM divisors
                  GROUP BY y;

Test duration: +000000000 00:00:00.010635000
STAT...CPU used by this session                                            2
Total Latches:              157

Open in new window

As expected the scaling is different here. Depending on the actual Y and X values the number of steps UP or DOWN to walk will be more or less expensive.

I haven't determined if there is an efficient way to pick before starting to walk or if that effort will be greater than the savings of picking the wrong one.

Most of the time walking down will be the shorter path so I'll default to that for now.

Nick67

I am not certain that I grasp completely what you are trying to do.
Let's say Y = 1024
And x = 32

You are looking for the SQL that would return N=64?
(the next factor of 1024 that is bigger than 32)

Is that right?

Because if that is what you are after, then a table of Prime numbers that are less than Y, if statically entered would probably aid in efficiency. You'd start factoring from 2 up until you got to X. From there, you'd multiply factors together until you got something greater than X

Sean Stuber

ASKER

no, N >= X So in your example N=32 because 32 is a factor of 1024

Generating all factors by multiplying combinations of primes sounds good, the problem is either generating the list of primes or reading them from a source. You'd either have to know already how many primes to use or you'd have to include a square root calculation to make sure you didn't get too many. Furthermore, simply having a list of primes isn't enough, you also have to know how many times each prime can be included in factors. For 1024: 2 is the only prime factor, but it's a factor 10 times.
I happen to know that but determining all prime factors and the number of each in all combinations of subfactors is not trivial for a general solution.

Dynamically multiplying within sql is not a cheap operation either.

Here's my best attempt at implementing your algorithm.
Rather than introduce physical IO, I took advantage of the fact that I know 2 is the only prime factor of 1024. Obviously this is impractical as a general solution but I wanted to give this the best chance possible.

Here though is where things start to go bad. Generating all of the factor combinations is not trivial. Fortunately Oracle has the POWERMULTISET function to do that for us; but it requires a collection object to operate on. So I built that into the starting point.

Next though is the actual multiplication. There is no direct means of multiplying all numbers in a set. So we use a math fact that EXP(SUM(LN())) will be the product. This is convenient syntax but is a lot of work for the cpu. We could write a separate pl/sql function or a user defined aggregate but those would be even more expensive than using the built in functions.

Once we have all the factor product combinations then it is a simple matter to compare those to 32 and pick the smallest product that is a factor of 1024.

WITH prime_factors
     AS (SELECT ora_mining_number_nt(2,2,2,2,2,2,2,2,2,2) f from dual)
SELECT MIN(f)
  FROM (SELECT rn, FLOOR(EXP(SUM(LN(y.COLUMN_VALUE)))) f
          FROM (SELECT ROWNUM rn, COLUMN_VALUE s
                  FROM TABLE(POWERMULTISET((SELECT f FROM prime_factors)))) x,
               TABLE(x.s) y
        GROUP BY rn)
 WHERE f >= 32 AND MOD(1024, f) = 0

Open in new window

However, as anticipated, even with an artificial head start this is simply too expensive and takes nearly ten seconds to complete the test run of 100 executions.

Test duration: +000000000 00:00:09.772921000
STAT...CPU used by this session                                          975
Total Latches:            7,949

Open in new window

I welcome suggestions on how to implement this method more efficiently; but I have my doubts it will ever be able to compete because it has too many expensive preliminary steps to create the data before it even starts evaluating.

PortletPaul

for anyone without Oracle there is sqlfiddle.com (choose Ora 11g in drop down)
e.g. http://sqlfiddle.com/#!4/9bbe9/293

I'd like to understand:
do the rules of this question exclude for loops?
what is this for? (a business problem or a theory exercise)
& what are "the expected results" (are you really looking for 150,000 rows of 2 columns?)

btw, the url above produces:

N      MIN(X) MAX(X) COUNT(*)
15000  7501   15000  7500
7500   5001   7500   2500
5000   3751   5000   1250
3750   3001   3750   750
3000   2501   3000   500
2500   1876   2500   625
1875   1501   1875   375
1500   1251   1500   250
1250   1001   1250   250
1000   751    1000   250
750    626    750    125
625    601    625    25
600    501    600    100
500    376    500    125
375    301    375    75
300    251    300    50
250    201    250    50
200    151    200    50
150    126    150    25
125    121    125    5
120    101    120    20
100    76     100    25
75     61     75     15
60     51     60     10
50     41     50     10
40     31     40     10
30     26     30     5
25     25     25     1
24     21     24     4
20     16     20     5
15     13     15     3
12     11     12     2
10     9      10     2
8      7      8      2
6      6      6      1
5      5      5      1
4      4      4      1
3      3      3      1
2      2      2      1
1      1      1      1

Open in new window

based on 3rd code block in question just nesting that final select and summarizing

WITH
  target    AS (
                SELECT 15000 y
                FROM DUAL
                )
, numbers   AS (
                SELECT LEVEL x, CASE WHEN MOD(y, LEVEL) = 0 THEN LEVEL END AS n
                FROM target
                CONNECT BY LEVEL <= y
                )
SELECT
      n
    , min(x)
    , max(x)
    , count(*)
FROM (
      SELECT
            x
          , FIRST_VALUE(CASE WHEN n >= x THEN n END IGNORE NULLS)
              OVER (
                    ORDER BY x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
                   ) AS n
      FROM numbers
      )
GROUP BY n
ORDER BY n DESC;

Open in new window

Sean Stuber

ASKER

If you can get pl/sql to generate an answer with loops faster and with less cpu/memory or other resource consumption then by all means go for it.

The final result must be sql accessible though, so if you generate a function
that I call by "select your_function(x,y) from dual" - the cost of the sql & pl/sql context switch (albeit minor) would be part of the cost of the algorithm.

I don't understand the intent of your posted code. I can, obviously, see the results and understand the logic; but given the examples already posted. I'm not sure why you're pursuing a code path with those extra calculations. It doesn't produce anything at all like the results I've posted. You don't need to explain it though, it doesn't produce what I'm looking for.

I currently have no business use. However a couple of place it could apply would be tiling rectangular areas or determining a fixed width for text given a minimum width requirement.

For example: I have 120 one-foot square tiles to layout for an area. I have a 7-foot long object that could be anywhere in my tiled area. What is the minimum width I can use and generate a rectangular area? Answer: 8. That's a simple one, most could probably do it in their head, I'm looking for the most efficient means of generating that answer for a general case.

Expected results: The question pretty much says it. Given X and Y, find N.

My original queries in my question listed the answer N for each X of 1 to Y.
And that was what I was looking for.

However I didn't actually specify generating everything as a requirement and sameer2010, the first responder, proposed a solution that would solve for a specific X.

This a more practical use case and is the target I've been pursuing as well since then.

If you'd like to attempt an answer, my post http:#a39471023 is currently the best general solution for finding the correct N for a given X and Y.

Do note, as I mentioned above, it is possible that for a minority of X for a given Y, it is possible that searching UP will be more efficient than searching DOWN for N. So, if you find something that happens to work well for some values but is worse than the above for most others then it's not really a candidate.

I am testing all queries with the literal values embedded. This is, of course, somewhat of a cheat because the final solution, if implemented in real code would use bind variables. However, this is an exercise in curiosity so I'm willing to accept that and I'd like to give each proposed solution its best possible chance and giving the optimizer all required information is as good as we can do.

I have been using Y=15000, X=67, with expected result of 75 in my test cases to provide a common baseline. I do test other values as well. If I see a significant performance variance by using other values I'll note it in my responses.

PortletPaul

Thanks, got it (in spades:) - wasn't expecting such a comprehensive reply, but appreciated.
Only purpose of the result was to indicate 'waste' and I wasn't just repeating your own code as my suggestion either in case that wasn't clear.

working 'up' to a minimum makes sense too. Cheers, Paul.

Sean Stuber

ASKER

Note for anyone using sqlfiddle, be sure to post your code in the thread as PortletPaul did in his example. An answer that only exists via link isn't much use if the link goes stale.

Also, all of the answers above finish sub-second. Even the worst of them is only about a tenth of a second. So, the network latency and lag as well as html rendering time will completely negate tracing of efficiencies.

Using that or similar sites to test your method works is great; but since I already know how to get the correct answer, simply working isn't sufficient. I will only accept the answer that is demonstrably and repeatably superior to all others.

I don't want to discourage participation; but I don't want to delude anyone into thinking that just because your answer is correct and "fast" that it will be accepted.

I've already demonstrated that I can write fast but inferior methods in multiple ways. :)
I don't need to know additional ways to NOT do it.
So now I'm only looking for the best.

Also note I will be testing the code on various machines of wildly different computing capacity with versions spanning 11.2.0.1 to 12.1.0.1. If you happen to find an answer that works well on one version but not on another that can still be a candidate answer. The current best version is the best on everything I've tried; but my original "best" answers differed based on version 11 vs 12.

Nick67

Alright.
I thought about it and played with it.
Lists of prime numbers were interesting, but not useful.
On the other hand, I generated a table of factors for the first 1,000,000 integers.
Took ~6 minutes in MS Access VBA
Here it is.
It's a big file, though.
41 MB

Haul it down and mount it up.
Index both columns, if you'd like.

At that point
SELECT TOP 1 TheFactor from tblOrderedfactors where theNumber = 15000 and TheFactor >=67;"
runs in an execution time that my desktop cannot effectively measure.

Theoretically, I'd think this would have to be the most efficient manner in terms of CPU time and execution time -- but who knows, maybe the disk time, and CPU time to get it from disk would be more expensive than computing it.

I doubt it, but until you conduct the experiment, you won't know for sure.

Factors for the first 1M integers generates an 11M row table. If that's overkill (and it is, but it is fun to generate tables that big!) let me know what upper bound you'd prefer and I'll generate and upload it.
OrderFactors1M.zip

Nick67

The factors for the first 20K integers generates in under a minute and is 200K rows.
Here it is

It gives you a baseline to compare against other solutions that use SQL to generate the solution dynamically.

Let me know how it works out
OrderedFactors20K.zip

Nick67

Here's the MS Access file used, if you are interested in the code that generated the tables, or wish to do it on your end.
LCM.zip

Nick67

Running it over in my head, and grinding a few numbers, I now doubt very much that an algorithm to strictly calculate N by walking is the best solution. Worst case scenario for 1M factors is Y = 999998 and X = 3.

499999 is a prime number, so a straight walk would run through almost a half-million operations before coming to a solution.

And that isn't really an edge case. Of the numbers between 1 and 1M, 553531 of them have 5 factors or less, so more often than not, walking is going to very expensive.

(Generating the table once for use in the general case is likely the most efficient solution. Think how long generating ALL the solutions for Y = 1 to 15000 and X = 1 to 7500 would take! Straight select from a generated table would be the winner.)

I am thinking that, if a person were to pursue an algorithmic solution, a progression using a table of primes might be more efficient in the general case than walking. A person is looking for the smallest upper bound ( that can be cheaply discovered) that N can be, which would either be the next prime number large than X or the product of all the prime factors of Y incrementally factored out at each step. From there, you'd walk down to X. Such an algorithm would minimize the number of steps taken in the general case. You'd start by loading Y into a variable. Check if Y is prime. There's no point in doing anything else if it is! You'd divide by the first prime (2) until there is a remainder, or the result < X, in which case multiply the smallest factored prime back in, and the result is the upper bound for N. If you succeed in factoring out primes, multiply each of them together. If that product > X, that is the upper bound for N. If the result or the product is X, then the solution is found. If a remainder is found when attempting to factor, check if the original number is prime. Stop if it is. Further factoring will be useless.

Now the question becomes what to do on the iterations. To work from the top end, would require finding the largest prime < TheResult/TheNextSmallPrime. Which is just factoring from the low end, anyway, so that doesn't make any sense. Working the low end IS simultaneously working the high-end. So what about working the middle? Find the square root of the result, and test the first prime smaller than it and the first prime larger than it.

Primes thin out moving away from zero. The efficacy of working the middles as a means to cut the walk down is going to be a function of the factoring result. The larger the result, the more likely a walk from zero up is will take more steps. So an algorithm would need to work that in. Probably a result > 2000 is when a shot at the middle (because square root is expensive) might become worthwhile.

I'll play with it some more.

Sean Stuber

ASKER

I had a lookup table already and while I have plenty of space to store it I was looking for something a little more generic. I failed to specify this in my question though so I'll give credit for it unless something else runs better.

The table based lookup IS faster but requires more latches across the board, but far less than the worst cases.

I'll work up some kind of summary to display. I don't have a good way to show the pattern across results.

I already have all primes up to 1000000 in a table as well. That's not really helpful though since I still need to assemble them into factors for a given Y. I tried the prime assembly before and it's awful.

I do have another problem though. I tried solving for larger Y values and the table simply ran out of values and the connect by ran out of memory.

Of course, I could make the table bigger and I could add more memory or, bite the bullet and go back to the recursive with clause. It's more expensive in processing but tends to not hit the memory limits as soon as the connect by.

The other option would be to go with pl/sql. I'll try that too; but I'm thinking the question may be winding to a conclusion because it's looking like these implementations are forcing me to compromise.

PortletPaul

>>I'm thinking the question may be winding to a conclusion
This is a suggestion...
have you thought about using MODEL?

It's not something I have used and was "thinking of investigating" but I've seen claims that suggest it may be fast, and the problem type does seem to suit a "formula".

& is java out of the question? (not volunteering, just hinting)

Sean Stuber

ASKER

java is acceptable but a problem would be measuring it accurately against the sql operations.

v$mystat and v$latch give me solid metrics to compare in an apples-to-apples way.

I can compare overall execution time; but I am interested in how expensive it is to run.
I'm open to suggestions on how to compare reliably, especially since we're already in the realm of micro-to-milliseconds.

I'm leaning toward a pl/sql solution because it just occurred to me that whatever the final best method is, I'll probably wrap it in a function anyway. It's possible the pl/sql will simply be

begin select .... into THE_ANSWER from....; return THE_ANSWER; end;

but, knowing there will be at least one layer of pl/sql involved; if anyone was discouraged from trying that before; please do give it a shot.

I'm willing to give MODEL a try, if nothing else to try to resolve the memory errors when solving large values with connect by.

As it stands now, a precalculated lookup table is still the fastest so far but sometimes heavier, sometimes lighter on resource consumption with the obvious limitation that it's only useful up to its largest value.

Nick67

:)
With an algorithm in place to calculate a table of whatever size you might want on a non-production machine, it becomes a question of disk space.

I don't do LISP, but the underpinnings of this are very LISP-y. The factors of 1024 are a function of 512, 512's of 256, 256's of 128...and so on. LISP is very good with recursive stuff like that. You may want to look up a place to post LISP questions and pose this problem.

It also is heavily involved with number theory, prime factorization and all being the root of what needs doing. Posing the question in a mathematics forum may also yield some interesting results.

I am going to keep playing with it. It's pretty clear that a straight walk will have occasions where it will be a hideously poor solution. An algorithm that makes use of a table of prime numbers and perhaps a hint table of numbers whose first prime factor is than some arbitrary value may perform reasonably well. I have to build that first, though, before you can test it

Sean Stuber

ASKER

It's already cross-posted to the math & science topic area.

I haven't touched lisp in 20+ years; but even if I were proficient in it I don't think it'd be a good fit here because I'm looking for something that can can be invoked from the database.

I suppose it might be possible to write an external procedure call via a compiled library; but if I go that route, I'd just write it in c and I can definitely recurse in c efficiently; but I'd probably implement it with iteration instead simply to avoid stack overhead.

Nick67

Ok, am not sure that I really buy this (from http://primes.utm.edu/prove/prove2_1.html)

This method is so fast that there is no reason to store a large list of primes on a computer--an efficient implementation can find them faster than a computer can read from a disk.

Now, I poked into these webpages AFTER I built the upcoming sample, and earlier you had mentioned a square root calculation--and that turns out to be a critical bit. Take 100M. The square root of that is 10K. There are 1230 prime numbers between 2 and 10007 -- which is makes for a fairly small table. From there, you need a routine to systematically factor out and record prime numbers starting at 2, until either there's nothing left to factor, or you run out of primes. Any number that doesn't have a factor less that it's own square root is going to be prime. Then you need a routine to create the compound factors and it's off to the races

So, with a few lines of VBA and a table of 1230 rows, you've got something that can answer your question in general for Y <100M. Not bad. I fired a random number at it, 23145691 -- figuring that ~23M ought to be a good test. Turns out that number is prime.

My system isn't able to measure precisely enough to tell me how long it takes to execute. Zero is what it says. VBA code below. Command8_Click fires it off. Text1 is Y. Text3 is X
Text5 is N.

Private Sub Command8_Click()
Dim db As Database
Dim rs As Recordset
Dim thefactors() As Long
Dim x As Integer
Dim y As Integer
Dim Candidate As Double
Dim Test As Double
Dim BiggerThanN As Long
Dim TheStart As Double
Dim TheEnd As Double
TheStart = Now

Candidate = Me.Text1.Value
ReDim thefactors(100)

Set db = CurrentDb
Set rs = db.OpenRecordset("select * from tblSmallPrimes;", dbOpenDynaset, dbSeeChanges)

x = 0

Do Until rs.EOF
    Test = Candidate
    Do Until CLng(Test) <> Test Or Test < rs!ThePrime
        Test = Test / rs!ThePrime
        If CLng(Test) = Test Then
            Candidate = Test
            thefactors(x) = rs!ThePrime
            x = x + 1
        End If
        If Candidate = 1 Then Exit Do
    Loop
    If Candidate = 1 Then Exit Do
    rs.MoveNext
Loop

'ok, to get here what's left has to be a prime number > 10007
'Q. why does it have to be prime?
'A. If it wasn't, it would have a factor that was already tried

If Candidate <> 1 Then
    thefactors(x) = Candidate
End If

ReDim Preserve thefactors(x)

thefactors() = BuildTheFactors(thefactors())

'ok now I have the CompoundFactor
'now what's the first one equal to or greater than x?
rs.Close
db.Execute ("delete * from tblCompoundFactors;")
Set rs = db.OpenRecordset("select * from tblCompoundFactors;", dbOpenDynaset, dbSeeChanges)
For x = 0 To UBound(thefactors)
    If thefactors(x) <> 0 Then
        With rs
            .AddNew
            !ThePrime = thefactors(x)
            .Update
        End With
        'MsgBox thefactors(x)
    End If
Next x

rs.Close
Set rs = db.OpenRecordset("select top 1 * from tblCompoundFactors where theprime >= " & Me.Text3 & ";", dbOpenDynaset, dbSeeChanges)
Me.Text5.Value = rs!ThePrime

TheEnd = Now
MsgBox TheEnd & " " & TheStart
End Sub


Private Function BuildTheFactors(thefactors() As Long) As Long()
Dim CompoundFactors() As Long
Dim InThere As Boolean
Dim w As Integer
Dim x As Integer
Dim y As Integer
Dim z As Integer
Dim LastCompound As Integer
'ok, we take every element of the array passed in
'we multiply it by every value already in the output array
'when we are done we have the compound factors
ReDim CompoundFactors(500)
CompoundFactors(0) = thefactors(0)
y = 1
z = 0
LastCompound = 0
For x = 1 To UBound(thefactors)
    For z = 0 To LastCompound
        'check if thefactors(x) * CompoundFactors(z)is already in CompoundFactors()
        InThere = False
        For w = 0 To 500
            If CompoundFactors(w) = thefactors(x) * CompoundFactors(z) Then
                InThere = True
                Exit For
            ElseIf CompoundFactors(w) = 0 Then
                Exit For
            End If
        Next w
        If InThere = False Then
            If Me.Text1.Value Mod (thefactors(x) * CompoundFactors(z)) = 0 Then
                CompoundFactors(y) = thefactors(x) * CompoundFactors(z)
                y = y + 1
            End If
        End If
    Next z
    'check if thefactors(x)is already in CompoundFactors()
    InThere = False
    For w = 0 To 500
        If CompoundFactors(w) = thefactors(x) Then
            InThere = True
            Exit For
        ElseIf CompoundFactors(w) = 0 Then
            Exit For
        End If
    Next w
    If InThere = False Then
        CompoundFactors(y) = thefactors(x)
        y = y + 1
    End If
    LastCompound = y
Next x

'For x = 0 To UBound(CompoundFactors)
'    If CompoundFactors(x) = 0 Then
        ReDim Preserve CompoundFactors(UBound(CompoundFactors))
'    End If
'Next x

BuildTheFactors = CompoundFactors

End Function

Open in new window

It requires a table, tblSmallPrimes, of the first 1230 primes, and another table, tblCompoundFactors, to throw the results of the factorization in.

I could knock the VBA into T-SQL if you'd like. What's being done with tables, arrays and recordsets can get done with @variables, sprocs, and table variables.

Find attached the MS Access sample
LCM---Copy.mdb

Sean Stuber

ASKER

This sort of leaves a bad taste in my mouth but it works.

A simple loop d:= d-1 or n:=n+1 is very fast, I can walk up or down very quickly.

With a walk down limit of 100000 (100K) I can quickly do many (but not all) factors up to 10 billion. If I can't do that, but Y is a million or less, again, I simply walk up. Yes, in a really bad case I might walk hundreds of thousands of steps, but it's a very tight loop and fast.

If I can't do either of those, then I bite the bullet and read from a table of primes and do the big ugly powermultiset I had already ruled out earlier.
It's still bad, but it will only be used for monstrous values and only in the worst cases where a walkdown isn't possible.

It still has the limitation that I only have primes up to 10 million.; but that should be more than sufficient for any real world cases I might run into. I do have a fast prime generator and could use it instead of a fixed table and generate them on the fly if I really want to extend it, the limit then simply being memory, or, of course, just make a bigger table.

The 100K and 1M boundaries are just educated guesses. I don't think there is a "best" number for all cases but those seem to work for me on one of my beefier machines. On less powerful the brute force threshold is lower.

CREATE OR REPLACE FUNCTION next_factor(x IN INTEGER, y IN INTEGER)
    RETURN INTEGER
IS
    n INTEGER;
    d INTEGER;
BEGIN
    IF y / x < 100000
    THEN
        --- It's a small divisor to N, simply walk down to find it
        d := FLOOR(y / x);

        WHILE MOD(y, d) != 0
        LOOP
            d := d - 1;
        END LOOP;

        n := y / d;
    ELSIF y <= 1000000
    THEN
        -- As long as Y less than a million we can walk up pretty quickly
        n := x;

        WHILE MOD(y, n) != 0
        LOOP
            n := n + 1;
        END LOOP;
    ELSIF y <= 9999991 * 9999991
    THEN
        -- Y is too big and the factor is too far away to walk
        -- So lookup from primes
        SELECT MIN(factor)
          INTO n
          FROM (SELECT rn, ROUND(EXP(SUM(LN(f.COLUMN_VALUE)))) factor
                  FROM (SELECT ROWNUM rn, COLUMN_VALUE factors
                          FROM TABLE(
                                   POWERMULTISET(
                                       (SELECT CAST(COLLECT(n) AS ora_mining_number_nt)
                                          FROM primes_lt_10million
                                         WHERE n * n <= y AND MOD(y, n) = 0)
                                   )
                               )),
                       TABLE(factors) f
                GROUP BY rn)
         WHERE factor >= x;
    ELSE
        -- Too big, can't do it
        RAISE VALUE_ERROR;
    END IF;

    RETURN n;
END;

Open in new window

100 executions each...
next_factor(271111420, 99559468190); 00:00:00.001678000
next_factor(3, 99559468190); 00:00:09.473022000

1 million executions of
next_factor(x,y)

where y random integer between 1000 and 10000000000
x random integer between 1 and y

00:00:12.111905000

So, for very bad cases with very large numbers it slows down to about a tenth of second.
On average though it runs about one one-hundred-thousandth of a second.

by the way, except for very large numbers, it definitely IS possible to generate primes faster than you can read them. I created a table of primes less than 10 million simply because that was the rough tipping point where I couldn't outpace the io.

Nick67

I've been building a modified sieve of Eratosthenes to create the compound factors without reference to a table of primes. It is markedly slower than referencing the table

Private Sub Command9_Click()
'implement a modified sieve of Eratosthenes to find the first factor of a number that is larger than a second number
Private sub Eratosthenses()
Dim CompoundFactors() As Long
Dim theY As Long
Dim theSQR As Long
Dim x As Long
Dim y As Long
Dim N As Long

theY = Me.Text1.Value
ReDim CompoundFactors(theY, 1)

For x = 2 To theY
    If CompoundFactors(x, 1) = 1 Then
        GoTo alreadychecked
    End If
    If theY Mod x <> 0 Then
        For y = 1 To theY
            If x * y < theY Then
                CompoundFactors(x * y, 1) = 1
            Else
                GoTo bigenough
            End If
        Next y
    Else
        N = x
        If (N >= Me.Text3) And (theY Mod N = 0) Then
            Me.Text5 = N
            GoTo done
        End If
bigenough:
    End If
alreadychecked:
Next x

done:
MsgBox "done!"

End Sub

Open in new window

With a table of 10M primes, you can find the prime factors of numbers less than 100 trillion using the algorithm I posted! That REALLY ought to be enough for real world situations. Building the list of compound factors once you have the prime factors is computationally trivial. For numbers less than 2^31 you will have a walk of 2^15.5 (~46000) in the very worst case scenario to find all the prime factors. Worse case scenarios for finding compound factors for numbers in the same range are less that 525 factors.

Beats the tar out of walking a 100K loop, no matter how tight!

What do you have for prime generator code? Is it SQL? What I've seen so far is loading up an array of Y elements and then zeroing out the multiples of each prime in order, until you reach the upper bound of the array. Doesn't strike me as efficient -- and it doesn't perform nearly as well as looking up a table of the first square root Y primes.

I'll have to do a run of Y = 1K to 2 M and x = 3 to 5K with what I've got and see how long that takes.
And kick it over to T-SQL and see how that does.

Sean Stuber

ASKER

My generator is a modified Eratosthenes with the Atkin Modulus 60 check. I wrote it in pl/sql. If I was really serious about the prime generation speed I'd do it in c as an external procedure.

I seed the first few primes into a collection.
Then loop through odd numbers.
For each I do the Atkin Modulus 60 check.
If one of the possible remainders is found then I iterate through my collection of primes up until I either find a factor of the current number or I exceed the square root.

If I find a factor that number is not prime, otherwise it is and I add it to my collection.
I repeat until I fill a collection of all primes up to some limit.

Building the list of compound factors once you have the prime factors is computationally trivial.

Syntax wise, yes, that's what powermultiset does. Cpu, no. There's quite a bit of work involved generating the combinations. But I agree it's fast. That's why I cap my walk up loops at 1 million. Noting that is still only the 2nd choice.

The walkdown method is still the fastest and least resource intensive for many values.
For any X>Y/2, the walkdown is 1 step. So, half of all numbers, regardless of how large will always be trivial. For X>Y/3 walkdown is only 2 steps. So two-thirds of all numbers are trivial. X>Y/4 = 3steps, three-fourths of all numbers are still trivial, etc.

It's just a matter of finding that threshold where it stops being trivial and then do some real work.

Sean Stuber

ASKER

It's just a matter of finding that threshold where it stops being trivial and then do some real work.

I alluded to this earlier but it's critical to the evaluation. The "threshold" is entirely dependent on the hardware. The machine I was testing on last night, I'm willing to loop up to a million rather than do some io. One of the machines I could test on today, I'd probably cap my walk down limit at 20K and walk up limit on Y at 100K then switch to the more intensive prime and factor search.

Beats the tar out of walking a 100K loop, no matter how tight!

yes once you're done, but you have to absorb the work of finding those primes first (which can be helped by precomputing if very large) then generating the factors.

Even on the smallest box I've been using, the speed of any of these is still solidly sub-second. I really am quibbling out at the 4 and 5th decimal points and that's when I start looking at the cpu, io and latches involved. I quit including them in my results posts because they are measures that only apply within Oracle. If you're testing within other frameworks there would be similar metrics but it would be impossible to correlate them meaningfully.

So, some combinations of X ,Y ,version and hardware with a particular method will knock off an extra few microseconds, maybe even a whole millisecond and I do care about those gains; but if the cost of that speed is excessive resource consumption then I care about that too.

What I've got above with the "flip-flop" of switching algorithms seems, to me, to be the best. I still find it distasteful though.

All three methods are still basically a brute force exhaustive search. I was hoping for something a little smarter. Something that could put some computational caps.

Kind of like the walk down factors. Simply reversing the direction of the search has a huge impact; but then it eventually fails and walking up will be better.

Both of those will still fail though for large primes or numbers so large that there is a prohibitively large gap between factors.

Searching for primes lets us scan only a few numbers but requires a lot of work to find those numbers and it too is an exhaustive search. All possible factors will be generated.

Maybe a more "clever" approach isn't possible because at some point we are forced to do prime factorization and that forces me to impose an arbitrary practical limit rather than a platform limit and that's really what's bugging me. I don't particularly care that trillions will almost certainly be big enough. The platform can support numbers into the duodecillions and, with the walk downs, it is possible to successfully iterate over most values of X and Y. But "most" still leaves 9 decillion values that the methods above can't handle. 9 decillion is a pretty big hole, even if it is only 0.00001 of the entire population.

Another way to look at it - the walk downs with a 100K limit are already solving 99.999% of all values the platform can support. So, it seems like we're almost there. The walk up method is pretty dumb but still captures the bulk of the remaining "real world" values.

The primes are the most complicated and really, only extend the range of supported values by about 0.000000000000000000000001% of the entire population.

So, yes, the primes do make the "real world" realm larger; but since we were already 99.999% done, what they add in terms of total platform support is inconsequential.

We're essentially running into the Pareto principle taken to a few more decimal places.

PortletPaul

99.999% , darn it I'm a pragmatist I have not had to prepare a report involving duodecillions and hope I never will.

Grab a beer, switch off the brain, relax.

I'm enjoying the read however. Thank you.

Sean Stuber

ASKER

hmmm, just thought of a problem with my factor generation.

I'm not taking into account repeats.
Simply having all of the primes and multiplying them in various combinations isn't sufficient to get all of the factors.

Like in the 1024 example above, there is only one prime, but there are 10 factors because we reuse that same prime.

Since the prime search and factor generation doesn't really add much value, I'm tempted to simply remove them altogether and just let them fail.

Sean Stuber

ASKER

It's not so much that I want to support a particular large value, like duodecillions,
as simply not wanting to impose a smaller value.

I'd like the method to be viable up to the platforms limitations.
That is, I want Oracle's code to fail before mine does.

If that means supporting umpteen gajillion, then so be it.

I'm coming to the conclusion though, that it's not likely to be feasible. So then my quandary is: since I can't solve everything, how much complexity do I want to include simply to make a microscopically marginal improvement.

This puts prime and factor generation on the cutting block for low ROI.

Nick67

It's a little trickier than that.
You walk through the result set of the primes

1. You add the first element to the result set of the compound factors
2. Move to the next element of the prime factors
3. You multiply the elements in the compound factors by the present prime factor
4. If the result is not in compound factors, add it
5. When you've multiplied the present prime by all the factors in compound factors, see if the present prime is in compound factors. If it is not, add it
6. Move to the next prime
7. do 2-6 until you come to the end of the prime result set
8. Order the results

Then find the first one >= X

Here it is as VBA

Private Function BuildTheFactors(thefactors() As Long) As Long()
Dim CompoundFactors() As Long
Dim InThere As Boolean
Dim w As Integer
Dim x As Integer
Dim y As Integer
Dim z As Integer
Dim LastCompound As Integer
'ok, we take every element of the array passed in
'we multiply it by every value already in the output array
'when we are done we have the compound factors
ReDim CompoundFactors(1500)
CompoundFactors(0) = thefactors(0)
y = 1
z = 0
LastCompound = 0
For x = 1 To UBound(thefactors)
    For z = 0 To LastCompound
        'check if thefactors(x) * CompoundFactors(z)is already in CompoundFactors()
        InThere = False
        For w = 0 To 500
            If CompoundFactors(w) = thefactors(x) * CompoundFactors(z) Then
                InThere = True
                Exit For
            ElseIf CompoundFactors(w) = 0 Then
                Exit For
            End If
        Next w
        If InThere = False Then
            If Me.Text1.Value Mod (thefactors(x) * CompoundFactors(z)) = 0 Then
                CompoundFactors(y) = thefactors(x) * CompoundFactors(z)
                y = y + 1
            End If
        End If
    Next z
    'check if thefactors(x)is already in CompoundFactors()
    InThere = False
    For w = 0 To 500
        If CompoundFactors(w) = thefactors(x) Then
            InThere = True
            Exit For
        ElseIf CompoundFactors(w) = 0 Then
            Exit For
        End If
    Next w
    If InThere = False Then
        CompoundFactors(y) = thefactors(x)
        y = y + 1
    End If
    LastCompound = y
Next x

ReDim Preserve CompoundFactors(UBound(CompoundFactors))

BuildTheFactors = CompoundFactors

End Function

Open in new window

Sean Stuber

ASKER

thanks, I know how to do it. I just didn't.

but, as noted above, there's sort of no point to bothering now (I probably will anyway)

Nick67

Here it is as T-SQL, using the 1230 row prime table
You may be able to adapt and optimize it for your purposes.
And replace the prime table with code to generate those on demand

ALTER PROCEDURE [dbo].[qryFindFactor] 
	-- Sproc to find the first factor of Y larger or equal to X
               --Present upper limit Y < 100M
               
	@y int = 0, 
	@x int = 0
AS
BEGIN
	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.
	SET NOCOUNT ON;

    -- Insert statements for procedure here
    Declare @test int
    Declare @initialY int
    Declare @ThePrime int
    Declare @ThePrimeID int
    Declare @theMax int
    Declare @TheCompound int
    Declare @theProduct int
    Declare @theCompoundMax int
    Declare @TheCompoundID int
    Declare @theCompoundMaxID int
    Declare @PrimeFactors table (RowNum int IDENTITY(1, 1) UNIQUE CLUSTERED , TheFactor int)
    Declare @CompoundFactors table (RowNum int IDENTITY(1, 1) UNIQUE CLUSTERED , TheFactor int)
    Select @theMax = max(thePrime) from tblSmallPrimes
    select @ThePrime = min(ThePrime) from tblSmallPrimes
	select @initialY = @y
	while @ThePrime <> @theMax
		begin
			if @y % @ThePrime = 0 And @y <> 2
			begin
				Insert into @PrimeFactors(Thefactor) select @ThePrime
				select @y =@y/@theprime
			end
			else
			begin 
				select @ThePrime =	min(ThePrime) from tblSmallPrimes where ThePrime > @ThePrime
			end
		end
	if @y > @theMax
		begin
			Insert into @PrimeFactors(Thefactor) select @y
		end 
		
	--alright, that's the prime factors
	--now the compound factors
	--Load the first factor in 
    Select @theMax = max(rownum) from @PrimeFactors
    select @ThePrimeID = min(rownum) from @PrimeFactors
	select @TheCompound = min(TheFactor) from @PrimeFactors where rownum = @ThePrimeID			
	Insert into @CompoundFactors(Thefactor) select @TheCompound
	select @TheCompoundID = min(RowNum) from @CompoundFactors
	--and move to the next
	select @ThePrimeID = min(rownum) from @PrimeFactors where rownum > @ThePrimeID
	select @ThePrime = min(TheFactor) from @PrimeFactors where rownum = @ThePrimeID
	--now loop through until you're done	   
	while @ThePrimeID <= @theMax --And @ThePrimeID <> 0
		begin --the prime loop
			Select @TheCompoundMaxID = max(RowNum) from @CompoundFactors			
			--do all this when there are elements to test in CompoundFactors
			
			while @TheCompoundID <= @TheCompoundMaxID --AND  @TheCompoundID <> 0
				begin --the compound loop
				select @TheCompoundID = min(RowNum) from @CompoundFactors where rownum = @TheCompoundID
				--load the value
				select @TheCompound = min(TheFactor) from @CompoundFactors where rownum = @TheCompoundID		
				select @TheProduct = @TheCompound * @ThePrime				
				select @test = thefactor from @CompoundFactors where thefactor = @TheProduct
				if @@rowcount = 0 and @TheProduct <= @initialY				
					begin
					--pound it in
					Insert into @CompoundFactors(Thefactor) select @TheProduct					
					end				
				--do we have another compound factor?
				select @TheCompoundID = min(RowNum) from @CompoundFactors where rownum > @TheCompoundID
				end--end the compound loop
				--reset the compound loop
			select @TheCompoundID = min(RowNum) from @CompoundFactors						
		--see if our prime was already in the compound table variable	
		select @test = min(theFactor) from @CompoundFactors where thefactor = @ThePrime
		if @@rowcount = 0 and @ThePrime <= @initialY				
			begin
			--pound it in
			insert into @CompoundFactors(Thefactor) select @ThePrime				
			end	
							
		--move on to the next prime or bail
		select @ThePrimeID = min(rownum) from @PrimeFactors where rownum > @ThePrimeID
		select @ThePrime = min(TheFactor) from @PrimeFactors where rownum = @ThePrimeID
		Select @TheCompoundMaxID = max(RowNum) from @CompoundFactors
		end --the prime loop

	return (select top 1 Thefactor from @CompoundFactors where thefactor >= @x order by thefactor)
END

Open in new window

Sean Stuber

ASKER

Here's what it might look with all factors calculated from the primes...
A side effect of generating all factors is you'll get duplicate factors if you have duplicate primes.

For example 4 has 2 factors A and B. Both of which happen to be 2.
So A, B, AxB and BxA are the complete list factors of A.
Which means 2, 2, 4 and 4 will be in the list.
Even if you control the generation so you don't do BxA if you have AxB, you'll still get A,B,AxB or 2,2,4 - and a duplicate happens anyway.
It requires more work to try to find the distinct combinations than to simply sort them into the list when searching for the smallest greater than X.

But, as I've noted a few times before. The entire prime and factor generation effort is largely pointless, since it only applies in less than 0.001% of all inputs while not really extending the functional range in a significant way.

CREATE OR REPLACE FUNCTION next_factor(x IN INTEGER, y IN INTEGER)
    RETURN INTEGER
IS
    c_factoring_limit constant integer := 99999820000081;
    n             INTEGER;
    d             INTEGER;
    v_temp        INTEGER := y;
    prime_factors ora_mining_number_nt;
BEGIN
    IF y / x < 100000
    THEN
        --- It's a small divisor to N, simply walk down to find it
        d := FLOOR(y / x);

        WHILE MOD(y, d) != 0
        LOOP
            d := d - 1;
        END LOOP;

        n := y / d;
    ELSIF y <= 1000000
    THEN
        -- As long as Y less than a million we can walk up pretty quickly
        n := x;

        WHILE MOD(y, n) != 0
        LOOP
            n := n + 1;
        END LOOP;
    ELSIF y <= c_factoring_limit
    THEN
        -- Y is too big and the factor is too far away to walk
        -- So lookup from primes
        v_temp := y;
        prime_factors := ora_mining_number_nt();

        FOR i IN (SELECT n f
                    FROM primes_lt_10million
                   WHERE n * n <= y)
        LOOP
            WHILE MOD(v_temp, i.f) = 0
            LOOP
                prime_factors.EXTEND;
                prime_factors(prime_factors.COUNT) := i.f;
                v_temp := v_temp / i.f;
            END LOOP;

            IF v_temp = 1
            THEN
                EXIT;
            END IF;
        END LOOP;

        IF v_temp > 1
        THEN
            prime_factors.EXTEND;
            prime_factors(prime_factors.COUNT) := v_temp;
        END IF;

        SELECT MIN(factor)
          INTO n
          FROM (  SELECT rn, ROUND(EXP(SUM(LN(f.COLUMN_VALUE)))) factor
                    FROM (SELECT ROWNUM rn, COLUMN_VALUE factors
                            FROM TABLE(POWERMULTISET(prime_factors))),
                         TABLE(factors) f
                GROUP BY rn)
         WHERE factor >= x;
    ELSE
        -- Too big, can't do it
        RAISE VALUE_ERROR;
    END IF;

    RETURN n;
END;

Open in new window

Sean Stuber

ASKER

So, this is what I'm going with...

CREATE OR REPLACE FUNCTION next_factor(x IN INTEGER, y IN INTEGER, force_it IN VARCHAR2 default 'N')
    RETURN INTEGER
IS
    n                          INTEGER;
    d                          INTEGER;
BEGIN
    IF y / x < 100000
    THEN
        --- It's a small divisor to N, simply walk down to find it
        d := FLOOR(y / x);

        WHILE MOD(y, d) != 0
        LOOP
            d := d - 1;
        END LOOP;

        n := y / d;
    ELSIF y <= 1000000 OR UPPER(SUBSTR(force_it, 1, 1)) = 'Y'
    THEN
        -- As long as Y less than a million we can walk up pretty quickly
        -- If not, we can still force the algorithm to try
        n := x;

        WHILE MOD(y, n) != 0
        LOOP
            n := n + 1;
        END LOOP;
    ELSE
        -- Too big, can't do it
        RAISE VALUE_ERROR;
    END IF;

    RETURN n;
END;

Open in new window

It's the same basic flip-flip on super-fast walk-down for 99.999% of all combinations
with the remaining 0.001% being walk-ups.

By default I'll error out if I might need to walk up a value over 1 million.
But I added an optional 3rd parameter to allowing forcing it to exceed the limit and walk as far as necessary.

I conceded that for very large values this is impractical. However, walking up to a million is definitely a feasible option and covers all anticipated real-world values.

If I have to go larger then I'm stuck.
I can wait for small increases in Y; but no where near the platform limits.
The prime stuff, while very fun to play with, isn't a viable extension.

I'll leave the question open a while longer. I've tried everything suggested and I'm willing to keep doing so.

Nick67

Why do you stop at just 1 iteration when you hit a factor?

WHILE MOD(y, d) != 0
LOOP
d := d - 1;
END LOOP;

n := y / d;

Why not

WHILE MOD(y, d) != 0
LOOP
d := d - 1;
END LOOP;

WHILE MOD(y, d) = 0 And n>= x
LOOP
y := y / d;
END LOOP;

n := y / d;

You'd need to test that the division didn't lower n below x, but if you've found a factor, why stop until you've removed them all?

And really, as a preliminary, you should factor out 2, 3, and 5. Doing so eliminates at least 60% of the possible walking distance, maybe more.

The syntax is quite alien to me, but I hope you get the gist.
I wonder about computational expense

Is y mod x more expensive than y/x?
Is 1000 mod x more expensive than 1000000 mod x?
One wonders.

I was thinking about my T-SQL.
It is possible to generate the compound factors on the fly.
At present, it generates the prime factors first, and then the composite factors from them.
Doing them simultaneously is actually simpler.
I'll have to hack that together and try it

Sean Stuber

ASKER

WHILE MOD(y, d) != 0
LOOP
d := d - 1;
END LOOP;

n := y / d;

is correct for the walk down.

Maybe an example would help.

Y = 35
X= 9
expected result: N = 35

d = floor(y/x) = floor(35/9) = floor(3.88888) = 3

mod(y/d) = mod(35/3 ) = 2
2 != 0
so d = d -1 = 3-1 = 2

mod(y/d) = mod(35/2) = 1
1!= 0
so d= d -1 = 2 -1 = 1

mod(y/d) = mod(35/1) = 0
0= 0
exit loop

n = y/d = 35/1 = 35

another example

Y=35
X = 3
expected result N = 5

d = floor(y/x) = floor(35/3) = floor(11.666666) = 11

mod(y/d) = mod(35/11 ) = 2
2 != 0
so d = d -1 = 11-1 = 10

mod(y/d) = mod(35/10) = 5
5!= 0
so d= d -1 = 10 -1 = 9

mod(y/d) = mod(35/9) = 8
8!= 0
so d= d -1 = 9 -1 = 8

mod(y/d) = mod(35/8) = 3
3!= 0
so d= d -1 = 8 -1 = 7

mod(y/d) = mod(35/7) = 0
0 = 0
exit loop

n = y/d = 35/7 = 5

I don't need to additional looping on d doing a walk down
Just like I don't need to additional looping on a walk up.

It's essentially the same thing, except walk down has the benefit of rapid reduction for the majority of values.

I can do arbitrary prime generation on the fly as well. That's not the problem.
The problem is it simply doesn't help much.

Let's imagine we have a really slow computer and looping will be too expensive.
So, instead of doing walk downs for y/x < 100000 I reduce it to just y/x < 10.

That still means 90% of all (x,y) combinations will be resolved by the walk downs in 9 steps or less.
So even with that exaggerated simplification it already becomes hard to justify using prime calculations.

Sean Stuber

ASKER

oops, I skipped one of your concerns...

You'd need to test that the division didn't lower n below x,

That's not a problem because....

Since Y and D are both positive integers

Then Y/D < Y/(D-1) < Y/(D-2) < Y/(D-3) etc

So the factors keep getting bigger and bigger until we finally reach our N.

ASKER CERTIFIED SOLUTION

Sean Stuber

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Nick67

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Nick67

Also,

On the stupid big front--which is faster for Y = 32452843 X =7
Walk up from 2 to SQRT(y) --> if there's no hit it must be prime

Or

generating prime factors and seeing that it is prime.

If 100K is your walk up limit, the 100K * 100K is the top end of walkup

SOLUTION

Nick67

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Sean Stuber

ASKER

>>> I'd still look to remove factors of 2, 3 and 5 from y here

I'm not sure where you're going with that.

There is virtually zero effort in including in the prime generation. 2 is a seed in the list because it's special as the only even. 3 is another seed because it's essentially the starting point since I only iterate on odd numbers. So 5 is the first value to actually be evaluated and it's immediately identified because I never need to look at 2 and 3*3 > 5 so 5 is a prime in one step.

Walking by these can certainly be faster if the the N I'm looking for is itself one of these factors; but that's not particularly helpful.

A simple example, 194 = 2x97. I can't walk increments of 2,3 or 5 or any other prime to find my factors. Other numbers with other factors have the same issue. Just because Y is divisible by 2, 3 or 5 doesn't mean the "next" factor from X is necessarily a multiple of those.

Maybe I'm not understanding what you're trying to say.

-----------------------

Walking up from 2 to sqrt(y) is slower than primes, depending on the specific values but being "dumb" it consumes few resources. The primes takes more resources to get the speed. Which is "better"? I'm not sure. Probably the primes since it can extend to larger numbers before it too become infeasible.

I do cache my primes, so multiple runs are faster, especially if Y is smaller than previous runs so it can reuse the cache and doesn't have to generate anything new. But that benefit is also a penalty because it means I'm consuming memory that I'm not going to reuse.

The primes "can" be faster for some combinations but not all, and, if it's a first time run, the cost and time of initial generation puts them behind.

It would be probably be possible to extend the IF/ELSE chain into a bunch of categories that would each have their own optimal search pattern. This would be ideal.

However, what I keep coming back to is I've already got a solution that covers 99.999% of all possible combinations.

From a real world implementation perspective:

I have almost zero incentive to add complexity for that remaining 0.001% ESPECIALLY since the vast majority of the remaining values aren't solvable regardless of the implementation.

From a purely academic perspective:

I have slightly more than zero incentive to pursue these other options; but, in all cases we're simply "walking". Walking up, walking down, walking through primes in order to generate factors then walking through those.

So walking isn't particularly interesting anymore because, despite any marginal speed improvements, it's still too limited and basically rehashing what's already been mentioned.

For pure speed, including support for the largest real numbers I might hit, the best option would be to do the walk down for some threshold (100K seems ok, but might not be optimal) and then precalculate all of the prime factors to some large value as shown in one of the early suggestions. That would let me calculate to the square of that number.

It's not particularly sophisticated but that's ok. The direction of the several iterations has been to add a lot of complexity for only marginal gains. I don't like using a table of stored values; but I dislike even more the idea of caching in memory, potentially millions of values, or regenerating those values every time.

With a little analysis it would be possible to filter the table of pregenerated factors and remove all entries that would be viably solved by the simple walk down.

As long as I keep in mind that the "bad" solutions are only for a tiny minority of inputs I don't have to care much.

What I really, really want is a method to directly calculate some set of boundary conditions that I could apply to either the walking or the lookup table.

Without a boundary I'm always going to be doing some sort of "dumb" looping because there won't be a way to make a smart decision of what to skip or when to stop.

Nick67

I'm not sure where you're going with that.

Consider Y = 1024002 and x = 31
Y is even, so you are checking with
WHILE MOD(y, d) != 0
LOOP
d := d - 1;
END LOOP;
because n may or may not be even. You start that walk at 33032 and walk down ~8650 steps to get 42 for a result

divide out 2 though and you get 502001, which is odd.
now you can check with
WHILE MOD(y, d) != 0
LOOP
d := d - 2;
END LOOP;
after ensuring that the initial d is odd -- because the quotient of an odd denominator and odd numerator is also odd. By ensuring that the starting value you walk down is not a product of two, you can double the speed of the walk AND cut the spread in at least half

Making the starting point the odd number 16517 (floor(502001/31) is even) and decrementing by 2, the solution is found in~2300 steps, almost a 4 fold improvement.

Removing factors of 3 (there are 5 of them) leaves 2107 -- which is not divisible by 5 -- so we could build a check that wouldn't bother with d where mod(d,3) = 0 or mod(d,5)=0 -- but since that's more work than actually checking every value, there isn't much point

Floor(2107/31) leaves 67 -- nice and odd -- so decrementing by two, in eleven steps we get to the expected 43

Now, 2107 is also divisible by 7, leaving 301. That leaves a d less than x--which means a walk up from 31 by two is in order. 6 steps later, 43 is achieved.

So, eliminating all factors of 2, 3, 5, and 7 -- which cost 8 mod operations and one addition operation to esnsure that the start number was odd, netted us a result in ~15 steps instead of ~8650 steps.

Ensuring that the start value is odd allows decrementing by two. That's clearly worth doing. I think the risk that Y isn't a factor of 2,3,5 and 7 is worth taking. At worst it costs 4 mod operations. At best, it saves thousands.

Nick67

Two perhaps final points.
One
Distribution of primes follows a law 1/LN(n) where n is the number of digits in the number.
The odds of a given n digit being prime are 1/LN(n).

Walking all odd values (because I think I've conclusively shown that incrementing/decrementing should be by two) is SQRT(Y)/2 operations

Walking the primes is (1/LN(LOG10(Y))) * SQRT(Y)

You can plot these. They do cross between 10M and 100M, so depending on your machine and the cost of generating primes, the cutover is in that range

Two

I have slightly more than zero incentive to pursue these other options; but, in all cases we're simply "walking". Walking up, walking down, walking through primes in order to generate factors then walking through those.

That why they use it for cryptography, because there aren't better ways to do it--and folks have gone batty over factoring since Eratosthenes and before.

:)

Nick67

Nick67

I wasn't quite on with ID: 39489324
If Y is odd, N must be odd
If Y is even, you take out factors of 2 until the result is odd.
You can then walk down by two's until you get the lowest ODD factor
What I missed is that you then have to walk down by one's from the lowest odd factor to X, because there may be a lower even factor.

Details. I can see why this stuff has driven people mad since ancient times.

For any Y where you are going to factorize, it isn't necessary generate all the compound factors. Everything else has been predicated on walking down from an arbitrary value larger than x. Why not do that here too? Walk the primes up. Have a variable that starts at 1 and gets multiplied by each factor found. The instant that variable > x, walk it down by one's to x.

That saves the expensive step of generating all the compound factors

Sean Stuber

ASKER

I can't simply walk through primes multiplying as I go.

Let's take something simple like

Y = 42 = 2 x 3 x 7

X = 15

2 x 3 = 6
walk to next prime 7
2 x 3 x 7 = 42

I skipped N=21, the answer I was looking for.

So, yes, I could potentially skip some compound factors on the high end.

If X = 12, I need to look at 2x7; but I could skip 3 x7 and 2x3x7

So I do need to generate intermediate combinations.

as to previous posts

Definitely walking by 2 seems reasonable if Y is odd.

I haven't had time to think about walking by 2 when Y is even or 3,5,7, etc.
Probably won't until Sunday

I have no idea where you were going with the cryptography reference.
I know prime generation is used in many of the algorithms; but other than that I'm not sure there is anything to relate to this task. Sure, loops involved in both; but I don't think that's sufficient to make a correlation.

Nick67

From way back at ID: 39478356
For something that you want to attack with the primes, you don't need to generate all the composite factors. All we really want is an arbitrary value for d such that d>=x and mod(y,d) = 0

So, for Y =1M X =23, let d =1 and multiply d by each factor removed until d>x
1M/2 = 500K, d= 2
500K/2 = 250K, d=4
250K/2 = 125K, d=8
125K/2 = 62.5K, d=16
62.5K/2=31250, d=32

d>x, so now, I don't know the syntax, but we'd like to save d say as di, in case we fluked into the correct value, and run the loop.

di integer; (is that the syntax?)
di = d;
WHILE And d >x
LOOP
IF MOD(y, d) != 0
THEN
d := d - 1;
ELSE
di = d;
END IF;
END LOOP;

n := y / di;

So, 32 was overshoot, but we walked it back to the correct answer of 25.

Gets the job done without POWERMULTISET, which is something you wanted.

Nick67

For big numbers, say, Y = 99499799 and x = 6
Start d at FLOOR(SQRT(Y))+1 (that equals 9975)
Walk it down. When you hit a factor, set d = FLOOR(SQRT(d))+1 and keep walking down

di integer; (is that the syntax?)
di = d;
WHILE d >x
LOOP
IF MOD(y, d) != 0
THEN
d := d - 1;
ELSE
di: = d;
d:=FLOOR(SQRT(d))+1;
END IF;
END LOOP;

n := y / di;

This alternative doesn't need the primes.
North of 1T, the primes may be faster, depending on relative costs of generating and accessing the primes vs sqrt and mod

Nick67

There's still bugs in this, but I am too fried to find them at the moment
Here is is though, to grab the logic of it
-------------------------------
----Here's the squares method as T-SQL
      @y decimal(38,0) = 0,
      @x decimal(38,0) = 0
AS
BEGIN
      -- SET NOCOUNT ON added to prevent extra result sets from
      -- interfering with SELECT statements.
      SET NOCOUNT ON;

-- Insert statements for procedure here
      Declare @d decimal(38,0)
      Declare @n decimal(38,0)
      if Floor(sqrt(@y)) >= @x --square root big enough?
            begin --yes
            select @d = Floor(sqrt(@y))+1
            --select @d as 'dropRoot'
            end
      else
            begin
            select @d = @y-1
            end
      select @n = @d
      while @d >= @x
            begin
            while (@y % @d) <> 0
                  begin
                  --select 'dropping' as huh
                  select @d = @d-1
                  --select @d as 'dropOne'
                  end
            --hit a factor
            select @n=@d
            if Floor(sqrt(@d))+1 >= @x --square root big enough?
                  begin --yes
                  select @d = Floor(sqrt(@d))+1
                  --select @d as 'dropRoot'
                  end
            else
                  begin --no
                  if (@y % @d) = 0
                        begin
                        select @d = @d-1
                        --select @d as 'Bump Down'
                        end
                  while (@y % @d) <> 0 and @d >= @x
                        begin
                        --select @d as 'PreRootTooSmall'
                        select @d = @d-1
                        --select @d as 'PostRootTooSmall'
                        end
                  end
            end
return @n

BigInt is limited to an integer of 2GB. Declaring decimals lets me get a bit bigger. SQL Server Express won't return a SQRT larger than the 2GB BigInt limit though :(

All tests below 4GB for Y come in at ~0.026 seconds, so that seems to be the system baseline. It's subsecond for all Y > 4.5T. North of that performance goes south quick.

And things go south when Sqrt(y) > x
So you have to check that the square roots of things used remain greater than x
So this bit is for occasions where Y is very big, but Y/X > sqrt(y)

Are you tired of your question and me yet?

Nick67

What I really, really want is a method to directly calculate some set of boundary conditions that I could apply to either the walking or the lookup table.

One is plotting (SQRT(Y) - x)/2 vs (1/LN(LOG10(Y))) * SQRT(Y)
The left term is the cost of walking all values, the right is the likely cost of walking the primes.
'Ok given a Y and X what are our choices for algorithms to pursue?

'One: We can have a largish Y and small X
'If it is odd, walking down from floor(y/x) will get the largest factor, and thereby the smallest

'Two: We can have a Y and a large X, larger than SQRT(Y)
'walking up from x by ones, if y is even, or two if odd
OR
'walking down from SQRT(y), depending on the spread between x and SQRT(y)
'The desired factor will not be optimally found by method one.

'Three: We have a very large Y, and a small x
'Find the nth even root of y that is still larger than x
'walk down to x, looking for a candidate
'if we find one great, if not, walk up from that root

'Four: We have a humungous Y, larger than 100B
'prime factorization of Y until the product of primes exceeds X
'Then walkdown from the product to x
'if no candidate is found, then walk up from x

'five: maybe worthwhile in every case
'prime factorization of 2,3,5,and 7 from Y
'apply the applicable case above to the result

Sean Stuber

ASKER

Thanks for the fun discussion. I'm still going with my basic walkdown as the first pass, with checks for odd/even and some possibly some simple prime checking.

I'll use other prime checks for other limits though. I don't have code to post and probably won't for a long time; but I'm content with the status as it is now.

The remaining optimizations will be playing with thresholds vs hardware/versions to determine which methods to use when. So, even if I did have a version to post right now, it would only reflect a single platform and thus not be ideal for everyone.

For PAQ value, I think there is sufficient direction here for others to pursue their own methods, including port to non-oracle platform. On a pure speed note, I might go ahead and port to "c" and run it as an external procedure. I suppose if I did that I could link in libraries for math of arbitrary size.

That might be fun. If I do that, it may be warrant a new question.

Thanks again.

Nick67

Thank you,

It isn't often that one gets the chance to discuss the kind of math we were playing with outside an academic environment, throw in platform optimizations and hardware underpinnings. Glad you found what I had to offer useful.

Nick67