Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 451
  • Last Modified:

O(log n) queries for subsequence problem

Hello,

I have to solve the following problem in O(log n) for both updates and queries... the problem statement goes like this:

Given n numbers, answer m questions like this: what is the value of the highest sum subsequence in the interval [A, B], 1<= A <= B <= n?

My idea is to use an interval / segment tree, but I'm not sure how to build, query and update it...

I'd appreciate any pointers. I don't want any other code except maybe pseudo-code, I'm only after some ideas about how to solve this efficiently...
0
Dastas
Asked:
Dastas
  • 7
  • 7
  • 2
1 Solution
 
harfangCommented:
As stated, the answer is alway: Sum from i=A to i=B of xi where xi>0.

The way you incoporate A and B in the sequence 1...n, it indicates that these are indices, and not values. So [A,B] isn't an interval per se, but a subset of X.

Note that is doesn't really matter. If [A,B] does in fact represent an interval:

    A <= xi <= B, where i = 1...n

This still creates a subset of k numbers, from which the largest subset sum is obtained by taking only the positive values, or none if all are negative (as a subset can be empty).

I'm probably missing the point, though.

(°v°)
0
 
ozoCommented:
what was your idea for updating the interval / segment tree?
0
 
DastasAuthor Commented:
harfang:

A and B are indices. I need the maximum sum subsequence of the sequence [1, n], subsequence that is between the indices [A, B].

For example, considering the following sequence of numbers:

1 4 -2 -1 6 5 -20 4 5

The highest sum subsequence in the interval [1, 4] is 5 (the numbers 1 + 4)
The highest sum subsequence in the interval [1, 5] is 8 (sum of all numbers. note that, since this is a subsequence, all numbers composing it must have consecutive indices, so you can't leave out the negatives, unless you leave 6 out too, which would give a lower sum)

Consider that, if there are just negative numbers in the sequence, the answer is the highest one. For example, given the query [7, 7], the answer would be -20.

ozo:
Well, I thought of building it like this:
Let A be the array of the given n numbers.
It's trivial to calculate the value for the leaves of the tree, since those are intervals of the form [X, X]. That is, their value is always A[X]. So, once these are calculated, due to the recursive nature of the process, the value of their father in the tree is calculated like this:
The value is one of three things: the value of the subsequence in the interval represented by the left son, the value of the subsequence in the interval represented by the right son, or, a subsequence with elements from both of those intervals. It is obviously the maximum of these three cases. This is where I got stuck, however. If it's the one with elements in both intervals, how do I compute that?
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
harfangCommented:
Ah, that makes sense. I misread "subset" for "subsequence".

If k is the number of elements between A and B, you have (k * (k+1)) / 2 possible subsequences. There is no elegant way to build a tree with them (in your example above, each node would have two fathers), so I can't think of a recursive algorithm.

So it's basically a two-variable nested loop problem. There might exist some borderline optimization in eliminating fringe negatives (provided there is at least one positive), but that probably irrelevant.

In pseudo code:

result = A[min]
for i = min to max {
    this = 0
    for j = i to max {
        this += A[j]
        if (this > result) then {
            // record this, i, j
            }
        }
    }

Perhaps I'm missing something else, now...
(°v°)
0
 
DastasAuthor Commented:
harfang:

You are. That is very slow when I can have M ( M <= 100000 ) such queries. You algorithm's complexity is O(N^2 * M) in the worse case, which, as you can see, is huge.

In fact, there's an easier O(N) dynamic programming algorithm, but that's too slow too when you have M = 100000 and N anything higher than 100. I need an algorithm that can solve the problem in O(M*logN) total complexity, that is, determine the value of the subsequence in logarithmic time. I'm not sure if it's possible to do this with interval trees, or if there's an easier way, it was just an idea. I'm open to any solution that can solve the problem in that complexity.
0
 
harfangCommented:
OK, I see your point. The nested loops do need (n*(n+1))/2 tests at the worst (k=n). Do you have a pointer to the O(n) algorithm? Anyway, here is another overall optimization, especially good for non-random numbers. Transform:

    1 4 -2 -1 6 5 -20 4 5

To alternating positive/negative sequences:

    [1 4] [-2 -1] [6 5] [-20] [4 5]
    +5 -3 +11 -20 +9

And consider only positive-to-positive sequences. This reduces the number of tests from 45 to 6, in this case.

There is again the borderline optimization: given that the max positive is 11, the "cost" -20 to include the trailing 9 is too high: it can be eliminated.

(°v°)
0
 
ozoCommented:
You can find the max subequence in [1,n] in O(n), and you can lookup a table of [A,B] answers in O(log n), but updating the table in O(log n) when you update one of the numbers seems trickier...
0
 
DastasAuthor Commented:
harfang:

The O(n) dynamic programming algorithm for finding the maximum sum subsequence is this:

Let A be the array of the initial numbers, let B be an array such that B[i] = maximum sum subsequence ending at position i.

B[1] = A[1]
B[i, i > 1] = max(B[i-1] + A[i], A[i]) (you can either append A[i] to the already-computed maximum sum subsequence, or take A[i] alone, depending on which gives a better sum. This also works if all the numbers are negative.)

This can be used to answer each query in O(n) time, resulting in total O(n*m) time, but this is still too slow for my needs... I need O(m*logn).


ozo:

Can you explain in a little more detail how I would go about creating that lookup table? I can't just store the answer to each query in an array, since that isn't helpful either in the worst case where all queries are unique, even if I don't have to do updates (by updates I mean changing one of the initial numbers with another)...

Here's what made me think about interval trees:
Consider the problem of finding the sum in an interval [A, B] in O(log n). Consider updates as well. This can be done with interval trees, like this:

For every leaf node, the sum is always A[interval_associated_with_that_node]. Therefore, for every father, the answer is Interval_tree[left_son] + Interval_tree[right_son]. This is how the tree is built.

Now, for the queries: Consider the query [X, Y]. Add to a global variable S the value associated to all nodes whose associated interval is incloded in [X, Y]. Do NOT go deeper down that path in the tree (so as not to add the same values twice). At the end of the querying process, all nodes whose reunited associated intervals form the interval [X, Y] are checked, and so the variable S contains the sum of the interval [X, Y]. The worst case complexity of this is O(log n), since an interval tree has log n nodes.

Updating is done the same way. Recursively go down the paths that include (or might end up including) the indice of the updated element. Complexity is O(log n), worst case, but can be optimised since not all paths need to be checked, only those that the changed element affects. Alternatively, just call the build function again, but that has log n complexity ALL the time.

Maybe there's a similar solution to this one? I just can't come up with something that works...
0
 
DastasAuthor Commented:
Hello again.

It's been a week now since there has been no reply. If I am not going to get an answer, could I at least get a refund?

Thank you.
0
 
harfangCommented:
I have been thinking about your question.

Since you will run several queries (m = 100000) on the same set, the optimization involves creating a good abstract of your data.

Continuing from {http:#19584985}, you can first summarize your data in alternating positive and negative blocks.

    [1 4] [-2 -1] [6 5] [-20] [4 5]
    +5 -3 +11 -20 +9

You can then examine each negative block and, if the absolute value is smaller than both bordering blocks, summarize three blocks as a new one. This reflects the fact that the cost of the negative block is covered from both sides (for subsequences being built from both sides)

    [+5 -3 +11] -20 +9
    +13 -20 +9

You can also examine each positive value and do the same when the value is smaller than the absolute value of the negative blocks on both sides. It's never worth it to take that positive block *alone* when building a sequence from either side.

By repeating this process, you end up with a sort of tree, or rather a forest, with the stems having alternating positive and negative values that can no longer be compressed.

When searching the highest subsequence for [A,B], blocks not affected do not need to be recalculated. In the example above, all sequences which contain the first block will have the same answer. Also note that each block's value is the highest possible subsequence within, so you don't need to open it at all when you can discard it using the total. Likewise, you can discard negative blocks on either side (provided you have one positive block) because you cannot create a positive subsequence from it starting at either side.

You query engine will thus:

Get "stem" blocks for A,B. If it's a single block, look inside until there are several blocks. (If you end up with a pure first-level positive block, return A,B, a pure first-level negative block, return the highest between A and B)

In the list of blocks, discard negative blocks from both sides entirely, "open" positive block containing A or B, until you end up with a sequence of alternating blocks.

Then apply your O(n) strategy, using the entire blocks.

Does that help?
(°v°)
0
 
DastasAuthor Commented:
What if all the numbers are positive or negative? Or if the difference between the two is big? That won't improve it much, still O(queries*n) total...

I know for sure there's a O(queries*log n) worst case solution... you solution however can easily approach liniar complexity if the numbers negatives and positives aren't relatively the same...

And what about updates? Doing those precomputations takes liniar time, I can't afford to do any operation in liniar time (except maybe once, but queries and updates need logarithmic time)...
0
 
harfangCommented:
If all numbers are positive, it's the "first-level positive block" rule: return A,B (the largest subsequence is to take all).

If all numbers are negative, it's the "first-level negative block" rule: return the largest value in the range (yes, that's linear time unless you index the block). [Formally, the solution for all negatives should be 0, but you said you didn't want that.]

The problem is interesting only with a mixture of positive and negative values.

Large differences do not affect the logic. Given three blocks (alternating signs), their absolute values can be in four forms (disregarding equals for now):

    10 9 8 (decreasing); 8 9 10 (increasing); 8 10 9 (local max); 10 8 9 (local min)

Only the last case will allow a new block to be created. But that means all small blocks not on one end will be included in higher-level blocks. You will end up with a regular increase of absolute block values to a maximum, and a decrease after that. It will eliminate all local minimas.

I don't know if the allows a faster algorithm in the end. But in any case, it's independent from the absolute differences in the values.

An update potentially requires the entire tree to be recomputed, that is true. But you wanted to answer 100000 queries on the same set, didn't you? Preprocessing is thus worthwhile. But I realize that my suggestion is a form of overall indexing of the data.

(°v°)
0
 
DastasAuthor Commented:
Yes, but I can also have quite a lot of updates - which would be considerably slow, recomputing everything each time.

Anyway, I still don't think your idea approaches logarithmic time, however, I'll try it, and if I don't get any better answers in the meantime I'll award you the points.
0
 
harfangCommented:
Thank you.

> I can also have quite a lot of updates

I'm not entirely certain that a single update needs a full recalculation. You might find that it can be updated recursively starting from the new leaf, but I think it could potentially affect 1/2 of the tree in the worst case. I'd need to test it myself to make sure, though.

> I still don't think your idea approaches logarithmic time

There is the overhead of the pre-calculations, and that of the final O(k) algorithm on the resulting blocks. But the number of blocks to consider (k) is in a log relation to n.

As a matter of fact, I'm not sure there *is* a final O(k) algorithm at all.

Remember that when A or B falls in a negative block, it's skipped entirely (with a 50% probability). When it falls in a positive block, that block needs to be processed ("opened") only as long as the resulting total is larger than the next negative block in the subsequence (which has a 50% initial probability). When the total goes below, the block is skipped along with the next negative block.

That rule: "if a positive block on the end of the sequence is smaller than the absolute value of the next negative block, skip both" is especially easy to apply as the block values are locally sorted.

For example, consider this overall "stem" sequence, which no longer has any local minimas (if there were, it would become a new block):

    +1 -2 +3 -4 +5 -6 +7 -6 +5 -4 +3 -2 +1

Given A and B, it's easy to find the largest block, as you know where the maximum is and where the sequence is ascending and descending. If A falls in the -2 block and B in the +7 block, you only have to consider the +5 block and explore the +7 block. That's all. In most cases, your answer will be a single block (which is intuitively satisfying). For example, whenever the range contains the +7 block, that's the answer.

> I know for sure there's a O(queries*log n) worst case solution...

If you ever find it, keep me informed; I've grown curious. If I had the time, I'd be programming already...

Good luck!
(°v°)
0
 
DastasAuthor Commented:
OK, points awarded to harfang. Your idea did speed things up a little. I still haven't found a log n solution.

Thanks!
0
 
harfangCommented:
Thank you, and success with your quest!
(°v°)
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 7
  • 7
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now