Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 336
  • Last Modified:

Fast Search Algorithm in C++

Dear All,

I want to ask for your opinion of a search algorithm.

Suppose we have 3 records containing the following data:

Record 1:  100 200 300 400 500
Record 2:  100 300 500
Record 3:  100 400

Then what is the fastest approach to find how many data coexist in both 3
records? Like this example data 100 coexist in both 3 records. I use
exhaustive search so that the performance is very slow. I hope you can give
some opinions to me.

1 Solution
Mayank SAssociate Director - Product EngineeringCommented:
Since all your records are sorted, you can use binary search.

johnnypoonAuthor Commented:
In record 1, we have 5 records. For each of the the data in Record 1, we need to find Record 2 and Record 3 once in order to find their existence. Therefore in this example, we have to access Record 2 and 3 for 5 times, which is redundant.

I think binary search can only decrease the time of searching each record, but I would also like to decrease the no. of times re-searching each record.

I only need the no. of data that coexist in those 3 records only, no need to know about what the data is.


If memory is not an issue or there are relatively few different objects that can be in a record, why not maintain an array of the frequencies, updating when records are added, deleted or modified?

This would be O(n)

The easiest method is probably mayankeagles suggestion
int GetFreq(int i){
    n = 0;
    for all records{
        n += binarysearchrecord(i)
    return n;

Harder possibilites would be caching answers temporarily if similar requests are asked frequently
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Assuming that all your records are sorted this is the approach you can do. I will assume that you only have three, if you have more than 3 it can be slightly modified later.

Also, I will assume that there are two values MIN and MAX such that for any value from any record x it is always the case that MIN < x < MAX. Whenever we read from a record and there are no more values (we have reached the end) we set the value read to MAX.

The algorithm uses 5 variables.

P1, P2 and P3 hold values from each of their records at any time. The value is V if the record has no more data.

M hold the maximum value of P1, P2 and P3 at any time. If any or all of the records are at the end M will have the value V.

S is initially empty and will hold the set of common values found so far and at the end of the algorithm is the end result.

The basic idea is as long as a record's next value is lower than M, read a new value from the record. This might possibly change M, if so the record's value must then equal M and you move on to the next record.

If all three records are equal to M you insert M into S and advance all three records.

This makes a fairly simple state machine:

State 0: /* Initialize.
    Set S = empty_set;
    Set P1,P2,P3,M to MIN (all values set to MIN).
    Go to state 1.

State 1: /* Read new value into P1 */
   Read P1
   If P1 < M go to state 1 (read a new value from P1).
   if P1 == M go to state 2.
   /* P1 > M */
   If P1 == MAX go to state 7 (we're done).
   Set M to P1.
   Go to state 2.

State 2: /* P1 == M */
    If P2 < M go to state 3.
    If P2 == M go to state 4.
    /* P2 > M */
    set M to P2
    /* Now we have that P1 < M again, read P1. */
    Go to state 1.

State 3: /* P1 == M, P2 < M */
    Read P2
    If P2 < M Go to state 3. (read a new value).
    If P2 == M Go to state 4.
    /* P2 > M */
    If P2 == MAX go to state 7 (we're done).
    set M to P2
    Go to state 1.

State 4: /* P1 == M, P2 == M */
    if P3 < M go to state 5
    if P3 == M go to state 6
    /* P3 > M */
    set M to P3.
    /* Now we have P1 < M again, read P1 */
    Go to state 1.

State 5: /* P1 == M, P2 == M, P3 < M */
    Read P3
    If P3 < M go to state 5. (read a new value).
    If P3 == M go to state 6.
    /* P3 > M */
    If P3 == MAX go to state 7 (we're done).
    set M to P3.
    Go to state 1.

State 6: /* P1 == M, P2 == M and P3 == M */
    /* MIN < M < MAX */
    Insert M into S.
    Set P1,P2,P3,M to MIN.
    Go to state 1.

State 7:
     /* S has the set in sorted order */

Note, this algorithm runs through each record only once and not even that much, if one record is empty it will ignore any later values in the other records since they necessarily must be larger than the largest value found in the record that was exhausted.

To change this for a N sets for input is rather simple, you will get more states in the state machine but other than that it is simple and the change is obvious.

Personally I wouldn't call it 'record'. It looks more to me like 'streams' or 'sequences' since they are sequences of values and they can hold a different number of values in each.

A 'record' is in my view a fixed length thing which has a fixed number of elements in the sequence.

It was therefore hard for me to write this description so if I above use the term 'stream' anywhere I most likely meant 'record' to keep it with your terminology but it is possible I have slipped and used the term 'stream' from time to time.

Well, it depends on where the data are residing,
how much RAM you have available,
whether you have sequential or random access to the records, whether they are in memory, on tape, on disk, and whether the data are sorted within the records, or sorted by record length, whether you're allowed to pre-scan the data and build indexes or hashes, or whatnot.  

Waay too many unknowns to give any kind of coherent answer.  maybe you could provide more details?

Some general observations:

If you don't have any info about the records, you're going to have to scan all fields of all records.

The algorithm I provided only require sequential scan of each of the records (or streams or sequences as I would prefer to call them) even if they are all in the same disk file you can easily manipulate seekg() to switch between them, just keep one file position for each of the records, so when you want to read P1 you do seekg(P1_pos), read the value and store it in P1 and read the new position after the value and store it in P1_pos.

johnnypoonAuthor Commented:
Thank you Alf, yours algorithm is pretty fast.

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now