Solved

# Fast Search Algorithm in C++

Posted on 2003-03-20
Medium Priority
330 Views
Dear All,

Suppose we have 3 records containing the following data:

Record 1:  100 200 300 400 500
Record 2:  100 300 500
Record 3:  100 400

Then what is the fastest approach to find how many data coexist in both 3
records? Like this example data 100 coexist in both 3 records. I use
exhaustive search so that the performance is very slow. I hope you can give
some opinions to me.

Johnny
0
Question by:johnnypoon
[X]
###### Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points

LVL 30

Expert Comment

ID: 8172525
Since all your records are sorted, you can use binary search.

Mayank.
0

Author Comment

ID: 8172653
In record 1, we have 5 records. For each of the the data in Record 1, we need to find Record 2 and Record 3 once in order to find their existence. Therefore in this example, we have to access Record 2 and 3 for 5 times, which is redundant.

I think binary search can only decrease the time of searching each record, but I would also like to decrease the no. of times re-searching each record.

I only need the no. of data that coexist in those 3 records only, no need to know about what the data is.

Thanks.

Johnny
0

Expert Comment

ID: 8172860
If memory is not an issue or there are relatively few different objects that can be in a record, why not maintain an array of the frequencies, updating when records are added, deleted or modified?

This would be O(n)

The easiest method is probably mayankeagles suggestion
int GetFreq(int i){
n = 0;
for all records{
n += binarysearchrecord(i)
}
return n;
}

Harder possibilites would be caching answers temporarily if similar requests are asked frequently
0

LVL 12

Accepted Solution

Salte earned 500 total points
ID: 8173481
Assuming that all your records are sorted this is the approach you can do. I will assume that you only have three, if you have more than 3 it can be slightly modified later.

Also, I will assume that there are two values MIN and MAX such that for any value from any record x it is always the case that MIN < x < MAX. Whenever we read from a record and there are no more values (we have reached the end) we set the value read to MAX.

The algorithm uses 5 variables.

P1, P2 and P3 hold values from each of their records at any time. The value is V if the record has no more data.

M hold the maximum value of P1, P2 and P3 at any time. If any or all of the records are at the end M will have the value V.

S is initially empty and will hold the set of common values found so far and at the end of the algorithm is the end result.

The basic idea is as long as a record's next value is lower than M, read a new value from the record. This might possibly change M, if so the record's value must then equal M and you move on to the next record.

If all three records are equal to M you insert M into S and advance all three records.

This makes a fairly simple state machine:

State 0: /* Initialize.
Set S = empty_set;
Set P1,P2,P3,M to MIN (all values set to MIN).
Go to state 1.

State 1: /* Read new value into P1 */
If P1 < M go to state 1 (read a new value from P1).
if P1 == M go to state 2.
/* P1 > M */
If P1 == MAX go to state 7 (we're done).
Set M to P1.
Go to state 2.

State 2: /* P1 == M */
If P2 < M go to state 3.
If P2 == M go to state 4.
/* P2 > M */
set M to P2
/* Now we have that P1 < M again, read P1. */
Go to state 1.

State 3: /* P1 == M, P2 < M */
If P2 < M Go to state 3. (read a new value).
If P2 == M Go to state 4.
/* P2 > M */
If P2 == MAX go to state 7 (we're done).
set M to P2
Go to state 1.

State 4: /* P1 == M, P2 == M */
if P3 < M go to state 5
if P3 == M go to state 6
/* P3 > M */
set M to P3.
/* Now we have P1 < M again, read P1 */
Go to state 1.

State 5: /* P1 == M, P2 == M, P3 < M */
If P3 < M go to state 5. (read a new value).
If P3 == M go to state 6.
/* P3 > M */
If P3 == MAX go to state 7 (we're done).
set M to P3.
Go to state 1.

State 6: /* P1 == M, P2 == M and P3 == M */
/* MIN < M < MAX */
Insert M into S.
Set P1,P2,P3,M to MIN.
Go to state 1.

State 7:
/* S has the set in sorted order */

Note, this algorithm runs through each record only once and not even that much, if one record is empty it will ignore any later values in the other records since they necessarily must be larger than the largest value found in the record that was exhausted.

To change this for a N sets for input is rather simple, you will get more states in the state machine but other than that it is simple and the change is obvious.

Personally I wouldn't call it 'record'. It looks more to me like 'streams' or 'sequences' since they are sequences of values and they can hold a different number of values in each.

A 'record' is in my view a fixed length thing which has a fixed number of elements in the sequence.

It was therefore hard for me to write this description so if I above use the term 'stream' anywhere I most likely meant 'record' to keep it with your terminology but it is possible I have slipped and used the term 'stream' from time to time.

Alf
0

LVL 22

Expert Comment

ID: 8174194
Well, it depends on where the data are residing,
how much RAM you have available,
whether you have sequential or random access to the records, whether they are in memory, on tape, on disk, and whether the data are sorted within the records, or sorted by record length, whether you're allowed to pre-scan the data and build indexes or hashes, or whatnot.

Waay too many unknowns to give any kind of coherent answer.  maybe you could provide more details?

----
Some general observations:

If you don't have any info about the records, you're going to have to scan all fields of all records.

0

LVL 12

Expert Comment

ID: 8174515
The algorithm I provided only require sequential scan of each of the records (or streams or sequences as I would prefer to call them) even if they are all in the same disk file you can easily manipulate seekg() to switch between them, just keep one file position for each of the records, so when you want to read P1 you do seekg(P1_pos), read the value and store it in P1 and read the new position after the value and store it in P1_pos.

Alf
0

Author Comment

ID: 8176206
Thank you Alf, yours algorithm is pretty fast.
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Errors will happen. It is a fact of life for the programmer. How and when errors are detected have a great impact on quality and cost of a product. It is better to detect errors at compile time, when possible and practical. Errors that make their waâ€¦
This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a Gâ€¦
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
###### Suggested Courses
Course of the Month12 days, 22 hours left to enroll