Link to home
Start Free TrialLog in
Avatar of johnnypoon
johnnypoon

asked on

Fast Search Algorithm in C++

Dear All,

I want to ask for your opinion of a search algorithm.

Suppose we have 3 records containing the following data:

Record 1:  100 200 300 400 500
Record 2:  100 300 500
Record 3:  100 400

Then what is the fastest approach to find how many data coexist in both 3
records? Like this example data 100 coexist in both 3 records. I use
exhaustive search so that the performance is very slow. I hope you can give
some opinions to me.

Johnny
Avatar of Mayank S
Mayank S
Flag of India image

Since all your records are sorted, you can use binary search.

Mayank.
Avatar of johnnypoon
johnnypoon

ASKER

In record 1, we have 5 records. For each of the the data in Record 1, we need to find Record 2 and Record 3 once in order to find their existence. Therefore in this example, we have to access Record 2 and 3 for 5 times, which is redundant.

I think binary search can only decrease the time of searching each record, but I would also like to decrease the no. of times re-searching each record.

I only need the no. of data that coexist in those 3 records only, no need to know about what the data is.

Thanks.

Johnny
If memory is not an issue or there are relatively few different objects that can be in a record, why not maintain an array of the frequencies, updating when records are added, deleted or modified?

This would be O(n)

The easiest method is probably mayankeagles suggestion
int GetFreq(int i){
    n = 0;
    for all records{
        n += binarysearchrecord(i)
    }
    return n;
}

Harder possibilites would be caching answers temporarily if similar requests are asked frequently
ASKER CERTIFIED SOLUTION
Avatar of Salte
Salte

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well, it depends on where the data are residing,
how much RAM you have available,
whether you have sequential or random access to the records, whether they are in memory, on tape, on disk, and whether the data are sorted within the records, or sorted by record length, whether you're allowed to pre-scan the data and build indexes or hashes, or whatnot.  

Waay too many unknowns to give any kind of coherent answer.  maybe you could provide more details?

----
Some general observations:

If you don't have any info about the records, you're going to have to scan all fields of all records.

The algorithm I provided only require sequential scan of each of the records (or streams or sequences as I would prefer to call them) even if they are all in the same disk file you can easily manipulate seekg() to switch between them, just keep one file position for each of the records, so when you want to read P1 you do seekg(P1_pos), read the value and store it in P1 and read the new position after the value and store it in P1_pos.

Alf
Thank you Alf, yours algorithm is pretty fast.