Solved

find common data in two large files

Posted on 2008-10-07
3
363 Views
Last Modified: 2012-05-05
Find common data in two large files.
Suppose two files have billios of usernames ( each user name appended in the file)
How efficiently we can find common data.(username)
Is it possible by using B tree?
0
Comment
Question by:shwetasingh206
3 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 22657347
Yes, it is possible using a B tree.
If it makes a difference, and you have a choice, a B tree of the smaller of the files should be more efficient.
Or a pat trie or suffix tree may be more efficient foe some distributions of names.
A hash table could have linear time expected performance, though worst case may be quadratic.
But if you handle collisions with a B tree. worst case performance would also be n log n

0
 
LVL 5

Expert Comment

by:libin_v
ID: 22657361
If you are looking for a solution using existing tools, please find below few linux tools that could do this for you.

sort -u FILE1 > FILE1.sorted
sort -u FILE2 > FILE2.sorted
comm -12 FILE1.sorted FILE2.sorted > commonfile

The common lines are put into file commonfile
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction On a scale of 1 to 10, how would you rate our Product? Many of us have answered that question time and time again. But only a few of us have had the pleasure of receiving a stack of the filled out surveys and being asked to do somethi…
Article by: Nadia
Suppose you use Uber application as a rider and you request a ride to go from one place to another. Your driver just arrived at the parking lot of your place. The only thing you know about the ride is the license plate number. How do you find your U…
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question