Solved

find common data in two large files

Posted on 2008-10-07
3
364 Views
Last Modified: 2012-05-05
Find common data in two large files.
Suppose two files have billios of usernames ( each user name appended in the file)
How efficiently we can find common data.(username)
Is it possible by using B tree?
0
Comment
Question by:shwetasingh206
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 22657347
Yes, it is possible using a B tree.
If it makes a difference, and you have a choice, a B tree of the smaller of the files should be more efficient.
Or a pat trie or suffix tree may be more efficient foe some distributions of names.
A hash table could have linear time expected performance, though worst case may be quadratic.
But if you handle collisions with a B tree. worst case performance would also be n log n

0
 
LVL 5

Expert Comment

by:libin_v
ID: 22657361
If you are looking for a solution using existing tools, please find below few linux tools that could do this for you.

sort -u FILE1 > FILE1.sorted
sort -u FILE2 > FILE2.sorted
comm -12 FILE1.sorted FILE2.sorted > commonfile

The common lines are put into file commonfile
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Nadia
Suppose you use Uber application as a rider and you request a ride to go from one place to another. Your driver just arrived at the parking lot of your place. The only thing you know about the ride is the license plate number. How do you find your U…
Lithium-ion batteries area cornerstone of today's portable electronic devices, and even though they are relied upon heavily, their chemistry and origin are not of common knowledge. This article is about a device on which every smartphone, laptop, an…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question