[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

find common data in two large files

Posted on 2008-10-07
3
Medium Priority
?
373 Views
Last Modified: 2012-05-05
Find common data in two large files.
Suppose two files have billios of usernames ( each user name appended in the file)
How efficiently we can find common data.(username)
Is it possible by using B tree?
0
Comment
Question by:shwetasingh206
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 2000 total points
ID: 22657347
Yes, it is possible using a B tree.
If it makes a difference, and you have a choice, a B tree of the smaller of the files should be more efficient.
Or a pat trie or suffix tree may be more efficient foe some distributions of names.
A hash table could have linear time expected performance, though worst case may be quadratic.
But if you handle collisions with a B tree. worst case performance would also be n log n

0
 
LVL 5

Expert Comment

by:libin_v
ID: 22657361
If you are looking for a solution using existing tools, please find below few linux tools that could do this for you.

sort -u FILE1 > FILE1.sorted
sort -u FILE2 > FILE2.sorted
comm -12 FILE1.sorted FILE2.sorted > commonfile

The common lines are put into file commonfile
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

One of Google's most recent algorithm changes affecting local searches is entitled "The Pigeon Update." This update has dramatically enhanced search inquires for the keyword "Yelp." Google searches with the word "Yelp" included will now yield Yelp a…
When we purchase storage, we typically are advertised storage of 500GB, 1TB, 2TB and so on. However, when you actually install it into your computer, your 500GB HDD will actually show up as 465GB. Why? It has to do with the way people and computers…
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question