Solved

find common data in two large files

Posted on 2008-10-07
3
360 Views
Last Modified: 2012-05-05
Find common data in two large files.
Suppose two files have billios of usernames ( each user name appended in the file)
How efficiently we can find common data.(username)
Is it possible by using B tree?
0
Comment
Question by:shwetasingh206
3 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
Comment Utility
Yes, it is possible using a B tree.
If it makes a difference, and you have a choice, a B tree of the smaller of the files should be more efficient.
Or a pat trie or suffix tree may be more efficient foe some distributions of names.
A hash table could have linear time expected performance, though worst case may be quadratic.
But if you handle collisions with a B tree. worst case performance would also be n log n

0
 
LVL 5

Expert Comment

by:libin_v
Comment Utility
If you are looking for a solution using existing tools, please find below few linux tools that could do this for you.

sort -u FILE1 > FILE1.sorted
sort -u FILE2 > FILE2.sorted
comm -12 FILE1.sorted FILE2.sorted > commonfile

The common lines are put into file commonfile
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Okay. So what exactly is the problem here? How often have we come across situations where we need to know if two strings are 'similar' but not necessarily the same? I have, plenty of times. Until recently, I thought any functionality like that wo…
Article by: Nadia
Linear search (searching each index in an array one by one) works almost everywhere but it is not optimal in many cases. Let's assume, we have a book which has 42949672960 pages. We also have a table of contents. Now we want to read the content on p…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now