Solved

Comparing 2 CSV (Text Files) with fuzzy hashing , possible?

Posted on 2009-06-28
6
403 Views
Last Modified: 2012-05-07
Hi Guys ,
i was asked by my boss to take 2 files , which are actually user repository files.

1 is hr user repository , the other is AD User repository.

i have 2 csv's which include the following
FirstName , FirstName2 , LastName , LastName2 , EmployeeID

Objective : i need to find which users in the the AD file , do not have a user in the HR File (which means they have a user but they are not workers)

Problem one - i tried using Contains and compared First Name to first name , and last name to last name , however - sometimes there are more then one first or last names , so i need to do more checks.

the first check however is the employee id - its not always there , but its 100% correct.

2nd Problem is typos - in one file there is

Arik , John , Smith
and the other has
Aric , John , Smith

is there a way to find % of matching? using some kind of fuzzy hashing?

im open to all ideas , maybe i aint seein the full picture

help is much appirciated ;)
0
Comment
Question by:m0tek
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 24733100
perldoc -q "How can I do approximate matching"
       See the module String::Approx available from CPAN.
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24733210
Check out Tie::Hash::Abbrev, it might be useful.

http://search.cpan.org/~fany/Tie-Hash-Array-0.1/lib/Tie/Hash/Abbrev.pm


0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24733214
Another article with some sample code

http://www.perlmonks.org/?node_id=300129
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 24733238
0
 

Author Comment

by:m0tek
ID: 24734174
is there any premade thing using one of those?

this one seems very similar to what i need

http://www.perlmonks.org/?node_id=300129

(i dont know peral or anything , yet :( )
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24742105
I don't think you will have much luck if you don't know Perl. Did you ask this in the Perl zone by mistake?
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Strings in Python are the set of characters that, once defined, cannot be changed by any other method like replace. Even if we use the replace method it still does not modify the original string that we use, but just copies the string and then modif…
Not long ago I saw a question in the VB Script forum that I thought would not take much time. You can read that question (Question ID  (http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_28455246.html)28455246) Here (http…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question