Solved

Comparing 2 CSV (Text Files) with fuzzy hashing , possible?

Posted on 2009-06-28
6
402 Views
Last Modified: 2012-05-07
Hi Guys ,
i was asked by my boss to take 2 files , which are actually user repository files.

1 is hr user repository , the other is AD User repository.

i have 2 csv's which include the following
FirstName , FirstName2 , LastName , LastName2 , EmployeeID

Objective : i need to find which users in the the AD file , do not have a user in the HR File (which means they have a user but they are not workers)

Problem one - i tried using Contains and compared First Name to first name , and last name to last name , however - sometimes there are more then one first or last names , so i need to do more checks.

the first check however is the employee id - its not always there , but its 100% correct.

2nd Problem is typos - in one file there is

Arik , John , Smith
and the other has
Aric , John , Smith

is there a way to find % of matching? using some kind of fuzzy hashing?

im open to all ideas , maybe i aint seein the full picture

help is much appirciated ;)
0
Comment
Question by:m0tek
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 24733100
perldoc -q "How can I do approximate matching"
       See the module String::Approx available from CPAN.
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24733210
Check out Tie::Hash::Abbrev, it might be useful.

http://search.cpan.org/~fany/Tie-Hash-Array-0.1/lib/Tie/Hash/Abbrev.pm


0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24733214
Another article with some sample code

http://www.perlmonks.org/?node_id=300129
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 24733238
0
 

Author Comment

by:m0tek
ID: 24734174
is there any premade thing using one of those?

this one seems very similar to what i need

http://www.perlmonks.org/?node_id=300129

(i dont know peral or anything , yet :( )
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24742105
I don't think you will have much luck if you don't know Perl. Did you ask this in the Perl zone by mistake?
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Everything left of comma in excel 4 51
troubleshoot a python script 8 110
Add a second combobox VBA Excel 8 46
Merging text files strings with filename 18 54
Strings in Python are the set of characters that, once defined, cannot be changed by any other method like replace. Even if we use the replace method it still does not modify the original string that we use, but just copies the string and then modif…
With User Account Control (UAC) enabled in Windows 7, one needs to open an elevated Command Prompt in order to run scripts under administrative privileges. Although the elevated Command Prompt accomplishes the task, the question How to run as script…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question