Comparing 2 CSV (Text Files) with fuzzy hashing , possible?
Posted on 2009-06-28
Hi Guys ,
i was asked by my boss to take 2 files , which are actually user repository files.
1 is hr user repository , the other is AD User repository.
i have 2 csv's which include the following
FirstName , FirstName2 , LastName , LastName2 , EmployeeID
Objective : i need to find which users in the the AD file , do not have a user in the HR File (which means they have a user but they are not workers)
Problem one - i tried using Contains and compared First Name to first name , and last name to last name , however - sometimes there are more then one first or last names , so i need to do more checks.
the first check however is the employee id - its not always there , but its 100% correct.
2nd Problem is typos - in one file there is
Arik , John , Smith
and the other has
Aric , John , Smith
is there a way to find % of matching? using some kind of fuzzy hashing?
im open to all ideas , maybe i aint seein the full picture
help is much appirciated ;)