Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 406
  • Last Modified:

Comparing 2 CSV (Text Files) with fuzzy hashing , possible?

Hi Guys ,
i was asked by my boss to take 2 files , which are actually user repository files.

1 is hr user repository , the other is AD User repository.

i have 2 csv's which include the following
FirstName , FirstName2 , LastName , LastName2 , EmployeeID

Objective : i need to find which users in the the AD file , do not have a user in the HR File (which means they have a user but they are not workers)

Problem one - i tried using Contains and compared First Name to first name , and last name to last name , however - sometimes there are more then one first or last names , so i need to do more checks.

the first check however is the employee id - its not always there , but its 100% correct.

2nd Problem is typos - in one file there is

Arik , John , Smith
and the other has
Aric , John , Smith

is there a way to find % of matching? using some kind of fuzzy hashing?

im open to all ideas , maybe i aint seein the full picture

help is much appirciated ;)
0
m0tek
Asked:
m0tek
  • 3
  • 2
1 Solution
 
ozoCommented:
perldoc -q "How can I do approximate matching"
       See the module String::Approx available from CPAN.
0
 
mrjoltcolaCommented:
Check out Tie::Hash::Abbrev, it might be useful.

http://search.cpan.org/~fany/Tie-Hash-Array-0.1/lib/Tie/Hash/Abbrev.pm


0
 
mrjoltcolaCommented:
Another article with some sample code

http://www.perlmonks.org/?node_id=300129
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
m0tekAuthor Commented:
is there any premade thing using one of those?

this one seems very similar to what i need

http://www.perlmonks.org/?node_id=300129

(i dont know peral or anything , yet :( )
0
 
mrjoltcolaCommented:
I don't think you will have much luck if you don't know Perl. Did you ask this in the Perl zone by mistake?
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now