Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Comparing 2 CSV (Text Files) with fuzzy hashing , possible?

Posted on 2009-06-28
6
Medium Priority
?
405 Views
Last Modified: 2012-05-07
Hi Guys ,
i was asked by my boss to take 2 files , which are actually user repository files.

1 is hr user repository , the other is AD User repository.

i have 2 csv's which include the following
FirstName , FirstName2 , LastName , LastName2 , EmployeeID

Objective : i need to find which users in the the AD file , do not have a user in the HR File (which means they have a user but they are not workers)

Problem one - i tried using Contains and compared First Name to first name , and last name to last name , however - sometimes there are more then one first or last names , so i need to do more checks.

the first check however is the employee id - its not always there , but its 100% correct.

2nd Problem is typos - in one file there is

Arik , John , Smith
and the other has
Aric , John , Smith

is there a way to find % of matching? using some kind of fuzzy hashing?

im open to all ideas , maybe i aint seein the full picture

help is much appirciated ;)
0
Comment
Question by:m0tek
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 24733100
perldoc -q "How can I do approximate matching"
       See the module String::Approx available from CPAN.
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24733210
Check out Tie::Hash::Abbrev, it might be useful.

http://search.cpan.org/~fany/Tie-Hash-Array-0.1/lib/Tie/Hash/Abbrev.pm


0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24733214
Another article with some sample code

http://www.perlmonks.org/?node_id=300129
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 84

Accepted Solution

by:
ozo earned 1500 total points
ID: 24733238
0
 

Author Comment

by:m0tek
ID: 24734174
is there any premade thing using one of those?

this one seems very similar to what i need

http://www.perlmonks.org/?node_id=300129

(i dont know peral or anything , yet :( )
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24742105
I don't think you will have much luck if you don't know Perl. Did you ask this in the Perl zone by mistake?
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A set of related code is known to be a Module, it helps us to organize our code logically which is much easier for us to understand and use it. Module is an object with arbitrarily named attributes which can be used in binding and referencing. …
Flask is a microframework for Python based on Werkzeug and Jinja 2. This requires you to have a good understanding of Python 2.7. Lets install Flask! To install Flask you can use a python repository for libraries tool called pip. Download this f…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question