Hi I have two text files with two indexed lists (one text string on each line, index is pipe delimited).
I would like to compare all the strings of the first file against all the strings in the second file and return their distances (ideally also the different characters).
e.g. (first file) (second file)
1|asdasdas fgdfgd dfgdf 1|asdadasda ajsjj snnnn
2|adasdasdasasdas asds asdas 2|asdj 3423 dsfsdfsd
3|asdasdasda asdasd 3||asndn sad333 sfsdfsdf
Looking on CPAN I've found this very promising functions:
a)
http://search.cpan.org/~spurkis/Test-Approx-0.02/lib/Test/Approx.pm (which is based on Text::LevenshteinXS and has also a threshold feature!)
b)
http://search.cpan.org/~kcivey/Text-Brew-0.02/lib/Text/Brew.pm(which returns also the "cost" of the distance")
c)
http://search.cpan.org/~davidebe/Text-WagnerFischer-0.04/WagnerFischer.pmis there a way to use them for comparing the differences in the original text strings?
thank you very much
Start Free Trial