Posted on 2011-04-30
I was asked to write a program from scratch. This program has the following description:
Write a program that finds the “distance” between two different nucleotide (genetic) sequences. The dis- tance is defined as the edit distance, as described in class and the DPV textbook, with the gap penalty and mismatch penalty both equal to 1. You may want to test your code on some small sequences, but it should work on all pairs of the sequences (formatted as plain text) provided to you. Compute the edit distance between each pair of these sequences, leaving out e.coli. Also compute the distance between e. coli and salmonella. Report the normalized edit distance per nucleotide (the edit distance divided by the sum of the lengths of the two sequences) and time taken to run your code on each pair. Which pair is most similar by this measure? You may be interested in looking up the relationships among the pairs afterwards (try Wikipedia).
Note: Be careful about your code’s space and time usage. Inefficient solutions may work on smaller prob- lems, but not scale to cases like e. coli and salmonella. Your code has to eventually work on these cases!
Can anybody please give me some reference codes about edit distance so that I can work on it?