I am trying to find a procedure for normalizing values of several scales so that results produced when comparing a protein sequence to them can all be plotted on the same graph. I can normalize the values of the scale such that they fall between 0 and 1 using the following code:
#!/usr/bin/perl -w
use strict;
my %hydrophobicity = (
# Kyte, J. & R. F. Doolittle. 1982. J Mol Biol 157:105-132.
'S' => '-0.8', 'F' => '2.8', 'T' => '-0.7',
'N' => '-3.5', 'K' => '-3.9', 'Y' => '-1.3',
'E' => '-3.5', 'V' => '4.2', 'Q' => '-3.5',
'M' => '1.9', 'C' => '2.5', 'L' => '3.8',
'A' => '1.8', 'W' => '-0.9', 'P' => '-1.6',
'H' => '-3.2', 'D' => '-3.5', 'R' => '-4.5',
'I' => '4.5', 'G' => '-0.4',
);
my @values = values %hydrophobicity;
my ($min, $max) = min_max(@values);
my $range = abs($min - $max);
my %normal = normalize(\%hydrophobicity
, $max, $range);
use Data::Dumper;
print Dumper(\%normal);
sub normalize {
my ($scale, $max, $range) = @_;
my %normalized;
foreach (sort keys %{$scale}) {
$normalized{$_} = ($scale->{$_} + $max)/$range;
}
return %normalized;
}
sub min_max {
my @values = @_;
my ($min, $max) = ($values[0], $values[0]);
$_ < $min and $min = $_ for @values;
$_ > $max and $max = $_ for @values;
return ($min, $max);
}
This is good, as far as it goes. How can I normalize an arbitrary scale (I've included one well-known hydrophobicity scale here, but there are hundreds of similar scales measuring various attributes of amino acid residues) so that all values fall between, say, -5 and +5?
I am trying to do something similar to what is described at
http://www.imtech.res.in/raghava/bcepred/bcepred_algorithm.html in the "Normalization procedure" section, but I can't translate this paragraph into Perl.
Any help greatly appreciated...
Start Free Trial