sarahJo
asked on
simple code
Hi All,
I have an input file pasted below. What I need is a simple perl script that takes each entry in the "betweenness" column (E.g starting with 1792.618) substract the mean of this column from each value and divide each value by the standard deviation. The mean and std dev values are actually listed in the bottom part of the file . If this value is greater than 3, I need to print the entry in the second column (e.g 73A for 1792.618) and the calculated value (which was greater than 3) . E.g of my output file in would be:
number Value (headings)
73A 5.047
etc for all entries greater than 3.
I hope you guys can help!
Sarah
1 2
Betweenness nBetweenness
------------ ------------
71 73A 1792.618 9.575
146 38D 1658.426 8.859
147 39D 1512.253 8.078
81 83A 1242.943 6.639
152 44D 1089.955 5.822
125 17D 1040.325 5.557
145 37D 969.445 5.178
73 75A 939.639 5.019
149 41D 858.042 4.583
69 71A 857.806 4.582
25 27A 842.471 4.500
85 87A 830.485 4.436
137 29D 802.954 4.289
142 34D 757.492 4.046
180 74D 720.249 3.847
161 53D 702.104 3.750
86 88A 700.904 3.744
33 35A 691.887 3.696
150 42D 672.182 3.591
132 24D 666.140 3.558
18 20A 643.116 3.435
101 103A 640.816 3.423
143 35D 606.252 3.238
128 20D 593.524 3.170
100 102A 561.033 2.997
176 70D 530.817 2.835
182 76D 527.608 2.818
58 60A 510.778 2.728
5 7A 502.023 2.682
76 78A 500.788 2.675
129 21D 497.979 2.660
179 73D 489.449 2.614
57 59A 469.591 2.508
88 90A 462.697 2.472
55 57A 461.927 2.467
22 24A 452.382 2.416
8 10A 431.257 2.304
113 5D 418.470 2.235
49 51A 416.237 2.223
153 45D 411.074 2.196
95 97A 404.376 2.160
184 78D 401.024 2.142
138 30D 365.418 1.952
40 42A 358.900 1.917
124 16D 358.621 1.916
127 19D 353.762 1.890
177 71D 352.447 1.883
172 66D 349.797 1.868
82 84A 342.503 1.830
23 25A 323.989 1.731
183 77D 322.110 1.721
157 49D 316.225 1.689
134 26D 302.222 1.614
54 56A 293.711 1.569
174 68D 293.629 1.568
53 55A 293.566 1.568
94 96A 292.313 1.561
159 51D 291.913 1.559
155 47D 288.529 1.541
60 62A 287.777 1.537
148 40D 276.897 1.479
52 54A 259.231 1.385
56 58A 257.833 1.377
68 70A 256.265 1.369
87 89A 251.393 1.343
80 82A 250.563 1.338
141 33D 248.534 1.328
36 38A 243.736 1.302
104 106A 236.903 1.265
140 32D 232.939 1.244
3 5A 224.693 1.200
121 13D 209.199 1.117
97 99A 204.563 1.093
178 72D 192.234 1.027
12 14A 190.155 1.016
131 23D 177.284 0.947
120 12D 161.247 0.861
61 63A 159.187 0.850
123 15D 157.683 0.842
48 50A 156.580 0.836
62 64A 150.501 0.804
135 27D 149.088 0.796
192 86D 148.954 0.796
74 76A 147.141 0.786
151 43D 143.718 0.768
63 65A 142.893 0.763
109 1D 142.776 0.763
186 80D 142.392 0.761
39 41A 136.417 0.729
35 37A 136.186 0.727
84 86A 134.532 0.719
164 56D 132.733 0.709
118 10D 130.712 0.698
47 49A 127.623 0.682
37 39A 127.228 0.680
50 52A 126.830 0.677
72 74A 126.747 0.677
171 63D 122.573 0.655
102 104A 118.440 0.633
99 101A 116.693 0.623
92 94A 114.457 0.611
190 84D 109.063 0.583
167 59D 108.614 0.580
75 77A 106.598 0.569
93 95A 105.380 0.563
173 67D 101.812 0.544
38 40A 97.457 0.521
139 31D 97.211 0.519
29 31A 96.433 0.515
27 29A 96.341 0.515
154 46D 91.436 0.488
133 25D 90.979 0.486
11 13A 88.639 0.473
67 69A 87.661 0.468
110 2D 84.682 0.452
24 26A 84.398 0.451
168 60D 84.377 0.451
162 54D 77.303 0.413
126 18D 76.913 0.411
9 11A 74.440 0.398
96 98A 74.151 0.396
191 85D 73.421 0.392
111 3D 72.555 0.388
144 36D 71.170 0.380
108 110A 69.347 0.370
28 30A 67.119 0.359
20 22A 66.327 0.354
79 81A 66.140 0.353
4 6A 65.006 0.347
51 53A 62.737 0.335
10 12A 62.724 0.335
70 72A 62.081 0.332
41 43A 55.526 0.297
44 46A 53.336 0.285
160 52D 52.104 0.278
21 23A 50.736 0.271
98 100A 49.223 0.263
90 92A 48.887 0.261
114 6D 48.812 0.261
158 50D 47.403 0.253
17 19A 46.595 0.249
189 83D 46.285 0.247
103 105A 44.667 0.239
115 7D 43.630 0.233
31 33A 43.141 0.230
181 75D 42.638 0.228
59 61A 41.174 0.220
89 91A 41.043 0.219
7 9A 39.027 0.208
43 45A 37.870 0.202
112 4D 35.746 0.191
77 79A 35.467 0.189
13 15A 32.452 0.173
26 28A 31.728 0.169
170 62D 31.467 0.168
46 48A 31.048 0.166
19 21A 29.565 0.158
130 22D 29.205 0.156
16 18A 28.062 0.150
83 85A 27.813 0.149
34 36A 27.656 0.148
165 57D 27.639 0.148
194 88D 27.550 0.147
122 14D 26.729 0.143
175 69D 26.239 0.140
193 87D 24.561 0.131
91 93A 24.238 0.129
156 48D 23.072 0.123
188 82D 21.435 0.114
1 3A 19.912 0.106
107 109A 19.744 0.105
116 8D 18.848 0.101
2 4A 18.191 0.097
163 55D 17.511 0.094
32 34A 16.118 0.086
42 44A 15.884 0.085
64 66A 15.832 0.085
105 107A 14.650 0.078
106 108A 13.869 0.074
6 8A 12.837 0.069
78 80A 11.027 0.059
119 11D 10.370 0.055
185 79D 9.964 0.053
195 89D 8.005 0.043
117 9D 6.762 0.036
166 58D 6.392 0.034
45 47A 6.304 0.034
30 32A 6.278 0.034
66 68A 6.226 0.033
136 28D 5.076 0.027
187 81D 4.961 0.027
15 17A 4.042 0.022
65 67A 1.246 0.007
14 16A 1.049 0.006
169 61D 0.901 0.005
DESCRIPTIVE STATISTICS FOR EACH MEASURE
1 2
Betweenness nBetweenness
------------ ------------
1 Mean 243.349 1.300
2 Std Dev 306.918 1.639
3 Sum 47453.000 253.475
4 Variance 94198.758 2.688
5 SSQ 29916386.000 853.593
6 MCSSQ 18368758.000 524.109
7 Euc Norm 5469.587 29.216
8 Minimum 0.901 0.005
9 Maximum 1792.618 9.575
OK..., script created. you have or can get Math::Stat, right?
####start#######
use Math::Stat;
$input = shift || 'input.txt'; #get file
$output = shift || 'output.txt'; #get output file
@col2=(); #column 2
@col3=() ;#column 3
open(in,$input); #open file
open(out,">$output"); #open output file
$trash=<in>;
$trash=<in>;
$trash=<in>; #toss out first 3 lines
while($line=<in>) #use the rest of the lines
{
if($line =~ / *([0-9]+) +([0-9]+[AD]) +([0-9.]+)/){ #parse
push(@col2,$2); #store column 2
push(@col3,$3); #store column 3
}
}
# now we have the data, lets calculate
my $stat = Math::Stat->new(\@col3, { Autoclean => 1 }); #use col3
$std_dev=$stat->stddev(); #std dev
$mean=$stat->average(); #averige
#finally, loop and check
print out "number Value (headings)\n";#print top of table
for($i=0;$i<scalar(@col3); $i++)
{
$x=$col3[$i]-$mean; #subtract mean;
$x/=$std_dev; #devide by standard deviation
if($x>3){ #check if larger then 3
print out sprintf ("%7s %16f\n",$col2[$i],$x); #print
}
}
# and done!
enjoy your calculations...
btw, the output was shrt enough to be done by hand:
number Value (headings)
73A 5.034865
38D 4.598763
39D 4.123726
83A 3.248513
####start#######
use Math::Stat;
$input = shift || 'input.txt'; #get file
$output = shift || 'output.txt'; #get output file
@col2=(); #column 2
@col3=() ;#column 3
open(in,$input); #open file
open(out,">$output"); #open output file
$trash=<in>;
$trash=<in>;
$trash=<in>; #toss out first 3 lines
while($line=<in>) #use the rest of the lines
{
if($line =~ / *([0-9]+) +([0-9]+[AD]) +([0-9.]+)/){ #parse
push(@col2,$2); #store column 2
push(@col3,$3); #store column 3
}
}
# now we have the data, lets calculate
my $stat = Math::Stat->new(\@col3, { Autoclean => 1 }); #use col3
$std_dev=$stat->stddev(); #std dev
$mean=$stat->average(); #averige
#finally, loop and check
print out "number Value (headings)\n";#print top of table
for($i=0;$i<scalar(@col3);
{
$x=$col3[$i]-$mean; #subtract mean;
$x/=$std_dev; #devide by standard deviation
if($x>3){ #check if larger then 3
print out sprintf ("%7s %16f\n",$col2[$i],$x); #print
}
}
# and done!
enjoy your calculations...
btw, the output was shrt enough to be done by hand:
number Value (headings)
73A 5.034865
38D 4.598763
39D 4.123726
83A 3.248513
if i use your preset values for mean and std_dev the result are slightly diferent:
number Value (headings)
73A 5.047827
38D 4.610603
39D 4.134342
83A 3.256876
number Value (headings)
73A 5.047827
38D 4.610603
39D 4.134342
83A 3.256876
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
read and pase file
calculate mean
calculate std dev
for each value in column 3
subtract mean from value
devide value by std dev
if value is larger than 3
print coresponding value from column 2 and value
i hope this is what you want... i will write it anyway, so tell me quickly if it isn't so i can write the correct thing.
hope i can help