Could you show me how to change this code so the output is different. I have attatched input and output configuration

Posted on 2009-07-09
Medium Priority
Last Modified: 2012-05-07
this is the arrangement of the input data:


and if I could get it to this arrangement it would be great

502      503      511      513      515      540      545      avg 520            group 2300200000
2      1      8      2      2      25      5      avg 23

509      512      515      517      536      537      542      avg 520            group 1302100000
2      1      8      2      2      25      5      avg 23

use List::Util qw(sum);
use POSIX;
        @arr = split /,/;
        $average = sum(@arr)/@arr;
        print OUT "The average is ".$average."\n";
        @sorted_contents = sort{$a <=> $b}@arr;
        print OUT "Sorted contents:\n";
        foreach $i (@sorted_contents)
                print OUT $i."\n";
        for($i = 0; $i<@sorted_contents;$i++){
        if($i == $#sorted_contents)
                $difference = $sorted_contents[$i+1] - $sorted_contents[$i];
                 #print $difference."\n";
        $diff_average = sum(@diff_array)/@diff_array;
        print OUT "The average of difference values is:".$diff_average."\n";
for(@num[1..$#diff_array]) {
    print OUT $diff_array[floor($_/10)]++."\n";

Open in new window

Question by:Europa MacDonald
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 11
  • 5
LVL 39

Expert Comment

ID: 24818338
From your other question... your original code and the code did not include the first item in some calculations, but the code you accepted did.  Was this intentional?

Original:  my $avg = ($n2 + $n3 + $n4 + $n5 + $n6 + $n7 + $n8)/ 7;
    this takes the average of the 2nd through 8th numbers
New: $average = sum(@arr)/@arr;
    this takes the average of all numbers

Original:  ($n2,$n3,$n4,$n5,$n6,$n7,$n8)=sort{$a<=>$b}$n2,$n3,$n4,$n5,$n6,$n7,$n8;
    this sorts the 2nd through 8th numbers
New: @sorted_contents = sort{$a <=> $b}@arr;
    this sorts all the numbers

Author Comment

by:Europa MacDonald
ID: 24818375
Im not sure, I have been taking code and messing with it. Does that answer your question ? Im a complete beginner at this programming so its complete trial and error for me with your help of course :-)

I have several different sets of data, and different ranges, but I am trying to learn this code on one range at the moment and hopefully if I tweak it I can carry on without bothering you good men too much.

Im really gratefull for your help


Author Comment

by:Europa MacDonald
ID: 24818398
New: $average = sum(@arr)/@arr;
    this takes the average of all numbers

that shouldnt be

New: @sorted_contents = sort{$a <=> $b}@arr;
    this sorts all the numbers

no these are inacruate

Ive been having trouble with the code :-)

i would like to stick to using only scalars so I can read it, but that last bit of code offered a bit of flexibility
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Author Comment

by:Europa MacDonald
ID: 24818412
back to the drawing board, but Im really getting lost in this. Sorry for that first dumass answer :-)
LVL 39

Expert Comment

ID: 24818523
In your other question, I suggested in post 24810785 that you use arrays instead of a bunch of scalars.  Although there is a learning curve with this, it will make things easier in the end.
In that post, I tried to give some detailed, but beginners, notes about how your code would work with arrays instead.
If, in that question, the post you accepted is not correct, you can ask to have it un-accepted, so you can then accept a more appropriate answer.

For your data, I think we will be able to help you better if you explain what it is you are trying to do on a high-level.  We can write some code, and explain it.  Asking us to fix a piece of code that is logically incorrect will still be logically incorrect.

Author Comment

by:Europa MacDonald
ID: 24818578

I am looking at protein arrangement in the surface of nerve cell membranes. Basically, I don't know how much you know about biology, but if you imagine many cars (over 1000 ) stretched out along a road. I am interested in how they arrange themselves in patterns, and how close or far apart they are. The proteins are showing a pattern, and I am looking at the pattern.  They also group themselves in ways that we can see through the microscope, but can't quite model mathematically. So what I am doing is teasing out the patterns and trying to predict further patterns.
This is in the area of repairing spinal cords and brain injury, I am a student with extremely limited resources and even less time, so I am working quite hard at learning what this code is doing. I have been taking the different bits of code offered on here and working out how the work. Thats why I prefer scalars, because then I can see what is going on. i tried that code which used arrays because the author said it gave a bit of flexibility but I dont know how it is working, so i would like now to stay with scalars.

Does that help at all.

Author Comment

by:Europa MacDonald
ID: 24818602
I actually have quite a long way to go with this, but I was hoping to pick the code up a bit quicker so I could maybe progress a bit better without inconveniencing anyone else :-)

I have to look at the patterns, model them, look at how they combine, see if parts of them repeat regularly and then some more stuff about predicting patterns.

A friend of mine tried to help me with VB, but he couldnt quite get what I was wanting so I decided to take the plunge and learn PERL and try and develop it myself.

Author Comment

by:Europa MacDonald
ID: 24818614
the numbers in the input files represent the distance of the protein from the start.

I need it in a format I can see clearer with, visually, as I am looking for patterns within that data too, matching and looking for sub patterns and predictions.

Does that make sense ?

Author Comment

by:Europa MacDonald
ID: 24818630
I am even considering having some sort of graphical output at the end, which would show visually the occurence of proteins and their interrelationships, but I have to get this data sorted before I can even consider that
LVL 39

Expert Comment

ID: 24818660
When I said "high-level", I meant what the code needs to do at a high level.  For example:

read data from a csv file.  For each row:
1) calculate the average of the 2nd through last itme
2) sort the 2nd through the last item
3) calculate the different from the 1st to each other item
    or from each item to the previous item
10) display results like so:

As before, I still recommend using arrays.  If you are set on scalars, we can make that work, but it'll be much less flexible, and more difficult to make work with different amounts of data.  Although there is a learning curve for arrays, they will save you time in the long run.  If there is a particular part about the arrays you don't understand, just post here.  

In this post, I tried to explain how arrays work:
Is that not clear?  Partly clear, but you still have questions?

Author Comment

by:Europa MacDonald
ID: 24818748
yes that is starting to make sense to me now actually :-)

so an array would just read in the data delineated with a coma, and store each individual data separately ? then I can access each element individually.

Why would this be better than using scalars ? I can see now that if the data extends, the array will cope with that change automatically but what I couldnt see clearly was how the data could be printed out, and how that could be altered by me easily. I could just ask on here, you guys have been great this is a really good site.

Ok, well you are the expert, so if you think Id be fine with arrays, I will go down that route.

Author Comment

by:Europa MacDonald
ID: 24818773
The high level of the code . . . I don't know how it will work to the end, but

1. It has to read a line of data
2. sort that data into ascending order
3. work out the difference (distance) between each number
4. tell me how many of these numbers fall within a specific group
5. print out the results in a format I can scan through visually

I have anticipated all possible associations of this data. I think I have every possible combination of protein positions (distance) possible. In real life they don't all exist so I have to revisit this huge list (1.5Gb data which I considered using sqlite for but used VIM instead).
so to revisit this list, I have to

1. Define a pattern#
2. Search the list for it.
3. mark all items which match the pattern I want to ignore in later assessments

i have code which does 1 - 5 but it uses scalars.

Is this a bit clearer ?
LVL 39

Accepted Solution

Adam314 earned 2000 total points
ID: 24823902
>>Why would this be better than using scalar?
The main reason is it is easier to make it flexible as to the number of items on each row.  The other reason is it makes some of the calculations easier.

In steps 1-5, none of them specify calculate average... but because of your desired output, and existing code, I'm assuming that is a requirement.

In your output, I guessed the second line was the differences, but in your example, that doesn't seem to be the case.  In this, the second line is the difference between pairs.

Also in the second line of output, I'm not sure where the avg amount comes.

If you look at this code compared to your previous code, it should be easier to read.  The operations on an array are easier and simplier to write/read.  If anything isn't clear, let me know.

There is 1 command that might not be obvious: the step 3 using the map function.  What this function does is for the list specified, "(1..$#arr)" in this case (which is the list of numbers from 1 to the last index of @arr), it does does the specified action, "$arr[$_] - $arr[$_-1]" in this case, and returns that value.  So this calculates the difference from the 2nd (index 1, because the first has index 0) through the last element to the previous element.

use strict;
use warnings;
use Data::Dumper;
use List::Util qw(sum);
##### Open files
open(my $IN,"invim2.vim") or die "Could not open input: $!\n";
open(my $OUT,">outvim2.vim") or die "Could not open output: $!\n";
##### Step 1: read a line, and split it on comma
	my @arr = split /,/;
	next unless @arr;
	##### Step 2: Sort data into ascending order
	@arr = sort {$a <=> $b} @arr;
	##### Step 3: Calculate difference between neighboring pairs
	my @diff = map {$arr[$_] - $arr[$_-1]} (1..$#arr);
	##### Step 4: Calucate amount in each group
	my @groups = (0)x10;
	$groups[$_/100]++ foreach (@arr);
	##### Step 6: Calculate average
	my $average = sum(@arr)/@arr;
	##### Step 5: Print results
	print $OUT join("    ", @arr) . "    avg $average     group @groups\n";
	print $OUT join("    ", @diff) . "\n";
	print $OUT "\n";

Open in new window


Author Comment

by:Europa MacDonald
ID: 24827904
where you have written
does this mean you are putting the input file into the scalar $IN
then reading the contents of the scalar to an array ?

and the while loop is processing the contents of the file ?
LVL 39

Expert Comment

ID: 24828221
The <...> is a shortcut for a few different things, depending on how it's used, so it can be complicated.
    If what is inside the brackets is a file handle, it's short for readline.
    If what is inside is a string, it's short for glob.
    If nothing is inside, it will read from the file named on the command line (@ARGV).

So in this case, it's short for readline($IN), which reads from the $IN filehandle (which is associated with the "invim2.vim" file in this case).
When no variable is specified, like  $line=<$IN>  then the line is automatically saved to $_.  This is just a shortcut way of writing:
    while($_ = readline($IN)) {
The advantage of using $_ is that some functions will automatically use the $_ variable if a variable isn't specified.  Such as the chomp and split functions.  So this;
is short for:
    split /,/;
is short for:
    split(/,/, $_);

It saves you some typing, and is frequently used in perl programs.  You can see the documentation for this here:

Author Closing Comment

by:Europa MacDonald
ID: 31601833
Brilliant and helpfull as always

Featured Post


Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question