Link to home
Start Free TrialLog in
Avatar of Europa MacDonald
Europa MacDonaldFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Could you show me how to change this code so the output is different. I have attatched input and output configuration

this is the arrangement of the input data:

511,513,515,502,503,540,545
512,515,536,537,542,509,51

and if I could get it to this arrangement it would be great

502      503      511      513      515      540      545      avg 520            group 2300200000
2      1      8      2      2      25      5      avg 23

509      512      515      517      536      537      542      avg 520            group 1302100000
2      1      8      2      2      25      5      avg 23

use List::Util qw(sum);
use POSIX;
open(IN,"invim2.vim");
open(OUT,">outvim2.vim");
while(<IN>){
        @arr = split /,/;
        $average = sum(@arr)/@arr;
        print OUT "The average is ".$average."\n";
        @sorted_contents = sort{$a <=> $b}@arr;
        print OUT "Sorted contents:\n";
        foreach $i (@sorted_contents)
        {
                print OUT $i."\n";
        }
        for($i = 0; $i<@sorted_contents;$i++){
        if($i == $#sorted_contents)
        {1;}
        else{
                $difference = $sorted_contents[$i+1] - $sorted_contents[$i];
                 push(@diff_array,$difference);
                 #print $difference."\n";
        }
        }
 
        $diff_average = sum(@diff_array)/@diff_array;
        print OUT "The average of difference values is:".$diff_average."\n";
}
 
 
for(@num[1..$#diff_array]) {
    print OUT $diff_array[floor($_/10)]++."\n";
}

Open in new window

Avatar of Adam314
Adam314

From your other question... your original code and the code did not include the first item in some calculations, but the code you accepted did.  Was this intentional?

eg:
Original:  my $avg = ($n2 + $n3 + $n4 + $n5 + $n6 + $n7 + $n8)/ 7;
    this takes the average of the 2nd through 8th numbers
New: $average = sum(@arr)/@arr;
    this takes the average of all numbers

Original:  ($n2,$n3,$n4,$n5,$n6,$n7,$n8)=sort{$a<=>$b}$n2,$n3,$n4,$n5,$n6,$n7,$n8;
    this sorts the 2nd through 8th numbers
New: @sorted_contents = sort{$a <=> $b}@arr;
    this sorts all the numbers
Avatar of Europa MacDonald

ASKER

Im not sure, I have been taking code and messing with it. Does that answer your question ? Im a complete beginner at this programming so its complete trial and error for me with your help of course :-)

I have several different sets of data, and different ranges, but I am trying to learn this code on one range at the moment and hopefully if I tweak it I can carry on without bothering you good men too much.

Im really gratefull for your help

New: $average = sum(@arr)/@arr;
    this takes the average of all numbers

that shouldnt be

New: @sorted_contents = sort{$a <=> $b}@arr;
    this sorts all the numbers

no these are inacruate

Ive been having trouble with the code :-)

i would like to stick to using only scalars so I can read it, but that last bit of code offered a bit of flexibility
back to the drawing board, but Im really getting lost in this. Sorry for that first dumass answer :-)
In your other question, I suggested in post 24810785 that you use arrays instead of a bunch of scalars.  Although there is a learning curve with this, it will make things easier in the end.
In that post, I tried to give some detailed, but beginners, notes about how your code would work with arrays instead.
If, in that question, the post you accepted is not correct, you can ask to have it un-accepted, so you can then accept a more appropriate answer.

For your data, I think we will be able to help you better if you explain what it is you are trying to do on a high-level.  We can write some code, and explain it.  Asking us to fix a piece of code that is logically incorrect will still be logically incorrect.
ok

I am looking at protein arrangement in the surface of nerve cell membranes. Basically, I don't know how much you know about biology, but if you imagine many cars (over 1000 ) stretched out along a road. I am interested in how they arrange themselves in patterns, and how close or far apart they are. The proteins are showing a pattern, and I am looking at the pattern.  They also group themselves in ways that we can see through the microscope, but can't quite model mathematically. So what I am doing is teasing out the patterns and trying to predict further patterns.
This is in the area of repairing spinal cords and brain injury, I am a student with extremely limited resources and even less time, so I am working quite hard at learning what this code is doing. I have been taking the different bits of code offered on here and working out how the work. Thats why I prefer scalars, because then I can see what is going on. i tried that code which used arrays because the author said it gave a bit of flexibility but I dont know how it is working, so i would like now to stay with scalars.

Does that help at all.
I actually have quite a long way to go with this, but I was hoping to pick the code up a bit quicker so I could maybe progress a bit better without inconveniencing anyone else :-)

I have to look at the patterns, model them, look at how they combine, see if parts of them repeat regularly and then some more stuff about predicting patterns.

A friend of mine tried to help me with VB, but he couldnt quite get what I was wanting so I decided to take the plunge and learn PERL and try and develop it myself.
the numbers in the input files represent the distance of the protein from the start.

I need it in a format I can see clearer with, visually, as I am looking for patterns within that data too, matching and looking for sub patterns and predictions.

Does that make sense ?
I am even considering having some sort of graphical output at the end, which would show visually the occurence of proteins and their interrelationships, but I have to get this data sorted before I can even consider that
When I said "high-level", I meant what the code needs to do at a high level.  For example:

read data from a csv file.  For each row:
1) calculate the average of the 2nd through last itme
2) sort the 2nd through the last item
3) calculate the different from the 1st to each other item
    or from each item to the previous item
....
10) display results like so:


As before, I still recommend using arrays.  If you are set on scalars, we can make that work, but it'll be much less flexible, and more difficult to make work with different amounts of data.  Although there is a learning curve for arrays, they will save you time in the long run.  If there is a particular part about the arrays you don't understand, just post here.  

In this post, I tried to explain how arrays work:
https://www.experts-exchange.com/questions/24554751/how-to-get-perl-to-write-to-a-VIM-file.html#24810785
Is that not clear?  Partly clear, but you still have questions?
yes that is starting to make sense to me now actually :-)

so an array would just read in the data delineated with a coma, and store each individual data separately ? then I can access each element individually.

Why would this be better than using scalars ? I can see now that if the data extends, the array will cope with that change automatically but what I couldnt see clearly was how the data could be printed out, and how that could be altered by me easily. I could just ask on here, you guys have been great this is a really good site.

Ok, well you are the expert, so if you think Id be fine with arrays, I will go down that route.
The high level of the code . . . I don't know how it will work to the end, but

1. It has to read a line of data
2. sort that data into ascending order
3. work out the difference (distance) between each number
4. tell me how many of these numbers fall within a specific group
5. print out the results in a format I can scan through visually

THEN
I have anticipated all possible associations of this data. I think I have every possible combination of protein positions (distance) possible. In real life they don't all exist so I have to revisit this huge list (1.5Gb data which I considered using sqlite for but used VIM instead).
so to revisit this list, I have to

1. Define a pattern#
2. Search the list for it.
3. mark all items which match the pattern I want to ignore in later assessments

i have code which does 1 - 5 but it uses scalars.

Is this a bit clearer ?
ASKER CERTIFIED SOLUTION
Avatar of Adam314
Adam314

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
where you have written
while(<$IN>)
does this mean you are putting the input file into the scalar $IN
then reading the contents of the scalar to an array ?

and the while loop is processing the contents of the file ?
The <...> is a shortcut for a few different things, depending on how it's used, so it can be complicated.
    If what is inside the brackets is a file handle, it's short for readline.
    If what is inside is a string, it's short for glob.
    If nothing is inside, it will read from the file named on the command line (@ARGV).

So in this case, it's short for readline($IN), which reads from the $IN filehandle (which is associated with the "invim2.vim" file in this case).
When no variable is specified, like  $line=<$IN>  then the line is automatically saved to $_.  This is just a shortcut way of writing:
    while($_ = readline($IN)) {
The advantage of using $_ is that some functions will automatically use the $_ variable if a variable isn't specified.  Such as the chomp and split functions.  So this;
    chomp;
is short for:
    chomp($_);
And:
    split /,/;
is short for:
    split(/,/, $_);

It saves you some typing, and is frequently used in perl programs.  You can see the documentation for this here:
http://perldoc.perl.org/perlop.html#I%2fO-Operators
Brilliant and helpfull as always