Could you show me how to change this code so the output is different. I have attatched input and output configuration

this is the arrangement of the input data:


and if I could get it to this arrangement it would be great

502      503      511      513      515      540      545      avg 520            group 2300200000
2      1      8      2      2      25      5      avg 23

509      512      515      517      536      537      542      avg 520            group 1302100000
2      1      8      2      2      25      5      avg 23

use List::Util qw(sum);
use POSIX;
        @arr = split /,/;
        $average = sum(@arr)/@arr;
        print OUT "The average is ".$average."\n";
        @sorted_contents = sort{$a <=> $b}@arr;
        print OUT "Sorted contents:\n";
        foreach $i (@sorted_contents)
                print OUT $i."\n";
        for($i = 0; $i<@sorted_contents;$i++){
        if($i == $#sorted_contents)
                $difference = $sorted_contents[$i+1] - $sorted_contents[$i];
                 #print $difference."\n";
        $diff_average = sum(@diff_array)/@diff_array;
        print OUT "The average of difference values is:".$diff_average."\n";
for(@num[1..$#diff_array]) {
    print OUT $diff_array[floor($_/10)]++."\n";

Open in new window

Europa MacDonaldChief slayer of dragonsAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

From your other question... your original code and the code did not include the first item in some calculations, but the code you accepted did.  Was this intentional?

Original:  my $avg = ($n2 + $n3 + $n4 + $n5 + $n6 + $n7 + $n8)/ 7;
    this takes the average of the 2nd through 8th numbers
New: $average = sum(@arr)/@arr;
    this takes the average of all numbers

Original:  ($n2,$n3,$n4,$n5,$n6,$n7,$n8)=sort{$a<=>$b}$n2,$n3,$n4,$n5,$n6,$n7,$n8;
    this sorts the 2nd through 8th numbers
New: @sorted_contents = sort{$a <=> $b}@arr;
    this sorts all the numbers
Europa MacDonaldChief slayer of dragonsAuthor Commented:
Im not sure, I have been taking code and messing with it. Does that answer your question ? Im a complete beginner at this programming so its complete trial and error for me with your help of course :-)

I have several different sets of data, and different ranges, but I am trying to learn this code on one range at the moment and hopefully if I tweak it I can carry on without bothering you good men too much.

Im really gratefull for your help

Europa MacDonaldChief slayer of dragonsAuthor Commented:
New: $average = sum(@arr)/@arr;
    this takes the average of all numbers

that shouldnt be

New: @sorted_contents = sort{$a <=> $b}@arr;
    this sorts all the numbers

no these are inacruate

Ive been having trouble with the code :-)

i would like to stick to using only scalars so I can read it, but that last bit of code offered a bit of flexibility
Become a Certified Penetration Testing Engineer

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

Europa MacDonaldChief slayer of dragonsAuthor Commented:
back to the drawing board, but Im really getting lost in this. Sorry for that first dumass answer :-)
In your other question, I suggested in post 24810785 that you use arrays instead of a bunch of scalars.  Although there is a learning curve with this, it will make things easier in the end.
In that post, I tried to give some detailed, but beginners, notes about how your code would work with arrays instead.
If, in that question, the post you accepted is not correct, you can ask to have it un-accepted, so you can then accept a more appropriate answer.

For your data, I think we will be able to help you better if you explain what it is you are trying to do on a high-level.  We can write some code, and explain it.  Asking us to fix a piece of code that is logically incorrect will still be logically incorrect.
Europa MacDonaldChief slayer of dragonsAuthor Commented:

I am looking at protein arrangement in the surface of nerve cell membranes. Basically, I don't know how much you know about biology, but if you imagine many cars (over 1000 ) stretched out along a road. I am interested in how they arrange themselves in patterns, and how close or far apart they are. The proteins are showing a pattern, and I am looking at the pattern.  They also group themselves in ways that we can see through the microscope, but can't quite model mathematically. So what I am doing is teasing out the patterns and trying to predict further patterns.
This is in the area of repairing spinal cords and brain injury, I am a student with extremely limited resources and even less time, so I am working quite hard at learning what this code is doing. I have been taking the different bits of code offered on here and working out how the work. Thats why I prefer scalars, because then I can see what is going on. i tried that code which used arrays because the author said it gave a bit of flexibility but I dont know how it is working, so i would like now to stay with scalars.

Does that help at all.
Europa MacDonaldChief slayer of dragonsAuthor Commented:
I actually have quite a long way to go with this, but I was hoping to pick the code up a bit quicker so I could maybe progress a bit better without inconveniencing anyone else :-)

I have to look at the patterns, model them, look at how they combine, see if parts of them repeat regularly and then some more stuff about predicting patterns.

A friend of mine tried to help me with VB, but he couldnt quite get what I was wanting so I decided to take the plunge and learn PERL and try and develop it myself.
Europa MacDonaldChief slayer of dragonsAuthor Commented:
the numbers in the input files represent the distance of the protein from the start.

I need it in a format I can see clearer with, visually, as I am looking for patterns within that data too, matching and looking for sub patterns and predictions.

Does that make sense ?
Europa MacDonaldChief slayer of dragonsAuthor Commented:
I am even considering having some sort of graphical output at the end, which would show visually the occurence of proteins and their interrelationships, but I have to get this data sorted before I can even consider that
When I said "high-level", I meant what the code needs to do at a high level.  For example:

read data from a csv file.  For each row:
1) calculate the average of the 2nd through last itme
2) sort the 2nd through the last item
3) calculate the different from the 1st to each other item
    or from each item to the previous item
10) display results like so:

As before, I still recommend using arrays.  If you are set on scalars, we can make that work, but it'll be much less flexible, and more difficult to make work with different amounts of data.  Although there is a learning curve for arrays, they will save you time in the long run.  If there is a particular part about the arrays you don't understand, just post here.  

In this post, I tried to explain how arrays work:
Is that not clear?  Partly clear, but you still have questions?
Europa MacDonaldChief slayer of dragonsAuthor Commented:
yes that is starting to make sense to me now actually :-)

so an array would just read in the data delineated with a coma, and store each individual data separately ? then I can access each element individually.

Why would this be better than using scalars ? I can see now that if the data extends, the array will cope with that change automatically but what I couldnt see clearly was how the data could be printed out, and how that could be altered by me easily. I could just ask on here, you guys have been great this is a really good site.

Ok, well you are the expert, so if you think Id be fine with arrays, I will go down that route.
Europa MacDonaldChief slayer of dragonsAuthor Commented:
The high level of the code . . . I don't know how it will work to the end, but

1. It has to read a line of data
2. sort that data into ascending order
3. work out the difference (distance) between each number
4. tell me how many of these numbers fall within a specific group
5. print out the results in a format I can scan through visually

I have anticipated all possible associations of this data. I think I have every possible combination of protein positions (distance) possible. In real life they don't all exist so I have to revisit this huge list (1.5Gb data which I considered using sqlite for but used VIM instead).
so to revisit this list, I have to

1. Define a pattern#
2. Search the list for it.
3. mark all items which match the pattern I want to ignore in later assessments

i have code which does 1 - 5 but it uses scalars.

Is this a bit clearer ?
>>Why would this be better than using scalar?
The main reason is it is easier to make it flexible as to the number of items on each row.  The other reason is it makes some of the calculations easier.

In steps 1-5, none of them specify calculate average... but because of your desired output, and existing code, I'm assuming that is a requirement.

In your output, I guessed the second line was the differences, but in your example, that doesn't seem to be the case.  In this, the second line is the difference between pairs.

Also in the second line of output, I'm not sure where the avg amount comes.

If you look at this code compared to your previous code, it should be easier to read.  The operations on an array are easier and simplier to write/read.  If anything isn't clear, let me know.

There is 1 command that might not be obvious: the step 3 using the map function.  What this function does is for the list specified, "(1..$#arr)" in this case (which is the list of numbers from 1 to the last index of @arr), it does does the specified action, "$arr[$_] - $arr[$_-1]" in this case, and returns that value.  So this calculates the difference from the 2nd (index 1, because the first has index 0) through the last element to the previous element.

use strict;
use warnings;
use Data::Dumper;
use List::Util qw(sum);
##### Open files
open(my $IN,"invim2.vim") or die "Could not open input: $!\n";
open(my $OUT,">outvim2.vim") or die "Could not open output: $!\n";
##### Step 1: read a line, and split it on comma
	my @arr = split /,/;
	next unless @arr;
	##### Step 2: Sort data into ascending order
	@arr = sort {$a <=> $b} @arr;
	##### Step 3: Calculate difference between neighboring pairs
	my @diff = map {$arr[$_] - $arr[$_-1]} (1..$#arr);
	##### Step 4: Calucate amount in each group
	my @groups = (0)x10;
	$groups[$_/100]++ foreach (@arr);
	##### Step 6: Calculate average
	my $average = sum(@arr)/@arr;
	##### Step 5: Print results
	print $OUT join("    ", @arr) . "    avg $average     group @groups\n";
	print $OUT join("    ", @diff) . "\n";
	print $OUT "\n";

Open in new window


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Europa MacDonaldChief slayer of dragonsAuthor Commented:
where you have written
does this mean you are putting the input file into the scalar $IN
then reading the contents of the scalar to an array ?

and the while loop is processing the contents of the file ?
The <...> is a shortcut for a few different things, depending on how it's used, so it can be complicated.
    If what is inside the brackets is a file handle, it's short for readline.
    If what is inside is a string, it's short for glob.
    If nothing is inside, it will read from the file named on the command line (@ARGV).

So in this case, it's short for readline($IN), which reads from the $IN filehandle (which is associated with the "invim2.vim" file in this case).
When no variable is specified, like  $line=<$IN>  then the line is automatically saved to $_.  This is just a shortcut way of writing:
    while($_ = readline($IN)) {
The advantage of using $_ is that some functions will automatically use the $_ variable if a variable isn't specified.  Such as the chomp and split functions.  So this;
is short for:
    split /,/;
is short for:
    split(/,/, $_);

It saves you some typing, and is frequently used in perl programs.  You can see the documentation for this here:
Europa MacDonaldChief slayer of dragonsAuthor Commented:
Brilliant and helpfull as always
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.