Solved

Could you show me how to change this code so the output is different. I have attatched input and output configuration

Posted on 2009-07-09
16
155 Views
Last Modified: 2012-05-07
this is the arrangement of the input data:

511,513,515,502,503,540,545
512,515,536,537,542,509,51

and if I could get it to this arrangement it would be great

502      503      511      513      515      540      545      avg 520            group 2300200000
2      1      8      2      2      25      5      avg 23

509      512      515      517      536      537      542      avg 520            group 1302100000
2      1      8      2      2      25      5      avg 23

use List::Util qw(sum);

use POSIX;

open(IN,"invim2.vim");

open(OUT,">outvim2.vim");

while(<IN>){

        @arr = split /,/;

        $average = sum(@arr)/@arr;

        print OUT "The average is ".$average."\n";

        @sorted_contents = sort{$a <=> $b}@arr;

        print OUT "Sorted contents:\n";

        foreach $i (@sorted_contents)

        {

                print OUT $i."\n";

        }

        for($i = 0; $i<@sorted_contents;$i++){

        if($i == $#sorted_contents)

        {1;}

        else{

                $difference = $sorted_contents[$i+1] - $sorted_contents[$i];

                 push(@diff_array,$difference);

                 #print $difference."\n";

        }

        }

 

        $diff_average = sum(@diff_array)/@diff_array;

        print OUT "The average of difference values is:".$diff_average."\n";

}

 

 

for(@num[1..$#diff_array]) {

    print OUT $diff_array[floor($_/10)]++."\n";

}

Open in new window

0
Comment
Question by:MichaelGlancy
  • 11
  • 5
16 Comments
 
LVL 39

Expert Comment

by:Adam314
ID: 24818338
From your other question... your original code and the code did not include the first item in some calculations, but the code you accepted did.  Was this intentional?

eg:
Original:  my $avg = ($n2 + $n3 + $n4 + $n5 + $n6 + $n7 + $n8)/ 7;
    this takes the average of the 2nd through 8th numbers
New: $average = sum(@arr)/@arr;
    this takes the average of all numbers

Original:  ($n2,$n3,$n4,$n5,$n6,$n7,$n8)=sort{$a<=>$b}$n2,$n3,$n4,$n5,$n6,$n7,$n8;
    this sorts the 2nd through 8th numbers
New: @sorted_contents = sort{$a <=> $b}@arr;
    this sorts all the numbers
0
 

Author Comment

by:MichaelGlancy
ID: 24818375
Im not sure, I have been taking code and messing with it. Does that answer your question ? Im a complete beginner at this programming so its complete trial and error for me with your help of course :-)

I have several different sets of data, and different ranges, but I am trying to learn this code on one range at the moment and hopefully if I tweak it I can carry on without bothering you good men too much.

Im really gratefull for your help

0
 

Author Comment

by:MichaelGlancy
ID: 24818398
New: $average = sum(@arr)/@arr;
    this takes the average of all numbers

that shouldnt be

New: @sorted_contents = sort{$a <=> $b}@arr;
    this sorts all the numbers

no these are inacruate

Ive been having trouble with the code :-)

i would like to stick to using only scalars so I can read it, but that last bit of code offered a bit of flexibility
0
 

Author Comment

by:MichaelGlancy
ID: 24818412
back to the drawing board, but Im really getting lost in this. Sorry for that first dumass answer :-)
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24818523
In your other question, I suggested in post 24810785 that you use arrays instead of a bunch of scalars.  Although there is a learning curve with this, it will make things easier in the end.
In that post, I tried to give some detailed, but beginners, notes about how your code would work with arrays instead.
If, in that question, the post you accepted is not correct, you can ask to have it un-accepted, so you can then accept a more appropriate answer.

For your data, I think we will be able to help you better if you explain what it is you are trying to do on a high-level.  We can write some code, and explain it.  Asking us to fix a piece of code that is logically incorrect will still be logically incorrect.
0
 

Author Comment

by:MichaelGlancy
ID: 24818578
ok

I am looking at protein arrangement in the surface of nerve cell membranes. Basically, I don't know how much you know about biology, but if you imagine many cars (over 1000 ) stretched out along a road. I am interested in how they arrange themselves in patterns, and how close or far apart they are. The proteins are showing a pattern, and I am looking at the pattern.  They also group themselves in ways that we can see through the microscope, but can't quite model mathematically. So what I am doing is teasing out the patterns and trying to predict further patterns.
This is in the area of repairing spinal cords and brain injury, I am a student with extremely limited resources and even less time, so I am working quite hard at learning what this code is doing. I have been taking the different bits of code offered on here and working out how the work. Thats why I prefer scalars, because then I can see what is going on. i tried that code which used arrays because the author said it gave a bit of flexibility but I dont know how it is working, so i would like now to stay with scalars.

Does that help at all.
0
 

Author Comment

by:MichaelGlancy
ID: 24818602
I actually have quite a long way to go with this, but I was hoping to pick the code up a bit quicker so I could maybe progress a bit better without inconveniencing anyone else :-)

I have to look at the patterns, model them, look at how they combine, see if parts of them repeat regularly and then some more stuff about predicting patterns.

A friend of mine tried to help me with VB, but he couldnt quite get what I was wanting so I decided to take the plunge and learn PERL and try and develop it myself.
0
 

Author Comment

by:MichaelGlancy
ID: 24818614
the numbers in the input files represent the distance of the protein from the start.

I need it in a format I can see clearer with, visually, as I am looking for patterns within that data too, matching and looking for sub patterns and predictions.

Does that make sense ?
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 

Author Comment

by:MichaelGlancy
ID: 24818630
I am even considering having some sort of graphical output at the end, which would show visually the occurence of proteins and their interrelationships, but I have to get this data sorted before I can even consider that
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24818660
When I said "high-level", I meant what the code needs to do at a high level.  For example:

read data from a csv file.  For each row:
1) calculate the average of the 2nd through last itme
2) sort the 2nd through the last item
3) calculate the different from the 1st to each other item
    or from each item to the previous item
....
10) display results like so:


As before, I still recommend using arrays.  If you are set on scalars, we can make that work, but it'll be much less flexible, and more difficult to make work with different amounts of data.  Although there is a learning curve for arrays, they will save you time in the long run.  If there is a particular part about the arrays you don't understand, just post here.  

In this post, I tried to explain how arrays work:
http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_24554751.html#24810785
Is that not clear?  Partly clear, but you still have questions?
0
 

Author Comment

by:MichaelGlancy
ID: 24818748
yes that is starting to make sense to me now actually :-)

so an array would just read in the data delineated with a coma, and store each individual data separately ? then I can access each element individually.

Why would this be better than using scalars ? I can see now that if the data extends, the array will cope with that change automatically but what I couldnt see clearly was how the data could be printed out, and how that could be altered by me easily. I could just ask on here, you guys have been great this is a really good site.

Ok, well you are the expert, so if you think Id be fine with arrays, I will go down that route.
0
 

Author Comment

by:MichaelGlancy
ID: 24818773
The high level of the code . . . I don't know how it will work to the end, but

1. It has to read a line of data
2. sort that data into ascending order
3. work out the difference (distance) between each number
4. tell me how many of these numbers fall within a specific group
5. print out the results in a format I can scan through visually

THEN
I have anticipated all possible associations of this data. I think I have every possible combination of protein positions (distance) possible. In real life they don't all exist so I have to revisit this huge list (1.5Gb data which I considered using sqlite for but used VIM instead).
so to revisit this list, I have to

1. Define a pattern#
2. Search the list for it.
3. mark all items which match the pattern I want to ignore in later assessments

i have code which does 1 - 5 but it uses scalars.

Is this a bit clearer ?
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 500 total points
ID: 24823902
>>Why would this be better than using scalar?
The main reason is it is easier to make it flexible as to the number of items on each row.  The other reason is it makes some of the calculations easier.

In steps 1-5, none of them specify calculate average... but because of your desired output, and existing code, I'm assuming that is a requirement.

In your output, I guessed the second line was the differences, but in your example, that doesn't seem to be the case.  In this, the second line is the difference between pairs.

Also in the second line of output, I'm not sure where the avg amount comes.

If you look at this code compared to your previous code, it should be easier to read.  The operations on an array are easier and simplier to write/read.  If anything isn't clear, let me know.

There is 1 command that might not be obvious: the step 3 using the map function.  What this function does is for the list specified, "(1..$#arr)" in this case (which is the list of numbers from 1 to the last index of @arr), it does does the specified action, "$arr[$_] - $arr[$_-1]" in this case, and returns that value.  So this calculates the difference from the 2nd (index 1, because the first has index 0) through the last element to the previous element.

#!/usr/bin/perl

use strict;

use warnings;

use Data::Dumper;

use List::Util qw(sum);
 

##### Open files

open(my $IN,"invim2.vim") or die "Could not open input: $!\n";

open(my $OUT,">outvim2.vim") or die "Could not open output: $!\n";
 

##### Step 1: read a line, and split it on comma

while(<$IN>){

	chomp;

	my @arr = split /,/;

	next unless @arr;

	

	##### Step 2: Sort data into ascending order

	@arr = sort {$a <=> $b} @arr;

	

	##### Step 3: Calculate difference between neighboring pairs

	my @diff = map {$arr[$_] - $arr[$_-1]} (1..$#arr);

	

	##### Step 4: Calucate amount in each group

	my @groups = (0)x10;

	$groups[$_/100]++ foreach (@arr);

	

	##### Step 6: Calculate average

	my $average = sum(@arr)/@arr;

	

	##### Step 5: Print results

	print $OUT join("    ", @arr) . "    avg $average     group @groups\n";

	print $OUT join("    ", @diff) . "\n";

	print $OUT "\n";

}
 
 

close($IN);

close($OUT);

Open in new window

0
 

Author Comment

by:MichaelGlancy
ID: 24827904
where you have written
while(<$IN>)
does this mean you are putting the input file into the scalar $IN
then reading the contents of the scalar to an array ?

and the while loop is processing the contents of the file ?
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24828221
The <...> is a shortcut for a few different things, depending on how it's used, so it can be complicated.
    If what is inside the brackets is a file handle, it's short for readline.
    If what is inside is a string, it's short for glob.
    If nothing is inside, it will read from the file named on the command line (@ARGV).

So in this case, it's short for readline($IN), which reads from the $IN filehandle (which is associated with the "invim2.vim" file in this case).
When no variable is specified, like  $line=<$IN>  then the line is automatically saved to $_.  This is just a shortcut way of writing:
    while($_ = readline($IN)) {
The advantage of using $_ is that some functions will automatically use the $_ variable if a variable isn't specified.  Such as the chomp and split functions.  So this;
    chomp;
is short for:
    chomp($_);
And:
    split /,/;
is short for:
    split(/,/, $_);

It saves you some typing, and is frequently used in perl programs.  You can see the documentation for this here:
http://perldoc.perl.org/perlop.html#I%2fO-Operators
0
 

Author Closing Comment

by:MichaelGlancy
ID: 31601833
Brilliant and helpfull as always
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
perl to mysql 5 128
Input from stdin for perl 6 108
sort hash by values desc 2 175
PERL - Find newest folder 12 102
On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now