Advertisement

07.27.2006 at 03:23PM PDT, ID: 21934743
[x]
Attachment Details
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

8.6

Seeking to detect smaller things wholly contained within larger things: if loops?

Asked by mjcoyne in Perl Programming Language

Okay, this one is driving me nuts -- I just can't get it exactly right.

I have a multi-level hash of things, each with an associated numerical start and stop (among other attributes).  I have created an array from this hash such that the array contains the (second level) keys of the hash in ascending numerical order:

foreach my $orf (sort {$contigs{$contig}{$a}{start} <=> $contigs{$contig}{$b}{start}} (keys %{$contigs{$contig}})) {
  push (@ordered, $orf);
}

Some of these things (called "orfs", for open reading frames, they are contained within "contigs", which are the primary keys of the hash) are small, and contained entirely within larger orfs (of the same contig).  I'd like to ignore these small open reading frames.  So, I thought I'd cycle through the array, and compare the orfs to each other:

my ($i, $j, %rejected, %accepted);
   
for ($i = 1; $i <= $#ordered; $i++) {
  next if ((exists $rejected{$ordered[$i]}) || (exists $accepted{$ordered[$i]}));
  for ($j = $i+1; $j <= $#ordered; $j++) {
    last if ($contigs{$contig}{$ordered[$j]}{start} > $contigs{$contig}{$ordered[$i]}{end});
    if (($contigs{$contig}{$ordered[$j]}{start} > $contigs{$contig}{$ordered[$i]}{start}) && ($contigs{$contig}{$ordered[$j]}{end} < $contigs{$contig}{$ordered[$i]}{end})) {
      $rejected{$ordered[$j]} = '1';
    } else {
      push (@non_overlap, $j) unless exists $accepted{$ordered[$j]};
      $accepted{$ordered[$j]} = '1';
    }
  }
}

foreach (@non_overlap) {
  # do stuff
}

But it doesn't work.  My output contains, for example:

>Contig2635_30 [1870 - 3501] (neg) 1632 bp (46.7% GC) 543 aa (62.9 kDa)
...data...
>Contig2635_1 [1891 - 2238] (pos) 348 bp (48.3% GC) 115 aa (13.6 kDa)
...data...
>Contig2635_2 [2429 - 2734] (pos) 306 bp (50.3% GC) 101 aa (11.7 kDa)
...data...
>Contig2635_3 [2596 - 3072] (pos) 477 bp (48.0% GC) 158 aa (17.6 kDa)
...data...
>Contig2635_4 [3431 - 3748] (pos) 318 bp (39.0% GC) 105 aa (12.5 kDa)
...data...

One can see that orf 1 (_1) of contig 2635 (occupying 1891 - 2238, start and end, respectively) is contained completely within orf 30 of the same contig (1870 - 3501), as are orfs 2 & 3 -- they should all (1, 2 & 3) be removed, because they're all completely contained within the larger orf 30.  Note that orf 4 should be retained at this stage because, while it does overlap orf 30 (and is probably not real), it's not completely contained within it.  These data also serve to demonstrate that there may be more than one small orf within a single larger one (there also, of course, may be none, or one).

The structure of the hash containing all these data is, for example:

$VAR1 = 'Contig2635';
$VAR2 = {
          '1' => {
                   'aa' => '115',
                   'bp' => 348,
                   'end' => 2238,
                   'hdr' => '>Contig2635_1 [1891 - 2238] 348 bp, 115 aa',
                   'num' => '1',
                   'ori' => 'pos',
                   'start' => 1891
                 },
          '30' => {
                    'aa' => '543',
                    'bp' => 1632,
                    'end' => 3501,
                    'hdr' => '>Contig2635_30 [1870 - 3501] 1632 bp, 543 aa',
                    'num' => '30',
                    'ori' => 'neg',
                    'start' => 1870
                  },
# etc.

I have tried many variations on the loop code shown above, but none has succeeded.  Any help would be appreciated.

Note that the @ordered array contains at least hundreds (more likely thousands) of entries, thus I've attempted to limit the work done by the nested loops by making $j dependent on $i, so that you're not needlessly checking $j = entry 12 against $i = entry 500, because they're so far apart from each other there's no chance one is contained within the other.  In like manner, I've tried to limit comparing $j = 500 against $i = 12, with the line "last if ($contigs{$contig}{$ordered[$j]}{start} > $contigs{$contig}{$ordered[$i]}{end});", because if the start of $j is already greater than the end of $i, there's also no chance one contains the other.

I've also included hashes (%rejected, %accepted) to allow past history to be checked so an orf (hash key) doesn't get put in the @non_overlap array more than once, and to allow bypassing $i if the orf was already rejected as a prior $j (at some point in my trying to get this loop to work I thought this was needed, it may not be...).
 
But, as I said, it's not working, so anything is subject to change.  Can someone see the logic error?  Is there a better way of doing this?Start Free Trial
[+][-]07.27.2006 at 08:38PM PDT, ID: 17198240

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.28.2006 at 08:43AM PDT, ID: 17202087

View this solution now by starting your 7-day free trial. Setting up your free trial is quick, easy, and secure. We will return you to this solution, unlocked, when you're done.

 

About this solution

Zone: Perl Programming Language
Sign Up Now!
Solution Provided By: Adam314
Participating Experts: 1
Solution Grade: A
 
 
[+][-]07.28.2006 at 10:32AM PDT, ID: 17202931

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.28.2006 at 12:57PM PDT, ID: 17204061

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]07.28.2006 at 01:01PM PDT, ID: 17204096

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]07.28.2006 at 01:54PM PDT, ID: 17204477

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.30.2006 at 02:49PM PDT, ID: 17212094

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.31.2006 at 08:43AM PDT, ID: 17216471

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]07.31.2006 at 06:52PM PDT, ID: 17220723

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
 
Loading Advertisement...
20080716-EE-VQP-32