Advertisement

01.03.2006 at 08:04AM PST, ID: 21682206
[x]
Attachment Details
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

8.6

Extracting values from a complex and variable data structure to produce a report

Asked by mjcoyne in Perl Programming Language

Tags: ,

Hello all --

I have an array of hashes that store data about apx. 4,000 genes.  A typical entry looks like this (some entries truncated with '...' for posting purposes):

BF2784 = {
  'name' => 'BF2784',
  'descr' => 'putative EPS related membrane protein',
  'start' => '3242515',
  'end' => '3244920',
  'ori' => 'pos',
  'bp' => '2409',
  'GC' => '56.3',
  'GeneID' => '3287061',
  'aa' => '802',
  'kDa' => '88.3',
  'GI' => '60682255',
  'groups' => {
    'CDD' => [
      'COG0455',
      'COG3206',
      'cd00550'
    ],
    'COG' => {
      'D' => 'Cell cycle control, mitosis and meiosis genes',
      'M' => 'Cell wall/membrane biogenesis genes'
    }
  },
  'links' => {
    'pep' => 'http://www.ncbi.nlm.nih.gov/entrez/...',
    'seq' => 'http://www.ncbi.nlm.nih.gov/entrez/...',
    'summary' => 'http://www.ncbi.nlm.nih.gov/entrez/...',
    'upstr' => 'http://www.ncbi.nlm.nih.gov/entrez/...'
  },
  'aka' => {
    'lec' => [
      'orf3_tsr19'
    ]
    'gb' => [
      'sigE'
    ],
  },
  'up_gap' => {
    'end' => '3242514',
    'size' => '13',
    'start' => '3242502'
  },
  'microarray' => {
    'SeqID' => 'BFRAG050600002693',
    'descr' => '2693|Bacteroides fragilis|0|506|CDS...'
    '102805' => {
      '0265_ML' => {
        'avg' => '959.5032',
        'block1' => '820.4571',
        'block2' => '1142.2980',
        'block3' => '915.7545'
      },
      '1394_EL' => {
        'avg' => '422.2764',
        'block1' => '448.5869',
        'block2' => '454.1586',
        'block3' => '364.0837'
      },
      '9343_EL' => {
        'avg' => '797.4852',
        'block1' => '753.0446',
        'block2' => '885.5215',
        'block3' => '753.8896'
      },
      '9343_ML' => {
        'avg' => '858.0540',
        'block1' => '933.8485',
        'block2' => '822.4420',
        'block3' => '817.8716'
      },
      'CrrD_ML' => {
        'avg' => '952.3332',
        'block1' => '1000.5565',
        'block2' => '949.4948',
        'block3' => '906.9484'
      }
    },
    '121905' => {
      '9343_ML' => {
        'avg' => '976.8530',
        'block1' => '1053.0826',
        'block2' => '930.7049',
        'block3' => '946.7716'
      },
      'ddUngD' => {
        'avg' => '852.5260',
        'block1' => '851.9713',
        'block2' => '823.1842',
        'block3' => '882.4226'
      },
      'mpi_mut44' => {
        'avg' => '1295.1745',
        'block1' => '1367.4020',
        'block2' => '1229.0144',
        'block3' => '1289.1070'
      },
      'mpi_mut8' => {
        'avg' => '1126.2450',
        'block1' => '1115.6544',
        'block2' => '1093.3422',
        'block3' => '1169.7385'
      },
      'tsr19_M1' => {
        'avg' => '1895.5840',
        'block1' => '1916.8111',
        'block2' => '1798.7082',
        'block3' => '1971.2327'
      },
      'tsr19_M3' => {
        'avg' => '1249.8808',
        'block1' => '1215.6576',
        'block2' => '1281.3577',
        'block3' => '1252.6272'
      }
    },
  },
};

These data are stored on disk in Storable format, and retrieved by:

  use Storable qw(store retrieve);
  my $data = retrieve("master_9343.db");

The entry above is from a hash in the $data->[0] array, e.g.:

use Data::Dumper;

open (DD, ">BF2784_dump.txt") or die;
print DD Dumper($data->[0]{BF2784});

This arrangement works great if I'm looking up a particular attribute of the gene, or a small set of attributes:

print "$gene begins at $data->[0]{$gene}{start} and ends at $data->[0]{$gene}{end}\n";

but what I want to do now is provide a program to display *all* stored data about the gene.  Some genes do not have all the entries shown above (for example, there may be only one name for a particular gene, thus neither $data->[0]{$gene}{aka}{lec} nor $data->[0]{$gene}{aka}{gb} will exist for that value of $gene, but either or both might exist for another value of $gene).  This is true for several of the structures ($data->[0]{$gene}{groups}, $data->[0]{$gene}{up_gap}, etc.).

So, my goal is to write a command line program that, when provided the name of a gene, prints out a report containing all the data accumulated for that gene.  For now, I'm going to output it to a text file using Perl's Report formats, but I might eventually try for a Perl/Tk version.

What is the best way to iterate over such a variable structure to dump all the data, ignoring values that don't exist?  I could of course set variables for each possible entry, testing first if it exists:

if (exists $data->[0]{$gene}{aka}{lec}) {
   my $aka = join (", ", @{$data->[0]{$gene}{aka}{lec}});
}

if (exists $data->[0]{$gene}{groups}{CDD}) {
   my $cdd = join (", ", @{$data->[0]{$gene}{groups}{CDD}});
}

etc., but this brute force approach seems tedious and wasteful.  Does anyone have any suggestions for an efficient way to do this, such that I wind up with a collection of variables suitable to pass to a report format subfunction?

Thanks --

MikeStart Free Trial
[+][-]01.03.2006 at 10:19AM PST, ID: 15600997

View this solution now by starting your 7-day free trial. Setting up your free trial is quick, easy, and secure. We will return you to this solution, unlocked, when you're done.

 

About this solution

Zone: Perl Programming Language
Tags: perl, extracting
Sign Up Now!
Solution Provided By: Tim_Utschig
Participating Experts: 2
Solution Grade: A
 
 
[+][-]01.03.2006 at 12:32PM PST, ID: 15602145

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]01.03.2006 at 04:05PM PST, ID: 15604072

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]01.04.2006 at 02:27AM PST, ID: 15606451

Assisted solutions are selected by the member who asked the question as a comment that contributed to their question's solution.

Start your 7-day free trial to view this Assisted Solution or ask the Experts your question.

 
[+][-]01.04.2006 at 02:31AM PST, ID: 15606477

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]01.04.2006 at 04:35AM PST, ID: 15607034

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]01.08.2006 at 10:19AM PST, ID: 15642663

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
 
Loading Advertisement...
20080716-EE-VQP-32