We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you a podcast all about Citrix Workspace, moving to the cloud, and analytics & intelligence. Episode 2 coming soon!Listen Now

x

how to get rid of spaces in the output from this perl script

bjuneja_2000
bjuneja_2000 asked
on
Medium Priority
216 Views
Last Modified: 2010-03-05
Folks,
I have a perl script that actually parses following sample file:
gene            1995..3119
                     /gene="dnaN"
                     /locus_tag="AAur_0002"
     CDS             1995..3119
                     /gene="dnaN"
                     /locus_tag="AAur_0002"
                     /EC_number="2.7.7.7"
                     /note="identified by match to protein family HMM PF00712;
                     match to protein family HMM PF02767; match to protein
                     family HMM PF02768; match to protein family HMM TIGR00663"
                     /codon_start=1
                     /transl_table=11
                     /product="DNA polymerase III, beta subunit"
                     /protein_id="tigr:AAur_0002"
                     /translation="MKFRVDRDVLAEAVTWTARSLSPRPPVPVLSGLLLKAEAGTVSL
                     SSFDYETSARLEIPADIAVEGTILVSGRLLADICRSLPSAPVEVETDGSKVTLTCRRS
                     SFHLATMPESEYPALPALPAISGTLPGDAFAQAVSQVIIAASKDDTLPILTGVRMEIE
                     DDLITLLATDRYRLAMREVPWKPVTPGISTSALVKSKTLNEVAKTLGGSGDINLALAD
                     DDSRLIGFESGGRTTTSLLVDGDYPKIRSLFPDSTPIHATVQTQELVEAVRRVSLVAE
                     RNTPVRLAFTQGLLNLDAGTGEDAQASEELEAQLSGEDITVAFNPHYLVEGLSVIETK
                     YVRFSFTTAPKPAMITAQAEADGEDQDDYRYLVMPVRLPN"
gene            5318..5872
                     /locus_tag="AAur_0005"
     CDS             5318..5872
                     /locus_tag="AAur_0005"
                     /note="identified by match to protein family HMM PF05258"
                     /codon_start=1
                     /transl_table=11
                     /product="putative protein of unknown function (DUF721)"
                     /protein_id="tigr:AAur_0005"
                     /translation="MAKDSRDGLQPGREPDEIDAAQAALNRMREAAAARGEVRQRAPR
                     PGSAPKRQGLRDTRGFAQFHGSGRDPLGLGKVVGRLVAERGWTSPVAVGSVMAEWETL
                     VGPDISSHCTPESFTDTTLHVRCDSTAWATQLRLLSTSLLEMFRNELGEGVVTSIHVL
                     GPSAPSWRKGGRSVNGRGPRDTYG"


Here is the script:
use strict;
use vars
qw{
$table_line
};
$table_line ='';
while(<>)
{

        if(/^\s+\/product=(.*)/)
        {
                my $product =$1;
                while (<>)
                {
                        last unless /^\s+\/product=(.*)/;
                        $product =$product.$1;

                }
                $table_line =$table_line.$product."\t";
        }
                if(/^\s+\/protein_id=(.*)/)
        {
                $table_line = $table_line.$1."\t";

        }
         if(/^\s+\/translation=(.*)/)
        {
                my $translation = $1;
                while (<>)
                {
                        last unless /^\s+\        (.*)/;
                        $translation=$translation.$1;
                }
                $table_line=$table_line.$translation."\t";

        }
                print "$table_line\n";
                $table_line ="";


}


Here is the output from the script:  This script parses the input file and puts required entries in tabbed format in a output file:












"DNA polymerase III, beta subunit"      "tigr:AAur_0002"
"MKFRVDRDVLAEAVTWTARSLSPRPPVPVLSGLLLKAEAGTVSLSSFDYETSARLEIPADIAVEGTILVSGRLLADICRSLPSAPVEVETDGSKVTLTCRRSSFHLATMPESEYPALPALPAISGTLPGDAFAQAVSQVIIAASKDDTLPILTGVRMEIEDDLITLLATDRYRLAMREVPWKPVTPGISTSALVKSKTLNEVAKTLGGSGDINLALADDDSRLIGFESGGRTTTSLLVDGDYPKIRSLFPDSTPIHATVQTQELVEAVRRVSLVAERNTPVRLAFTQGLLNLDAGTGEDAQASEELEAQLSGEDITVAFNPHYLVEGLSVIETKYVRFSFTTAPKPAMITAQAEADGEDQDDYRYLVMPVRLPN"






"putative protein of unknown function (DUF721)" "tigr:AAur_0005"
"MAKDSRDGLQPGREPDEIDAAQAALNRMREAAAARGEVRQRAPRPGSAPKRQGLRDTRGFAQFHGSGRDPLGLGKVVGRLVAERGWTSPVAVGSVMAEWETLVGPDISSHCTPESFTDTTLHVRCDSTAWATQLRLLSTSLLEMFRNELGEGVVTSIHVLGPSAPSWRKGGRSVNGRGPRDTYG"


I wanted this script to inseatd not to print the blank spaces , instead it shd just print the required output in tabbed format .. any clues how can i get rid  of these blank spaces ..
Comment
Watch Question

Commented:
Before printing, try:

$table_line =~ s/^\s*$//g;

Author

Commented:
hmm ,
Actually I tried that before , it didn't work .., not sure why ..
Any other clue ?
CERTIFIED EXPERT
Most Valuable Expert 2014
Top Expert 2015
Commented:
       print "$table_line\n" if $table_line;

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.