Link to home
Start Free TrialLog in
Avatar of syarra
syarra

asked on

removing blank lines and making the data as comma separated in text file

Hi,
  I have a text file,that i got from running the sql query.There are 5 columns.I need to remove column

names and the "-------" under the column names.I did removed column names and "-------".That leaves 2

blank lines at starting of the file.I have 400 rows of data.After the data there is some text.I removed

that text also.That leaves 1 blank line at the bottom of the file.Can some body please tell me how to

remove those blank lines?


And i need to make that data as comma separated.


Example file: empData.txt

column1   column2  column3    coloum4             column5
---------   ---------    ---------    --------               --------
 abcd         N           Y            Bank of America    580 LA
 xyz           Y            N           Bank of Africa        765 DC


I started writing getEmpData.pl as follows.

#!usr/bin/perl

`isql -Uusername -Ppwd -Sserver -isqlfile -oempData.txt`;    #this line is to run sql query.

`perl -pi -e s/\.*empno.*//g empData.txt`;      # to remove column names
`perl -pi -e s/\-*//g empData.txt`;                  # to remove ---------
`perl -pi -e s/\.*return.*//g empData.txt`;      # to remove some text


I am new to perl.Just learning from two days.Can somebody help me out with this?

Thanks,
syarra.
Avatar of ozo
ozo
Flag of United States of America image

perl -ni -e 'print if /./'  empData.txt
{local @ARGV=('empData.txt'); local $^I='';
  while( <> ){
     next if /empno/; # to remove column names
     next if /-+/;       # to remove ---------
     next if /return/;  # to remove some text
     s/  +/,/g;          # to make that data as comma separated.
                             # I'm not sure how you want to distingush the spaces between
                             # "Y" and "Bank" from the spaces beween "Bank" and "of"
                             # you might want to read the first line of column names to determine the spacing    
     print;
  }
}
Avatar of syarra
syarra

ASKER

Hi ozo,
        I have to test your comments tommorow since i am out of office now.The spaces between "Y" and "Bank" are multiple spaces or multiple tabs. The space between "bank" and "of" is only one.
I want data like:

abcd,N,Y,Bank of America,580 LA
xyz,Y,N,Bank of Africa,765 DC

      Thanks,
      syarra.
s/[ \t][ \t]+/,/g;
Avatar of syarra

ASKER

Hi ozo,
         I ran the following script.      

{local @ARGV=('empData.txt'); local $^I='';
  while( <> ){
     next if /empno/;   # to remove column names
     next if /-+/;         # to remove ---------
     next if /return/;   # to remove some text
     s/[ \t][ \t]+/,/g;  # to make that data as comma separated.
                             # I'm not sure how you want to distingush the spaces between
                             # "Y" and "Bank" from the spaces beween "Bank" and "of"
                             # you might want to read the first line of column names to
                             # determine the spacing    
     print;
  }
}

I got data like:

abcd,N,Y,Bank of America 580 LA,
xyz,Y,N,Bank of Africa 765 DC,

how can I get comma between "Bank of America" and "580 LA"? (i think there is only one space between "Bank of America" and "580 LA") and
how can i remove the comma after "580 LA"?

     Thank you very much
          syarra.



Avatar of syarra

ASKER

Hi ozo,
        I removed the comma after "580 LA" with the following.
             s/,$//;
so, Now my data looks like:

abcd,N,Y,Bank of America 580 LA
xyz,Y,N,Bank of Africa 765 DC

I need to get comma between "Bank of America" and "580 LA"

Thanks,
syarra.
Avatar of syarra

ASKER

Hi,
  Increased points from 250 to 275.
                Thanks,
                syarra.
Avatar of syarra

ASKER

sorry increased to 300
Avatar of syarra

ASKER

Hi,
   I have 3 questions.
(1) How can I get comma between "Bank of America" and "580 LA"? (There is only one space between "Bank of America" and "580 LA")
(2)The data may contain comma i.e, like 580,LA .So i need to escape the commas,if any.
(3)I want to switch the columns i.e, like column5 to position 1 and column3 to position 5 etc.

Can anybody please give me solutions?Its Urgent.
I am increasing points from 300 to 500.
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of syarra

ASKER

ozo,
    I got compilation error at line 36:
             my $p=join'',map{'a'.length},<>=~/(\S+\s+)/g; # to determine spacing from the column names

syntax error at /home/syarra/getEmpData.pl line 36, near "},"
Execution of /home/syarra/getEmpData.pl aborted due to compilation errors.

Can you look into it why i got this error? You are helping me a lot. I never going to do the coding you did.I am just learning perl from 3 days.Thanks for your help.
      I am giving you the points.
         syarra.
Sorry, that should have been
my $p=join'',map{'a'.length}<>=~/(\S+\s+)/g;
Avatar of syarra

ASKER

Thank you very much ozo. It worked.
         syarra.
Avatar of syarra

ASKER

Hi ozo,
        I got some error "Can't open $basePath/empData.txt: No such file or directory at /home/syarra/getEmpData.pl line 9."
line 9 is:             my $p=join'',map{'a'.length},<>=~/(\S+\s+)/g;
getEmpData.pl is as following.

#!/usr/bin/perl

$basePath="/home/syarra";

`isql -Uusername -Ppassword -Sservername -i$basePath/sqlquery -o$basePath/empData.txt`;

#assuming the first line of column names lines up with the spacing of the data columns:
{local @ARGV=('$basePath/empData.txt'); local $^I='';
  my $p=join'',map{'a'.length},<>=~/(\S+\s+)/g; # to determine spacing from the column names
  <>; # to remove ---------
  while( <> ){
     next if /return/;   # to remove some text
     chomp;
     print join(',',map{s/^\s+//;s/\s+$//;s/,/\\,/g;$_}(unpack $p,$_)[4,1,0,3,2]),"\n";
  }
}


I am calling getEmpData.pl from getData.ksh which is as follow.

#! /usr/bin/ksh
basePath="/home/syarra"
filename=$basePath"/getEmpData.pl"
/usr/bin/perl $filename


/home/syarra directory contains

-rw-r--r--   1 syarra  dsdr          235 Aug 30 10:55  getData.ksh
-rw-r--r--   1 syarra  dsdr        3001 Aug 30 10:54 getEmpData.pl

can you give me some idea about why i am getting this error?
            Thanks,
            syarra.
local @ARGV=("$basePath/empData.txt");
Avatar of syarra

ASKER

Thanks ozo.
Avatar of syarra

ASKER

Hi ozo,
       Can you please explain the following lines? or please send any documentation related to following code.

{local @ARGV=('$basePath/empData.txt'); local $^I='';
  my $p=join'',map{'a'.length},<>=~/(\S+\s+)/g; # to determine spacing from the column names
  <>; # to remove ---------
  while( <> ){
     next if /return/;   # to remove some text
     chomp;
     print join(',',map{s/^\s+//;s/\s+$//;s/,/\\,/g;$_}(unpack $p,$_)[4,1,0,3,2]),"\n";
  }
}
 
 Thanks,
 syarra.
perldoc perlvar
       @ARGV   The array @ARGV contains the command-line arguments intended
               for the script.  $#ARGV is generally the number of arguments
               minus one, because $ARGV[0] is the first argument, not the pro-
               gram's command name itself.  See $0 for the command name.

       $INPLACE_EDIT
       $^I     The current value of the inplace-edit extension.  Use "undef"
               to disable inplace editing.  (Mnemonic: value of -i switch.)

perldoc perlrun
       -i[extension]
            specifies that files processed by the "<>" construct are to be
            edited in-place.  It does this by renaming the input file, opening
            the output file by the original name, and selecting that output
            file as the default for print() statements.  The extension, if
            supplied, is used to modify the name of the old file to make a
            backup copy, following these rules:

            If no extension is supplied, no backup is made and the current
            file is overwritten.

            If the extension doesn't contain a "*", then it is appended to the
            end of the current filename as a suffix.  If the extension does
            contain one or more "*" characters, then each "*" is replaced with
            the current filename.  In Perl terms, you could think of this as:

                ($backup = $extension) =~ s/\*/$file_name/g;


perldoc perlop
       I/O Operators
...
       The null filehandle <> is special: it can be used to emulate the behav-
       ior of sed and awk.  Input from <> comes either from standard input, or
       from each file listed on the command line.  Here's how it works: the
       first time <> is evaluated, the @ARGV array is checked, and if it is
       empty, $ARGV[0] is set to "-", which when opened gives you standard
       input.  The @ARGV array is then processed as a list of filenames.  The
       loop

           while (<>) {
               ...                     # code for each line
           }

       is equivalent to the following Perl-like pseudo code:

           unshift(@ARGV, '-') unless @ARGV;
           while ($ARGV = shift) {
               open(ARGV, $ARGV);
               while (<ARGV>) {
                   ...         # code for each line
               }
           }

       except that it isn't so cumbersome to say, and will actually work.  It
       really does shift the @ARGV array and put the current filename into the
       $ARGV variable.  It also uses filehandle ARGV internally--<> is just a
       synonym for <ARGV>, which is magical.  (The pseudo code above doesn't
       work because it treats <ARGV> as non-magical.)

       You can modify @ARGV before the first <> as long as the array ends up
       containing the list of filenames you really want.  Line numbers ($.)
       continue as though the input were one big happy file.  See the example
       in "eof" in perlfunc for how to reset line numbers on each file.

       If you want to set @ARGV to your own list of files, go right ahead.
       This sets @ARGV to all plain text files if no @ARGV was given:

           @ARGV = grep { -f && -T } glob('*') unless @ARGV;

       You can even set them to pipe commands.  For example, this automati-
       cally filters compressed arguments through gzip:

           @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;


when <> returns "column1   column2  column3    coloum4             column5
"
/(\S+\s+)/g returns a list of
'column1   ',
'column2  ',
'column3    ',
'coloum4             ',
'column5
'
map{'a'.length} takes the lengths of those and puts 'a' in front of each
join'ed together, this makes $p='a10a9a11a20a8'

perldoc -f pack
       pack TEMPLATE,LIST
               Takes a LIST of values and converts it into a string using the
               rules given by the TEMPLATE.  The resulting string is the con-
               catenation of the converted values.  Typically, each converted
               value looks like its machine-level representation.  For exam-
               ple, on 32-bit machines a converted integer may be represented
               by a sequence of 4 bytes.

               The TEMPLATE is a sequence of characters that give the order
               and type of values, as follows:

                   a   A string with arbitrary binary data, will be null padded.

so (unpack $p,$_)[4,1,0,3,2] splits the line according to the lengths established by the first line
and returns them in the order fifth, second, first, fourth, third
s/^\s+//; #removes leading space
s/\s+$//; #removes trailing spaces
s/,/\\,/g; #escpaes , with \,
print join','
prints them with commas in between

Avatar of syarra

ASKER

Thanks ozo.