asked on

removing blank lines and making the data as comma separated in text file

Hi,
I have a text file,that i got from running the sql query.There are 5 columns.I need to remove column

names and the "-------" under the column names.I did removed column names and "-------".That leaves 2

blank lines at starting of the file.I have 400 rows of data.After the data there is some text.I removed

that text also.That leaves 1 blank line at the bottom of the file.Can some body please tell me how to

remove those blank lines?

And i need to make that data as comma separated.

Example file: empData.txt

column1 column2 column3 coloum4 column5
--------- --------- --------- -------- --------
abcd N Y Bank of America 580 LA
xyz Y N Bank of Africa 765 DC

I started writing getEmpData.pl as follows.

#!usr/bin/perl

`isql -Uusername -Ppwd -Sserver -isqlfile -oempData.txt`; #this line is to run sql query.

`perl -pi -e s/\.*empno.*//g empData.txt`; # to remove column names
`perl -pi -e s/\-*//g empData.txt`; # to remove ---------
`perl -pi -e s/\.*return.*//g empData.txt`; # to remove some text

I am new to perl.Just learning from two days.Can somebody help me out with this?

Thanks,
syarra.

ozo

perl -ni -e 'print if /./' empData.txt

ozo

{local @ARGV=('empData.txt'); local $^I='';
while( <> ){
next if /empno/; # to remove column names
next if /-+/; # to remove ---------
next if /return/; # to remove some text
s/ +/,/g; # to make that data as comma separated.
# I'm not sure how you want to distingush the spaces between
# "Y" and "Bank" from the spaces beween "Bank" and "of"
# you might want to read the first line of column names to determine the spacing
print;
}
}

syarra

ASKER

Hi ozo,
I have to test your comments tommorow since i am out of office now.The spaces between "Y" and "Bank" are multiple spaces or multiple tabs. The space between "bank" and "of" is only one.
I want data like:

abcd,N,Y,Bank of America,580 LA
xyz,Y,N,Bank of Africa,765 DC

Thanks,
syarra.

ozo

s/[ \t][ \t]+/,/g;

syarra

ASKER

Hi ozo,
I ran the following script.

{local @ARGV=('empData.txt'); local $^I='';
while( <> ){
next if /empno/; # to remove column names
next if /-+/; # to remove ---------
next if /return/; # to remove some text
s/[ \t][ \t]+/,/g; # to make that data as comma separated.
# I'm not sure how you want to distingush the spaces between
# "Y" and "Bank" from the spaces beween "Bank" and "of"
# you might want to read the first line of column names to
# determine the spacing
print;
}
}

I got data like:

abcd,N,Y,Bank of America 580 LA,
xyz,Y,N,Bank of Africa 765 DC,

how can I get comma between "Bank of America" and "580 LA"? (i think there is only one space between "Bank of America" and "580 LA") and
how can i remove the comma after "580 LA"?

Thank you very much
syarra.

syarra

ASKER

Hi ozo,
I removed the comma after "580 LA" with the following.
s/,$//;
so, Now my data looks like:

abcd,N,Y,Bank of America 580 LA
xyz,Y,N,Bank of Africa 765 DC

I need to get comma between "Bank of America" and "580 LA"

Thanks,
syarra.

syarra

ASKER

Hi,
Increased points from 250 to 275.
Thanks,
syarra.

syarra

ASKER

sorry increased to 300

syarra

ASKER

Hi,
I have 3 questions.
(1) How can I get comma between "Bank of America" and "580 LA"? (There is only one space between "Bank of America" and "580 LA")
(2)The data may contain comma i.e, like 580,LA .So i need to escape the commas,if any.
(3)I want to switch the columns i.e, like column5 to position 1 and column3 to position 5 etc.

Can anybody please give me solutions?Its Urgent.
I am increasing points from 300 to 500.

ASKER CERTIFIED SOLUTION

ozo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

syarra

ASKER

ozo,
I got compilation error at line 36:
my $p=join'',map{'a'.length},<>=~/(\S+\s+)/g; # to determine spacing from the column names

syntax error at /home/syarra/getEmpData.pl line 36, near "},"
Execution of /home/syarra/getEmpData.pl aborted due to compilation errors.

Can you look into it why i got this error? You are helping me a lot. I never going to do the coding you did.I am just learning perl from 3 days.Thanks for your help.
I am giving you the points.
syarra.

ozo

Sorry, that should have been
my $p=join'',map{'a'.length}<>=~/(\S+\s+)/g;

syarra

ASKER

Thank you very much ozo. It worked.
syarra.

syarra

ASKER

Hi ozo,
I got some error "Can't open $basePath/empData.txt: No such file or directory at /home/syarra/getEmpData.pl line 9."
line 9 is: my $p=join'',map{'a'.length},<>=~/(\S+\s+)/g;
getEmpData.pl is as following.

#!/usr/bin/perl

$basePath="/home/syarra";

`isql -Uusername -Ppassword -Sservername -i$basePath/sqlquery -o$basePath/empData.txt`;

#assuming the first line of column names lines up with the spacing of the data columns:
{local @ARGV=('$basePath/empData.txt'); local $^I='';
my $p=join'',map{'a'.length},<>=~/(\S+\s+)/g; # to determine spacing from the column names
<>; # to remove ---------
while( <> ){
next if /return/; # to remove some text
chomp;
print join(',',map{s/^\s+//;s/\s+$//;s/,/\\,/g;$_}(unpack $p,$_)[4,1,0,3,2]),"\n";
}
}

I am calling getEmpData.pl from getData.ksh which is as follow.

#! /usr/bin/ksh
basePath="/home/syarra"
filename=$basePath"/getEmpData.pl"
/usr/bin/perl $filename

/home/syarra directory contains

-rw-r--r-- 1 syarra dsdr 235 Aug 30 10:55 getData.ksh
-rw-r--r-- 1 syarra dsdr 3001 Aug 30 10:54 getEmpData.pl

can you give me some idea about why i am getting this error?
Thanks,
syarra.

ozo

local @ARGV=("$basePath/empData.txt");

syarra

ASKER

Thanks ozo.

syarra

ASKER

Hi ozo,
Can you please explain the following lines? or please send any documentation related to following code.

{local @ARGV=('$basePath/empData.txt'); local $^I='';
my $p=join'',map{'a'.length},<>=~/(\S+\s+)/g; # to determine spacing from the column names
<>; # to remove ---------
while( <> ){
next if /return/; # to remove some text
chomp;
print join(',',map{s/^\s+//;s/\s+$//;s/,/\\,/g;$_}(unpack $p,$_)[4,1,0,3,2]),"\n";
}
}

Thanks,
syarra.

ozo

perldoc perlvar
@ARGV The array @ARGV contains the command-line arguments intended
for the script. $#ARGV is generally the number of arguments
minus one, because $ARGV[0] is the first argument, not the pro-
gram's command name itself. See $0 for the command name.

$INPLACE_EDIT
$^I The current value of the inplace-edit extension. Use "undef"
to disable inplace editing. (Mnemonic: value of -i switch.)

perldoc perlrun
-i[extension]
specifies that files processed by the "<>" construct are to be
edited in-place. It does this by renaming the input file, opening
the output file by the original name, and selecting that output
file as the default for print() statements. The extension, if
supplied, is used to modify the name of the old file to make a
backup copy, following these rules:

If no extension is supplied, no backup is made and the current
file is overwritten.

If the extension doesn't contain a "*", then it is appended to the
end of the current filename as a suffix. If the extension does
contain one or more "*" characters, then each "*" is replaced with
the current filename. In Perl terms, you could think of this as:

($backup = $extension) =~ s/\*/$file_name/g;

perldoc perlop
I/O Operators
...
The null filehandle <> is special: it can be used to emulate the behav-
ior of sed and awk. Input from <> comes either from standard input, or
from each file listed on the command line. Here's how it works: the
first time <> is evaluated, the @ARGV array is checked, and if it is
empty, $ARGV[0] is set to "-", which when opened gives you standard
input. The @ARGV array is then processed as a list of filenames. The
loop

while (<>) {
... # code for each line
}

is equivalent to the following Perl-like pseudo code:

unshift(@ARGV, '-') unless @ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}

except that it isn't so cumbersome to say, and will actually work. It
really does shift the @ARGV array and put the current filename into the
$ARGV variable. It also uses filehandle ARGV internally--<> is just a
synonym for <ARGV>, which is magical. (The pseudo code above doesn't
work because it treats <ARGV> as non-magical.)

You can modify @ARGV before the first <> as long as the array ends up
containing the list of filenames you really want. Line numbers ($.)
continue as though the input were one big happy file. See the example
in "eof" in perlfunc for how to reset line numbers on each file.

If you want to set @ARGV to your own list of files, go right ahead.
This sets @ARGV to all plain text files if no @ARGV was given:

@ARGV = grep { -f && -T } glob('*') unless @ARGV;

You can even set them to pipe commands. For example, this automati-
cally filters compressed arguments through gzip:

@ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV;

when <> returns "column1 column2 column3 coloum4 column5
"
/(\S+\s+)/g returns a list of
'column1 ',
'column2 ',
'column3 ',
'coloum4 ',
'column5
'
map{'a'.length} takes the lengths of those and puts 'a' in front of each
join'ed together, this makes $p='a10a9a11a20a8'

perldoc -f pack
pack TEMPLATE,LIST
Takes a LIST of values and converts it into a string using the
rules given by the TEMPLATE. The resulting string is the con-
catenation of the converted values. Typically, each converted
value looks like its machine-level representation. For exam-
ple, on 32-bit machines a converted integer may be represented
by a sequence of 4 bytes.

The TEMPLATE is a sequence of characters that give the order
and type of values, as follows:

a A string with arbitrary binary data, will be null padded.

so (unpack $p,$_)[4,1,0,3,2] splits the line according to the lengths established by the first line
and returns them in the order fifth, second, first, fourth, third
s/^\s+//; #removes leading space
s/\s+$//; #removes trailing spaces
s/,/\\,/g; #escpaes , with \,
print join','
prints them with commas in between

syarra

ASKER

Thanks ozo.