Link to home
Create AccountLog in
Avatar of cybrthug
cybrthug

asked on

Bash shell script - AWK Remove blank lines from text file while stripping certain characters

I have a text file im trying to parse out information after a certain line with a keyword. After this keyword all the information
below it is important to me, but for some reason its stripping all the escape characters or line characters that im needing
to process the information with another script. The other script I run takes this information and makes it delimited so I can
import into mysql. While peeking at the file with cat -vet I need it to look like this:

awk '/../&&d{print}/QUANTITY/{d=1}' test1.txt > test2.txt


FILE: test1.txt ORIGINAL FILE
---------------------------------------------------------------------
TABULATION^M$
QUANTITY^M$
^M$
^M$
    11111111  GIANTS                                   222.000     TON^M$
 7777      Acme Inc.                                           6.5000        3,425.50        3,425.50^M$
 8888      Pipe Ind, Inc.                                      1.0000          527.00          527.00^M$
^M$
    22222222  BEARS                                    324.000     TON^M$
7777      Acme, Inc.                                         148.3800        2,522.46        2,522.46^M$
8888      Pipe Ind, Inc..                               120.0000        2,040.00        2,040.00^M$
-----------------------------------------------------------------------

With awk it leaves in ^M$ after every line like the following and my other script wont read it right.

test2.txt comes out like this:

FILE: test2.txt
---------------------------------------------------------------------
    11111111  GIANTS                                   222.000     TON^M$
 7777      Acme Inc.                                           6.5000        3,425.50        3,425.50^M$
 8888      Pipe Ind, Inc.                                      1.0000          527.00          527.00^M$
    22222222  BEARS                                    324.000     TON^M$
7777      Acme, Inc.                                         148.3800        2,522.46        2,522.46^M$
8888      Pipe Ind, Inc..                               120.0000        2,040.00        2,040.00^M$
-----------------------------------------------------------------------


What I need test2.txt to look like is the following, I need to remove the ^M but leave the $ at the end of each line, ALSO!
very important for me to process I need to remove the 2 blank lines before the information like you see in the original
file after the word QUANTITY. It must look like this for my other script to work properly.

FILE: test2.txt
---------------------------------------------------------------------
    11111111  GIANTS                                   222.000     TON$
 7777      Acme Inc.                                           6.5000        3,425.50        3,425.50$
 8888      Pipe Ind, Inc.                                      1.0000          527.00          527.00$
$
    22222222  BEARS                                    324.000     TON$
7777      Acme, Inc.                                         148.3800        2,522.46        2,522.46$
8888      Pipe Ind, Inc..                               120.0000        2,040.00        2,040.00$
-----------------------------------------------------------------------

Thanks in advance for any help you can offer.

Avatar of ahoffmann
ahoffmann
Flag of Germany image

are ^M the two literal characters ^ and M, or is this a copy&paste from vi where ^M represents the carriage-return charater?

awk '/QUANTITY/{p=1;next}{p++}($1~/^\^M\$$/&&p<4){next}(p>3){print}' test1.txt|sed 's/\^M\$$/$/'
Avatar of cybrthug
cybrthug

ASKER

I believe ^M$ is the carriage-return character, but i need to remove the ^M and only have $ ending on each line.
If I use pico to edit the file with the ^M$ return character and resave it, I get the $ only at the end of the line, but I need to process this at the command line and not edit every single file.
> I believe ^M$ ..
that is not sufficient, you have to be 101% sure, no doubt at all.
PLease check with od -c
With od -c I get  \r  \n at the end of each line.
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
and the $ is a real character, or was it the "end of line" marker of your editor?

to get rid of the \r (aka ^M aka Ctrl-M) use:
  tr -d '\015' <test1.txt

ozo, you need gawk, nawk for that ;-)

cybrthug, do you have awk, or any of gawk, nawk? check with awk -v
Ahoffmann, appreciate the responses, but ozo hit it on the head, again :) You are the bomb ozo!