cybrthug
asked on
Bash shell script - AWK Remove blank lines from text file while stripping certain characters
I have a text file im trying to parse out information after a certain line with a keyword. After this keyword all the information
below it is important to me, but for some reason its stripping all the escape characters or line characters that im needing
to process the information with another script. The other script I run takes this information and makes it delimited so I can
import into mysql. While peeking at the file with cat -vet I need it to look like this:
awk '/../&&d{print}/QUANTITY/{ d=1}' test1.txt > test2.txt
FILE: test1.txt ORIGINAL FILE
-------------------------- ---------- ---------- ---------- ---------- ---
TABULATION^M$
QUANTITY^M$
^M$
^M$
11111111 GIANTS 222.000 TON^M$
7777 Acme Inc. 6.5000 3,425.50 3,425.50^M$
8888 Pipe Ind, Inc. 1.0000 527.00 527.00^M$
^M$
22222222 BEARS 324.000 TON^M$
7777 Acme, Inc. 148.3800 2,522.46 2,522.46^M$
8888 Pipe Ind, Inc.. 120.0000 2,040.00 2,040.00^M$
-------------------------- ---------- ---------- ---------- ---------- -----
With awk it leaves in ^M$ after every line like the following and my other script wont read it right.
test2.txt comes out like this:
FILE: test2.txt
-------------------------- ---------- ---------- ---------- ---------- ---
11111111 GIANTS 222.000 TON^M$
7777 Acme Inc. 6.5000 3,425.50 3,425.50^M$
8888 Pipe Ind, Inc. 1.0000 527.00 527.00^M$
22222222 BEARS 324.000 TON^M$
7777 Acme, Inc. 148.3800 2,522.46 2,522.46^M$
8888 Pipe Ind, Inc.. 120.0000 2,040.00 2,040.00^M$
-------------------------- ---------- ---------- ---------- ---------- -----
What I need test2.txt to look like is the following, I need to remove the ^M but leave the $ at the end of each line, ALSO!
very important for me to process I need to remove the 2 blank lines before the information like you see in the original
file after the word QUANTITY. It must look like this for my other script to work properly.
FILE: test2.txt
-------------------------- ---------- ---------- ---------- ---------- ---
11111111 GIANTS 222.000 TON$
7777 Acme Inc. 6.5000 3,425.50 3,425.50$
8888 Pipe Ind, Inc. 1.0000 527.00 527.00$
$
22222222 BEARS 324.000 TON$
7777 Acme, Inc. 148.3800 2,522.46 2,522.46$
8888 Pipe Ind, Inc.. 120.0000 2,040.00 2,040.00$
-------------------------- ---------- ---------- ---------- ---------- -----
Thanks in advance for any help you can offer.
below it is important to me, but for some reason its stripping all the escape characters or line characters that im needing
to process the information with another script. The other script I run takes this information and makes it delimited so I can
import into mysql. While peeking at the file with cat -vet I need it to look like this:
awk '/../&&d{print}/QUANTITY/{
FILE: test1.txt ORIGINAL FILE
--------------------------
TABULATION^M$
QUANTITY^M$
^M$
^M$
11111111 GIANTS 222.000 TON^M$
7777 Acme Inc. 6.5000 3,425.50 3,425.50^M$
8888 Pipe Ind, Inc. 1.0000 527.00 527.00^M$
^M$
22222222 BEARS 324.000 TON^M$
7777 Acme, Inc. 148.3800 2,522.46 2,522.46^M$
8888 Pipe Ind, Inc.. 120.0000 2,040.00 2,040.00^M$
--------------------------
With awk it leaves in ^M$ after every line like the following and my other script wont read it right.
test2.txt comes out like this:
FILE: test2.txt
--------------------------
11111111 GIANTS 222.000 TON^M$
7777 Acme Inc. 6.5000 3,425.50 3,425.50^M$
8888 Pipe Ind, Inc. 1.0000 527.00 527.00^M$
22222222 BEARS 324.000 TON^M$
7777 Acme, Inc. 148.3800 2,522.46 2,522.46^M$
8888 Pipe Ind, Inc.. 120.0000 2,040.00 2,040.00^M$
--------------------------
What I need test2.txt to look like is the following, I need to remove the ^M but leave the $ at the end of each line, ALSO!
very important for me to process I need to remove the 2 blank lines before the information like you see in the original
file after the word QUANTITY. It must look like this for my other script to work properly.
FILE: test2.txt
--------------------------
11111111 GIANTS 222.000 TON$
7777 Acme Inc. 6.5000 3,425.50 3,425.50$
8888 Pipe Ind, Inc. 1.0000 527.00 527.00$
$
22222222 BEARS 324.000 TON$
7777 Acme, Inc. 148.3800 2,522.46 2,522.46$
8888 Pipe Ind, Inc.. 120.0000 2,040.00 2,040.00$
--------------------------
Thanks in advance for any help you can offer.
ASKER
I believe ^M$ is the carriage-return character, but i need to remove the ^M and only have $ ending on each line.
ASKER
If I use pico to edit the file with the ^M$ return character and resave it, I get the $ only at the end of the line, but I need to process this at the command line and not edit every single file.
> I believe ^M$ ..
that is not sufficient, you have to be 101% sure, no doubt at all.
PLease check with od -c
that is not sufficient, you have to be 101% sure, no doubt at all.
PLease check with od -c
ASKER
With od -c I get \r \n at the end of each line.
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
and the $ is a real character, or was it the "end of line" marker of your editor?
to get rid of the \r (aka ^M aka Ctrl-M) use:
tr -d '\015' <test1.txt
ozo, you need gawk, nawk for that ;-)
cybrthug, do you have awk, or any of gawk, nawk? check with awk -v
to get rid of the \r (aka ^M aka Ctrl-M) use:
tr -d '\015' <test1.txt
ozo, you need gawk, nawk for that ;-)
cybrthug, do you have awk, or any of gawk, nawk? check with awk -v
ASKER
Ahoffmann, appreciate the responses, but ozo hit it on the head, again :) You are the bomb ozo!
awk '/QUANTITY/{p=1;next}{p++}