We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you two Citrix podcasts. Learn about 2020 trends and get answers to your biggest Citrix questions!Listen Now

x

Bash shell script - AWK Remove blank lines from text file while stripping certain characters

cybrthug
cybrthug asked
on
Medium Priority
30,184 Views
Last Modified: 2011-08-18
I have a text file im trying to parse out information after a certain line with a keyword. After this keyword all the information
below it is important to me, but for some reason its stripping all the escape characters or line characters that im needing
to process the information with another script. The other script I run takes this information and makes it delimited so I can
import into mysql. While peeking at the file with cat -vet I need it to look like this:

awk '/../&&d{print}/QUANTITY/{d=1}' test1.txt > test2.txt


FILE: test1.txt ORIGINAL FILE
---------------------------------------------------------------------
TABULATION^M$
QUANTITY^M$
^M$
^M$
    11111111  GIANTS                                   222.000     TON^M$
 7777      Acme Inc.                                           6.5000        3,425.50        3,425.50^M$
 8888      Pipe Ind, Inc.                                      1.0000          527.00          527.00^M$
^M$
    22222222  BEARS                                    324.000     TON^M$
7777      Acme, Inc.                                         148.3800        2,522.46        2,522.46^M$
8888      Pipe Ind, Inc..                               120.0000        2,040.00        2,040.00^M$
-----------------------------------------------------------------------

With awk it leaves in ^M$ after every line like the following and my other script wont read it right.

test2.txt comes out like this:

FILE: test2.txt
---------------------------------------------------------------------
    11111111  GIANTS                                   222.000     TON^M$
 7777      Acme Inc.                                           6.5000        3,425.50        3,425.50^M$
 8888      Pipe Ind, Inc.                                      1.0000          527.00          527.00^M$
    22222222  BEARS                                    324.000     TON^M$
7777      Acme, Inc.                                         148.3800        2,522.46        2,522.46^M$
8888      Pipe Ind, Inc..                               120.0000        2,040.00        2,040.00^M$
-----------------------------------------------------------------------


What I need test2.txt to look like is the following, I need to remove the ^M but leave the $ at the end of each line, ALSO!
very important for me to process I need to remove the 2 blank lines before the information like you see in the original
file after the word QUANTITY. It must look like this for my other script to work properly.

FILE: test2.txt
---------------------------------------------------------------------
    11111111  GIANTS                                   222.000     TON$
 7777      Acme Inc.                                           6.5000        3,425.50        3,425.50$
 8888      Pipe Ind, Inc.                                      1.0000          527.00          527.00$
$
    22222222  BEARS                                    324.000     TON$
7777      Acme, Inc.                                         148.3800        2,522.46        2,522.46$
8888      Pipe Ind, Inc..                               120.0000        2,040.00        2,040.00$
-----------------------------------------------------------------------

Thanks in advance for any help you can offer.

Comment
Watch Question

are ^M the two literal characters ^ and M, or is this a copy&paste from vi where ^M represents the carriage-return charater?

awk '/QUANTITY/{p=1;next}{p++}($1~/^\^M\$$/&&p<4){next}(p>3){print}' test1.txt|sed 's/\^M\$$/$/'

Author

Commented:
I believe ^M$ is the carriage-return character, but i need to remove the ^M and only have $ ending on each line.

Author

Commented:
If I use pico to edit the file with the ^M$ return character and resave it, I get the $ only at the end of the line, but I need to process this at the command line and not edit every single file.
> I believe ^M$ ..
that is not sufficient, you have to be 101% sure, no doubt at all.
PLease check with od -c

Author

Commented:
With od -c I get  \r  \n at the end of each line.
CERTIFIED EXPERT
Most Valuable Expert 2014
Top Expert 2015
Commented:
awk '{gsub(/\r/,"");print}'

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts
and the $ is a real character, or was it the "end of line" marker of your editor?

to get rid of the \r (aka ^M aka Ctrl-M) use:
  tr -d '\015' <test1.txt

ozo, you need gawk, nawk for that ;-)

cybrthug, do you have awk, or any of gawk, nawk? check with awk -v

Author

Commented:
Ahoffmann, appreciate the responses, but ozo hit it on the head, again :) You are the bomb ozo!
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.