Link to home
Start Free TrialLog in
Avatar of rdavis777
rdavis777

asked on

How can i use awk to remove blank lines after a 1 sentence paragraph

I have a text file with thousands of lines in it.  These lines contain names and addresses separated by some number of blank lines.  

I'm looking for an awk script that will do the following.

I want to find any instance where there is: 1 non-blank line that is preceded by one or more blank lines and also followed by one or more blank lines, and delete all the blank lines up to the next non-blank line.  That is, I want to remove the blank lines between the company name and the address lines.  Essentially, you're taking any 1-line paragraph, removing the blank lines that follow it, and combining it with whatever the next set of non-blank lines are.  Any single non-blank line in the file preceded by and followed by one or more blank lines is to be considered a company name.

A sample data file looks like this:  (the dashed lines are NOT part of the text file).
------------------------------------------

2034B Company1Name
company1address1
company1address2
company1address3



3928A Company2Name


company2address1
company2address2


8234B Company3Name
company3address1
company3address2
company3address3

92348B Company4Name

company4address1
company4address2
company4address3
company4address4

5055A Company5Name
company5address1
------------------------------------------

So, using the above example, this is my expected output:

------------------------------------------

2034B Company1Name
company1address1
company1address2
company1address3



3928A Company2Name
company2address1
company2address2


8234B Company3Name
company3address1
company3address2
company3address3

92348B Company4Name
company4address1
company4address2
company4address3
company4address4

5055A Company5Name
company5address1
------------------------------------------

Notice how the lines containing "Company2Name" had 2 blank lines removed after it, and the line containing "Company4Name" had 1 blank line removed after it. And the other "paragraphs" are left untouched, including the multiple consecutive blank lines that separate them.
Avatar of Tachion
Tachion
Flag of United States of America image

Are you sure you want to do this with awk?
This would be a lot easier in perl or python, where you could use the signatures of the company name to then remove all blank lines coming after it.
ASKER CERTIFIED SOLUTION
Avatar of Tachion
Tachion
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of rdavis777
rdavis777

ASKER

Works great!  I only extended [AB] to [A-Z] to accommodate some exceptions in the data.