rdavis777
asked on
How can i use awk to remove blank lines after a 1 sentence paragraph
I have a text file with thousands of lines in it. These lines contain names and addresses separated by some number of blank lines.
I'm looking for an awk script that will do the following.
I want to find any instance where there is: 1 non-blank line that is preceded by one or more blank lines and also followed by one or more blank lines, and delete all the blank lines up to the next non-blank line. That is, I want to remove the blank lines between the company name and the address lines. Essentially, you're taking any 1-line paragraph, removing the blank lines that follow it, and combining it with whatever the next set of non-blank lines are. Any single non-blank line in the file preceded by and followed by one or more blank lines is to be considered a company name.
A sample data file looks like this: (the dashed lines are NOT part of the text file).
-------------------------- ---------- ------
2034B Company1Name
company1address1
company1address2
company1address3
3928A Company2Name
company2address1
company2address2
8234B Company3Name
company3address1
company3address2
company3address3
92348B Company4Name
company4address1
company4address2
company4address3
company4address4
5055A Company5Name
company5address1
-------------------------- ---------- ------
So, using the above example, this is my expected output:
-------------------------- ---------- ------
2034B Company1Name
company1address1
company1address2
company1address3
3928A Company2Name
company2address1
company2address2
8234B Company3Name
company3address1
company3address2
company3address3
92348B Company4Name
company4address1
company4address2
company4address3
company4address4
5055A Company5Name
company5address1
-------------------------- ---------- ------
Notice how the lines containing "Company2Name" had 2 blank lines removed after it, and the line containing "Company4Name" had 1 blank line removed after it. And the other "paragraphs" are left untouched, including the multiple consecutive blank lines that separate them.
I'm looking for an awk script that will do the following.
I want to find any instance where there is: 1 non-blank line that is preceded by one or more blank lines and also followed by one or more blank lines, and delete all the blank lines up to the next non-blank line. That is, I want to remove the blank lines between the company name and the address lines. Essentially, you're taking any 1-line paragraph, removing the blank lines that follow it, and combining it with whatever the next set of non-blank lines are. Any single non-blank line in the file preceded by and followed by one or more blank lines is to be considered a company name.
A sample data file looks like this: (the dashed lines are NOT part of the text file).
--------------------------
2034B Company1Name
company1address1
company1address2
company1address3
3928A Company2Name
company2address1
company2address2
8234B Company3Name
company3address1
company3address2
company3address3
92348B Company4Name
company4address1
company4address2
company4address3
company4address4
5055A Company5Name
company5address1
--------------------------
So, using the above example, this is my expected output:
--------------------------
2034B Company1Name
company1address1
company1address2
company1address3
3928A Company2Name
company2address1
company2address2
8234B Company3Name
company3address1
company3address2
company3address3
92348B Company4Name
company4address1
company4address2
company4address3
company4address4
5055A Company5Name
company5address1
--------------------------
Notice how the lines containing "Company2Name" had 2 blank lines removed after it, and the line containing "Company4Name" had 1 blank line removed after it. And the other "paragraphs" are left untouched, including the multiple consecutive blank lines that separate them.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Works great! I only extended [AB] to [A-Z] to accommodate some exceptions in the data.
This would be a lot easier in perl or python, where you could use the signatures of the company name to then remove all blank lines coming after it.