We help IT Professionals succeed at work.

Remove first and last non-empty line from a huge file on linux

enthuguy
enthuguy asked
on
63 Views
Last Modified: 2020-07-24
Hi,
Would like to remove first line and last non-empty line from a file.

Since I'm expecting very large files, could you suggest me an efficient way of performing this please. It would be good if we can achieve in one liner script.
I saw few examples but they redirect the file to another new file after removing.

please help

# cat input.txt
header_row
one
two
three
four
five
footer_row
<blankline>
<blankline>

Open in new window


# after removing
# cat input.txt
one
two
three
four
five

Open in new window

Comment
Watch Question

nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
remove the first line:   tail -n +2 filename
remove the last line:   head -n -1 filename
remove ALL blank lines:     egrep -v '^$'  filename
removing trailing blank lines will need some more scripting.

Both in one run:
tail -n +2 filename | head -n -1

Author

Commented:
Thanks @noci, testing on my side

Author

Commented:
@Noci, Thanks.....may be very close :)
It didnt remove the footer part. Please check below

[ec2-user@ip-10-157-60-218 tmp]$ cat input.txt
header_row
one
two
three
four
five
footer_row



[ec2-user@ip-10-157-60-218 tmp]$ tail -n +2 input.txt | head -n -1
one
two
three
four
five
footer_row


[ec2-user@ip-10-157-60-218 tmp]$

Author

Commented:
At this moment I'm using this but even this is not handling last blank lines.

If no blank lines, this will works fine. I'm taking risk here :(

sed -i '1d' ${NEWFILETMP} && sed -i '$d' ${NEWFILETMP}
nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
you can also use an ed script...

ed filename  <<CMDEND
1d
$
?^.?,$d
1,$p
q
q
CMDEND

CERTIFIED EXPERT

Commented:
Hi enthuguy,

"Would like to remove first line and last non-empty line from a file."
Do you mean "all trailing blank lines + 1 line before them (i.e. the footer)"?

Your example probably makes this clear, but your description should match it.
CERTIFIED EXPERT

Commented:
Have you tested that ed script, noci?  Did it work for you?

Author

Commented:
sorry tel2, noci
you are right "all trailing blank lines + 1 line before them (i.e. the footer)"
nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
yes, the $ get squashed after adding the <<CMDEND.. block (forgot to type the \ before it)

this works better but does change the source file (it does make a backup of the source)
#!/bin/bash
if [ -f "$1" ]
then
  cp "$1" "$1"~
  echo >>"$1"
  ed "$1" <<CMDEND
\$
2,?^.?-1w 
q
q
CMDEND
else
  echo usage: $0 filename
fi
exit 

Save the above as f.e. trimfile, chmod +x trimfile and it can be run with ./trimfile filename

CERTIFIED EXPERT

Commented:
Thanks noci, I might try that later.  What do you mean by "f.e."?

enthuguy,
1. Could there be any blank lines anywhere in the input file, which you want to retain in the output file?
2. Approximately how large could the file be, maximum, in MB?

Author

Commented:
thanks Noci for the snippet.

@tel2,
1. I think, we can expect blank lines only after "footer_row" nothing inbetween. If there are any, we dont have retain in the output file.
2. approx 400MB :)

thanks again for your time and help noci and tel2
CERTIFIED EXPERT

Commented:
In that case, noci's grep should be fine for you.  This is basically what noci was suggesting in his original post:
grep -v ^$ input.txt | head -n -1 | tail -n +2 >output.txt
mv output.txt input.txt
Nice and reasonably simple.  Tested and seems to work.
Any problems with that?

noci, I tried your bash+ed script from your last post, and it seems to work.  Any ideas why it sends this kind of thing to STDERR?:
52

24
nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
@tel2, enthuguy
Indeed:
grep -v ^$ input.txt | head -n -1 | tail -n +2 >output.t
was intended

@tel2:
The numbers are the amount of bytes read and written... (allas to stdout).

@enthuguy:
f.e. means you can use trimfile, but you can also use another name.
CERTIFIED EXPERT

Commented:
easy one liner

sed -i.tmp -e '
  1 { /^[[:space:]]*$/ d }  
  $ { /^[[:space:]]*$/ d }
'
... but this is not efficient.

performing it efficiently would be done using fallocate by truncating the beginning of the file if the filesystem supports it : something along these lines

# grab first line
first_line="`head -n1 "$FILE"`"
# remove first line unless it is empty
expr "$first_line" : '^[[:space:]]*$' >/dev/null \
|| fallocate -p -o 0 -l `expr "$first_line" : '.*'`

# grab last line
last_line="`tac "$FILE" | head -n 1`"
# grab the file size
filesize=`stat --printf="%s" "$FILE"`
# remove last line if non empty
expr "$last_line" : '^[[:space:]]*$' >/dev/null \
|| fallocate -p -o $(expr $filesize - `expr "$last_line" : '.*'`) -l `expr "$last_line" : '.*'`


if you need to skip through empty lines and remove the first non empty, logic is pretty much the same.
in that case, you need to make clear whether you want to preserve the empty lines or not.

it is also feasible with sed but more complex to handle the last line if the file is huge

CERTIFIED EXPERT

Commented:
Hi skullnobrains,

Your sed solution seems to delete the 1st and last line only if those lines consist of 0 or more whitespace characters.  If you look at enthuguy's description and sample input & output, I don't think that's what he's after.  For starters, the 1st line has to be deleted regardless of what it contains.
Assuming enthuguy's definition of a blank line is one which doesn't even contain spaces (and I assume that based on his reference to "non-empty" lines), then I expect this would do the job if he wanted a sed solution:
    sed '1d;/^$/d' input.txt | sed '$d'
But I don't know how to do that in a single sed command, do you?

I've never heard of fallocate, but I've now had a look at the man page.  Looks like an interesting option for those with a filesystem that supports it, thanks.
CERTIFIED EXPERT
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Author

Commented:
Thanks so much everyone, really appreciated it, I have one more than one solution :)
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.