Link to home
Start Free TrialLog in
Avatar of enthuguy
enthuguyFlag for Australia

asked on

Remove first and last non-empty line from a huge file on linux

Hi,
Would like to remove first line and last non-empty line from a file.

Since I'm expecting very large files, could you suggest me an efficient way of performing this please. It would be good if we can achieve in one liner script.
I saw few examples but they redirect the file to another new file after removing.

please help

# cat input.txt
header_row
one
two
three
four
five
footer_row
<blankline>
<blankline>

Open in new window


# after removing
# cat input.txt
one
two
three
four
five

Open in new window

Avatar of noci
noci

remove the first line:   tail -n +2 filename
remove the last line:   head -n -1 filename
remove ALL blank lines:     egrep -v '^$'  filename
removing trailing blank lines will need some more scripting.

Both in one run:
tail -n +2 filename | head -n -1
Avatar of enthuguy

ASKER

Thanks @noci, testing on my side
@Noci, Thanks.....may be very close :)
It didnt remove the footer part. Please check below

[ec2-user@ip-10-157-60-218 tmp]$ cat input.txt
header_row
one
two
three
four
five
footer_row



[ec2-user@ip-10-157-60-218 tmp]$ tail -n +2 input.txt | head -n -1
one
two
three
four
five
footer_row


[ec2-user@ip-10-157-60-218 tmp]$

Open in new window

At this moment I'm using this but even this is not handling last blank lines.

If no blank lines, this will works fine. I'm taking risk here :(

sed -i '1d' ${NEWFILETMP} && sed -i '$d' ${NEWFILETMP}

Open in new window

you can also use an ed script...

ed filename  <<CMDEND
1d
$
?^.?,$d
1,$p
q
q
CMDEND

Open in new window

Hi enthuguy,

"Would like to remove first line and last non-empty line from a file."
Do you mean "all trailing blank lines + 1 line before them (i.e. the footer)"?

Your example probably makes this clear, but your description should match it.
Have you tested that ed script, noci?  Did it work for you?
sorry tel2, noci
you are right "all trailing blank lines + 1 line before them (i.e. the footer)"
yes, the $ get squashed after adding the <<CMDEND.. block (forgot to type the \ before it)

this works better but does change the source file (it does make a backup of the source)
#!/bin/bash
if [ -f "$1" ]
then
  cp "$1" "$1"~
  echo >>"$1"
  ed "$1" <<CMDEND
\$
2,?^.?-1w 
q
q
CMDEND
else
  echo usage: $0 filename
fi
exit 

Open in new window


Save the above as f.e. trimfile, chmod +x trimfile and it can be run with ./trimfile filename

Thanks noci, I might try that later.  What do you mean by "f.e."?

enthuguy,
1. Could there be any blank lines anywhere in the input file, which you want to retain in the output file?
2. Approximately how large could the file be, maximum, in MB?
thanks Noci for the snippet.

@tel2,
1. I think, we can expect blank lines only after "footer_row" nothing inbetween. If there are any, we dont have retain in the output file.
2. approx 400MB :)

thanks again for your time and help noci and tel2
In that case, noci's grep should be fine for you.  This is basically what noci was suggesting in his original post:
grep -v ^$ input.txt | head -n -1 | tail -n +2 >output.txt
mv output.txt input.txt

Open in new window

Nice and reasonably simple.  Tested and seems to work.
Any problems with that?

noci, I tried your bash+ed script from your last post, and it seems to work.  Any ideas why it sends this kind of thing to STDERR?:
52

24

Open in new window

@tel2, enthuguy
Indeed:
grep -v ^$ input.txt | head -n -1 | tail -n +2 >output.t

Open in new window

was intended

@tel2:
The numbers are the amount of bytes read and written... (allas to stdout).

@enthuguy:
f.e. means you can use trimfile, but you can also use another name.
easy one liner

sed -i.tmp -e '
  1 { /^[[:space:]]*$/ d }  
  $ { /^[[:space:]]*$/ d }
'

Open in new window

... but this is not efficient.

performing it efficiently would be done using fallocate by truncating the beginning of the file if the filesystem supports it : something along these lines

# grab first line
first_line="`head -n1 "$FILE"`"
# remove first line unless it is empty
expr "$first_line" : '^[[:space:]]*$' >/dev/null \
|| fallocate -p -o 0 -l `expr "$first_line" : '.*'`

# grab last line
last_line="`tac "$FILE" | head -n 1`"
# grab the file size
filesize=`stat --printf="%s" "$FILE"`
# remove last line if non empty
expr "$last_line" : '^[[:space:]]*$' >/dev/null \
|| fallocate -p -o $(expr $filesize - `expr "$last_line" : '.*'`) -l `expr "$last_line" : '.*'`

Open in new window


if you need to skip through empty lines and remove the first non empty, logic is pretty much the same.
in that case, you need to make clear whether you want to preserve the empty lines or not.

it is also feasible with sed but more complex to handle the last line if the file is huge

Hi skullnobrains,

Your sed solution seems to delete the 1st and last line only if those lines consist of 0 or more whitespace characters.  If you look at enthuguy's description and sample input & output, I don't think that's what he's after.  For starters, the 1st line has to be deleted regardless of what it contains.
Assuming enthuguy's definition of a blank line is one which doesn't even contain spaces (and I assume that based on his reference to "non-empty" lines), then I expect this would do the job if he wanted a sed solution:
    sed '1d;/^$/d' input.txt | sed '$d'
But I don't know how to do that in a single sed command, do you?

I've never heard of fallocate, but I've now had a look at the man page.  Looks like an interesting option for those with a filesystem that supports it, thanks.
ASKER CERTIFIED SOLUTION
Avatar of skullnobrains
skullnobrains

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks so much everyone, really appreciated it, I have one more than one solution :)