Avatar of enthuguy
enthuguyFlag for Australia asked on

Remove first and last non-empty line from a huge file on linux

Hi,
Would like to remove first line and last non-empty line from a file.

Since I'm expecting very large files, could you suggest me an efficient way of performing this please. It would be good if we can achieve in one liner script.
I saw few examples but they redirect the file to another new file after removing.

please help

# cat input.txt
header_row
one
two
three
four
five
footer_row
<blankline>
<blankline>

Open in new window


# after removing
# cat input.txt
one
two
three
four
five

Open in new window

Linux* BashShell Scripting* AWK* sed

Avatar of undefined
Last Comment
enthuguy

8/22/2022 - Mon
noci

remove the first line:   tail -n +2 filename
remove the last line:   head -n -1 filename
remove ALL blank lines:     egrep -v '^$'  filename
removing trailing blank lines will need some more scripting.

Both in one run:
tail -n +2 filename | head -n -1
ASKER
enthuguy

Thanks @noci, testing on my side
ASKER
enthuguy

@Noci, Thanks.....may be very close :)
It didnt remove the footer part. Please check below

[ec2-user@ip-10-157-60-218 tmp]$ cat input.txt
header_row
one
two
three
four
five
footer_row



[ec2-user@ip-10-157-60-218 tmp]$ tail -n +2 input.txt | head -n -1
one
two
three
four
five
footer_row


[ec2-user@ip-10-157-60-218 tmp]$

Open in new window

Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
ASKER
enthuguy

At this moment I'm using this but even this is not handling last blank lines.

If no blank lines, this will works fine. I'm taking risk here :(

sed -i '1d' ${NEWFILETMP} && sed -i '$d' ${NEWFILETMP}

Open in new window

noci

you can also use an ed script...

ed filename  <<CMDEND
1d
$
?^.?,$d
1,$p
q
q
CMDEND

Open in new window

tel2

Hi enthuguy,

"Would like to remove first line and last non-empty line from a file."
Do you mean "all trailing blank lines + 1 line before them (i.e. the footer)"?

Your example probably makes this clear, but your description should match it.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
tel2

Have you tested that ed script, noci?  Did it work for you?
ASKER
enthuguy

sorry tel2, noci
you are right "all trailing blank lines + 1 line before them (i.e. the footer)"
noci

yes, the $ get squashed after adding the <<CMDEND.. block (forgot to type the \ before it)

this works better but does change the source file (it does make a backup of the source)
#!/bin/bash
if [ -f "$1" ]
then
  cp "$1" "$1"~
  echo >>"$1"
  ed "$1" <<CMDEND
\$
2,?^.?-1w 
q
q
CMDEND
else
  echo usage: $0 filename
fi
exit 

Open in new window


Save the above as f.e. trimfile, chmod +x trimfile and it can be run with ./trimfile filename

All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
tel2

Thanks noci, I might try that later.  What do you mean by "f.e."?

enthuguy,
1. Could there be any blank lines anywhere in the input file, which you want to retain in the output file?
2. Approximately how large could the file be, maximum, in MB?
ASKER
enthuguy

thanks Noci for the snippet.

@tel2,
1. I think, we can expect blank lines only after "footer_row" nothing inbetween. If there are any, we dont have retain in the output file.
2. approx 400MB :)

thanks again for your time and help noci and tel2
tel2

In that case, noci's grep should be fine for you.  This is basically what noci was suggesting in his original post:
grep -v ^$ input.txt | head -n -1 | tail -n +2 >output.txt
mv output.txt input.txt

Open in new window

Nice and reasonably simple.  Tested and seems to work.
Any problems with that?

noci, I tried your bash+ed script from your last post, and it seems to work.  Any ideas why it sends this kind of thing to STDERR?:
52

24

Open in new window

Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
noci

@tel2, enthuguy
Indeed:
grep -v ^$ input.txt | head -n -1 | tail -n +2 >output.t

Open in new window

was intended

@tel2:
The numbers are the amount of bytes read and written... (allas to stdout).

@enthuguy:
f.e. means you can use trimfile, but you can also use another name.
skullnobrains

easy one liner

sed -i.tmp -e '
  1 { /^[[:space:]]*$/ d }  
  $ { /^[[:space:]]*$/ d }
'

Open in new window

... but this is not efficient.

performing it efficiently would be done using fallocate by truncating the beginning of the file if the filesystem supports it : something along these lines

# grab first line
first_line="`head -n1 "$FILE"`"
# remove first line unless it is empty
expr "$first_line" : '^[[:space:]]*$' >/dev/null \
|| fallocate -p -o 0 -l `expr "$first_line" : '.*'`

# grab last line
last_line="`tac "$FILE" | head -n 1`"
# grab the file size
filesize=`stat --printf="%s" "$FILE"`
# remove last line if non empty
expr "$last_line" : '^[[:space:]]*$' >/dev/null \
|| fallocate -p -o $(expr $filesize - `expr "$last_line" : '.*'`) -l `expr "$last_line" : '.*'`

Open in new window


if you need to skip through empty lines and remove the first non empty, logic is pretty much the same.
in that case, you need to make clear whether you want to preserve the empty lines or not.

it is also feasible with sed but more complex to handle the last line if the file is huge

tel2

Hi skullnobrains,

Your sed solution seems to delete the 1st and last line only if those lines consist of 0 or more whitespace characters.  If you look at enthuguy's description and sample input & output, I don't think that's what he's after.  For starters, the 1st line has to be deleted regardless of what it contains.
Assuming enthuguy's definition of a blank line is one which doesn't even contain spaces (and I assume that based on his reference to "non-empty" lines), then I expect this would do the job if he wanted a sed solution:
    sed '1d;/^$/d' input.txt | sed '$d'
But I don't know how to do that in a single sed command, do you?

I've never heard of fallocate, but I've now had a look at the man page.  Looks like an interesting option for those with a filesystem that supports it, thanks.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
ASKER CERTIFIED SOLUTION
skullnobrains

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
See how we're fighting big data
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
ASKER
enthuguy

Thanks so much everyone, really appreciated it, I have one more than one solution :)