Solved

How to remove header and footr from a file in Unix?

Posted on 2006-11-07
19
3,382 Views
Last Modified: 2013-12-26
How to remove header and footer from a file in Unix?

eg:file f1.txt has

header|Test|Test
Record1
Record2
Record3
Footer|Test|Date


output of f1.txt should be:
Record1
Record2
Record3


With using a single statement at the Unix Prompt,
what is the fastest method to get the resulted output?
0
Comment
Question by:vihar123
  • 5
  • 3
  • 3
  • +3
19 Comments
 
LVL 58

Accepted Solution

by:
amit_g earned 100 total points
ID: 17893352
sed -e '1d;$d' f1.txt > /tmp/f1_$$.txt && mv /tmp/f1_$$.txt f1.txt

If you sed supports inplace editing, you can do

sed -i -e '1d;$d' f1.txt

if you want to take a backup also

sed -i.bak -e '1d;$d' f1.txt
0
 

Author Comment

by:vihar123
ID: 17894032
Amit,

can u plz get me more info about this statement:

sed -e '1d;$d' f1.txt > /tmp/f1_$$.txt && mv /tmp/f1_$$.txt f1.txt


'-e 1d and $d' means: deleting first and last line.
can u explain other stuff..

Thanks
0
 
LVL 58

Expert Comment

by:amit_g
ID: 17894052
sed is stream editor. -e option passes the command/script that sed executes. 1d deletes first line and $d deletes last line. That is what you want. By default the sed reads from given file and produces output on screen (stdout). So you save that output to a temp file /tmp/f1_$$.txtand later move that file back to the original one. Alternatively if your sed supports in place editing (i.e. the commands run on the same file and the same file is edited instead of outputed to screen) you could use that and not need > /tmp/f1_$$.txt && mv /tmp/f1_$$.txt f1.txt. I would try the last one first

sed -i.bak -e '1d;$d' f1.txt

and if that works, you have edited file f1.txt and a backup of original file as f1.txt.bak
0
 

Author Comment

by:vihar123
ID: 17918956
due to some constraints,i cant use "sed" command.

i have a file of 10 million rows in a file, and need to delete the header and footer.
i guess, performance wise,is i use sed command.

what do u say?

am i right?

Thanks


0
 
LVL 58

Expert Comment

by:amit_g
ID: 17918976
Well, I have never used these tools on such huge file myself so I can't say how it would behave. I would suggest to try it on one and see how it works. Whatever tool you use, it is going to read whole file and write it to another one unless you want to write your own. Do you have to do this on a regular basis or just once?
0
 
LVL 84

Expert Comment

by:ozo
ID: 17918982
perl -MTie::File -e 'tie @a, 'Tie::File', "f1.txt" or die $!; pop @a; shift @a'

0
 
LVL 51

Assisted Solution

by:ahoffmann
ahoffmann earned 100 total points
ID: 17922285
> .. i have a file of 10 million rows
you need to use perl or write your own filter in a language which compiles to native binary

if you have gawk you can try:
gawk '(NR==1){next;}(getline==1){print}' your file
(but it also removes empty lines)
0
 
LVL 48

Assisted Solution

by:Tintin
Tintin earned 100 total points
ID: 17926701
To test performance between the various solutions, use the time command,e g:

time sed -e '1d;$d' f1.txt > /tmp/f1_$$.txt && mv /tmp/f1_$$.txt f1.txt

Another solution (although you'd need to test the performance) is to use grep.  Assuming your records don't contain pipe symbols, then do

time grep -v '|' f1.txt >/tmp/$$ && mv /tmp/$$ f1.txt

0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 51

Expert Comment

by:ahoffmann
ID: 17928450
.. I guess that sed will crash on most systems for such huge files ..
0
 
LVL 65

Assisted Solution

by:rockiroads
rockiroads earned 100 total points
ID: 17930697
An alternative way to sed


grep -v `head -1 f1.txt` f1.txt | grep -v `tail -1 f1.txt`
0
 

Author Comment

by:vihar123
ID: 17932438
Thank you very much for your inputs.

 Tinton,
      file consists of pipe symbols
  ozo,ahoffmann
       constraint is,i need to use only Unix commands.
  Amit
     i need to use this on regular basis(every day).

rockiroads:
    i will try ur command and let u know.

Thanks





0
 
LVL 48

Expert Comment

by:Tintin
ID: 17933699
What OS are you using?
Have you tried using sed?  Does it handle the amount of data?
0
 
LVL 58

Expert Comment

by:amit_g
ID: 17934365
We can keep on speculating but what would work for you is going to be based on the fact what kind of machine you have and what kind of load it has. All solution given above would work but since your file is huge, you need to try and see which one works best for you.

I tested some on my own. Again the real test is your own as I don't know how wide is your file. If it is 10 bytes wide you have 100MB file while if you have 100 bytes wide file, you have 1GB file. I took a 100 byte files with 10M records and it took about 6 minutes on my poor machine with sed. I am not sure if that kind of performance is acceptable to you but at least it did not crash. Also the machine performance during those 6 minutes was slow but not very bad. Perl solution on the other hand took over 12 minutes and caused the system to dramatically slow down because of huge memory usage. This is understandable as the perl solution seems to be reading the whole file in memory. Other solutions like sed and awk work line by like and whatever time they take is mainly due to disk IO. Since the file is huge, whatever solution you use, it will need to do that much disk IO and so you would get similar performance.
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 100 total points
ID: 17934662
Tie::File does not read the whole file into memory, although it does keep a configurable sized memory cache if it decides it's more efficient to deffer some writes to do several updates at once.
The memory you found it using was probably for its internal list of byte offsets for the records.
Since you are updating only two records, most of this is unecessary, so
perl -i -ne 'print if (2..0)and!eof' f1.txt
should be faster, and just as bound by disk IO to the file as the sed version.
0
 
LVL 58

Expert Comment

by:amit_g
ID: 17934748
Pardon my ignorance about perl. My comment was based on the memory usage that I saw in my system and you are right, the last solution does take about the same time (~6 minutes). So whatever tool we use, we are going to be limited by disk IO.

Vihar, the time you see in my posts is only indicative as first I don't have real data and second your machine could be a lot better (or a lot worse) than mine. So you have to do your own tests and use whatever works best for you. My guess is that all tools would give similar performance becuase disk IO is going to be the bottleneck.
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 17941211
> .. i need to use only Unix commands.
I don't see anything else than Unix commands here!

another try, just plain old awk (which should be installed on any Unix):
  awk '(NR==1){next;}(NR==2){x=$0;next;}{print x;} your-file
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Turn a spreadsheet into a vba executable. 2 78
Complete beginner needs help making a cron job 9 109
string initialization in java 11 109
sumHeights  challenge 17 65
If you use Adobe Reader X it is possible you can't open OLE PDF documents in the standard. The reason is the 'save box mode' in adobe reader X. Many people think the protected Mode of adobe reader x is only to stop the write access. But this fe…
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

947 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now