Go Premium for a chance to win a PS4. Enter to Win


sed command to search expression and delete it

Posted on 2006-11-19
Medium Priority
Last Modified: 2012-05-05
I change my text files here manually which have the data in the following format.

1085961616.474 172 TCP_MISS/200 2146 GET http://cfg.mywebsearch.com/mysaconfg.jsp?[07fiG10lKmU4M3R:IQ.TCH] - DIRECT/ text/html

1085961622.602 60 TCP_HIT/200 9476 GET http://image.linkexchange.com/01/73/69/31/banner468x60.gif - NONE/- image/gif

1085961627.502 159 TCP_REFRESH_HIT/304 339 GET http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js - DIRECT/ application/x-javascript

1085961648.792 6 TCP_MISS/503 1585 GET http://erp.water.com:9930/ - NONE/- text/html

I change these files to have the url's only i.e. as below


Now I filter out my urls according these ways
1) I look for everything else other than _hit/200 and _miss/200 and remove it from the file
2) Then I take the line with _hit/200 and _miss/200 and copy the url only.

Currently I only have 10 records in the text file, however down the line I'll be receiving these files in megs. As a result I am trying to make my life easier before hand by creating a shell script.

Now here is my algorithm. get the line number that do not have the expression _hit/200 andn _miss/200, pick it and delete it. then for the ones left truncate everything before http: and after .js or .jsp or .html or .gif it will leave me with only the urls.

I am not so good at script syntax but here's what I've come up with so far.

sed -n -d '/!_hit/200 /='log.txt -- search in log.txt file the line with the given expression and delete it. it's not correct as it doesn't give me the expected result i.e. delete the other lines.

sed -n -d '/!_miss/200 /='log.txt -- same here
sed -n -d '/^* /200 /='log.txt    -- now here I checked man pages and I believe I am supposed to use the source and destination text in the file but don't know what expression would fit in.

Would someone tell me what am I doing wrong?

Thanks in advance

Question by:askhan1
  • 4
  • 2
LVL 51

Expert Comment

ID: 17975653
> .. I'll be receiving these files in megs.
you may encounter problems with crashed sed if you don't use Gnu's sed, hence I use awk below

> Would someone tell me what am I doing wrong?
your sed pattern contains a / which is also your pattern delimiter, hence you need to escape / as \/ inside your pattern

I'd use:

awk '($4~/_(miss|hit)\/200/i){print $7}' log.txt

Author Comment

ID: 17976028
Thank you for your reply.

I tried it out and have a few qeustions. When I run it against 10 entries it also brings me the urls that have
anything other than "miss or hit/anything". Say it brought me the result of this line

1085961627.502 159 TCP_REFRESH_HIT/304 339 GET http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js - DIRECT/ application/x-javascript

http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js -- url not needed.

Moreover, say if I have http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js?anything I want to remove the "?anything" from column 7. Is there a substring function I could apply to print substring($7,until(?))

Thank you


Author Comment

ID: 17976067
I tried this but gives me an error

awk '($4~/_(miss|hit)\/200/i){print substr($7,'?',1)}' Log.txt
Nothing ever in the clear!

This technical paper will help you implement VMware’s VM encryption as well as implement Veeam encryption which together will achieve the nothing ever in the clear goal. If a bad guy steals VMs, backups or traffic they get nothing.


Author Comment

ID: 17976074
Actually I just realised I can't use substr as when I did man substr it did not find any manual entries for the command. That means that substr is not supported in my version of linux.
LVL 51

Accepted Solution

ahoffmann earned 200 total points
ID: 17976203
awk '($4~/_(miss|HIT)\/200/){print $7}' log.txt

>  I want to remove the "?anything"
awk '($4~/_(miss|HIT)\/200/){print $7}' log.txt|sed -e 's/?.*$//'

Author Comment

ID: 17984673
Thanks Hoffman.

Actually, I worded it improperly. I want to delete all the lines containing "?".

I triedthis instead but my syntax is wrong.

awk '($4~/_(miss|HIT)\/200/*?*){print $7}' log.txt|

Can you tell me what am I doing wrong? Moreover, thanks for your help and enjoy your points.

Featured Post

Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The purpose of this article is to fix the unknown display problem in Linux Mint operating system. After installing the OS if you see Display monitor is not recognized then we can install "MESA" utilities to fix this problem or we can install additio…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Want to learn how to record your desktop screen without having to use an outside camera. Click on this video and learn how to use the cool google extension called "Screencastify"! Step 1: Open a new google tab Step 2: Go to the left hand upper corn…
Is your data getting by on basic protection measures? In today’s climate of debilitating malware and ransomware—like WannaCry—that may not be enough. You need to establish more than basics, like a recovery plan that protects both data and endpoints.…

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question