Solved

sed command to search expression and delete it

Posted on 2006-11-19
6
328 Views
Last Modified: 2012-05-05
Hi,
I change my text files here manually which have the data in the following format.

1085961616.474 172 190.104.253.84 TCP_MISS/200 2146 GET http://cfg.mywebsearch.com/mysaconfg.jsp?[07fiG10lKmU4M3R:IQ.TCH] - DIRECT/63.236.66.14 text/html

1085961622.602 60 68.22.217.209 TCP_HIT/200 9476 GET http://image.linkexchange.com/01/73/69/31/banner468x60.gif - NONE/- image/gif

1085961627.502 159 190.104.253.84 TCP_REFRESH_HIT/304 339 GET http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js - DIRECT/192.43.217.199 application/x-javascript


1085961648.792 6 12.168.85.240 TCP_MISS/503 1585 GET http://erp.water.com:9930/ - NONE/- text/html

I change these files to have the url's only i.e. as below

http://cfg.mywebsearch.com/mysaconfg.jsp
http://image.linkexchange.com/01/73/69/31/banner468x60.gif

Now I filter out my urls according these ways
1) I look for everything else other than _hit/200 and _miss/200 and remove it from the file
2) Then I take the line with _hit/200 and _miss/200 and copy the url only.

Currently I only have 10 records in the text file, however down the line I'll be receiving these files in megs. As a result I am trying to make my life easier before hand by creating a shell script.

Now here is my algorithm. get the line number that do not have the expression _hit/200 andn _miss/200, pick it and delete it. then for the ones left truncate everything before http: and after .js or .jsp or .html or .gif it will leave me with only the urls.

I am not so good at script syntax but here's what I've come up with so far.

sed -n -d '/!_hit/200 /='log.txt -- search in log.txt file the line with the given expression and delete it. it's not correct as it doesn't give me the expected result i.e. delete the other lines.

sed -n -d '/!_miss/200 /='log.txt -- same here
sed -n -d '/^* /200 /='log.txt    -- now here I checked man pages and I believe I am supposed to use the source and destination text in the file but don't know what expression would fit in.

Would someone tell me what am I doing wrong?

Thanks in advance

askhan1
0
Comment
Question by:askhan1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
6 Comments
 
LVL 51

Expert Comment

by:ahoffmann
ID: 17975653
> .. I'll be receiving these files in megs.
you may encounter problems with crashed sed if you don't use Gnu's sed, hence I use awk below

> Would someone tell me what am I doing wrong?
your sed pattern contains a / which is also your pattern delimiter, hence you need to escape / as \/ inside your pattern

I'd use:

awk '($4~/_(miss|hit)\/200/i){print $7}' log.txt
0
 

Author Comment

by:askhan1
ID: 17976028
Thank you for your reply.

I tried it out and have a few qeustions. When I run it against 10 entries it also brings me the urls that have
anything other than "miss or hit/anything". Say it brought me the result of this line

1085961627.502 159 190.104.253.84 TCP_REFRESH_HIT/304 339 GET http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js - DIRECT/192.43.217.199 application/x-javascript

http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js -- url not needed.

Moreover, say if I have http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js?anything I want to remove the "?anything" from column 7. Is there a substring function I could apply to print substring($7,until(?))

Thank you

0
 

Author Comment

by:askhan1
ID: 17976067
I tried this but gives me an error

awk '($4~/_(miss|hit)\/200/i){print substr($7,'?',1)}' Log.txt
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:askhan1
ID: 17976074
Actually I just realised I can't use substr as when I did man substr it did not find any manual entries for the command. That means that substr is not supported in my version of linux.
0
 
LVL 51

Accepted Solution

by:
ahoffmann earned 50 total points
ID: 17976203
awk '($4~/_(miss|HIT)\/200/){print $7}' log.txt

>  I want to remove the "?anything"
awk '($4~/_(miss|HIT)\/200/){print $7}' log.txt|sed -e 's/?.*$//'
0
 

Author Comment

by:askhan1
ID: 17984673
Thanks Hoffman.

Actually, I worded it improperly. I want to delete all the lines containing "?".

I triedthis instead but my syntax is wrong.

awk '($4~/_(miss|HIT)\/200/*?*){print $7}' log.txt|

Can you tell me what am I doing wrong? Moreover, thanks for your help and enjoy your points.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Have you ever been frustrated by having to click seven times in order to retrieve a small bit of information from the web, always the same seven clicks, scrolling down and down until you reach your target? When you know the benefits of the command l…
The purpose of this article is to fix the unknown display problem in Linux Mint operating system. After installing the OS if you see Display monitor is not recognized then we can install "MESA" utilities to fix this problem or we can install additio…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question