Solved

sed command to search expression and delete it

Posted on 2006-11-19
6
321 Views
Last Modified: 2012-05-05
Hi,
I change my text files here manually which have the data in the following format.

1085961616.474 172 190.104.253.84 TCP_MISS/200 2146 GET http://cfg.mywebsearch.com/mysaconfg.jsp?[07fiG10lKmU4M3R:IQ.TCH] - DIRECT/63.236.66.14 text/html

1085961622.602 60 68.22.217.209 TCP_HIT/200 9476 GET http://image.linkexchange.com/01/73/69/31/banner468x60.gif - NONE/- image/gif

1085961627.502 159 190.104.253.84 TCP_REFRESH_HIT/304 339 GET http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js - DIRECT/192.43.217.199 application/x-javascript


1085961648.792 6 12.168.85.240 TCP_MISS/503 1585 GET http://erp.water.com:9930/ - NONE/- text/html

I change these files to have the url's only i.e. as below

http://cfg.mywebsearch.com/mysaconfg.jsp
http://image.linkexchange.com/01/73/69/31/banner468x60.gif

Now I filter out my urls according these ways
1) I look for everything else other than _hit/200 and _miss/200 and remove it from the file
2) Then I take the line with _hit/200 and _miss/200 and copy the url only.

Currently I only have 10 records in the text file, however down the line I'll be receiving these files in megs. As a result I am trying to make my life easier before hand by creating a shell script.

Now here is my algorithm. get the line number that do not have the expression _hit/200 andn _miss/200, pick it and delete it. then for the ones left truncate everything before http: and after .js or .jsp or .html or .gif it will leave me with only the urls.

I am not so good at script syntax but here's what I've come up with so far.

sed -n -d '/!_hit/200 /='log.txt -- search in log.txt file the line with the given expression and delete it. it's not correct as it doesn't give me the expected result i.e. delete the other lines.

sed -n -d '/!_miss/200 /='log.txt -- same here
sed -n -d '/^* /200 /='log.txt    -- now here I checked man pages and I believe I am supposed to use the source and destination text in the file but don't know what expression would fit in.

Would someone tell me what am I doing wrong?

Thanks in advance

askhan1
0
Comment
Question by:askhan1
  • 4
  • 2
6 Comments
 
LVL 51

Expert Comment

by:ahoffmann
ID: 17975653
> .. I'll be receiving these files in megs.
you may encounter problems with crashed sed if you don't use Gnu's sed, hence I use awk below

> Would someone tell me what am I doing wrong?
your sed pattern contains a / which is also your pattern delimiter, hence you need to escape / as \/ inside your pattern

I'd use:

awk '($4~/_(miss|hit)\/200/i){print $7}' log.txt
0
 

Author Comment

by:askhan1
ID: 17976028
Thank you for your reply.

I tried it out and have a few qeustions. When I run it against 10 entries it also brings me the urls that have
anything other than "miss or hit/anything". Say it brought me the result of this line

1085961627.502 159 190.104.253.84 TCP_REFRESH_HIT/304 339 GET http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js - DIRECT/192.43.217.199 application/x-javascript

http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js -- url not needed.

Moreover, say if I have http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js?anything I want to remove the "?anything" from column 7. Is there a substring function I could apply to print substring($7,until(?))

Thank you

0
 

Author Comment

by:askhan1
ID: 17976067
I tried this but gives me an error

awk '($4~/_(miss|hit)\/200/i){print substr($7,'?',1)}' Log.txt
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 

Author Comment

by:askhan1
ID: 17976074
Actually I just realised I can't use substr as when I did man substr it did not find any manual entries for the command. That means that substr is not supported in my version of linux.
0
 
LVL 51

Accepted Solution

by:
ahoffmann earned 50 total points
ID: 17976203
awk '($4~/_(miss|HIT)\/200/){print $7}' log.txt

>  I want to remove the "?anything"
awk '($4~/_(miss|HIT)\/200/){print $7}' log.txt|sed -e 's/?.*$//'
0
 

Author Comment

by:askhan1
ID: 17984673
Thanks Hoffman.

Actually, I worded it improperly. I want to delete all the lines containing "?".

I triedthis instead but my syntax is wrong.

awk '($4~/_(miss|HIT)\/200/*?*){print $7}' log.txt|

Can you tell me what am I doing wrong? Moreover, thanks for your help and enjoy your points.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Have you ever been frustrated by having to click seven times in order to retrieve a small bit of information from the web, always the same seven clicks, scrolling down and down until you reach your target? When you know the benefits of the command l…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now