Solved

sed command to search expression and delete it

Posted on 2006-11-19
6
325 Views
Last Modified: 2012-05-05
Hi,
I change my text files here manually which have the data in the following format.

1085961616.474 172 190.104.253.84 TCP_MISS/200 2146 GET http://cfg.mywebsearch.com/mysaconfg.jsp?[07fiG10lKmU4M3R:IQ.TCH] - DIRECT/63.236.66.14 text/html

1085961622.602 60 68.22.217.209 TCP_HIT/200 9476 GET http://image.linkexchange.com/01/73/69/31/banner468x60.gif - NONE/- image/gif

1085961627.502 159 190.104.253.84 TCP_REFRESH_HIT/304 339 GET http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js - DIRECT/192.43.217.199 application/x-javascript


1085961648.792 6 12.168.85.240 TCP_MISS/503 1585 GET http://erp.water.com:9930/ - NONE/- text/html

I change these files to have the url's only i.e. as below

http://cfg.mywebsearch.com/mysaconfg.jsp
http://image.linkexchange.com/01/73/69/31/banner468x60.gif

Now I filter out my urls according these ways
1) I look for everything else other than _hit/200 and _miss/200 and remove it from the file
2) Then I take the line with _hit/200 and _miss/200 and copy the url only.

Currently I only have 10 records in the text file, however down the line I'll be receiving these files in megs. As a result I am trying to make my life easier before hand by creating a shell script.

Now here is my algorithm. get the line number that do not have the expression _hit/200 andn _miss/200, pick it and delete it. then for the ones left truncate everything before http: and after .js or .jsp or .html or .gif it will leave me with only the urls.

I am not so good at script syntax but here's what I've come up with so far.

sed -n -d '/!_hit/200 /='log.txt -- search in log.txt file the line with the given expression and delete it. it's not correct as it doesn't give me the expected result i.e. delete the other lines.

sed -n -d '/!_miss/200 /='log.txt -- same here
sed -n -d '/^* /200 /='log.txt    -- now here I checked man pages and I believe I am supposed to use the source and destination text in the file but don't know what expression would fit in.

Would someone tell me what am I doing wrong?

Thanks in advance

askhan1
0
Comment
Question by:askhan1
  • 4
  • 2
6 Comments
 
LVL 51

Expert Comment

by:ahoffmann
ID: 17975653
> .. I'll be receiving these files in megs.
you may encounter problems with crashed sed if you don't use Gnu's sed, hence I use awk below

> Would someone tell me what am I doing wrong?
your sed pattern contains a / which is also your pattern delimiter, hence you need to escape / as \/ inside your pattern

I'd use:

awk '($4~/_(miss|hit)\/200/i){print $7}' log.txt
0
 

Author Comment

by:askhan1
ID: 17976028
Thank you for your reply.

I tried it out and have a few qeustions. When I run it against 10 entries it also brings me the urls that have
anything other than "miss or hit/anything". Say it brought me the result of this line

1085961627.502 159 190.104.253.84 TCP_REFRESH_HIT/304 339 GET http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js - DIRECT/192.43.217.199 application/x-javascript

http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js -- url not needed.

Moreover, say if I have http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js?anything I want to remove the "?anything" from column 7. Is there a substring function I could apply to print substring($7,until(?))

Thank you

0
 

Author Comment

by:askhan1
ID: 17976067
I tried this but gives me an error

awk '($4~/_(miss|hit)\/200/i){print substr($7,'?',1)}' Log.txt
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 

Author Comment

by:askhan1
ID: 17976074
Actually I just realised I can't use substr as when I did man substr it did not find any manual entries for the command. That means that substr is not supported in my version of linux.
0
 
LVL 51

Accepted Solution

by:
ahoffmann earned 50 total points
ID: 17976203
awk '($4~/_(miss|HIT)\/200/){print $7}' log.txt

>  I want to remove the "?anything"
awk '($4~/_(miss|HIT)\/200/){print $7}' log.txt|sed -e 's/?.*$//'
0
 

Author Comment

by:askhan1
ID: 17984673
Thanks Hoffman.

Actually, I worded it improperly. I want to delete all the lines containing "?".

I triedthis instead but my syntax is wrong.

awk '($4~/_(miss|HIT)\/200/*?*){print $7}' log.txt|

Can you tell me what am I doing wrong? Moreover, thanks for your help and enjoy your points.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Linux daemon 11 356
Problem passing file name with spaces to shell script 12 257
updating the repos in yum 4 117
Embeded Linux on Router 9 109
The purpose of this article is to fix the unknown display problem in Linux Mint operating system. After installing the OS if you see Display monitor is not recognized then we can install "MESA" utilities to fix this problem or we can install additio…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
This video shows how to quickly and easily add an email signature for all users on Exchange 2016. The resulting signature is applied on a server level by Exchange Online. The email signature template has been downloaded from: www.mail-signatures…
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question