Solved

sed command to search expression and delete it

Posted on 2006-11-19
6
324 Views
Last Modified: 2012-05-05
Hi,
I change my text files here manually which have the data in the following format.

1085961616.474 172 190.104.253.84 TCP_MISS/200 2146 GET http://cfg.mywebsearch.com/mysaconfg.jsp?[07fiG10lKmU4M3R:IQ.TCH] - DIRECT/63.236.66.14 text/html

1085961622.602 60 68.22.217.209 TCP_HIT/200 9476 GET http://image.linkexchange.com/01/73/69/31/banner468x60.gif - NONE/- image/gif

1085961627.502 159 190.104.253.84 TCP_REFRESH_HIT/304 339 GET http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js - DIRECT/192.43.217.199 application/x-javascript


1085961648.792 6 12.168.85.240 TCP_MISS/503 1585 GET http://erp.water.com:9930/ - NONE/- text/html

I change these files to have the url's only i.e. as below

http://cfg.mywebsearch.com/mysaconfg.jsp
http://image.linkexchange.com/01/73/69/31/banner468x60.gif

Now I filter out my urls according these ways
1) I look for everything else other than _hit/200 and _miss/200 and remove it from the file
2) Then I take the line with _hit/200 and _miss/200 and copy the url only.

Currently I only have 10 records in the text file, however down the line I'll be receiving these files in megs. As a result I am trying to make my life easier before hand by creating a shell script.

Now here is my algorithm. get the line number that do not have the expression _hit/200 andn _miss/200, pick it and delete it. then for the ones left truncate everything before http: and after .js or .jsp or .html or .gif it will leave me with only the urls.

I am not so good at script syntax but here's what I've come up with so far.

sed -n -d '/!_hit/200 /='log.txt -- search in log.txt file the line with the given expression and delete it. it's not correct as it doesn't give me the expected result i.e. delete the other lines.

sed -n -d '/!_miss/200 /='log.txt -- same here
sed -n -d '/^* /200 /='log.txt    -- now here I checked man pages and I believe I am supposed to use the source and destination text in the file but don't know what expression would fit in.

Would someone tell me what am I doing wrong?

Thanks in advance

askhan1
0
Comment
Question by:askhan1
  • 4
  • 2
6 Comments
 
LVL 51

Expert Comment

by:ahoffmann
ID: 17975653
> .. I'll be receiving these files in megs.
you may encounter problems with crashed sed if you don't use Gnu's sed, hence I use awk below

> Would someone tell me what am I doing wrong?
your sed pattern contains a / which is also your pattern delimiter, hence you need to escape / as \/ inside your pattern

I'd use:

awk '($4~/_(miss|hit)\/200/i){print $7}' log.txt
0
 

Author Comment

by:askhan1
ID: 17976028
Thank you for your reply.

I tried it out and have a few qeustions. When I run it against 10 entries it also brings me the urls that have
anything other than "miss or hit/anything". Say it brought me the result of this line

1085961627.502 159 190.104.253.84 TCP_REFRESH_HIT/304 339 GET http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js - DIRECT/192.43.217.199 application/x-javascript

http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js -- url not needed.

Moreover, say if I have http://scripts.lycos.com/catman/login.mail.lycos.com.cm/logout.js?anything I want to remove the "?anything" from column 7. Is there a substring function I could apply to print substring($7,until(?))

Thank you

0
 

Author Comment

by:askhan1
ID: 17976067
I tried this but gives me an error

awk '($4~/_(miss|hit)\/200/i){print substr($7,'?',1)}' Log.txt
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 

Author Comment

by:askhan1
ID: 17976074
Actually I just realised I can't use substr as when I did man substr it did not find any manual entries for the command. That means that substr is not supported in my version of linux.
0
 
LVL 51

Accepted Solution

by:
ahoffmann earned 50 total points
ID: 17976203
awk '($4~/_(miss|HIT)\/200/){print $7}' log.txt

>  I want to remove the "?anything"
awk '($4~/_(miss|HIT)\/200/){print $7}' log.txt|sed -e 's/?.*$//'
0
 

Author Comment

by:askhan1
ID: 17984673
Thanks Hoffman.

Actually, I worded it improperly. I want to delete all the lines containing "?".

I triedthis instead but my syntax is wrong.

awk '($4~/_(miss|HIT)\/200/*?*){print $7}' log.txt|

Can you tell me what am I doing wrong? Moreover, thanks for your help and enjoy your points.
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
what does getent do 3 50
How to use docker to run your java program on a linux box 9 105
Linux : adding the new user with password option, 11 107
Problem to start Neon 20 106
The purpose of this article is to fix the unknown display problem in Linux Mint operating system. After installing the OS if you see Display monitor is not recognized then we can install "MESA" utilities to fix this problem or we can install additio…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
This Micro Tutorial will give you a basic overview how to record your screen with Microsoft Expression Encoder. This program is still free and open for the public to download. This will be demonstrated using Microsoft Expression Encoder 4.

831 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question