Link to home
Start Free TrialLog in
Avatar of popy
popy

asked on

remove content on string

hi all

  i would like to remove some content on string that i read from file

  there is in general the line :

text here <a href="http://somelink.com/somedir/">some text </a> text here
or
text here <a href="http://somelink.com/">some text </a> text here
or
text here <a href="http://somelink.com/somedir/page.html">some text </a> text here

 and i would like to keep only URL like
http://somelink.com/
or
http://somelink.com/somedir/
or
http://somelink.com/somedir/page.html

  can you help me..?
ASKER CERTIFIED SOLUTION
Avatar of clockwatcher
clockwatcher

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jhurst
jhurst

I suspect that:

$line=~ s/<a href=\"([^\"]+)>[^<]+<\/a>.*$/$1/mg;

may be closer.
Not to beat a dead horse here, but if there are multiple links on a line like the following :

text here <a href="http://somelink.com/somedir/">some test </a> text here <a href="http://www.boston.com">asdf</a>
or
text here <a href="http://somelink.com/">some text </a> text here
or
text here <a href="http://somelink.com/somedir/page.html">some text </a> text here

you should try these modifications to clockwatcher's :

#! /usr/local/bin/perl

while($line = <>) {

    @matches = $line =~ m#"(http://[^"]*)"#gi;
    foreach $item (@matches) { print "$item\n"; }

}