• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 317
  • Last Modified:

Remove links and text from list.

I am looking to remove anything (Including the 'marker' text) between:

<a href"

and the first instance of:

</li>

So the following:

<li> blah blah <a href="blah.com"> blah </a> </li>
<li> blah blah <a href="blah2.com"> blah blah blah </a> </li>

Looks like:

<li> blah blah
<li> blah blah
0
stakor
Asked:
stakor
  • 5
  • 4
  • 3
1 Solution
 
Terry WoodsIT GuruCommented:
$data =~ s/^(.*?)<a href=".*?<\/li>/$1/mg;

Open in new window

0
 
Terry WoodsIT GuruCommented:
Actually this is probably better:
$data =~ s/(<li>.*?)<a href=".*?<\/li>/$1/sg;

Open in new window

0
 
stakorAuthor Commented:
Is there a way to tweak this, so that the file can be piped into it?
0
The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

 
Terry WoodsIT GuruCommented:
Maybe you want a one-liner like this?

perl -0777  -i.bak -pe 's/(<li>.*?)<a href=".*?<\/li>/$1/sg' file.txt

Open in new window


This alters the file "file.txt" but creates a backup of it as "file.txt.bak"
0
 
ozoCommented:
If the intended purpose of the (<li>.*?) was to prevent removal of things like
 <a href="blah.com"> blah </a>
 <li> blah blah  </li>
you may want to note that it does not prevent removal of something like
 <li> blah blah </li>
 <a href="blah2.com"> blah blah blah </a>
 <li> blah blah blah blah blah </li>
0
 
stakorAuthor Commented:
The items to be removed would exist inside of the <li> </li> tags. There will probably be more cleaning up that will be required, but at this stage, that is fine. I appreciate the feed back.
0
 
Terry WoodsIT GuruCommented:
Good point ozo. This might be better:

perl -0777  -i.bak -pe 's/(<li>(?:(?!<\/li>).)*?)<a href=".*?<\/li>/$1/sg' file.txt

Open in new window

0
 
ozoCommented:
But the substitution in http:#a39233593 is capable of removing items not inside of <li> </li> tags.
If your input file has no items outside of <li> </li> tags then it may not matter for your application.
But if your input file has no items outside of <li> </li> tags then the (<li>.*?) would seem to serve no purpose.
0
 
stakorAuthor Commented:
The application that I am going to use this for only has data inside of <li> </li> tags. There is no other data. As long as the 'delete' only goes to the next </li> tag, it should be good. I just don't want to have it get greedy and delete an entire <li> ... </li> set.
0
 
ozoCommented:
s/(<li>(?:(?!<\/li>).)*?)<a href=".*?<\/li>/$1/sg
takes care of the case of
<li> blah blah </li>
<li> blah blah <a href="blah2.com"> blah blah blah </a> </li>
but not the case of
<li> blah blah
<li> blah blah <a href="blah.com"> blah </a>
<li> blah blah <a href="blah2.com"> blah blah blah </a>
</li>
0
 
stakorAuthor Commented:
There will not be nested <li></li> sets. That I am comfortable with. There is a chance that there might be the occasional set that does not have a link. But I will have to see if that happens.
0
 
Terry WoodsIT GuruCommented:
You'll need the latest version I posted to successful handle data where there exists a <li> tag without a link in it.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 5
  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now