Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Remove links and text from list.

Posted on 2013-06-09
12
Medium Priority
?
309 Views
Last Modified: 2013-06-09
I am looking to remove anything (Including the 'marker' text) between:

<a href"

and the first instance of:

</li>

So the following:

<li> blah blah <a href="blah.com"> blah </a> </li>
<li> blah blah <a href="blah2.com"> blah blah blah </a> </li>

Looks like:

<li> blah blah
<li> blah blah
0
Comment
Question by:stakor
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 3
12 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39233532
$data =~ s/^(.*?)<a href=".*?<\/li>/$1/mg;

Open in new window

0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39233537
Actually this is probably better:
$data =~ s/(<li>.*?)<a href=".*?<\/li>/$1/sg;

Open in new window

0
 

Author Comment

by:stakor
ID: 39233583
Is there a way to tweak this, so that the file can be piped into it?
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 35

Accepted Solution

by:
Terry Woods earned 2000 total points
ID: 39233593
Maybe you want a one-liner like this?

perl -0777  -i.bak -pe 's/(<li>.*?)<a href=".*?<\/li>/$1/sg' file.txt

Open in new window


This alters the file "file.txt" but creates a backup of it as "file.txt.bak"
0
 
LVL 84

Expert Comment

by:ozo
ID: 39233641
If the intended purpose of the (<li>.*?) was to prevent removal of things like
 <a href="blah.com"> blah </a>
 <li> blah blah  </li>
you may want to note that it does not prevent removal of something like
 <li> blah blah </li>
 <a href="blah2.com"> blah blah blah </a>
 <li> blah blah blah blah blah </li>
0
 

Author Comment

by:stakor
ID: 39233655
The items to be removed would exist inside of the <li> </li> tags. There will probably be more cleaning up that will be required, but at this stage, that is fine. I appreciate the feed back.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39233659
Good point ozo. This might be better:

perl -0777  -i.bak -pe 's/(<li>(?:(?!<\/li>).)*?)<a href=".*?<\/li>/$1/sg' file.txt

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 39233666
But the substitution in http:#a39233593 is capable of removing items not inside of <li> </li> tags.
If your input file has no items outside of <li> </li> tags then it may not matter for your application.
But if your input file has no items outside of <li> </li> tags then the (<li>.*?) would seem to serve no purpose.
0
 

Author Comment

by:stakor
ID: 39233674
The application that I am going to use this for only has data inside of <li> </li> tags. There is no other data. As long as the 'delete' only goes to the next </li> tag, it should be good. I just don't want to have it get greedy and delete an entire <li> ... </li> set.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39233676
s/(<li>(?:(?!<\/li>).)*?)<a href=".*?<\/li>/$1/sg
takes care of the case of
<li> blah blah </li>
<li> blah blah <a href="blah2.com"> blah blah blah </a> </li>
but not the case of
<li> blah blah
<li> blah blah <a href="blah.com"> blah </a>
<li> blah blah <a href="blah2.com"> blah blah blah </a>
</li>
0
 

Author Comment

by:stakor
ID: 39233691
There will not be nested <li></li> sets. That I am comfortable with. There is a chance that there might be the occasional set that does not have a link. But I will have to see if that happens.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39233695
You'll need the latest version I posted to successful handle data where there exists a <li> tag without a link in it.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question