• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 355
  • Last Modified:

Sort HTML File

I want to take an HTML file, and edit it so that it is easier to scrape the data in it. I wish to edit the file, so that two things happen.

1.) all <img ... tags are replaced with a carriage return and then <img ...

2.) all </img> tags are replaced with </img> and a carriage return.  

So, a file that has:

blah<img alt="" src="http://test.com/test.jpg"></img><img alt="" src="http://test.com/test2.jpg"></img>

becomes:

blah
<img alt="" src="http://test.com/test.jpg"></img>

<img alt="" src="http://test.com/test2.jpg"></img>
0
stakor
Asked:
stakor
1 Solution
 
ozoCommented:
perl -pe 's/(?=<img)/\n/g;s{(?<=</img>)}{\n}g' <<END
blah<img alt="" src="http://test.com/test.jpg"></img><img alt="" src="http://test.com/test2.jpg"></img>
END
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

Tackle projects and never again get stuck behind a technical roadblock.
Join Now