• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 348
  • Last Modified:

Sort HTML File

I want to take an HTML file, and edit it so that it is easier to scrape the data in it. I wish to edit the file, so that two things happen.

1.) all <img ... tags are replaced with a carriage return and then <img ...

2.) all </img> tags are replaced with </img> and a carriage return.  

So, a file that has:

blah<img alt="" src="http://test.com/test.jpg"></img><img alt="" src="http://test.com/test2.jpg"></img>

becomes:

blah
<img alt="" src="http://test.com/test.jpg"></img>

<img alt="" src="http://test.com/test2.jpg"></img>
0
stakor
Asked:
stakor
1 Solution
 
ozoCommented:
perl -pe 's/(?=<img)/\n/g;s{(?<=</img>)}{\n}g' <<END
blah<img alt="" src="http://test.com/test.jpg"></img><img alt="" src="http://test.com/test2.jpg"></img>
END
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now