• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 305
  • Last Modified:

regular expression

Dear Experts,

A variable contains in fact a whole html-page.
Within that string there are some badly formed xhtml tags (f.e. tags in CAPS, attributes which are unquoted, etc...).
For most of these issues I have found a solution, but for this one not:
<img src="arrow.gif" width="45" height="46" alt="arrow" border="0"> has to become <img src="arrow.gif" width="45" height="46" alt="arrow" border="0" />

I think with the use of regular expression this can be achieved (but it's not my cup of T)

Many thanks in advance.

Gijs
0
gijsbertjr
Asked:
gijsbertjr
  • 3
1 Solution
 
soapergemCommented:
preg_replace('@<img([^/>]+)>@s', '<img$1/>', $html_text);
0
 
soapergemCommented:
(That's assuming that you have the whole HTML page stored in variable $html_text.)
0
 
soapergemCommented:
On second thought, use this:

    preg_replace('@<img([^/>]+)>@si', '<img$1/>', $html_text);

Only difference is that I added an "i" to make it case-insensitive
0
 
RoonaanCommented:
Hoi Gijsbert,

You can try and use this expression. It is little different from the one soapergem posted, but might be more accurate. You would have to test with both expressions:

  function correct_img($html_text) {
    $preg = '#<img([^>]*(\S))\s*>#sie';
    $repl = '"<img$1".("\2"=="/" ? "" : "/").">"';
    return preg_replace($preg,$repl, $html_text);
  }

I ran some test with both algoritm. soapergems is mid column, above function is right column. Input is left column:
<img src="apple">                 <img src="apple"/>                <img src="apple"/>                
<img src=""/>                     <img src=""/>                     <img src=""/>                    
<img src=""/ >                    <img src=""/ >                    <img src=""/>                    
<img src="apple"  >               <img src="apple"  />              <img src="apple"/>                
<img src="apple"        >         <img src="apple"        />        <img src="apple"/>                
<img src="apple"  /   >           <img src="apple"  /   >           <img src="apple"  />              
<IMG src="apple"  />              <IMG src="apple"  />              <img src="apple"  />              
<IMG src="apple"  />              <IMG src="apple"  />              <img src="apple"  />

-r-
0
 
gijsbertjrAuthor Commented:
Dear soapergem and Roonaan,

First of all my excuses for this late reaction.
Thanks both for your help.
I've run some tests with both expressions and came to the same conclusion as Roonaan.

I'll go with his solution.

Best regards,

Gijs
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now