Link to home
Start Free TrialLog in
Avatar of janhoedt
janhoedt

asked on

Regex: select values between html tags?

Hi,

Howto get (with regex) everything which is between the tags <table class='whatever' > ... </table> (including the tags theirselves) in an html page?

J.
Avatar of Francisco Igor
Francisco Igor
Flag of Canada image

You can use

/<table[^>]*>(.|\s)+?<\/table>/ig

Open in new window


The [^>] expression prevents to close the tag when there is some ' >'  character inside the tag
Avatar of janhoedt
janhoedt

ASKER

That does get all the tables right?  Not only the one with class='whatever`
Yes, it's for getting all tables (TABLE tags). What programming language are you planning to do this?
There could be easier ways to accomplish the same task using language-specific utilities or libraries.
Powershell. Goal is to get content of an html page (which I created myself), search for specific html tag and replace  it with new html tag (which contains new value).

I know out-html and other ways to create html, but I d like to keep editing the original html page in Dreamweaver so I have advanced editing possibilities.
To clarify:

$webpage = get-content MyWebPage Location

 $tables = regex on $webpage

$toreplacehtmlblock= $tables | where-object contains class=...

$newhtmlblock = replace string in $

toreplacehtmlblock

Set-content to new page
Maybe something like this...

[xml]$Global:HtmlContent = Get-content file.html
$NR = (New-Object System.Xml.XmlNodeReader $Global:HtmlContent)

#find Specific node:
$table= $NR.FindObject("table")

Open in new window


Or you can even use the SelectNodes cmdlet
Do you have nested tables?
Will look into the code you posted, thanks!  Not sure what it is doing though....
Yes, I do have tables in tables in the html.
I've found that nesting throws off regex pattern matching.  You will have to look for the tags and match them programmatically.

What do you want to see for outer/encapsulating tables?
I ll post an example.
So this would be a simple html page in which I would like to replace everything which contains
<table class= "WindowsUpdateSuccesRates" ..> </table>
By my own codeblock (which is output of powershell query).


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>

<body>
<table width="1642" border="1">
  <tr>
    <td><div align="center"><strong>Windows Updates</strong> Success rate</div></td>
  </tr>
  <tr>
    <td><div align="center">
      <table class= "WindowsUpdateSuccesRates" width="212" border="1">
        <tr>
          <td width="56">Month</td>
          <td width="56"><div align="center">Windows 7</div></td>
          <td width="36"><div align="center">Windows 10 </div></td>
          <td width="36"><div align="center">Windows 2012 </div></td>
          </tr>
        <tr>
          <td>11/2017</td>
          <td ><div align="center">97%</div></td>
          <td><div align="center">98%</div></td>
          <td><div align="center">99%</div></td>
        </tr>
        <tr>
          <td>12/2017</td>
          <td><div align="center">77%</div></td>
          <td> <div align="center">80%</div></td>
          <td><div align="center">79%</div></td>
          </tr>
      </table>
    </div></td>
  </tr>
</table>
</body>
</html>

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of aikimark
aikimark
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Great!
Works for this case, thanks!
However, now I'm trying to get a value from another site in which I included "AntivirusKPI" in html tag.
This is the tag
<Table class="AntivirusKPI" border=2 cellpadding=4 cellspacing=3>
When I try to get it, it doesn't return results ... anything I'm overlooking here?                  


$pattern='<table .*class=\s*"AntivirusKPI"[^>]*>(?:.|\n)+?</table>'
$webclient = New-Object System.Net.WebClient
$html = $webclient.DownloadString($url)
$Regex = [Regex]::Matches($html, $pattern)
$Regex.value
Without sample HTML, it is difficult to determine what is wrong or to debug the code.