• Status: Solved
  • Priority: Low
  • Security: Public
  • Views: 83
  • Last Modified:

Regex: select values between html tags?

Hi,

Howto get (with regex) everything which is between the tags <table class='whatever' > ... </table> (including the tags theirselves) in an html page?

J.
0
janhoedt
Asked:
janhoedt
  • 8
  • 4
  • 2
  • +1
1 Solution
 
F IgorDeveloperCommented:
You can use

/<table[^>]*>(.|\s)+?<\/table>/ig

Open in new window


The [^>] expression prevents to close the tag when there is some ' >'  character inside the tag
1
 
janhoedtAuthor Commented:
That does get all the tables right?  Not only the one with class='whatever`
0
 
F IgorDeveloperCommented:
Yes, it's for getting all tables (TABLE tags). What programming language are you planning to do this?
There could be easier ways to accomplish the same task using language-specific utilities or libraries.
1
Has Powershell sent you back into the Stone Age?

If managing Active Directory using Windows Powershell® is making you feel like you stepped back in time, you are not alone.  For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why.

 
janhoedtAuthor Commented:
Powershell. Goal is to get content of an html page (which I created myself), search for specific html tag and replace  it with new html tag (which contains new value).

I know out-html and other ways to create html, but I d like to keep editing the original html page in Dreamweaver so I have advanced editing possibilities.
0
 
janhoedtAuthor Commented:
To clarify:

$webpage = get-content MyWebPage Location

 $tables = regex on $webpage

$toreplacehtmlblock= $tables | where-object contains class=...

$newhtmlblock = replace string in $

toreplacehtmlblock

Set-content to new page
0
 
Jose Gabriel Ortega CCEO Faru Bonon ITCommented:
Maybe something like this...

[xml]$Global:HtmlContent = Get-content file.html
$NR = (New-Object System.Xml.XmlNodeReader $Global:HtmlContent)

#find Specific node:
$table= $NR.FindObject("table")

Open in new window


Or you can even use the SelectNodes cmdlet
1
 
aikimarkCommented:
Do you have nested tables?
1
 
janhoedtAuthor Commented:
Will look into the code you posted, thanks!  Not sure what it is doing though....
Yes, I do have tables in tables in the html.
0
 
aikimarkCommented:
I've found that nesting throws off regex pattern matching.  You will have to look for the tags and match them programmatically.

What do you want to see for outer/encapsulating tables?
0
 
janhoedtAuthor Commented:
I ll post an example.
0
 
janhoedtAuthor Commented:
So this would be a simple html page in which I would like to replace everything which contains
<table class= "WindowsUpdateSuccesRates" ..> </table>
By my own codeblock (which is output of powershell query).


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>

<body>
<table width="1642" border="1">
  <tr>
    <td><div align="center"><strong>Windows Updates</strong> Success rate</div></td>
  </tr>
  <tr>
    <td><div align="center">
      <table class= "WindowsUpdateSuccesRates" width="212" border="1">
        <tr>
          <td width="56">Month</td>
          <td width="56"><div align="center">Windows 7</div></td>
          <td width="36"><div align="center">Windows 10 </div></td>
          <td width="36"><div align="center">Windows 2012 </div></td>
          </tr>
        <tr>
          <td>11/2017</td>
          <td ><div align="center">97%</div></td>
          <td><div align="center">98%</div></td>
          <td><div align="center">99%</div></td>
        </tr>
        <tr>
          <td>12/2017</td>
          <td><div align="center">77%</div></td>
          <td> <div align="center">80%</div></td>
          <td><div align="center">79%</div></td>
          </tr>
      </table>
    </div></td>
  </tr>
</table>
</body>
</html>

Open in new window

0
 
aikimarkCommented:
This pattern will match that exact table.
<table .*class=\s*"WindowsUpdateSuccesRates"[^>]*>(?:.|\n)+?</table>

Open in new window

1
 
janhoedtAuthor Commented:
Great!
0
 
janhoedtAuthor Commented:
Works for this case, thanks!
However, now I'm trying to get a value from another site in which I included "AntivirusKPI" in html tag.
This is the tag
<Table class="AntivirusKPI" border=2 cellpadding=4 cellspacing=3>
When I try to get it, it doesn't return results ... anything I'm overlooking here?                  


$pattern='<table .*class=\s*"AntivirusKPI"[^>]*>(?:.|\n)+?</table>'
$webclient = New-Object System.Net.WebClient
$html = $webclient.DownloadString($url)
$Regex = [Regex]::Matches($html, $pattern)
$Regex.value
0
 
aikimarkCommented:
Without sample HTML, it is difficult to determine what is wrong or to debug the code.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

  • 8
  • 4
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now