Regex: select values between html tags?

Hi,

Howto get (with regex) everything which is between the tags <table class='whatever' > ... </table> (including the tags theirselves) in an html page?

J.
janhoedtAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

F IgorDeveloperCommented:
You can use

/<table[^>]*>(.|\s)+?<\/table>/ig

Open in new window


The [^>] expression prevents to close the tag when there is some ' >'  character inside the tag
1
janhoedtAuthor Commented:
That does get all the tables right?  Not only the one with class='whatever`
0
F IgorDeveloperCommented:
Yes, it's for getting all tables (TABLE tags). What programming language are you planning to do this?
There could be easier ways to accomplish the same task using language-specific utilities or libraries.
1
How the Cloud Can Help You as an MSSP

Today, every Managed Security Service Provider (MSSP) needs a platform to deliver effective and efficient security-as-a-service to their customers. Scale, elasticity and profitability are a few of the many features that a Cloud platform offers. Register today to learn more!

janhoedtAuthor Commented:
Powershell. Goal is to get content of an html page (which I created myself), search for specific html tag and replace  it with new html tag (which contains new value).

I know out-html and other ways to create html, but I d like to keep editing the original html page in Dreamweaver so I have advanced editing possibilities.
0
janhoedtAuthor Commented:
To clarify:

$webpage = get-content MyWebPage Location

 $tables = regex on $webpage

$toreplacehtmlblock= $tables | where-object contains class=...

$newhtmlblock = replace string in $

toreplacehtmlblock

Set-content to new page
0
Jose Gabriel Ortega CastroCEOCommented:
Maybe something like this...

[xml]$Global:HtmlContent = Get-content file.html
$NR = (New-Object System.Xml.XmlNodeReader $Global:HtmlContent)

#find Specific node:
$table= $NR.FindObject("table")

Open in new window


Or you can even use the SelectNodes cmdlet
1
aikimarkCommented:
Do you have nested tables?
1
janhoedtAuthor Commented:
Will look into the code you posted, thanks!  Not sure what it is doing though....
Yes, I do have tables in tables in the html.
0
aikimarkCommented:
I've found that nesting throws off regex pattern matching.  You will have to look for the tags and match them programmatically.

What do you want to see for outer/encapsulating tables?
0
janhoedtAuthor Commented:
I ll post an example.
0
janhoedtAuthor Commented:
So this would be a simple html page in which I would like to replace everything which contains
<table class= "WindowsUpdateSuccesRates" ..> </table>
By my own codeblock (which is output of powershell query).


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>

<body>
<table width="1642" border="1">
  <tr>
    <td><div align="center"><strong>Windows Updates</strong> Success rate</div></td>
  </tr>
  <tr>
    <td><div align="center">
      <table class= "WindowsUpdateSuccesRates" width="212" border="1">
        <tr>
          <td width="56">Month</td>
          <td width="56"><div align="center">Windows 7</div></td>
          <td width="36"><div align="center">Windows 10 </div></td>
          <td width="36"><div align="center">Windows 2012 </div></td>
          </tr>
        <tr>
          <td>11/2017</td>
          <td ><div align="center">97%</div></td>
          <td><div align="center">98%</div></td>
          <td><div align="center">99%</div></td>
        </tr>
        <tr>
          <td>12/2017</td>
          <td><div align="center">77%</div></td>
          <td> <div align="center">80%</div></td>
          <td><div align="center">79%</div></td>
          </tr>
      </table>
    </div></td>
  </tr>
</table>
</body>
</html>

Open in new window

0
aikimarkCommented:
This pattern will match that exact table.
<table .*class=\s*"WindowsUpdateSuccesRates"[^>]*>(?:.|\n)+?</table>

Open in new window

1

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
janhoedtAuthor Commented:
Great!
0
janhoedtAuthor Commented:
Works for this case, thanks!
However, now I'm trying to get a value from another site in which I included "AntivirusKPI" in html tag.
This is the tag
<Table class="AntivirusKPI" border=2 cellpadding=4 cellspacing=3>
When I try to get it, it doesn't return results ... anything I'm overlooking here?                  


$pattern='<table .*class=\s*"AntivirusKPI"[^>]*>(?:.|\n)+?</table>'
$webclient = New-Object System.Net.WebClient
$html = $webclient.DownloadString($url)
$Regex = [Regex]::Matches($html, $pattern)
$Regex.value
0
aikimarkCommented:
Without sample HTML, it is difficult to determine what is wrong or to debug the code.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Scripting Languages

From novice to tech pro — start learning today.