janhoedt
asked on
Regex: select values between html tags?
Hi,
Howto get (with regex) everything which is between the tags <table class='whatever' > ... </table> (including the tags theirselves) in an html page?
J.
Howto get (with regex) everything which is between the tags <table class='whatever' > ... </table> (including the tags theirselves) in an html page?
J.
ASKER
That does get all the tables right? Not only the one with class='whatever`
Yes, it's for getting all tables (TABLE tags). What programming language are you planning to do this?
There could be easier ways to accomplish the same task using language-specific utilities or libraries.
There could be easier ways to accomplish the same task using language-specific utilities or libraries.
ASKER
Powershell. Goal is to get content of an html page (which I created myself), search for specific html tag and replace it with new html tag (which contains new value).
I know out-html and other ways to create html, but I d like to keep editing the original html page in Dreamweaver so I have advanced editing possibilities.
I know out-html and other ways to create html, but I d like to keep editing the original html page in Dreamweaver so I have advanced editing possibilities.
ASKER
To clarify:
$webpage = get-content MyWebPage Location
$tables = regex on $webpage
$toreplacehtmlblock= $tables | where-object contains class=...
$newhtmlblock = replace string in $
toreplacehtmlblock
Set-content to new page
$webpage = get-content MyWebPage Location
$tables = regex on $webpage
$toreplacehtmlblock= $tables | where-object contains class=...
$newhtmlblock = replace string in $
toreplacehtmlblock
Set-content to new page
Maybe something like this...
Or you can even use the SelectNodes cmdlet
[xml]$Global:HtmlContent = Get-content file.html
$NR = (New-Object System.Xml.XmlNodeReader $Global:HtmlContent)
#find Specific node:
$table= $NR.FindObject("table")
Or you can even use the SelectNodes cmdlet
Do you have nested tables?
ASKER
Will look into the code you posted, thanks! Not sure what it is doing though....
Yes, I do have tables in tables in the html.
Yes, I do have tables in tables in the html.
I've found that nesting throws off regex pattern matching. You will have to look for the tags and match them programmatically.
What do you want to see for outer/encapsulating tables?
What do you want to see for outer/encapsulating tables?
ASKER
I ll post an example.
ASKER
So this would be a simple html page in which I would like to replace everything which contains
<table class= "WindowsUpdateSuccesRates" ..> </table>
By my own codeblock (which is output of powershell query).
<table class= "WindowsUpdateSuccesRates"
By my own codeblock (which is output of powershell query).
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<table width="1642" border="1">
<tr>
<td><div align="center"><strong>Windows Updates</strong> Success rate</div></td>
</tr>
<tr>
<td><div align="center">
<table class= "WindowsUpdateSuccesRates" width="212" border="1">
<tr>
<td width="56">Month</td>
<td width="56"><div align="center">Windows 7</div></td>
<td width="36"><div align="center">Windows 10 </div></td>
<td width="36"><div align="center">Windows 2012 </div></td>
</tr>
<tr>
<td>11/2017</td>
<td ><div align="center">97%</div></td>
<td><div align="center">98%</div></td>
<td><div align="center">99%</div></td>
</tr>
<tr>
<td>12/2017</td>
<td><div align="center">77%</div></td>
<td> <div align="center">80%</div></td>
<td><div align="center">79%</div></td>
</tr>
</table>
</div></td>
</tr>
</table>
</body>
</html>
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Great!
ASKER
Works for this case, thanks!
However, now I'm trying to get a value from another site in which I included "AntivirusKPI" in html tag.
This is the tag
<Table class="AntivirusKPI" border=2 cellpadding=4 cellspacing=3>
When I try to get it, it doesn't return results ... anything I'm overlooking here?
$pattern='<table .*class=\s*"AntivirusKPI"[ ^>]*>(?:.| \n)+?</tab le>'
$webclient = New-Object System.Net.WebClient
$html = $webclient.DownloadString( $url)
$Regex = [Regex]::Matches($html, $pattern)
$Regex.value
However, now I'm trying to get a value from another site in which I included "AntivirusKPI" in html tag.
This is the tag
<Table class="AntivirusKPI" border=2 cellpadding=4 cellspacing=3>
When I try to get it, it doesn't return results ... anything I'm overlooking here?
$pattern='<table .*class=\s*"AntivirusKPI"[
$webclient = New-Object System.Net.WebClient
$html = $webclient.DownloadString(
$Regex = [Regex]::Matches($html, $pattern)
$Regex.value
Without sample HTML, it is difficult to determine what is wrong or to debug the code.
Open in new window
The [^>] expression prevents to close the tag when there is some ' >' character inside the tag