(powershell) some serious string manipulation

Stevolee
Stevolee used Ask the Experts™
on
Hi,

I have attached the code below, I need to return just "esx" in the code  below. I need a function whereby I pass the whole string and a path separator (I guess as a path separator it will be <td>) the function will then return just "esx"

The situation is I have to repeat this several times, I know the position, not the character(i.e esx)... So i need something like:

#string
$htmlcode = "<tr><td>VMware, Inc.</td><td>3.5.0</td><td>82663</td><td>esx</td><td>VMware ESX Server 3.5.0 build-82663</td></tr>"

#call the function
returncharacter ($htmlcode, 4)

#result
esx is retuned

#actual function
Function retcharacter {
some serious string manipulation..;-)
}




<tr><td>VMware, Inc.</td><td>3.5.0</td><td>82663</td><td>esx</td><td>VMware ESX Server 3.5.0 build-82663</td></tr>

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Commented:
Substring?
such as "abcdefgh".Substring(2,3)

Author

Commented:
Thanks for the response mate, however as you know that will only return three character after counting the first two, not even close to what I am looking for.
The solution might involve regular expression and a designated path sepator... I urgently need the function anyone else out there????
Thanks in advance
 
 
 

Commented:
So you can detect first occurence of your separator and then get substring from it till end.
And repeat this several times in a loop. Easiest way to get proper string is to ignore empty ones included between adjacing html tags
Exploring SQL Server 2016: Fundamentals

Learn the fundamentals of Microsoft SQL Server, a relational database management system that stores and retrieves data when requested by other software applications.

Author

Commented:
Please provide example in the form of the function that provides the result. I can do the same thing in vbscript with my eyes close using instr, however powershell has more functionality like substring, regex...
I want a solution that is not messy..
Thanks
 

Author

Commented:
Here is a sample solution I have put together:
Function $string, $int {
 $aryHTML = $string.replace("<td>", "#").split("#")
 return $aryHTML[$int].replace("</td>", "")
}
I am not exactly a powershell novice myself. However, I was hoping for something more elegant..;-)  Can anyone top that????
Thanks in advance...

Author

Commented:
Full code

$htmlcode = "<tr><td>VMware, Inc.</td><td>3.5.0</td><td>82663</td><td>esx</td><td>VMware ESX Server 3.5.0 build-82663</td></tr>"#

Function Get-HTMLData($Input, $int) {
 $aryHTML = $htmlcode.replace("<td>", "#").split("#")
 return $aryHTML[4].replace("</td>", "")
}

Get-HTMLData $htmlcode, 2
This returns esx, try it urself.

Author

Commented:
This is much better...

$htmlcode = "<tr><td>VMware, Inc.</td><td>3.5.0</td><td>82663</td><td>esx</td><td>VMware ESX Server 3.5.0 build-82663</td></tr>"#

Function Get-HTMLData([string]$html, [int]$int) {

 $aryHTML = $html.replace("</tr>", "")
 $aryHTML = $aryHTML.replace("<td>", "#").split("#")
 return $aryHTML[$int].replace("</td>", "")
 
}

Get-HTMLData $htmlcode 4

 
 
Try this.

$string = "esx"
$htmlcode = "<tr><td>VMware, Inc.</td><td>3.5.0</td><td>82663</td><td>esx</td><td>VMware ESX Server 3.5.0 build-82663</td></tr>"
$htmlcode.Substring($htmlcode.IndexOf($string),3)

Open in new window

Author

Commented:
Hi Learnctx,
Very good attempt, not exactly what I am looking for. In my situation I don't have the name of the string I am looking for, I only have the position. Your solution is the other way round you assume I have the string name and length. In my case I know the position, I don't know the string or the length.
Nice try though, your script will come in handy for other part of my monster script.
Once again guys I need an elegant solution to my dilema. The solution I posted (based on vbscript format) works but there has to be an elegant solution in powershell.
 
PowerShell Developer
Top Expert 2010
Commented:
It isn't easy. The best I can come up with that uses regular expression matching to parse the HTML is below. I don't consider this to be more efficient than the replace version you have already.

"-match" only captures a single match (the first match based on the RegEx from the string it's passed). It can catch more if you use the + operator on the RegEx (e.g. "(<\w*>[^<|^>]*</\w*>)+"), but then you'll get one match and a less specific match (the alternatives to this tend to give much the same result).

The work-around for that is to take the HTML string and chop bits off until you end up with the piece from the correct position. That's what the code below is doing, using the reserved "$Matches" variable holding the result of the -match operation.

There are alternatives to this. PowerShell can use .NET, therefore PowerShell should be able to use this:

http://htmlagilitypack.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=272

If you're doing a lot of work parsing HTML that may be beneficial. If not, I'd stick with the Replace you already have.

Chris
Function Get-HTMLData([string]$html, [int]$int) {
  Do {
    [Void]($html -Match "<\w*>[^<|^>]*</\w*>")
    $html = $html -Replace $Matches[0]; $i++
  } Until ($i -eq $int)
  Return $Matches[0] -Replace "</?td>"
}
 
Get-HTMLData $htmlcode 4

Open in new window

Author

Commented:
Chris,
You are the man, I like your fucntion. My main script requires a lot of html replacement as I am higlighting failed items in red and good ones in green and so on.... I know the position but NOT the name of the items....
Good work, a bit upset you did not come online sonner...;-) Just kiddin...
Thanks mate!
 
 

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial