We help IT Professionals succeed at work.

Manipulate Web Page Via PowerShell

116 Views
1 Endorsement
Last Modified: 2019-02-15
I'm looking to manipulate a web page via PowerShell when there is no id field on the HTML element.
Let's take Google as an example ... Some elements have an id:
Element with an id... and some don't - they only have a class (or multiple) name(s):
Element with only a class name
I can use the following code to change the CSS of an element with an ID:
$SiteURL = "https://www.google.com/"   
$google = New-Object -ComObject "InternetExplorer.Application"
$google.visible = $true
$google.Navigate2($SiteURL)
# wait until doc is loaded and ready
Write-Host "Waiting for document to load "
    while ($google.ReadyState -ne 4) {
        Write-Host "." -NoNewLine
        Start-Sleep 1
    }
$doc = $google.Document
$id = [System.__ComObject].InvokeMember("getElementById",[System.Reflection.BindingFlags]::InvokeMethod, $null, $doc, 'lga')
"Current CSS: $($id.style.csstext)"
$id.style.csstext = "display:none;"

Open in new window

... but how can I change the CSS of an element with only a class?
Comment
Watch Question

David FavorFractional CTO
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
Note: Code you're writing is considered Bot code (non-human interactions)...

First problem is Google blocks Bot interactions with their sites.

Likely code as simple as yours will be caught + blocked in a way which is very difficult to catch + debug.

Here's how to get your code working.

1) Setup your own page somewhere for testing, which allows Bot access to manipulate the page.

2) If you'll only be manipulating pure HTML sites, then your code will work.

3) If you're interacting with normal sites, which fire Javascript your code will fail... again... in ways difficult to catch + debug.

Note: If you're writing a general purpose tool like this, then you'll use http://phantomjs.org/ as this is a headless version of Chrome, which runs Javascript.

Tip: Almost every major site these days checks for Bot interactions + blocks them in various ways... so... likely better approach will be to check site's docs for API access. For example Google provides API access for many of it's services.
Sam JacobsCitrix Technology Professional / Director of TechDev Services, IPM
CERTIFIED EXPERT

Author

Commented:
David,

Thanks for your response.  Maybe I wasn't being clear. The code provided above does work (please feel free to try it).
I am quite aware that interacting with Google is best accomplished via their API.
I provided it solely as an example of what I am trying to accomplish with another website (without an API).

I respectfully disagree with your definition of Bot code. The code provided is quite similar to what Google would see coming from an actual human interaction. If I repeated the process many times in a short time span, that would be a different story.

-Sam
Qlemo"Batchelor", Developer and EE Topic Advisor
CERTIFIED EXPERT
Top Expert 2015

Commented:
Indeed the only way to go forward might be to go through the collections at some hierarchy level. Like going thru all chidlren with a certain name, and counting or checking for a particular text or attribute or whatever distinguishes same class items.
Scott FellDeveloper
CERTIFIED EXPERT
Fellow
Most Valuable Expert 2013

Commented:
Hi Sam,

I am not well versed in PowerShell. Qlemo is my go to for PS so I can only offer an idea and not full code. Perhaps if you want to scrape using vbs I can come up with something.   Can you try using
ParsedHtml.body.getElementsByClassName('gb_Q') 

Open in new window

or
ParsedHtml.body.getElementsByTagName('div') |  Where {$_.getAttributeNode('class').Value -eq 'gb_Q'}

Open in new window

Sam JacobsCitrix Technology Professional / Director of TechDev Services, IPM
CERTIFIED EXPERT

Author

Commented:
Hi Scott,

Thanks for your reply.

Sorry, I should have mentioned that I already have the commands to retrieve the needed DOM objects.
I'm using the following:
$classes = $doc.getElementsByClassName('gb_Q')

Open in new window

I find retrieval by class name to be much faster than by tag name, which could be done with:
$divs = $doc.getElementsByTagName('div') |  Where className -like 'gb_Q*'

Open in new window

What I am seeking assistance with is how to modify the style of the elements (e.g. set to display:none;) once found.
I can modify a get/set attribute like innerText:
$classes[0].innerText = "My text"

Open in new window

However, style  seems to be a read-only property.

Thanks!
Sam
Developer
CERTIFIED EXPERT
Fellow
Most Valuable Expert 2013
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Sam JacobsCitrix Technology Professional / Director of TechDev Services, IPM
CERTIFIED EXPERT

Author

Commented:
Scott ... thanks again for your detailed response.
I am quite familiar with how to do it in JavaScript.
I am also quite familiar with manipulating the DOM in PowerShell, including how to use and iterate through getElementById, GetElementsByClassName, and getElementsByTagName in PowerShell.
What I am not familiar with is how to modify the style of a class or a <div> in PowerShell
Sam JacobsCitrix Technology Professional / Director of TechDev Services, IPM
CERTIFIED EXPERT

Author

Commented:
Scott ... great minds think alike ... I had also thought about replacing .outerHTML to include the modified style (which would of course over-ride any style sheets). I was just about to try it, when I reread your post, and saw that you had suggested it as well, so the points go to you!
(I still think there must be some way to modify the attributes of a style directly). Thanks!
Qlemo"Batchelor", Developer and EE Topic Advisor
CERTIFIED EXPERT
Top Expert 2015
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Sam JacobsCitrix Technology Professional / Director of TechDev Services, IPM
CERTIFIED EXPERT

Author

Commented:
OMG ... you are correct ... I was assuming that because .style.csstext was blank, that it wasn't working.
I could've sworn that I had tried it earlier and it failed  (maybe I had forgotten to include the index for getElementsbyClassName(...) when I tried it last).
But I tried it just did now, and it DOES work!
Thanks!
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.