Link to home
Start Free TrialLog in
Avatar of Sam Jacobs
Sam JacobsFlag for United States of America

asked on

Manipulate Web Page Via PowerShell

I'm looking to manipulate a web page via PowerShell when there is no id field on the HTML element.
Let's take Google as an example ... Some elements have an id:
User generated image... and some don't - they only have a class (or multiple) name(s):
User generated image
I can use the following code to change the CSS of an element with an ID:
$SiteURL = "https://www.google.com/"   
$google = New-Object -ComObject "InternetExplorer.Application"
$google.visible = $true
$google.Navigate2($SiteURL)
# wait until doc is loaded and ready
Write-Host "Waiting for document to load "
    while ($google.ReadyState -ne 4) {
        Write-Host "." -NoNewLine
        Start-Sleep 1
    }
$doc = $google.Document
$id = [System.__ComObject].InvokeMember("getElementById",[System.Reflection.BindingFlags]::InvokeMethod, $null, $doc, 'lga')
"Current CSS: $($id.style.csstext)"
$id.style.csstext = "display:none;"

Open in new window

... but how can I change the CSS of an element with only a class?
Avatar of David Favor
David Favor
Flag of United States of America image

Note: Code you're writing is considered Bot code (non-human interactions)...

First problem is Google blocks Bot interactions with their sites.

Likely code as simple as yours will be caught + blocked in a way which is very difficult to catch + debug.

Here's how to get your code working.

1) Setup your own page somewhere for testing, which allows Bot access to manipulate the page.

2) If you'll only be manipulating pure HTML sites, then your code will work.

3) If you're interacting with normal sites, which fire Javascript your code will fail... again... in ways difficult to catch + debug.

Note: If you're writing a general purpose tool like this, then you'll use http://phantomjs.org/ as this is a headless version of Chrome, which runs Javascript.

Tip: Almost every major site these days checks for Bot interactions + blocks them in various ways... so... likely better approach will be to check site's docs for API access. For example Google provides API access for many of it's services.
Avatar of Sam Jacobs

ASKER

David,

Thanks for your response.  Maybe I wasn't being clear. The code provided above does work (please feel free to try it).
I am quite aware that interacting with Google is best accomplished via their API.
I provided it solely as an example of what I am trying to accomplish with another website (without an API).

I respectfully disagree with your definition of Bot code. The code provided is quite similar to what Google would see coming from an actual human interaction. If I repeated the process many times in a short time span, that would be a different story.

-Sam
Indeed the only way to go forward might be to go through the collections at some hierarchy level. Like going thru all chidlren with a certain name, and counting or checking for a particular text or attribute or whatever distinguishes same class items.
Hi Sam,

I am not well versed in PowerShell. Qlemo is my go to for PS so I can only offer an idea and not full code. Perhaps if you want to scrape using vbs I can come up with something.   Can you try using
ParsedHtml.body.getElementsByClassName('gb_Q') 

Open in new window

or
ParsedHtml.body.getElementsByTagName('div') |  Where {$_.getAttributeNode('class').Value -eq 'gb_Q'}

Open in new window

Hi Scott,

Thanks for your reply.

Sorry, I should have mentioned that I already have the commands to retrieve the needed DOM objects.
I'm using the following:
$classes = $doc.getElementsByClassName('gb_Q')

Open in new window

I find retrieval by class name to be much faster than by tag name, which could be done with:
$divs = $doc.getElementsByTagName('div') |  Where className -like 'gb_Q*'

Open in new window

What I am seeking assistance with is how to modify the style of the elements (e.g. set to display:none;) once found.
I can modify a get/set attribute like innerText:
$classes[0].innerText = "My text"

Open in new window

However, style  seems to be a read-only property.

Thanks!
Sam
ASKER CERTIFIED SOLUTION
Avatar of Scott Fell
Scott Fell
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Scott ... thanks again for your detailed response.
I am quite familiar with how to do it in JavaScript.
I am also quite familiar with manipulating the DOM in PowerShell, including how to use and iterate through getElementById, GetElementsByClassName, and getElementsByTagName in PowerShell.
What I am not familiar with is how to modify the style of a class or a <div> in PowerShell
Scott ... great minds think alike ... I had also thought about replacing .outerHTML to include the modified style (which would of course over-ride any style sheets). I was just about to try it, when I reread your post, and saw that you had suggested it as well, so the points go to you!
(I still think there must be some way to modify the attributes of a style directly). Thanks!
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
OMG ... you are correct ... I was assuming that because .style.csstext was blank, that it wasn't working.
I could've sworn that I had tried it earlier and it failed  (maybe I had forgotten to include the index for getElementsbyClassName(...) when I tried it last).
But I tried it just did now, and it DOES work!
Thanks!