Solved

Perl Web Automation

Posted on 2011-09-20
8
899 Views
Last Modified: 2013-11-10
Hi Experts,

I'm wanting to do some web automation using Perl, but I'm struggling, especially since this site will not allow me to login or perform some of the vital steps without JavaScript.  Does that mean I'll need WWW::Scripter::Plugin::JavaScript?  It also has to handle cookies.


Here are some of the key steps:

1. Login at: http://portal.ccli.com/
    Yes, that's the real URL, so you can visit that login page if you like, but sorry I can't give you a username/password.  If you view source, you'll see the User ID and Password input tags are:
       <input name="ctl00$cph1$txtUserId" type="text" maxlength="20" id="ctl00_cph1_txtUserId" style="width:215px;" />
       <input type="password" name="password" style="width:215px;" MaxLength="20" value="" />

2. On the next page, I have to click the "Launch Copy Report" link.  This is what it looks like:
        <a id="ctl00_cph1_lnkOLCR" class="applink" href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;ctl00$cph1$lnkOLCR&quot;, &quot;&quot;, false, &quot;&quot;, &quot;http://www.ccli.com/CopyReport/Login.cfm";, false, true))">Launch Copy Report</a>

3. On the next page, I need to go to this page:
      http://www.ccli.com/CopyReport/SongCopyright.cfm?song_id=12345
    where I will supply the "12345" which could be any number.

4. Click the "Enter into Copy Report" button (image):
      <input TYPE="image" SRC="Images/Buttons/EnterIntoCopyReport.gif" onClick="enterSong();">

5. On the next page, type "1" into this field:
      <input type="text" size="10" name="project" value="0" tabindex="1" class="Highlighted">

6. Select the "CCLI Number" option from a dropdown:
      <select name="search_type" size="1" onchange="setDefaultMethod(this);">
          <option value="TitleOnly">Title</option>
          <option value="Title">Title &amp; AKA</option>
          <option value="Text">Lyrics</option>
          <option>Author</option>
          <option>Catalog</option>
          <option>Theme</option>
          <option selected>CCLI Number</option>
      </select>

7. Click the "Save" button (image):
      <input type="image" src="Images/Buttons/Save.gif" onclick="document.forms[0].nextPage.value='SongListReportSession.cfm?finished=true';" tabindex="4">

If I can get the above working, I can hopefully handle the rest.


Here's my first attempt at the code, but in addition to other things, it doesn't handle the site's requirement for JavaScript:
use WWW::Mechanize;

$m = WWW::Mechanize->new();

# Step 1. Login at: http://portal.ccli.com/
$m->get('http://portal.ccli.com/');
#print $mech->content;
$m->set_visible('myUsername','myPassword');
$m->submit();
#print $m->content;

# Step 2.  On the next page, I have to click the "Launch Copy Report" link.
# I don't know how to do this, as it's got JavaScript.

# Step 3.  On the next page, I need to go to this page:
$songno = 12345;
$m->get("http://www.ccli.com/CopyReport/SongCopyright.cfm?song_id=$songno");

# Step 4. Click the "Enter into Copy Report" button (image):
# I don't know how to do this, as it's got JavaScript.

# Step 5. On the next page, type "1" into this field:
$m->field('project','1');

# Step 6. Select the "CCLI Number" option from a dropdown:
# I don't know how to do this.

# Step 7. Click the "Save" button (image):
# I don't know how to do this, as it's got JavaScript.

Open in new window


Can someone help me get these main steps working, please?

Thanks.
tel2
0
Comment
Question by:tel2
  • 4
  • 4
8 Comments
 
LVL 23

Expert Comment

by:nemws1
ID: 36567836
Well, first off, you need to cookies, you just need to add in a cookie jar, which is only 2 additional lines:
 
use WWW::Mechanize;
use HTTP::Cookies;

$m = WWW::Mechanize->new();
$m->cookie_jar(HTTP::Cookies->new());

# Step 1. Login at: http://portal.ccli.com/
......

Open in new window


As for the rest, JavaScript doesn't really matter, if you know what the JavaScript is calling/sending back on the server side.  I'll often use the Firefox "Live HTTP Headers" plugin to help with this (or a packet sniffer like Wireshark, but a lot of people don't know how to use sniffer software).  Live HTTP Headers really should let you get by step #2, #4, #6, & #7.  You'll need the URL and any arguments being sent to it (should contain any form elements/info).  You don't need to worry about any of the Cookie headers if you're using the cookie jar. ;-)

Without seeing the code for having a login, its hard to help further.
0
 
LVL 12

Author Comment

by:tel2
ID: 36570259
Thanks for that, nemws1.

When you say:
    "JavaScript doesn't really matter, if you know what the JavaScript is calling/sending back on the server side."
Did you see my original comment:
    "...this site will not allow me to login or perform some of the vital steps without JavaScript"?
To test this, if you turn JavaScript off in Firefox, then browse to http://portal.ccli.com/, you will not be prompted for User ID / Password.  Instead, you'll get the error message:
    "Javascript & cookies are required for this site to function."
Try it if you like.  How do I get past that?

Thanks.
tel2
0
 
LVL 23

Accepted Solution

by:
nemws1 earned 500 total points
ID: 36570573
Javascript is all client-side.  The server has no idea what you're doing w/ Javascript on your client.  Yes, if you disable it, it won't work, because the server is expecting a certain response, but if you know what that response is going to be, you can script it with WWW::Mechanize without using Javascript.

For example, perhaps that form dynamically generates via Javascript the FORM element named "password".  When you submit the form (with Javascript turned on with Firefox) with "Live HTTP Headers" running, you'll see the client is submitting a form and passing it the "password" variable with an argument (along with all the other INPUT elements).  You just need to figure out all the variables that the browser is sending to the server when something is submitted.

When I tried just now with a fake username/password, "Live HTTP Headers" gave me the following URL (right after the Content-Length: header):

__LASTFOCUS=&__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPDwUKMTMxNjk4NDk1NA9kFgJmD2QWAgIDD2QWAmYPFgIeBFRleHQFHlNvbmdTZWxlY3QgLyBDb3B5IFJlcG9ydCBMb2dpbmQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFGGN0bDAwJGNwaDEkY2hrUmVtZW1iZXJNZQ%3D%3D&ctl00%24cph1%24fldVerificationCode=&ctl00%24cph1%24txtUserId=spam&password=blah&ctl00%24cph1%24btnLogin=Login

Open in new window


Which I can then deconstruct into the following form elements that I would need for WWW::Mechanize:
 
__LASTFOCUS=
__EVENTTARGET=
__EVENTARGUMENT=
__VIEWSTATE=%2FwEPDwUKMTMxNjk4NDk1NA9kFgJmD2QWAgIDD2QWAmYPFgIeBFRleHQFHlNvbmdTZWxlY3QgLyBDb3B5IFJlcG9ydCBMb2dpbmQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFGGN0bDAwJGNwaDEkY2hrUmVtZW1iZXJNZQ%3D%3D
ctl00%24cph1%24fldVerificationCode=
ctl00%24cph1%24txtUserId=spam
password=blah
ctl00%24cph1%24btnLogin=Login

Open in new window


Granted, that __VIEWSTATE looks a little hairy, but it's probably not even needed.  I would try it with just the two fields showing the username and password (blah & spam)
0
MIM Survival Guide for Service Desk Managers

Major incidents can send mastered service desk processes into disorder. Systems and tools produce the data needed to resolve these incidents, but your challenge is getting that information to the right people fast. Check out the Survival Guide and begin bringing order to chaos.

 
LVL 12

Author Comment

by:tel2
ID: 36570826
That's great, nemws1

Thanks to your suggestions, the code is now logging in (step 1)!

Here's the code now:
use WWW::Mechanize;
use HTTP::Cookies;

$m = WWW::Mechanize->new();
$m->cookie_jar(HTTP::Cookies->new());

# Step 1. Login at: http://portal.ccli.com/
$m->get('http://portal.ccli.com/');
$m->set_visible('myUserID', 'myPassword');
#$m->submit();
$m->click('ctl00$cph1$btnLogin');
#print $m->content;

# Step 2.  On the next page, I have to click the "Launch Copy Report" link.
$m->click_button(number => 4);
print $m->content;

Open in new window

My next problem is how to click the "Launch Copy Report" button on the next page.
As mentioned in my original post, the code for that button looks like this:
    <a id="ctl00_cph1_lnkOLCR" class="applink" href="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;ctl00$cph1$lnkOLCR&quot;, &quot;&quot;, false, &quot;&quot;, &quot;http://www.ccli.com/CopyReport/Login.cfm";, false, true))">Launch Copy Report</a>
So, how do I click it?
I don't see a "name" for it, so maybe I can click it by button number.  This is the 4th button on the page, as confirmed by: lynx -dump step2.htm
So, I tried:
    $m->click_button(number => 4);
but that gives the error:
    "Can't call method "click" on an undefined value at .../WWW/Mechanize.pm line 1770."
So, I tried:
    $m->get('http://www.ccli.com/CopyReport/Login.cfm');
but that seemed to fail and take me back to the initial login page.

Any ideas how I should click the "Launch Copy Report" button?

I've attached the entire web page, in case it's of use.
step2.txt
0
 
LVL 23

Expert Comment

by:nemws1
ID: 36570929
I guess I wouldn't even try to use a click_button(), but just call $m->submit_form(); with whatever fields you get from Live HTTP Headers when you click on the button.  If you've set up the cookie jar, it usually is pretty straightforward.

I can't tell from the code you attached what the 'name' value is either, but it has to have a name & value (that's the way web forms work!).  Again, Live HTTP Headers will be able to give you the name in a heartbeat.
0
 
LVL 12

Author Comment

by:tel2
ID: 36571469
Hi again nemws1,

Thanks for that.  I have installed Live HTTP Headers into Firefox, and I:
- Logged in to the first page (step 1) via Firefox.
- Started on Live HTTP Headers.
- Clicked the "Launch Copy Report" link.
And attached is the Live HTTP Headers log (I have replaced my user ID with "myUserID" and my password with "myPassword", as is my custom).

I then tried adding to my script, various combinations of get commands, including this one:
    $m->get('http://www.ccli.com/CopyReport/Login.cfm?ctl00%24cph1%24hdnUrchinHiddenField=&FromLogin=Yes&javascript=yes&cookies=yes&browser=yes&LoginID=myUserID&Password=myPassword&aliasLoginID=myUserID&isCRA=0&ctl00%24cph1%24chkCopyReporter=on');
and checked the resulting page this:
    print $m->content;
but alas, I'm back to the login page.
I also tried replacing the "%24"s with "$"s (should I do that?), but still no joy.
I'm not sure if I've got the stuff before the 1st parameter right (i.e. "http://www.ccli.com/CopyReport/Login.cfm?").

What do you recommend?

Can't I just click that 4th button (the "Launch Copy Report" image) somehow?  If not, why not?

If you want to test things yourself, I can email you a temporary password if you go to my profile page, and send me a message by clicking the "Hire Me" link on the left.

Thanks again for your time.
step2b.txt
0
 
LVL 23

Expert Comment

by:nemws1
ID: 36574449
First, yes, leave the %24s - don't use '$' instead.

Secondly, my e-mail is expexch@emptec.com - send me temp. password and I'll see what I can come up with. ;-)
0
 
LVL 12

Author Comment

by:tel2
ID: 36908705
Hi nemws1,

Thanks for the above help, and for trying to help further.  The points are yours.  Although we didn't get it all working, at least I learned a few things (e.g. about Firefox's "Live HTTP Headers" plugin).

In the end I ran out of time and did the automation with Firefox's iMacros extension, though I would still like to know how to do such with Perl, because of the flexibility that offers.

> First, yes, leave the %24s - don't use '$' instead.
The reason I asked about this is, before trying:
    $m->set_visible('myUserID', 'myPassword');
I tried:
    $m->field('ctl00%24cph1%24txtUserId','myUserID');
    $m->field('password','myPassword');
because "Live HTTP Headers" showed the "%24"s, but that failed, so I changed the "%24"s to "$"s, like this:
    $m->field('ctl00$cph1$txtUserId','myUserID');
    $m->field('password','myPassword');
and that worked, so I figured I might need to do that elsewhere, too.  I then changed it to:
    $m->set_visible('myUserID', 'myPassword');
for simplicity.
So when should I change them and when shouldn't I?

And why is it that on that first login page, I had to do:
    $m->click('ctl00$cph1$btnLogin');
and couldn't just do:
    $m->submit();
?

Thanks again.
tel2
0

Featured Post

Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
The viewer will learn how to dynamically set the form action using jQuery.

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question