Having a custom web bot can save you lots of time and effort. For instance, you might need to log in to a site, click a certain link, scroll down to locate some particular information, copy it to a clipboard for pasting into a spreadsheet, ... then do the whole thing over again the next day. "If only I had a robot to do that for me!
The programming for a browser bot is not trivial, but it's also not terribly difficult. The common question is "Where do I start?
This series of articles describes one approach for C++ programmers. We'll set up a simple "test harness" that will let you experiment with the basic steps of automatically filling in text boxes, selecting options, clicking buttons, and submitting forms. We'll also look at a way to automatically extract information such as stock quotes, movie schedules, etc. from the sites where you would normally need to browse and collect that information manually.
In this three-part series, We will use Visual C++ to create a dialog-based application that will:
1) Pull up and display a log-in page of a site. (PART ONE)
2) Enter some values (a username and password) and click a button on the web page. (PART TWO
3) Surf to a particular page on the site.
4) Obtain the entire HTML text of the resulting page. (PART THREE
5) Search that HTML it to locate and display some particular pieces of information.
In this article, PART ONE, we'll build the test-harness application and use it to read and display a login page from a well-known web site. Let's begin!
1. Create a Dialog-based Application with MFC Support
Select menu command File / New Project...
and MFC Application
Click OK, then Click Next.
In Application Type, select Dialog based
We'll be using default options, so just click [Finish]
Note: We use an MFC dialog-based application because they are so easy to create and so useful for experimentation. For instance, to try out something new, just draw a button on the dialog and code up a button-click handler.
2. Add a Browser Control
In the dialog editor,
Right-click in the dialog and select Insert ActiveX Control...
Scroll down to locate Microsoft Web Browser
Resize the control as desired.
3. Create a C++ Wrapper for the Browser Control
Use the ClassWizard create a wrapper and a control-type variable (m_ctlBrowser) for the browser control:
In VS 6, we'd press Ctrl+W to display the Class Wizard.
In VS 2008, we click the Class View
tab in the leftmost panel.
Select Add > Add variable...
In the "Add Member Variable Wizard" ...
Put a checkmark in this checkbox
Select IDC_EXPLORER1 from the list.
In the Solution Explorer, you now have a C++ file named explorer1.cpp with a matching header file. This is your wrapper
for the ActiveX control.
4. Add a Button and an OnClick Handler
Add a "Do Login" button, like so:
In the Dialog Editor...
Select the "Button" control from the Toolbox
Draw a button in the dialog (initially named "Button 1")
Right-click the button and set its Caption
attribute to "Do Login" and...
Set its ID
Double-click the new [Do Login] button.
This displays the source code stub for a new handler for clicks of that button.
Make it so:
MessageBox( L"Login -- see PART TWO!" );
5. Open a Web Page at Program Startup
Set OnInitDialog to pull up the login page. In the Class View
panel, double-click OnInitDialog (or just scroll up and find it in the WebRobotDlg.cpp file).
Make the bottom of the CWebRobotDlg::OnInitDialog
() function look like this:
GetDlgItem( IDC_DoLogin )->EnableWindow( false );
m_ctlBrowser.Navigate(L"https://login.yahoo.com/config/login", 0,0,0,0 );
For demonstration purposes, the program hard-codes the URL of a well-known website Login page. That page may change in the future, but the principles described here will remain valid for other pages.
6. Add an OnDocumentComplete() Handler
In the Dialog Editor...
Right-click the web browser control and select Add Event Handler...
Select "Document Complete"
Leave "CWebRobotDlg" selected.
Click [Add and Edit]
Make the code for the DocumentCompleteExplorer1(
) function look like this:
void CWebRobotDlg::DocumentCompleteExplorer1(LPDISPATCH pDisp, VARIANT* URL)
GetDlgItem( IDC_DoLogin )->EnableWindow( true );
Just a moment ago, we added code in the OnInitDialog function to dis
able that button... Why do all of this?
This is a crucial element of all webbot programming.
You cannot access the browser DOM -- to get access to page HTML elements -- until the page has been completely loaded and the browser control is ready to proceed.
Run the program and you should see something like this:
Summary of PART ONE:
With just a few steps, we have created an application program that will access a particular web page and display it. All the pieces are in place to begin manipulating that page.
We saw how to use the Visual Studio AppWizard to create a simple dialog-based application with a Web Browser ActiveX control and one button.
We used the ClassWizard to create a C++ wrapper for the ActiveX control.
We used the control's Navigate() function to have it display a login page from Yahoo.com
We talked about the requirement that no action should take place until the browser control indicates that the document has been completely downloaded and the DOM is ready. We added an OnDocumentComplete() function so that we can know exactly when it is OK to continue.
In PART TWO
, we will get to the core of our webbot program -- accessing the WebBrowser DOM (Document Object Model) so that we can automatically fill-in text boxes, click buttons, and so forth. I'll show you how to fill-in a couple of text boxes and click that [Sign In] button.
If you liked this article
and want to see more from this author,
please click the Yes
button near the:
Was this article helpful?
label that is just below and to the right of this text. Thanks!