Community Pick: Many members of our community have endorsed this article.
Editor's Choice: This article has been selected by our editors as an exceptional contribution.

Browser Bot -- Automate Browsing Sequences With C++ (PART ONE)

DanRollins
CERTIFIED EXPERT
Published:
Having a custom web bot can save you lots of time and effort.  For instance, you might need to log in to a site, click a certain link, scroll down to locate some particular information, copy it to a clipboard for pasting into a spreadsheet, ... then do the whole thing over again the next day. "If only I had a robot to do that for me!"

The programming for a browser bot is not trivial, but it's also not terribly difficult.  The common question is "Where do I start?"  

This series of articles describes one approach for C++ programmers.  We'll set up a simple "test harness" that will let you experiment with the basic steps of automatically filling in text boxes, selecting options, clicking buttons, and submitting forms.  We'll also look at a way to automatically extract information such as stock quotes, movie schedules, etc. from the sites where you would normally need to browse and collect that information manually.

Overview
In this three-part series, We will use Visual C++ to create a dialog-based application that will:
1) Pull up and display a log-in page of a site. (PART ONE)
2) Enter some values (a username and password) and click a button on the web page. (PART TWO)
3) Surf to a particular page on the site.
4) Obtain the entire HTML text of the resulting page. (PART THREE)
5) Search that HTML it to locate and display some particular pieces of information.
Final Result of 3-part ArticleIn this article, PART ONE, we'll build the test-harness application and use it to read and display a login page from a well-known web site.  Let's begin!

1. Create a Dialog-based Application with MFC Support

Select menu command File / New Project...
Select MFC and MFC Application
Click OK, then Click Next.
In Application Type, select Dialog based
We'll be using default options, so just click [Finish]

Note: We use an MFC dialog-based application because they are so easy to create and so useful for experimentation.  For instance, to try out something new, just draw a button on the dialog and code up a button-click handler.

2. Add a Browser Control

In the dialog editor,
Right-click in the dialog and select Insert ActiveX Control...
Scroll down to locate Microsoft Web Browser  
Click OK.
Resize the control as desired.
Adding the web browser control to the dialog

3. Create a C++ Wrapper for the Browser Control

Use the ClassWizard create a wrapper and a control-type variable (m_ctlBrowser) for the browser control:

In VS 6, we'd press Ctrl+W to display the Class Wizard.  
In VS 2008, we click the Class View tab in the leftmost panel.

Right-click CWebRobotDlg
Select Add > Add variable...
In the "Add Member Variable Wizard" ...
Control variable  Put a checkmark in this checkbox
Control ID           Select IDC_EXPLORER1 from the list.
Variable Name    m_ctlBrowser
Click [Finish]

In the Solution Explorer, you now have a C++ file named explorer1.cpp with a matching header file.  This is your wrapper for the ActiveX control.

4. Add a Button and an OnClick Handler

Add a "Do Login" button, like so:  
In the Dialog Editor...
Select the "Button" control from the Toolbox
Draw a button in the dialog (initially named "Button 1")
Right-click the button and set its Caption attribute to "Do Login" and...
Set its ID to IDC_DoLogin.
Double-click the new [Do Login] button.
This displays the source code stub for a new handler for clicks of that button.

Make it so:
void CWebRobotDlg::OnBnClickedDologin()
                      {
                      	MessageBox( L"Login -- see PART TWO!" );
                      }

Open in new window

5. Open a Web Page at Program Startup

Set OnInitDialog to pull up the login page.  In the Class View panel, double-click OnInitDialog (or just scroll up and find it in the WebRobotDlg.cpp file).

Make the bottom of the CWebRobotDlg::OnInitDialog() function look like this:
	GetDlgItem( IDC_DoLogin )->EnableWindow( false );
                      
                      	m_ctlBrowser.Navigate(L"https://login.yahoo.com/config/login", 0,0,0,0 );
                      
                      	return TRUE;
                      }

Open in new window

For demonstration purposes, the program hard-codes the URL of a well-known website Login page.  That page may change in the future, but the principles described here will remain valid for other pages.

6. Add an OnDocumentComplete() Handler

In the Dialog Editor...
Right-click the web browser control and select Add Event Handler...
Message Type: Select "Document Complete"
Class list:            Leave "CWebRobotDlg" selected.
Click [Add and Edit]

Make the code for the DocumentCompleteExplorer1() function look like this:
void CWebRobotDlg::DocumentCompleteExplorer1(LPDISPATCH pDisp, VARIANT* URL)
                      {
                      	GetDlgItem( IDC_DoLogin )->EnableWindow( true );
                      }

Open in new window

Just a moment ago, we added code in the OnInitDialog function to disable that button... Why do all of this?

This is a crucial element of all webbot programming.  You cannot access the browser DOM -- to get access to page HTML elements -- until the page has been completely loaded and the browser control is ready to proceed.

Run the program and you should see something like this:
The program displays the web pageSummary of PART ONE:
With just a few steps, we have created an application program that will access a particular web page and display it.  All the pieces are in place to begin manipulating that page.

Review:
We saw how to use the Visual Studio AppWizard to create a simple dialog-based application with a Web Browser ActiveX control and one button.
We used the ClassWizard to create a C++ wrapper for the ActiveX control.
We used the control's Navigate() function to have it display a login page from Yahoo.com
We talked about the requirement that no action should take place until the browser control indicates that the document has been completely downloaded and the DOM is ready.  We added an OnDocumentComplete() function so that we can know exactly when it is OK to continue.
In PART TWO, we will get to the core of our webbot program -- accessing the WebBrowser DOM (Document Object Model) so that we can automatically fill-in text boxes, click buttons, and so forth.   I'll show you how to fill-in a couple of text boxes and click that [Sign In] button.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you liked this article and want to see more from this author,  please click the Yes button near the:
      Was this article helpful?
label that is just below and to the right of this text.   Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
9
18,828 Views
DanRollins
CERTIFIED EXPERT

Comments (3)

Subrat (C++ windows/Linux)Software Engineer
CERTIFIED EXPERT

Commented:
Hi Dan Rollins,
I'm facing problem in step2. While adding control variable it's not creating explorer1.h & .cpp file.So at compile time it flashes error. m_ctlBrowser undeclare identifier due to unknown CExplorer1.

Plz let me knw what might be th fix!
I'm using VS2010.
Thanks
subrat
Subrat (C++ windows/Linux)Software Engineer
CERTIFIED EXPERT

Commented:
Plz have a look on this.

Commented:
How would you set a checkbox to checked?

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.