Our community of experts have been thoroughly vetted for their expertise and industry experience. Experts with Gold status have received one of our highest-level Expert Awards, which recognize experts for their valuable contributions.
Published:
Browse All Articles > Browser Bot -- Automate Browsing Sequences With C++ (PART ONE)
Having a custom web bot can save you lots of time and effort. For instance, you might need to log in to a site, click a certain link, scroll down to locate some particular information, copy it to a clipboard for pasting into a spreadsheet, ... then do the whole thing over again the next day. "
If only I had a robot to do that for me!"
The programming for a browser bot is not trivial, but it's also not terribly difficult. The common question is "
Where do I start?"
This series of articles describes one approach for C++ programmers. We'll set up a simple "test harness" that will let you experiment with the basic steps of automatically filling in text boxes, selecting options, clicking buttons, and submitting forms. We'll also look at a way to automatically extract information such as stock quotes, movie schedules, etc. from the sites where you would normally need to browse and collect that information manually.
Overview In this three-part series, We will use Visual C++ to create a dialog-based application that will:
1) Pull up and display a log-in page of a site. (PART ONE)
2) Enter some values (a username and password) and click a button on the web page. (
PART TWO)
3) Surf to a particular page on the site.
4) Obtain the entire HTML text of the resulting page. (
PART THREE)
5) Search that HTML it to locate and display some particular pieces of information.
In this article, PART ONE, we'll build the test-harness application and use it to read and display a login page from a well-known web site. Let's begin!
1. Create a Dialog-based Application with MFC Support
Select menu command
File / New Project... Select
MFC and
MFC Application Click OK, then Click Next.
In Application Type, select
Dialog based We'll be using default options, so just click
[Finish]
Note: We use an MFC dialog-based application because they are so easy to create and so useful for experimentation. For instance, to try out something new, just draw a button on the dialog and code up a button-click handler.
2. Add a Browser Control
In the dialog editor,
Right-click in the dialog and select
Insert ActiveX Control... Scroll down to locate
Microsoft Web Browser Click OK.
Resize the control as desired.
3. Create a C++ Wrapper for the Browser Control
Use the ClassWizard create a wrapper and a control-type variable (m_ctlBrowser) for the browser control:
In VS 6, we'd press Ctrl+W to display the Class Wizard.
In VS 2008, we click the
Class View tab in the leftmost panel.
Right-click CWebRobotDlg
Select
Add > Add variable... In the "Add Member Variable Wizard" ...
Control variable Put a checkmark in this checkbox
Control ID Select IDC_EXPLORER1 from the list.
Variable Name m_ctlBrowser
Click [Finish]
In the Solution Explorer, you now have a C++ file named explorer1.cpp with a matching header file. This is your
wrapper for the ActiveX control.
4. Add a Button and an OnClick Handler
Add a "Do Login" button, like so:
In the Dialog Editor...
Select the "Button" control from the Toolbox
Draw a button in the dialog (initially named "Button 1")
Right-click the button and set its
Caption attribute to "Do Login" and...
Set its
ID to IDC_DoLogin.
Double-click the new [Do Login] button.
This displays the source code stub for a new handler for clicks of that button.
Make it so:
void CWebRobotDlg::OnBnClickedDologin(){ MessageBox( L"Login -- see PART TWO!" );}
Set OnInitDialog to pull up the login page. In the
Class View panel, double-click OnInitDialog (or just scroll up and find it in the WebRobotDlg.cpp file).
Make the bottom of the CWebRobotDlg::OnInitDialog
() function look like this:
For demonstration purposes, the program hard-codes the URL of a well-known website Login page. That page may change in the future, but the principles described here will remain valid for other pages.
6. Add an OnDocumentComplete() Handler
In the Dialog Editor...
Right-click the web browser control and select
Add Event Handler... Message Type: Select "Document Complete"
Class list: Leave "CWebRobotDlg" selected.
Click
[Add and Edit]
Make the code for the DocumentCompleteExplorer1(
) function look like this:
Just a moment ago, we added code in the OnInitDialog function to
disable that button... Why do all of this?
This is a
crucial element of all webbot programming. You cannot access the browser DOM -- to get access to page HTML elements -- until the page has been completely loaded and the browser control is ready to proceed.
Run the program and you should see something like this:
Summary of PART ONE:
With just a few steps, we have created an application program that will access a particular web page and display it. All the pieces are in place to begin manipulating that page.
Review:
We saw how to use the Visual Studio AppWizard to create a simple dialog-based application with a Web Browser ActiveX control and one button.
We used the ClassWizard to create a C++ wrapper for the ActiveX control.
We used the control's
Navigate() function to have it display a login page from Yahoo.com
We talked about the requirement that no action should take place until the browser control indicates that the document has been completely downloaded and the DOM is ready. We added an OnDocumentComplete() function so that we can know exactly when it is OK to continue.
In
PART TWO, we will get to the core of our webbot program -- accessing the WebBrowser DOM (Document Object Model) so that we can automatically fill-in text boxes, click buttons, and so forth. I'll show you how to fill-in a couple of text boxes and click that [Sign In] button.
=-=-=-=-=-=-=-=-=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=
If you liked this article and want to see more from
this author, please click the
Yes button near the:
Was this article helpful?
label that is just below and to the right of this text.
Thanks! =-=-=-=-=-=-=-=-=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=
Our community of experts have been thoroughly vetted for their expertise and industry experience. Experts with Gold status have received one of our highest-level Expert Awards, which recognize experts for their valuable contributions.
Hi Dan Rollins,
I'm facing problem in step2. While adding control variable it's not creating explorer1.h & .cpp file.So at compile time it flashes error. m_ctlBrowser undeclare identifier due to unknown CExplorer1.
Plz let me knw what might be th fix!
I'm using VS2010.
Thanks
subrat
Comments (3)
Commented:
I'm facing problem in step2. While adding control variable it's not creating explorer1.h & .cpp file.So at compile time it flashes error. m_ctlBrowser undeclare identifier due to unknown CExplorer1.
Plz let me knw what might be th fix!
I'm using VS2010.
Thanks
subrat
Commented:
Commented: