Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

reading the text data on the screen for a web page via c# but not the source code

Posted on 2011-10-21
6
Medium Priority
?
274 Views
Last Modified: 2013-11-19
i can at present read the source code by giving my c# project the url and can retrieve the code from the back ground, what i want to do is to retrieve the text data on the page, same as select all menu  function, then copy the data to a string then check the data for what i am looking for

any ideas would be appreciated
0
Comment
Question by:sydneyguy
  • 4
  • 2
6 Comments
 
LVL 17

Expert Comment

by:Carlos Villegas
ID: 37011037
Hello, I have made this example for you:
string myText = null;
string myHtml = null;
// Download the page html.
using (System.Net.WebClient myClient = new System.Net.WebClient())
{
    myHtml = myClient.DownloadString("http://news.google.com");
}

// Use the WebBrowser control to parse the html data
using (System.Windows.Forms.WebBrowser myWebBrowser = new WebBrowser())
{
    // Load the html data.
    myWebBrowser.DocumentText = myHtml;

    DateTime start = DateTime.UtcNow;
    while (myWebBrowser.ReadyState != WebBrowserReadyState.Complete)
    {
        System.Windows.Forms.Application.DoEvents();
        System.Threading.Thread.Sleep(1);
        // Timeout if has not completed in 30 seconds
        if (DateTime.UtcNow.Subtract(start).TotalSeconds > 30)
            throw new TimeoutException();
    }

    // Get only the text.
    myText = myWebBrowser.Document.Body.InnerText;
}

Open in new window

If you are working in a console application you will need to add the reference to the System.Windows.Forms assembly.
0
 
LVL 17

Expert Comment

by:Carlos Villegas
ID: 37011041
Also include the namespace System.Windows.Forms to your code file, because WebBrowserReadyState enum is there (System.Windows.Forms.WebBrowserReadyState).
0
 

Author Comment

by:sydneyguy
ID: 37012029
thanks for getting back with the code for me i have set up the below code as you have suggested, and added a text box to store the info to can you check the code for me please as it just throows an exception and quits

at throw new TimeoutException();


using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.Net;


namespace grabpagedata
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {
            string myText = null;
            string myHtml = null;
            // Download the page html.
            using (System.Net.WebClient myClient = new System.Net.WebClient())
            {
                myHtml = myClient.DownloadString("http://www.daniweb.com/software-development/csharp/threads/145709");
            }

            // Use the WebBrowser control to parse the html data
            using (System.Windows.Forms.WebBrowser myWebBrowser = new WebBrowser())
            {
                // Load the html data.
                myWebBrowser.DocumentText = myHtml;

                DateTime start = DateTime.UtcNow;
                while (myWebBrowser.ReadyState != WebBrowserReadyState.Complete)
                {
                 //   System.Windows.Forms.Application.DoEvents();
                    System.Threading.Thread.Sleep(1);
                    // Timeout if has not completed in 30 seconds
                    if (DateTime.UtcNow.Subtract(start).TotalSeconds > 30)
                                           
                        throw new TimeoutException();
                }

                // Get only the text.
                myText = myWebBrowser.Document.Body.InnerText;
                richTextBox1.Text = myText;
            }
     
        }
       
    }
}
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 17

Accepted Solution

by:
Carlos Villegas earned 2000 total points
ID: 37012086
Hello, I think the problem was because a script error on that page, now i'm using the ScriptErrorsSuppressed property to avoid that, and please don't comment the DoEvents method, that is required to allow the WebBrowser to complete. Also I have added an improve, that will temporarily disable the load of images in the WebBrowser control, allow this to reach the Complete state faster.
Try this code:
static void SetIeImagesValue(string value)
{
    Microsoft.Win32.RegistryKey ieMainKey = null;
    try
    {
        ieMainKey = Microsoft.Win32.Registry.CurrentUser.OpenSubKey("Software\\Microsoft\\Internet Explorer\\Main", true);
        ieMainKey.SetValue("Display Inline Images", value);
    }
    finally
    {
        if (ieMainKey != null)
            ieMainKey.Close();
    }
}

private void button1_Click(object sender, EventArgs e)
{
    try
    {
        SetIeImagesValue("no");

        string myText = null;
        string myHtml = null;
        // Download the page html.
        using (System.Net.WebClient myClient = new System.Net.WebClient())
        {
            myHtml = myClient.DownloadString("http://www.daniweb.com/software-development/csharp/threads/145709");
        }

        // Use the WebBrowser control to parse the html data
        using (System.Windows.Forms.WebBrowser myWebBrowser = new WebBrowser())
        {
            // To avoid script errors warnings.
            myWebBrowser.ScriptErrorsSuppressed = true;

            // Load the html data.
            myWebBrowser.DocumentText = myHtml;

            DateTime start = DateTime.UtcNow;
            while (myWebBrowser.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
            {
                System.Windows.Forms.Application.DoEvents();
                System.Threading.Thread.Sleep(1);

                // Timeout if has not completed in 30 seconds
                if (DateTime.UtcNow.Subtract(start).TotalSeconds > 30)
                    throw new TimeoutException();
            }

            // Get only the text.
            myText = myWebBrowser.Document.Body.InnerText;
            richTextBox1.Text = myText;
        }
    }
    finally
    {
        SetIeImagesValue("yes");
    }
}

Open in new window

0
 

Author Closing Comment

by:sydneyguy
ID: 37012123
Ok your a GOD, works perfectly and im off and running, owe you a drink some time when i'm in your neck of the woods, takes care and thanks for the help
garry
0
 
LVL 17

Expert Comment

by:Carlos Villegas
ID: 37012155
Glad to have been of help! :) and I'm not a God ;) take care.
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to write a Context Sensitive Help (an online help that is obtained from a specific point in state of software to provide help with that state) ,  first we need to make the file that contains all topics, which are given exclusive IDs. …
Originally, this post was published on Monitis Blog, you can check it here . It goes without saying that technology has transformed society and the very nature of how we live, work, and communicate in ways that would’ve been incomprehensible 5 ye…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This video teaches users how to migrate an existing Wordpress website to a new domain.
Suggested Courses
Course of the Month20 days, 21 hours left to enroll

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question