Link to home
Start Free TrialLog in
Avatar of pothireddysunil
pothireddysunil

asked on

How can I read an html page using HtmlAgilityPack in C#

I am trying to read an HTML page ( .htm page) , which is on my local drive using HtmlAgilityPack in C#.

Here are the things which i did.

1. Using Visual Studio 2012, first i installed HtmlAgilityPack using Package Manager Console -- NuGet.
2. It added HtmlAgilityPack  dll to my project.
3. Here is my code. I started running my code in debug mode, when it reached the below line
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
4. I got an error saying that
No Source available
There no source code available for the current location.

I am confused here. What source code it is looking for and Why it is looking for the source code as we already have the dll attached to the project

So, here are my questions on this issue

1. What is this error means. If it is looking for the source code, how can i get it.
2. How can i get the same source code for the HtmlAgilityPack which it was installed
3. how can I make it available to my application
4. how can i read the html tables

or is there any different approach that i can use to read the tables on the html page
          try
            {
                DirectoryInfo theFolder = new DirectoryInfo("\\\\MYPC\\Users\\Desktop");
                System.IO.FileInfo[] file = theFolder.GetFiles();
                int len = file.Length;
                if (file.Length > 1)
                {
                    int intLength;
                    fileName = Convert.ToString(file.GetValue(0));
                    intLength = fileName.IndexOf("_");
                }
                string FileName = "\\\\MYPC\\Users\\Desktop" + fileName;
                // Load the html document
                HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
                doc.Load(FileName);
                // Get all tables in the document
                HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//TABLE");

               
                HtmlNodeCollection rows = tables(0).SelectNodes(".//TR");
                for (int i = 0; i < rows.Count; ++i)
                {
                   
                    HtmlNodeCollection cols = rows(i).SelectNodes(".//TD");
                    for (int j = 0; j < cols.Count; ++j)
                    {
                        // Get the value of the column and print it
                        string value = cols(j).InnerText;
                        Console.WriteLine(value);
                    }
                }
            }
            catch (Exception objError)
            {
                throw objError;
            }
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of pothireddysunil
pothireddysunil

ASKER

Thanks Kaufmed. I tried, it's not giving me an option to cancel and move ahead with my debugging.
It asks me to select the source code and once i selects it gives me this alert.

Source file:
 C:......\HtmlDocument.cs
 Module: c:\users........\Debug\HtmlAgilityPacl.dll
 Process:,,.exe

 The source file is different from when the module was built. Would you like to use it anyway?

It gives this alert for couple of ore classes and finally throws the below error.

Source file information;

 Locating source for 'd:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlDocument.PathMethods.cs'. Checksum: MD5 {21 f3 9f 31 c1 6a 76 67 a7 c1 d8 6f 9b b2 66 7d}
 The file 'd:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlDocument.PathMethods.cs' does not exist.
 Looking in script documents for 'd:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlDocument.PathMethods.cs'...
 Looking in the projects for 'd:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlDocument.PathMethods.cs'.
 The file was not found in a project.
 Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\crt\src\'...
 Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\crt\src\vccorlib\'...
 Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\atlmfc\src\mfc\'...
 Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\atlmfc\src\atl\'...
 Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\atlmfc\include'...
 Looking in directory 'C:\Users\sunil\Desktop\HtmlAgilityPack\Release\1_4_0\'...
 The debug source files settings for the active solution indicate that the debugger will not ask the user to find the file: d:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlDocument.PathMethods.cs.
 The debugger could not locate the source file 'd:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlDocument.PathMethods.cs'.

 If i selects NO - it asks me to select the source code again.

 I installed version 1.4.6.0
Hi All, its resolved. its the input file path access issue. Thanks