Solved

Crawler in C#

Posted on 2003-10-30
7
1,528 Views
Last Modified: 2008-02-01
I am writing an application in C#. There is table that contains lot of urls, what I want is, to take one url at a time and download the content until all of them downloaded.I think I should use thread when it send the request and download it.What to do when there is a url which is to be sent by post method.
   Please give me some rough idea how can I achieve in C#. I have little idea in VC++. Like I can use ISAPI session,connection...etc.
0
Comment
Question by:NavinKaushik
  • 3
  • 3
7 Comments
 
LVL 7

Expert Comment

by:psdavis
ID: 9650417
This is the code I use to run a cgi script and download an image.
Sorry I don't have time to give you more, running to airport in a few mins.

            using( WebClient pWebClient = new WebClient( ))
            {
               pWebClient.BaseAddress = @"http://" + this.Server;
               byte[] byImage = pWebClient.DownloadData( @"cgi-bin/getimage.cgi" );
            }

0
 

Author Comment

by:NavinKaushik
ID: 9656103
Thanks for your crucial time :) but I want code or API'S in C# ......
0
 
LVL 6

Accepted Solution

by:
purpleblob earned 250 total points
ID: 9659433
The code demonstrated by psdavis is C# and should fulfill your requirements - here's the same thing but showing the download of a page which might make it more in line with what you want to do with it.

byte[] buffer = null;

using(WebClient client = new WebClient())
{
   buffer = client.DownloadData("http://www.google.com");
}

Now we could convert the buffer to a string using System.Text.ASCIIEncoding.ASCII.GetString(buffer); and then find any URL's within it to also download. Also as psdavis has shown you could assign the base URL to the BaseAddress property and thus DownloadData will expect relative paths - this is obviously useful if you do want to download images etc. from the page that are shown using relative paths.

Hope this helps
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 

Author Comment

by:NavinKaushik
ID: 9662070
Thanks purpleblob. I will be much more thankful to you if you help me more. Actually I dont have any practical exp in .NET. Previously there was a desktop application in VC++ , multi-tired using MFC in UI , ATL com in middle layer, application was to crawl the urls present  in table(database), application download all the pages for those urls into a folder.
  First I create the urls Ex.
    If you login to your mail account through a webpage then it will display your inbox. What I do is
    www.abc.com/inbox.asp?login_name=navin+kaushik&pwd=xyz
   Now when I put this url
 byte[] buffer = null;

using(WebClient client = new WebClient())
{
   buffer = client.DownloadData("www.abc.com/inbox.asp?login_name=navin+kaushik&pwd=xyz");
}
 This is one of modules of my application.
 I think my objective is more clear to you now. My question is Should I go for C# Since later I may integrate it with web also.
 How can I make multi-tired application in C# and tell me the corresponding API's in .NET  of ISAPI in VC++ 6.0
  Thanks.
0
 
LVL 6

Expert Comment

by:purpleblob
ID: 9662393
You can use your existing COM controls within C# by selecting Add References from a Visual Studio project and then select the COM tab and either browse for your COM objects or select from the list of registered object. VS will then create a wrapper DLL (like #import in VC++).

I'm afraid I don't think you can write ISAPI filters in C# as C# DLLs doen't expose methods in the same way to the "standard" C DLL's. However this said, if you wanted to use managed C++ you could create ISAPI filters and then call C# objects thus you could write all the main logic in C# and just have the C++ DLL acts as a proxy to your C# code.

As for should you go for C#, I would say at the moment it sounds like C# will not offer you too much that your current system doesn't already have - however if you migrated your existing system to managed C++ you would then be able to work with the .NET libraries and also write C# objects which you could then call from managed C++.

Hope this helps
0
 

Author Comment

by:NavinKaushik
ID: 9670102
hmmm lot more confusing!!!!.
   Is there anything which I can't do in C# ( considering application from scratch) and I can do that in VC++.NET.
   I think this will solve my problem.
0
 
LVL 6

Expert Comment

by:purpleblob
ID: 9670217
Basicaly things like ISAPI filters which require standard style DLL's are not possible within C# at this time. Also C# does not create COM objects. However by using VC++.NET in conjunction you could write most/all of the functionality in C# DLL's then create VC++.NET wrappers around them. So this is very much achievable.

With regards your application, I'm not sure where ISAPI fits in with your current system (if at all) so this might not be an issue.

From what I understand of your application, i.e. you have a database of URL's, you get these URL's navigate to them and download whatever is contained at the URL - this can easily be done with C#. If you are also creating the web based application, i.e. at the URL then you could use ASP.NET with C# as your chosen language - so again probably no problems there.

Now, not wishing to confuse the issue fuether, I suppose what I was saying earlier about C# not offering you much that you don't currently have is this - if I were writing such an application from scratch I would definitely write it in C# and .NET using ADO.NET and ASP.NET. If all I was doing was adding a little bit of code to download data from URL's and the rest of the system was already written in C++ I'd probably use VC++.NET just for consistancy.

I hope I haven't confused matters further - it's difficult to say what's best for YOU to do as I do not know the ins and outs of your whole application.

Hope this helps
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article series is supposed to shed some light on the use of IDisposable and objects that inherit from it. In essence, a more apt title for this article would be: using (IDisposable) {}. I’m just not sure how many people would ge…
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
Two types of users will appreciate AOMEI Backupper Pro: 1 - Those with PCIe drives (and haven't found cloning software that works on them). 2 - Those who want a fast clone of their boot drive (no re-boots needed) and it can clone your drive wh…
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question