Solved

Crawler in C#

Posted on 2003-10-30
7
1,546 Views
Last Modified: 2008-02-01
I am writing an application in C#. There is table that contains lot of urls, what I want is, to take one url at a time and download the content until all of them downloaded.I think I should use thread when it send the request and download it.What to do when there is a url which is to be sent by post method.
   Please give me some rough idea how can I achieve in C#. I have little idea in VC++. Like I can use ISAPI session,connection...etc.
0
Comment
Question by:NavinKaushik
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
7 Comments
 
LVL 7

Expert Comment

by:psdavis
ID: 9650417
This is the code I use to run a cgi script and download an image.
Sorry I don't have time to give you more, running to airport in a few mins.

            using( WebClient pWebClient = new WebClient( ))
            {
               pWebClient.BaseAddress = @"http://" + this.Server;
               byte[] byImage = pWebClient.DownloadData( @"cgi-bin/getimage.cgi" );
            }

0
 

Author Comment

by:NavinKaushik
ID: 9656103
Thanks for your crucial time :) but I want code or API'S in C# ......
0
 
LVL 6

Accepted Solution

by:
purpleblob earned 250 total points
ID: 9659433
The code demonstrated by psdavis is C# and should fulfill your requirements - here's the same thing but showing the download of a page which might make it more in line with what you want to do with it.

byte[] buffer = null;

using(WebClient client = new WebClient())
{
   buffer = client.DownloadData("http://www.google.com");
}

Now we could convert the buffer to a string using System.Text.ASCIIEncoding.ASCII.GetString(buffer); and then find any URL's within it to also download. Also as psdavis has shown you could assign the base URL to the BaseAddress property and thus DownloadData will expect relative paths - this is obviously useful if you do want to download images etc. from the page that are shown using relative paths.

Hope this helps
0
Creating Instructional Tutorials  

For Any Use & On Any Platform

Contextual Guidance at the moment of need helps your employees/users adopt software o& achieve even the most complex tasks instantly. Boost knowledge retention, software adoption & employee engagement with easy solution.

 

Author Comment

by:NavinKaushik
ID: 9662070
Thanks purpleblob. I will be much more thankful to you if you help me more. Actually I dont have any practical exp in .NET. Previously there was a desktop application in VC++ , multi-tired using MFC in UI , ATL com in middle layer, application was to crawl the urls present  in table(database), application download all the pages for those urls into a folder.
  First I create the urls Ex.
    If you login to your mail account through a webpage then it will display your inbox. What I do is
    www.abc.com/inbox.asp?login_name=navin+kaushik&pwd=xyz
   Now when I put this url
 byte[] buffer = null;

using(WebClient client = new WebClient())
{
   buffer = client.DownloadData("www.abc.com/inbox.asp?login_name=navin+kaushik&pwd=xyz");
}
 This is one of modules of my application.
 I think my objective is more clear to you now. My question is Should I go for C# Since later I may integrate it with web also.
 How can I make multi-tired application in C# and tell me the corresponding API's in .NET  of ISAPI in VC++ 6.0
  Thanks.
0
 
LVL 6

Expert Comment

by:purpleblob
ID: 9662393
You can use your existing COM controls within C# by selecting Add References from a Visual Studio project and then select the COM tab and either browse for your COM objects or select from the list of registered object. VS will then create a wrapper DLL (like #import in VC++).

I'm afraid I don't think you can write ISAPI filters in C# as C# DLLs doen't expose methods in the same way to the "standard" C DLL's. However this said, if you wanted to use managed C++ you could create ISAPI filters and then call C# objects thus you could write all the main logic in C# and just have the C++ DLL acts as a proxy to your C# code.

As for should you go for C#, I would say at the moment it sounds like C# will not offer you too much that your current system doesn't already have - however if you migrated your existing system to managed C++ you would then be able to work with the .NET libraries and also write C# objects which you could then call from managed C++.

Hope this helps
0
 

Author Comment

by:NavinKaushik
ID: 9670102
hmmm lot more confusing!!!!.
   Is there anything which I can't do in C# ( considering application from scratch) and I can do that in VC++.NET.
   I think this will solve my problem.
0
 
LVL 6

Expert Comment

by:purpleblob
ID: 9670217
Basicaly things like ISAPI filters which require standard style DLL's are not possible within C# at this time. Also C# does not create COM objects. However by using VC++.NET in conjunction you could write most/all of the functionality in C# DLL's then create VC++.NET wrappers around them. So this is very much achievable.

With regards your application, I'm not sure where ISAPI fits in with your current system (if at all) so this might not be an issue.

From what I understand of your application, i.e. you have a database of URL's, you get these URL's navigate to them and download whatever is contained at the URL - this can easily be done with C#. If you are also creating the web based application, i.e. at the URL then you could use ASP.NET with C# as your chosen language - so again probably no problems there.

Now, not wishing to confuse the issue fuether, I suppose what I was saying earlier about C# not offering you much that you don't currently have is this - if I were writing such an application from scratch I would definitely write it in C# and .NET using ADO.NET and ASP.NET. If all I was doing was adding a little bit of code to download data from URL's and the rest of the system was already written in C++ I'd probably use VC++.NET just for consistancy.

I hope I haven't confused matters further - it's difficult to say what's best for YOU to do as I do not know the ins and outs of your whole application.

Hope this helps
0

Featured Post

Creating Instructional Tutorials  

For Any Use & On Any Platform

Contextual Guidance at the moment of need helps your employees/users adopt software o& achieve even the most complex tasks instantly. Boost knowledge retention, software adoption & employee engagement with easy solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article introduced a TextBox that supports transparent background.   Introduction TextBox is the most widely used control component in GUI design. Most GUI controls do not support transparent background and more or less do not have the…
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
If you're a developer or IT admin, you’re probably tasked with managing multiple websites, servers, applications, and levels of security on a daily basis. While this can be extremely time consuming, it can also be frustrating when systems aren't wor…
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor (https://www.adremsoft.com/). Top Charts is a view in which you can set seve…

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question