Solved

c# spider the web

Posted on 2010-11-26
5
460 Views
Last Modified: 2012-05-10
may programs can extract email from url and many pro gram can copy con tent of url to local machine   How can those program known the directory of url they follow the link or they have a class to explore web directory
0
Comment
Question by:teera
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
5 Comments
 
LVL 10

Expert Comment

by:aboo_s
ID: 34220696
If I understand your question is how to explore all url's on the web, you can create a program that will
try all combinations in .com then .net and so on to discover which are real url by connecting to them using port 80.  you start like this a.com b.com c.com ...aa.com ab.com ...and so on ..you can explore most of the web ..!!


Does this answer your question!?
0
 

Author Comment

by:teera
ID: 34221409
Hi aboo_s

i wanto explore a.com , a.om/s.htm  a.com/sys.php a.com/pic/in.jpg  i want to explore on the url
0
 
LVL 10

Accepted Solution

by:
aboo_s earned 500 total points
ID: 34222518
Oh I see,

When you have the correct URL, let's say a.com exists, then you can list the whole directory underneath
using php command for example:  readdir (example code attached)

This way when a URL is found you can get the list of files(directories) under it, and work with them accordingly!
/* List all files in a directory */
<?php

if ($handle = opendir('/path/to/files')) {
    echo "Directory handle: $handle\n";
    echo "Files:\n";

   
    while (false !== ($file = readdir($handle))) {
        echo "$file\n";
    }

   

    closedir($handle);
}
?>

Open in new window

0
 
LVL 10

Expert Comment

by:aboo_s
ID: 34222523
Oh ..but you must ask this!!! You can only run php on the server which you cannot!!!
So ..I'll think of something ..
0
 
LVL 10

Expert Comment

by:aboo_s
ID: 34222559
Ok, I guess you will most probably have to also guess them just like with  the url s
Cause som(most) site hide their files from being listed!

Anyway because you do not have to re-invent the wheel !!!
Take a look at this page:
http://allseeing-i.com/ASIHTTPRequest/
0

Featured Post

PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Unlocking a column in excel using C# 17 33
C# LINQ 5 44
Run software updates from the website 6 50
C#line chart with data on Y and time on X-axis 3 32
This article introduced a TextBox that supports transparent background.   Introduction TextBox is the most widely used control component in GUI design. Most GUI controls do not support transparent background and more or less do not have the…
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Are you ready to implement Active Directory best practices without reading 300+ pages? You're in luck. In this webinar hosted by Skyport Systems, you gain insight into Microsoft's latest comprehensive guide, with tips on the best and easiest way…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question