Solved

Methods of Crawling JavaScript Links

Posted on 2004-09-23
8
987 Views
Last Modified: 2013-12-16
I am now writing a PHP program to fetch the contents of webpages,  try to crawl all the links from them and store them in a MySQL database.

It was quite a success for me to crawl normal HTML links (ie. <a href = "">). However, it would be quite problematic for JavaScript links since they have lots of variations. For example, the link values might be embedded within the <select> tag and then, they would be passed into variables of a JavaScript function to generate a link eventually.

Is there any method to crawl JavaScript links using PHP or any other programs / softwares?

Thanks!
0
Comment
Question by:lcyandy
  • 2
  • 2
  • 2
8 Comments
 
LVL 9

Expert Comment

by:riyasjef
Comment Utility
Hi
try this

<html>
<head>
<script>
function getLinks()
{
      var links=document.anchors;
      strLinks="";
      for(i=0;i<links.length;i++)
      {
            if(strLinks==""
                  strLinks=links[i].href;
            else      
                  strLinks+=","+links[i].href;
      }      
      
      document.forms[0].hdnLinks.value=strLinks;
}
</script>
</head>
<body>
<form method=post onsubmit="return getLinks()">
<a href="link1">
<a href="link2">
<a href="link3">
<input type="hidden" name=hdnLinks>
<input type="submit" value="submit">

</form>

0
 

Author Comment

by:lcyandy
Comment Utility
I've your code but I don't quite understand what is the function of it.
Can you explain it to me?
0
 
LVL 9

Accepted Solution

by:
riyasjef earned 63 total points
Comment Utility
Sorry there is change in the code

<html>
<head>
<script>
function getLinks()
{
     var links=document.anchors;
     alert(document.anchors.length);
     strLinks="";
     for(i=0;i<links.length;i++)
     {
          if(strLinks=="")
               strLinks=links[i].href;
          else
               strLinks+=","+links[i].href;
     }

     document.forms[0].hdnLinks.value=strLinks;
     alert(document.forms[0].hdnLinks.value);
}
</script>
</head>
<body>
<form method=post onsubmit="return getLinks()">
<a id="id1" href="link1">link1</a>
<a id="id2" href="link2">link2</a>
<a id="id3" href="link3">link3</a>
<input type="hidden" name=hdnLinks>
<input type="submit" value="submit">

</form>
</body>

"getLinks()" fn collects all the links in the document and put in a hidden box. You can access the hidden field from php to get links seperated by comma

Riyasjef


0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 36

Expert Comment

by:Zyloch
Comment Utility
Riyasjef's is great for finding HTML links, but if you want Javascript links, there's no clear way to do it. Perhaps you can check for anything that has http:// in the beginning and assume it's a link? You can find them all with PHP preg_match_all
0
 

Author Comment

by:lcyandy
Comment Utility
First, thanks Riyasjef for the dedicated help.

Really?! there's no absolute method to crawl javascript links??
Anyone knows how Google could do that?
0
 
LVL 36

Assisted Solution

by:Zyloch
Zyloch earned 62 total points
Comment Utility
Google has said they can follow simplified Javascript links. I'll assume they mean following something like:

window.location.href="somewherenew.html" and window.open("somewherenew") amongst other usual ways to get stuff.

However, it can only do simplified JScript links as there is just too many. You could of course also test and find each http:// in the doument, assume it's a link since most of the time it is, use PHP @fopen to test if it exists, and if it does, add it to the link list.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

I've been trying to accomplish this for a while and it just struck me yesterday how to accomplish this task. I have done searches all over the internet looking for ways to email pages from my applications and finally I have done it!!! Every single s…
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
The purpose of this video is to demonstrate how to set up the permalinks on a WordPress Website. This will be demonstrated using a Windows 8 PC. Go to your WordPress login page. This will look like the following: mywebsite.com/wp-login.php : Go t…
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now