Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Methods of Crawling JavaScript Links

Posted on 2004-09-23
8
Medium Priority
?
1,016 Views
Last Modified: 2013-12-16
I am now writing a PHP program to fetch the contents of webpages,  try to crawl all the links from them and store them in a MySQL database.

It was quite a success for me to crawl normal HTML links (ie. <a href = "">). However, it would be quite problematic for JavaScript links since they have lots of variations. For example, the link values might be embedded within the <select> tag and then, they would be passed into variables of a JavaScript function to generate a link eventually.

Is there any method to crawl JavaScript links using PHP or any other programs / softwares?

Thanks!
0
Comment
Question by:lcyandy
  • 2
  • 2
  • 2
8 Comments
 
LVL 9

Expert Comment

by:riyasjef
ID: 12140837
Hi
try this

<html>
<head>
<script>
function getLinks()
{
      var links=document.anchors;
      strLinks="";
      for(i=0;i<links.length;i++)
      {
            if(strLinks==""
                  strLinks=links[i].href;
            else      
                  strLinks+=","+links[i].href;
      }      
      
      document.forms[0].hdnLinks.value=strLinks;
}
</script>
</head>
<body>
<form method=post onsubmit="return getLinks()">
<a href="link1">
<a href="link2">
<a href="link3">
<input type="hidden" name=hdnLinks>
<input type="submit" value="submit">

</form>

0
 

Author Comment

by:lcyandy
ID: 12144580
I've your code but I don't quite understand what is the function of it.
Can you explain it to me?
0
 
LVL 9

Accepted Solution

by:
riyasjef earned 252 total points
ID: 12146790
Sorry there is change in the code

<html>
<head>
<script>
function getLinks()
{
     var links=document.anchors;
     alert(document.anchors.length);
     strLinks="";
     for(i=0;i<links.length;i++)
     {
          if(strLinks=="")
               strLinks=links[i].href;
          else
               strLinks+=","+links[i].href;
     }

     document.forms[0].hdnLinks.value=strLinks;
     alert(document.forms[0].hdnLinks.value);
}
</script>
</head>
<body>
<form method=post onsubmit="return getLinks()">
<a id="id1" href="link1">link1</a>
<a id="id2" href="link2">link2</a>
<a id="id3" href="link3">link3</a>
<input type="hidden" name=hdnLinks>
<input type="submit" value="submit">

</form>
</body>

"getLinks()" fn collects all the links in the document and put in a hidden box. You can access the hidden field from php to get links seperated by comma

Riyasjef


0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 36

Expert Comment

by:Zyloch
ID: 12149194
Riyasjef's is great for finding HTML links, but if you want Javascript links, there's no clear way to do it. Perhaps you can check for anything that has http:// in the beginning and assume it's a link? You can find them all with PHP preg_match_all
0
 

Author Comment

by:lcyandy
ID: 12149687
First, thanks Riyasjef for the dedicated help.

Really?! there's no absolute method to crawl javascript links??
Anyone knows how Google could do that?
0
 
LVL 36

Assisted Solution

by:Zyloch
Zyloch earned 248 total points
ID: 12151034
Google has said they can follow simplified Javascript links. I'll assume they mean following something like:

window.location.href="somewherenew.html" and window.open("somewherenew") amongst other usual ways to get stuff.

However, it can only do simplified JScript links as there is just too many. You could of course also test and find each http:// in the doument, assume it's a link since most of the time it is, use PHP @fopen to test if it exists, and if it does, add it to the link list.
0

Featured Post

NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article provides a case study on how our local youth baseball league deployed a new website, including the platform selection, implementation and benefits to the league.
JavaScript can be used in a browser to change parts of a webpage dynamically. It begins with the following pattern: If condition W is true, do thing X to target Y after event Z. Below are some tips and tricks to help you get started with JavaScript …
The purpose of this video is to demonstrate how to insert an Iframe into WordPress. This will be demonstrated using a Windows 8 PC. Go to your WordPress login page. This will look like the following: mywebsite.com/wp-login.php : Open Page or Post…
The purpose of this video is to demonstrate how to integrate Mailchimp with Facebook. This will be demonstrated using a Windows 8 PC. Mailchimp and Facebook will be used. Log into your Mailchimp account. : Click on your name. Go to Account Setti…
Suggested Courses

926 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question