Solved

Methods of Crawling JavaScript Links

Posted on 2004-09-23
8
1,003 Views
Last Modified: 2013-12-16
I am now writing a PHP program to fetch the contents of webpages,  try to crawl all the links from them and store them in a MySQL database.

It was quite a success for me to crawl normal HTML links (ie. <a href = "">). However, it would be quite problematic for JavaScript links since they have lots of variations. For example, the link values might be embedded within the <select> tag and then, they would be passed into variables of a JavaScript function to generate a link eventually.

Is there any method to crawl JavaScript links using PHP or any other programs / softwares?

Thanks!
0
Comment
Question by:lcyandy
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
8 Comments
 
LVL 9

Expert Comment

by:riyasjef
ID: 12140837
Hi
try this

<html>
<head>
<script>
function getLinks()
{
      var links=document.anchors;
      strLinks="";
      for(i=0;i<links.length;i++)
      {
            if(strLinks==""
                  strLinks=links[i].href;
            else      
                  strLinks+=","+links[i].href;
      }      
      
      document.forms[0].hdnLinks.value=strLinks;
}
</script>
</head>
<body>
<form method=post onsubmit="return getLinks()">
<a href="link1">
<a href="link2">
<a href="link3">
<input type="hidden" name=hdnLinks>
<input type="submit" value="submit">

</form>

0
 

Author Comment

by:lcyandy
ID: 12144580
I've your code but I don't quite understand what is the function of it.
Can you explain it to me?
0
 
LVL 9

Accepted Solution

by:
riyasjef earned 63 total points
ID: 12146790
Sorry there is change in the code

<html>
<head>
<script>
function getLinks()
{
     var links=document.anchors;
     alert(document.anchors.length);
     strLinks="";
     for(i=0;i<links.length;i++)
     {
          if(strLinks=="")
               strLinks=links[i].href;
          else
               strLinks+=","+links[i].href;
     }

     document.forms[0].hdnLinks.value=strLinks;
     alert(document.forms[0].hdnLinks.value);
}
</script>
</head>
<body>
<form method=post onsubmit="return getLinks()">
<a id="id1" href="link1">link1</a>
<a id="id2" href="link2">link2</a>
<a id="id3" href="link3">link3</a>
<input type="hidden" name=hdnLinks>
<input type="submit" value="submit">

</form>
</body>

"getLinks()" fn collects all the links in the document and put in a hidden box. You can access the hidden field from php to get links seperated by comma

Riyasjef


0
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

 
LVL 36

Expert Comment

by:Zyloch
ID: 12149194
Riyasjef's is great for finding HTML links, but if you want Javascript links, there's no clear way to do it. Perhaps you can check for anything that has http:// in the beginning and assume it's a link? You can find them all with PHP preg_match_all
0
 

Author Comment

by:lcyandy
ID: 12149687
First, thanks Riyasjef for the dedicated help.

Really?! there's no absolute method to crawl javascript links??
Anyone knows how Google could do that?
0
 
LVL 36

Assisted Solution

by:Zyloch
Zyloch earned 62 total points
ID: 12151034
Google has said they can follow simplified Javascript links. I'll assume they mean following something like:

window.location.href="somewherenew.html" and window.open("somewherenew") amongst other usual ways to get stuff.

However, it can only do simplified JScript links as there is just too many. You could of course also test and find each http:// in the doument, assume it's a link since most of the time it is, use PHP @fopen to test if it exists, and if it does, add it to the link list.
0

Featured Post

Will your db performance match your db growth?

In Percona’s white paper “Performance at Scale: Keeping Your Database on Its Toes,” we take a high-level approach to what you need to think about when planning for database scalability.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This guide will walk you through the essential considerations and tech stack for building scalable websites. Know how to grow your business the smart way!
Dramatic changes are revolutionizing how we build and use technology. Every company is automating, digitizing, and modernizing operations. We need a better, more connected way to work together as teams so we can harness the insights from our system…
The purpose of this video is to demonstrate how to create a Printer Friendly PDF on a WordPress Page. This will be demonstrated using a Windows 8 PC. Tools Used are Photoshop, Awesome Screenshot” Google Chrome Extension, and SmallPDF.com Log…
The purpose of this video is to demonstrate how to add AdSense Ads to a WordPress Website, and how to set up WordPress to automatically place Ads in Sidebars. This will be demonstrated using a Windows 8 PC. Log into your AdSense account. : Cli…
Suggested Courses

626 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question