Solved

Methods of Crawling JavaScript Links

Posted on 2004-09-23
8
999 Views
Last Modified: 2013-12-16
I am now writing a PHP program to fetch the contents of webpages,  try to crawl all the links from them and store them in a MySQL database.

It was quite a success for me to crawl normal HTML links (ie. <a href = "">). However, it would be quite problematic for JavaScript links since they have lots of variations. For example, the link values might be embedded within the <select> tag and then, they would be passed into variables of a JavaScript function to generate a link eventually.

Is there any method to crawl JavaScript links using PHP or any other programs / softwares?

Thanks!
0
Comment
Question by:lcyandy
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
8 Comments
 
LVL 9

Expert Comment

by:riyasjef
ID: 12140837
Hi
try this

<html>
<head>
<script>
function getLinks()
{
      var links=document.anchors;
      strLinks="";
      for(i=0;i<links.length;i++)
      {
            if(strLinks==""
                  strLinks=links[i].href;
            else      
                  strLinks+=","+links[i].href;
      }      
      
      document.forms[0].hdnLinks.value=strLinks;
}
</script>
</head>
<body>
<form method=post onsubmit="return getLinks()">
<a href="link1">
<a href="link2">
<a href="link3">
<input type="hidden" name=hdnLinks>
<input type="submit" value="submit">

</form>

0
 

Author Comment

by:lcyandy
ID: 12144580
I've your code but I don't quite understand what is the function of it.
Can you explain it to me?
0
 
LVL 9

Accepted Solution

by:
riyasjef earned 63 total points
ID: 12146790
Sorry there is change in the code

<html>
<head>
<script>
function getLinks()
{
     var links=document.anchors;
     alert(document.anchors.length);
     strLinks="";
     for(i=0;i<links.length;i++)
     {
          if(strLinks=="")
               strLinks=links[i].href;
          else
               strLinks+=","+links[i].href;
     }

     document.forms[0].hdnLinks.value=strLinks;
     alert(document.forms[0].hdnLinks.value);
}
</script>
</head>
<body>
<form method=post onsubmit="return getLinks()">
<a id="id1" href="link1">link1</a>
<a id="id2" href="link2">link2</a>
<a id="id3" href="link3">link3</a>
<input type="hidden" name=hdnLinks>
<input type="submit" value="submit">

</form>
</body>

"getLinks()" fn collects all the links in the document and put in a hidden box. You can access the hidden field from php to get links seperated by comma

Riyasjef


0
MS Dynamics Made Instantly Simpler

Make Your Microsoft Dynamics Investment Count  & Drastically Decrease Training Time by Providing Intuitive Step-By-Step WalkThru Tutorials.

 
LVL 36

Expert Comment

by:Zyloch
ID: 12149194
Riyasjef's is great for finding HTML links, but if you want Javascript links, there's no clear way to do it. Perhaps you can check for anything that has http:// in the beginning and assume it's a link? You can find them all with PHP preg_match_all
0
 

Author Comment

by:lcyandy
ID: 12149687
First, thanks Riyasjef for the dedicated help.

Really?! there's no absolute method to crawl javascript links??
Anyone knows how Google could do that?
0
 
LVL 36

Assisted Solution

by:Zyloch
Zyloch earned 62 total points
ID: 12151034
Google has said they can follow simplified Javascript links. I'll assume they mean following something like:

window.location.href="somewherenew.html" and window.open("somewherenew") amongst other usual ways to get stuff.

However, it can only do simplified JScript links as there is just too many. You could of course also test and find each http:// in the doument, assume it's a link since most of the time it is, use PHP @fopen to test if it exists, and if it does, add it to the link list.
0

Featured Post

SharePoint Admin?

Enable Your Employees To Focus On The Core With Intuitive Onscreen Guidance That is With You At The Moment of Need.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
isNaN issue on the Table input text 2 42
Check if field exists SPUtility 5 48
PDF Turn Look 7 38
How to open a new browser tab after executing php script 20 49
This guide will walk you through the essential considerations and tech stack for building scalable websites. Know how to grow your business the smart way!
Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
The purpose of this video is to demonstrate how to make a WordPress Site faster and smaller in size by cleaning up the database. This will be demonstrated using a Windows 8 PC. Plugin WP Optimize will be used. Go to your WordPress login page. T…
The purpose of this video is to demonstrate how to set up the permalinks on a WordPress Website. This will be demonstrated using a Windows 8 PC. Go to your WordPress login page. This will look like the following: mywebsite.com/wp-login.php : Go t…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question