lcyandy
asked on
Methods of Crawling JavaScript Links
I am now writing a PHP program to fetch the contents of webpages, try to crawl all the links from them and store them in a MySQL database.
It was quite a success for me to crawl normal HTML links (ie. <a href = "">). However, it would be quite problematic for JavaScript links since they have lots of variations. For example, the link values might be embedded within the <select> tag and then, they would be passed into variables of a JavaScript function to generate a link eventually.
Is there any method to crawl JavaScript links using PHP or any other programs / softwares?
Thanks!
It was quite a success for me to crawl normal HTML links (ie. <a href = "">). However, it would be quite problematic for JavaScript links since they have lots of variations. For example, the link values might be embedded within the <select> tag and then, they would be passed into variables of a JavaScript function to generate a link eventually.
Is there any method to crawl JavaScript links using PHP or any other programs / softwares?
Thanks!
ASKER
I've your code but I don't quite understand what is the function of it.
Can you explain it to me?
Can you explain it to me?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Riyasjef's is great for finding HTML links, but if you want Javascript links, there's no clear way to do it. Perhaps you can check for anything that has http:// in the beginning and assume it's a link? You can find them all with PHP preg_match_all
ASKER
First, thanks Riyasjef for the dedicated help.
Really?! there's no absolute method to crawl javascript links??
Anyone knows how Google could do that?
Really?! there's no absolute method to crawl javascript links??
Anyone knows how Google could do that?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
try this
<html>
<head>
<script>
function getLinks()
{
var links=document.anchors;
strLinks="";
for(i=0;i<links.length;i++
{
if(strLinks==""
strLinks=links[i].href;
else
strLinks+=","+links[i].hre
}
document.forms[0].hdnLinks
}
</script>
</head>
<body>
<form method=post onsubmit="return getLinks()">
<a href="link1">
<a href="link2">
<a href="link3">
<input type="hidden" name=hdnLinks>
<input type="submit" value="submit">
</form>