Solved

Javascript get HTML source code

Posted on 2006-06-28
30
21,086 Views
Last Modified: 2013-11-19
Hey

How can I retrieve the source code from a site in javascript? for example if i wanted to retrieve source code from google.co.uk?
0
Comment
Question by:garreH
  • 11
  • 9
  • 6
  • +2
30 Comments
 
LVL 11

Expert Comment

by:walkerke
ID: 17004313
You can view the HTML source code of any page in most browser. In IE select "Source" from the view menu. In Mozilla/Firefox, select "Page Source" from the View menu. In Safari, select "View Source" from the View menu.
0
 
LVL 4

Author Comment

by:garreH
ID: 17004420
thanks walkerke... but I need to get html source in javascript as stated above.
0
 
LVL 8

Expert Comment

by:radnor
ID: 17004468
Clear your cache.
Then go to the site you want.
Then copy file (JS) from your cache to the desktop.

Then, enjoy!
0
 
LVL 8

Expert Comment

by:radnor
ID: 17004494
View source
Put URL of JS file in address bar....
0
 
LVL 4

Author Comment

by:garreH
ID: 17004505
I want to get the HTML source code from a site IN javascript. IE i need javascript code to get the html source from a site.
0
 
LVL 8

Expert Comment

by:radnor
ID: 17004519
site URL???
0
 

Expert Comment

by:n1875621
ID: 17004831
u cant - javascript allows client side file handling (to some degree) but not remote file I/O.

What you could do is write php to generate javascript that has the content of the html file you want...

i.e.
<?php

?>
<script language=javascript>
   function get_the_file ()
   {
         var filecontents;
         <?
          $fp = fopen ( 'http://whereever.com/whatever.html', 'rt' );
          while ( $str = fgets ( $fp ) )
          {
                ?> filecontents += "<?=str_replace ( '"', '\"', $str )?>\n"; <?
           }
          ?>
         alert (filecontents);
   }
</script>
<?

?>


That may help (depending on what ur actually doing)

code not tested or anything  -let me know if you need more help - but u certainly cant do it in JS... sorry :()
0
 
LVL 4

Author Comment

by:garreH
ID: 17004887
thanks n1875621, but thats the whole point... i want the _CLIENT_ to get the source code from the site and not do it via the server (which is what I'm doin at the moment)

=/
0
 

Expert Comment

by:n1875621
ID: 17004923
you cant in javascript... sorry.
0
 
LVL 4

Author Comment

by:garreH
ID: 17004932
well i thought about using an <iframe> to get the source code and then use .innerHtml - but this doesnt seem to work because of 'cross domain scripting' security problems.
0
 

Expert Comment

by:n1875621
ID: 17004939
if u give more info about exactly what ap ur writing i can maybe help further. but javascript cannot open a remote file on a remote server and return the content therein...
0
 
LVL 4

Author Comment

by:garreH
ID: 17004969
well i just need to get the html source code of a site that isnt on the same domain  so i can process the source code with a regex... javascript doesnt have sockets (as far as i know?)... i thought about vbscript but this probably has more security restrictions than javascript... flash/actionscript possibly an option? im stuck for ideas to do this...

there must be a simple way? :S
0
 

Expert Comment

by:n1875621
ID: 17004988
do u have control of the remote domain or is it any domain?

if u control the remote as well, you could add a wrapper script that generates any file as a javascript variable string then simply include that wrapper file as a source javascript file and your away... that is of course if you have control of the other domain.

0
 
LVL 4

Author Comment

by:garreH
ID: 17004991
i have no control over the other domain... it would make light work if i did
0
 

Expert Comment

by:n1875621
ID: 17005004
i.e on ur remote host, write a php script that...

takes a filename as the variable in $_GET - so...

www.yourremotedomain.com/wrapper.php?filetowrap=myfile.html

wrapper.php opens "myfile", encases it in a JS variable (like i have done in my first example) and spits out raw JS.

then you can call the script like so...

<script src=www.yourremotedomain.com/wrapper.php?filetowrap=myfile.html></script>

which will return javascript, which you can then do ur regex on.



0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 

Expert Comment

by:n1875621
ID: 17005008
ah soz.. just saw ur reponse.

ur going to have a tough time finding an answer. let me know if you do!
0
 

Assisted Solution

by:n1875621
n1875621 earned 100 total points
ID: 17005098
just found this... uses some ajax something or rather... never heard of it myself, but might help

http://www.devarticles.com/c/a/JavaScript/JavaScript-Remote-Scripting-Processing-XML-Files/
0
 
LVL 4

Author Comment

by:garreH
ID: 17005300
hmmm. i thought about using ajax type thing but is it possible to get non-xml source from a site? just like the basic html source... not sure how to make it work with getting html source

also i'm not sure if this works for cross domains?
0
 

Expert Comment

by:n1875621
ID: 17005324
no idea. no experience with it
0
 
LVL 7

Expert Comment

by:mmarksbury
ID: 17005631
n1875621 is right.  You can do this using AJAX.

Basically, AJAX will do a remote call to a server that you specify (from Javascript, in the background).  Typically this is used to query servers and get information from databases, but if you query a page the call will return the HTML of that page.

This will get you started:
http://rajshekhar.net/blog/archives/85-Rasmus-30-second-AJAX-Tutorial.html
0
 

Expert Comment

by:n1875621
ID: 17005667
amazing stuff tho - javascript isn't just for rollovers anymore ;)

wonder what the compatibility is like ;)
0
 
LVL 4

Author Comment

by:garreH
ID: 17005680
thanks guys for ur help

at the moment i found a site out of pure luck http://www.aspfaq.com/show.asp?id=2173 - it looks promising... though now i'm tryin to figure out how to actually get the data 'Response.Write(xmlhttp.responseText); '

I change to:

 Response.Write(xmlhttp.responseText);
    alert(xmlhttp.responseText);

hmm? :S
0
 
LVL 7

Expert Comment

by:mmarksbury
ID: 17005686
Most newer version of the major browsers support AJAX.  It WILL be the web of the future, which is why they call it Web 2.0.  It's good stuff and makes web development more powerful.  Just don't over-do it!

See this link:
http://developer.ebusiness-apps.com/ajax/ajax.htm
0
 
LVL 7

Expert Comment

by:mmarksbury
ID: 17005731
Here is a quick program that does what you want.  Save this as an HTM file and run on your computer.  Then you can take the parts out that you need.  Good luck!

<html>
<head>
<script type="text/javascript">
var xmlHttpRequestHandler = new Object();
var requestObject;

xmlHttpRequestHandler.createXmlHttpRequest = function(){
    var XmlHttpRequestObject;
    if(typeof XMLHttpRequest != "undefined")
    {
        XmlHttpRequestObject = new XMLHttpRequest();
    }
    else if(window.ActiveXObject)
    {
        var tryPossibleVersions = ["MSXML2.XMLHttp.5.0", "MSXML2.XMLHttp.4.0", "MSXML2.XMLHttp.3.0", "MSXML2.XMLHttp",

"Microsoft.XMLHttp"];
        for(i=0;i<tryPossibleVersions.length;i++)
        {
            try
            {
                XmlHttpRequestObject = new ActiveXObject(tryPossibleVersions[i]);
                break;
            }
            catch(xmlHttpRequestObjectError)
            {
                // Ignore Exception
            }
        }
    }
    return XmlHttpRequestObject;
}

function getHtml()
{
      var url = document.getElementById('url').value;
      if(url.length > 0)
      {
            requestObject = xmlHttpRequestHandler.createXmlHttpRequest();
            requestObject.onreadystatechange=onReadyStateChangeResponse;
            requestObject.open("Get",url, true);
            requestObject.send(null);
      }
}

function onReadyStateChangeResponse()
{
      var ready, status;
      try
      {
            ready = requestObject.readyState;
            status = requestObject.status;
      }
      catch(e) {}
      if(ready == 4 && status == 200)
      {
            alert(requestObject.responseText);
      }
}

</script>
</head>
<body>

<b>Enter Website URL:</b><br />
<input type="text" id="url" size="50" />
<br /><br />

<input type="button" onclick="getHtml();" value="Get HTML" />
</body>
</html>
0
 
LVL 7

Expert Comment

by:mmarksbury
ID: 17005750
And remember, the function call is asynchronous, so when you press the button, it could take some time before the HTML is retrieved.  You can change that to force the browser to wait by using this line instead...

requestObject.open("Get",url, false);  <--- Changed last argument to false
0
 
LVL 4

Author Comment

by:garreH
ID: 17005763
thanks for the code but it says error on line 42 ("the system cannot locate the resource specified.")
0
 
LVL 7

Accepted Solution

by:
mmarksbury earned 250 total points
ID: 17005787
How did you type in the address?

It has to be http://www.site.com

This is using HTTP to do the call, so if you do not specify the URL correctly, it will not work.  Also, I've noticed that some sites block these types of calls.  I'm not sure how, but they do.
0
 
LVL 4

Author Comment

by:garreH
ID: 17005800
:O it works!!!!! omg i love you!! thanks so much for the help man :D
0
 
LVL 4

Author Comment

by:garreH
ID: 17005859
only problem is that it doesnt seem to do anything in firefox.. strange
0
 
LVL 7

Expert Comment

by:mmarksbury
ID: 17005917
Launch the Javascript console in Firefox and it will give you more error details.  This was a quick sample to get you started.  I'm sure you can figure out the needed tweaks.  Thanks for the PTS and best of luck!
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
"In order to have an organized way for empathy mapping, we rely on a psychological model and trying to model it in a simple way, so we will split the board to three section for each persona and a scenario and try to see what those personas would Do,…
The purpose of this video is to demonstrate how to set up an RSS Feed on a WordPress Website. This will be demonstrated using a Windows 8 PC. Feedburner will be used for this demonstration. Go to your WordPress login page. This will look like the…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now