AdrianSmithUK
asked on
JQuery: How to count words on a page.
Dear Experts,
1. The html code below should count the number of words in the body of a page.
2. The screenshot shows that if I remove all of the white space from the html body some of the words are combined. (eg: "Herearesomedivtagsrow1col 1row1col2r ow2col1row 2col2")
3. How could I correct this?
1. The html code below should count the number of words in the body of a page.
2. The screenshot shows that if I remove all of the white space from the html body some of the words are combined. (eg: "Herearesomedivtagsrow1col
3. How could I correct this?
<!DOCTYPE html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js">
</script>
<script>
$(document).ready(function(){
var numberOfMatches = $("body").text().match(/\w+/ig).length;
console.log(numberOfMatches);
var bodyText = $("body").text().match(/\w+/ig);
console.log(bodyText);
});
</script>
</head>
<body>
<h1>Here is a heading.</h1><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>
</body>
</html>
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
PS: Out of interest, in the end I solved the issue by appending spaces after selected div tags.
<!DOCTYPE html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>
<script>
$(document).ready(function(){
//Add spaces after selected div tags.
appendSpaces();
var numberOfMatches = $("body").text().match(/\w+/ig).length;
console.log(numberOfMatches);
var bodyText = $("body").text().match(/\w+/ig);
console.log(bodyText);
console.log( $("p").text() );
});
function appendSpaces(){
$("div").append(" ");
$("td").append(" ");
$("a").append(" ");
}
</script>
</head>
<body>
<h1>Here is a heading.</h1><a href="#">Link1</a><a href="#">Link2</a><a href="#">Link3</a><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>
</body>
</html>
This line of my solution does the same thing but for ALL html elements:
var bodyHTML = $("body").html().replace(/ <(.|\n)+?> /ig, " ");
var bodyHTML = $("body").html().replace(/
ASKER
Does it not destroy all the HTML elements and replace them with a space?
Well, it creates a variable (using the .html() command) strips out any HTML, and then counts what is left making sure that there are white spaces between all the contents of the former HTML elements. It doesn't "destroy" the HTML for the viewer.
ASKER
I see. Definitely a good snippet. Many thanks. Adrian
ASKER
Skrile
I re-factored the solution to use pure Javascript and your solution is beautiful. Here is the code.
I re-factored the solution to use pure Javascript and your solution is beautiful. Here is the code.
<!DOCTYPE html>
<html>
<head>
<script>
window.onload = function(){
var bodyHtml = document.getElementsByTagName('body')[0].innerHTML.replace(/<(.|\n)+?>/ig, " ");
var bodyText = bodyHtml.match(/\w+/ig);
console.log(bodyText);
console.log(bodyText.length);
}
</script>
</head>
<body>
<h1>Here is a heading.</h1><a href="#">Link1</a><a href="#">Link2</a><a href="#">Link3</a><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>
</body>
</html>
Nice.
Also, a good discussion on the troubles with window.load() here:
http://stackoverflow.com/questions/6352789/cross-browser-compatible-way-to-bind-events-on-page-load
Also, a good discussion on the troubles with window.load() here:
http://stackoverflow.com/questions/6352789/cross-browser-compatible-way-to-bind-events-on-page-load
ASKER
Very interesting and many thanks.
I'm developing a plugin for firefox and the DOMContentLoaded event will be much more suitable than the window.load event. Some websites take for ages to load their flash movies and banners.
https://developer.mozilla.org/en-US/docs/Mozilla_event_reference/DOMContentLoaded_(event)
Thanks Again :)
I'm developing a plugin for firefox and the DOMContentLoaded event will be much more suitable than the window.load event. Some websites take for ages to load their flash movies and banners.
https://developer.mozilla.org/en-US/docs/Mozilla_event_reference/DOMContentLoaded_(event)
Thanks Again :)
ASKER
Much faster!
<!DOCTYPE html>
<html>
<head>
<script>
var listener = function(e)
{
window.removeEventListener("DOMContentLoaded", listener, false);
var bodyHtml = document.getElementsByTagName('body')[0].innerHTML.replace(/<(.|\n)+?>/ig, " ");
var bodyText = bodyHtml.match(/\w+/ig);
console.log(bodyText);
console.log(bodyText.length);
}
window.addEventListener("DOMContentLoaded", listener, false);
</script>
</head>
<body>
<h1>Here is a heading.</h1><a href="#">Link1</a><a href="#">Link2</a><a href="#">Link3</a><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>
</body>
</html>
ASKER
Kind Regards,
Adrian