Link to home
Start Free TrialLog in
Avatar of AdrianSmithUK
AdrianSmithUK

asked on

JQuery: How to count words on a page.

Dear Experts,

1. The html code below should count the number of words in the body of a page.

2. The screenshot shows that if I remove all of the white space from the html body some of the words are combined. (eg: "Herearesomedivtagsrow1col1row1col2row2col1row2col2")

3. How could I correct this?


<!DOCTYPE html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js">
</script>
<script>
$(document).ready(function(){
        
        var numberOfMatches = $("body").text().match(/\w+/ig).length;
        console.log(numberOfMatches);
        
        var bodyText = $("body").text().match(/\w+/ig);
        console.log(bodyText);
        
});
</script>
</head>
<body>
<h1>Here is a heading.</h1><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>
</body>
</html>

Open in new window


User generated image
SOLUTION
Avatar of Big Monty
Big Monty
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of AdrianSmithUK
AdrianSmithUK

ASKER

Many thanks chaps.
Kind Regards,
Adrian
PS: Out of interest, in the end I solved the issue by appending spaces after selected div tags.

<!DOCTYPE html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>

<script>
$(document).ready(function(){
	
		//Add spaces after selected div tags.
        appendSpaces();
		
        var numberOfMatches = $("body").text().match(/\w+/ig).length;
        console.log(numberOfMatches);
        
        var bodyText = $("body").text().match(/\w+/ig);
        console.log(bodyText);
		
		console.log( $("p").text() );
		
});

function appendSpaces(){
		$("div").append(" ");
		$("td").append(" ");
		$("a").append(" ");
}

</script>
</head>
<body>

<h1>Here is a heading.</h1><a href="#">Link1</a><a href="#">Link2</a><a href="#">Link3</a><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>

</body>
</html>

Open in new window

This line of my solution does the same thing but for ALL html elements:

var bodyHTML = $("body").html().replace(/<(.|\n)+?>/ig, " ");
Does it not destroy all the HTML elements and replace them with a space?
Well, it creates a variable (using the .html() command) strips out any HTML, and then counts what is left making sure that there are white spaces between all the contents of the former HTML elements.  It doesn't "destroy" the HTML for the viewer.
I see. Definitely a good snippet. Many thanks. Adrian
Skrile

I re-factored the solution to use pure Javascript and your solution is beautiful. Here is the code.

<!DOCTYPE html>
<html>
<head>

<script>

window.onload = function(){
	
   var bodyHtml = document.getElementsByTagName('body')[0].innerHTML.replace(/<(.|\n)+?>/ig, " ");	
   var bodyText = bodyHtml.match(/\w+/ig);

   console.log(bodyText);
   console.log(bodyText.length);
}

</script>
</head>
<body>

<h1>Here is a heading.</h1><a href="#">Link1</a><a href="#">Link2</a><a href="#">Link3</a><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>

</body>
</html>

Open in new window

Very interesting and many thanks.

I'm developing a plugin for firefox and the DOMContentLoaded event will be much more suitable than the window.load event. Some websites take for ages to load their flash movies and banners.

https://developer.mozilla.org/en-US/docs/Mozilla_event_reference/DOMContentLoaded_(event)

Thanks Again :)
Much faster!

<!DOCTYPE html>
<html>
<head>

<script>

var listener = function(e)
{
    window.removeEventListener("DOMContentLoaded", listener, false);
    
	var bodyHtml = document.getElementsByTagName('body')[0].innerHTML.replace(/<(.|\n)+?>/ig, " ");	
	var bodyText = bodyHtml.match(/\w+/ig);

    console.log(bodyText);
    console.log(bodyText.length);
}

window.addEventListener("DOMContentLoaded", listener, false);

</script>
</head>
<body>

<h1>Here is a heading.</h1><a href="#">Link1</a><a href="#">Link2</a><a href="#">Link3</a><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>

</body>
</html>

Open in new window