Solved

JQuery: How to count words on a page.

Posted on 2013-02-05
12
352 Views
Last Modified: 2013-02-06
Dear Experts,

1. The html code below should count the number of words in the body of a page.

2. The screenshot shows that if I remove all of the white space from the html body some of the words are combined. (eg: "Herearesomedivtagsrow1col1row1col2row2col1row2col2")

3. How could I correct this?


<!DOCTYPE html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js">
</script>
<script>
$(document).ready(function(){
        
        var numberOfMatches = $("body").text().match(/\w+/ig).length;
        console.log(numberOfMatches);
        
        var bodyText = $("body").text().match(/\w+/ig);
        console.log(bodyText);
        
});
</script>
</head>
<body>
<h1>Here is a heading.</h1><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>
</body>
</html>

Open in new window


Notice - Herearesomedivtagsrow1col1row1col2row2col1row2col2
0
Comment
Question by:AdrianSmithUK
  • 7
  • 4
12 Comments
 
LVL 32

Assisted Solution

by:Big Monty
Big Monty earned 50 total points
ID: 38856445
0
 
LVL 16

Accepted Solution

by:
Steve Krile earned 450 total points
ID: 38856605
This seemed to do the trick for me:

        //replace all HTML elements with a blank space - this makes sure there are spaces between every word and will be ignored by your MATCH statement
        var bodyHTML = $("body").html().replace(/<(.|\n)+?>/ig, " ");

        
        var bodyText = bodyHTML.match(/\w+/ig);

        console.log(bodyText);
        console.log(bodyText.length);

Open in new window


The key is to remember that the text() jquery function ignores all HTML tags and compresses all the contents of the BODY tag into one result.  Instead, use the .html() function and then a regex function to chop out all the html elements.
0
 

Author Closing Comment

by:AdrianSmithUK
ID: 38857307
Many thanks chaps.
Kind Regards,
Adrian
0
 

Author Comment

by:AdrianSmithUK
ID: 38858608
PS: Out of interest, in the end I solved the issue by appending spaces after selected div tags.

<!DOCTYPE html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>

<script>
$(document).ready(function(){
	
		//Add spaces after selected div tags.
        appendSpaces();
		
        var numberOfMatches = $("body").text().match(/\w+/ig).length;
        console.log(numberOfMatches);
        
        var bodyText = $("body").text().match(/\w+/ig);
        console.log(bodyText);
		
		console.log( $("p").text() );
		
});

function appendSpaces(){
		$("div").append(" ");
		$("td").append(" ");
		$("a").append(" ");
}

</script>
</head>
<body>

<h1>Here is a heading.</h1><a href="#">Link1</a><a href="#">Link2</a><a href="#">Link3</a><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>

</body>
</html>

Open in new window

0
 
LVL 16

Expert Comment

by:Steve Krile
ID: 38859125
This line of my solution does the same thing but for ALL html elements:

var bodyHTML = $("body").html().replace(/<(.|\n)+?>/ig, " ");
0
 

Author Comment

by:AdrianSmithUK
ID: 38859242
Does it not destroy all the HTML elements and replace them with a space?
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 
LVL 16

Expert Comment

by:Steve Krile
ID: 38859250
Well, it creates a variable (using the .html() command) strips out any HTML, and then counts what is left making sure that there are white spaces between all the contents of the former HTML elements.  It doesn't "destroy" the HTML for the viewer.
0
 

Author Comment

by:AdrianSmithUK
ID: 38859261
I see. Definitely a good snippet. Many thanks. Adrian
0
 

Author Comment

by:AdrianSmithUK
ID: 38860102
Skrile

I re-factored the solution to use pure Javascript and your solution is beautiful. Here is the code.

<!DOCTYPE html>
<html>
<head>

<script>

window.onload = function(){
	
   var bodyHtml = document.getElementsByTagName('body')[0].innerHTML.replace(/<(.|\n)+?>/ig, " ");	
   var bodyText = bodyHtml.match(/\w+/ig);

   console.log(bodyText);
   console.log(bodyText.length);
}

</script>
</head>
<body>

<h1>Here is a heading.</h1><a href="#">Link1</a><a href="#">Link2</a><a href="#">Link3</a><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>

</body>
</html>

Open in new window

0
 
LVL 16

Expert Comment

by:Steve Krile
ID: 38860117
Nice.

Also, a good discussion on the troubles with window.load() here:

http://stackoverflow.com/questions/6352789/cross-browser-compatible-way-to-bind-events-on-page-load
0
 

Author Comment

by:AdrianSmithUK
ID: 38860177
Very interesting and many thanks.

I'm developing a plugin for firefox and the DOMContentLoaded event will be much more suitable than the window.load event. Some websites take for ages to load their flash movies and banners.

https://developer.mozilla.org/en-US/docs/Mozilla_event_reference/DOMContentLoaded_(event)

Thanks Again :)
0
 

Author Comment

by:AdrianSmithUK
ID: 38860209
Much faster!

<!DOCTYPE html>
<html>
<head>

<script>

var listener = function(e)
{
    window.removeEventListener("DOMContentLoaded", listener, false);
    
	var bodyHtml = document.getElementsByTagName('body')[0].innerHTML.replace(/<(.|\n)+?>/ig, " ");	
	var bodyText = bodyHtml.match(/\w+/ig);

    console.log(bodyText);
    console.log(bodyText.length);
}

window.addEventListener("DOMContentLoaded", listener, false);

</script>
</head>
<body>

<h1>Here is a heading.</h1><a href="#">Link1</a><a href="#">Link2</a><a href="#">Link3</a><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>

</body>
</html>

Open in new window

0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction If you're like most people, you have occasionally made a typographical error when you're entering information into an online form.  And to your consternation, the browser remembers the error, and offers to autocomplete your future entr…
JavaScript can be used in a browser to change parts of a webpage dynamically. It begins with the following pattern: If condition W is true, do thing X to target Y after event Z. Below are some tips and tricks to help you get started with JavaScript …
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…

919 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now