?
Solved

JQuery: How to count words on a page.

Posted on 2013-02-05
12
Medium Priority
?
359 Views
Last Modified: 2013-02-06
Dear Experts,

1. The html code below should count the number of words in the body of a page.

2. The screenshot shows that if I remove all of the white space from the html body some of the words are combined. (eg: "Herearesomedivtagsrow1col1row1col2row2col1row2col2")

3. How could I correct this?


<!DOCTYPE html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js">
</script>
<script>
$(document).ready(function(){
        
        var numberOfMatches = $("body").text().match(/\w+/ig).length;
        console.log(numberOfMatches);
        
        var bodyText = $("body").text().match(/\w+/ig);
        console.log(bodyText);
        
});
</script>
</head>
<body>
<h1>Here is a heading.</h1><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>
</body>
</html>

Open in new window


Notice - Herearesomedivtagsrow1col1row1col2row2col1row2col2
0
Comment
Question by:AdrianSmithUK
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 4
12 Comments
 
LVL 33

Assisted Solution

by:Big Monty
Big Monty earned 200 total points
ID: 38856445
0
 
LVL 16

Accepted Solution

by:
Steve Krile earned 1800 total points
ID: 38856605
This seemed to do the trick for me:

        //replace all HTML elements with a blank space - this makes sure there are spaces between every word and will be ignored by your MATCH statement
        var bodyHTML = $("body").html().replace(/<(.|\n)+?>/ig, " ");

        
        var bodyText = bodyHTML.match(/\w+/ig);

        console.log(bodyText);
        console.log(bodyText.length);

Open in new window


The key is to remember that the text() jquery function ignores all HTML tags and compresses all the contents of the BODY tag into one result.  Instead, use the .html() function and then a regex function to chop out all the html elements.
0
 

Author Closing Comment

by:AdrianSmithUK
ID: 38857307
Many thanks chaps.
Kind Regards,
Adrian
0
WordPress Tutorial 2: Terminology

An important part of learning any new piece of software is understanding the terminology it uses. Thankfully WordPress uses fairly simple names for everything that make it easy to start using the software.

 

Author Comment

by:AdrianSmithUK
ID: 38858608
PS: Out of interest, in the end I solved the issue by appending spaces after selected div tags.

<!DOCTYPE html>
<html>
<head>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>

<script>
$(document).ready(function(){
	
		//Add spaces after selected div tags.
        appendSpaces();
		
        var numberOfMatches = $("body").text().match(/\w+/ig).length;
        console.log(numberOfMatches);
        
        var bodyText = $("body").text().match(/\w+/ig);
        console.log(bodyText);
		
		console.log( $("p").text() );
		
});

function appendSpaces(){
		$("div").append(" ");
		$("td").append(" ");
		$("a").append(" ");
}

</script>
</head>
<body>

<h1>Here is a heading.</h1><a href="#">Link1</a><a href="#">Link2</a><a href="#">Link3</a><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>

</body>
</html>

Open in new window

0
 
LVL 16

Expert Comment

by:Steve Krile
ID: 38859125
This line of my solution does the same thing but for ALL html elements:

var bodyHTML = $("body").html().replace(/<(.|\n)+?>/ig, " ");
0
 

Author Comment

by:AdrianSmithUK
ID: 38859242
Does it not destroy all the HTML elements and replace them with a space?
0
 
LVL 16

Expert Comment

by:Steve Krile
ID: 38859250
Well, it creates a variable (using the .html() command) strips out any HTML, and then counts what is left making sure that there are white spaces between all the contents of the former HTML elements.  It doesn't "destroy" the HTML for the viewer.
0
 

Author Comment

by:AdrianSmithUK
ID: 38859261
I see. Definitely a good snippet. Many thanks. Adrian
0
 

Author Comment

by:AdrianSmithUK
ID: 38860102
Skrile

I re-factored the solution to use pure Javascript and your solution is beautiful. Here is the code.

<!DOCTYPE html>
<html>
<head>

<script>

window.onload = function(){
	
   var bodyHtml = document.getElementsByTagName('body')[0].innerHTML.replace(/<(.|\n)+?>/ig, " ");	
   var bodyText = bodyHtml.match(/\w+/ig);

   console.log(bodyText);
   console.log(bodyText.length);
}

</script>
</head>
<body>

<h1>Here is a heading.</h1><a href="#">Link1</a><a href="#">Link2</a><a href="#">Link3</a><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>

</body>
</html>

Open in new window

0
 
LVL 16

Expert Comment

by:Steve Krile
ID: 38860117
Nice.

Also, a good discussion on the troubles with window.load() here:

http://stackoverflow.com/questions/6352789/cross-browser-compatible-way-to-bind-events-on-page-load
0
 

Author Comment

by:AdrianSmithUK
ID: 38860177
Very interesting and many thanks.

I'm developing a plugin for firefox and the DOMContentLoaded event will be much more suitable than the window.load event. Some websites take for ages to load their flash movies and banners.

https://developer.mozilla.org/en-US/docs/Mozilla_event_reference/DOMContentLoaded_(event)

Thanks Again :)
0
 

Author Comment

by:AdrianSmithUK
ID: 38860209
Much faster!

<!DOCTYPE html>
<html>
<head>

<script>

var listener = function(e)
{
    window.removeEventListener("DOMContentLoaded", listener, false);
    
	var bodyHtml = document.getElementsByTagName('body')[0].innerHTML.replace(/<(.|\n)+?>/ig, " ");	
	var bodyText = bodyHtml.match(/\w+/ig);

    console.log(bodyText);
    console.log(bodyText.length);
}

window.addEventListener("DOMContentLoaded", listener, false);

</script>
</head>
<body>

<h1>Here is a heading.</h1><a href="#">Link1</a><a href="#">Link2</a><a href="#">Link3</a><p>This is a paragraph.</p><p>This is another paragraph.</p><ul><li>Here is a bullet.</li><li>Here is another bullet.</li><li>Here is the last bullet.</li></ul><div>Here</div><div>are</div><div>some</div><div>div</div><div>tags</div><table width="200" border="0" cellspacing="0" cellpadding="0"><tr><td>row1col1</td><td>row1col2</td></tr><tr><td>row2col1</td><td>row2col2</td></tr></table>

</body>
</html>

Open in new window

0

Featured Post

What is a Denial of Service (DoS)?

A DoS is a malicious attempt to prevent the normal operation of a computer system. You may frequently see the terms 'DDoS' (Distributed Denial of Service) and 'DoS' used interchangeably, but there are some subtle differences.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Knockoutjs (Knockout) is a JavaScript framework (Model View ViewModel or MVVM framework).   The main ideology behind Knockout is to control from JavaScript how a page looks whilst creating an engaging user experience in the least …
I found this questions asking how to do this in many different forums, so I will describe here how to implement a solution using PHP and AJAX. The logical flow for the problem should be: Write an event handler for the first drop down box to get …
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…
Suggested Courses

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question