Solved

Count or sort times string is found in text

Posted on 2013-01-25
7
255 Views
Last Modified: 2013-01-26
This code works fine but I need either the total number of times each word is duplicated, (currently set to 4) then display only duplicates with duplicate count on a separate line above, or even below the list of words, or I need the duplicated listed in descending order of how many times repeated. I actually do not need to display any words which are not duplicated, just those which are duplicated (and the count).

Currently, this is far too difficult to make much sense of a full page of text with the duplicate words simply highlighted.

<!DOCTYPE html>
<html>
<head>
  <script type='text/javascript' src='http://code.jquery.com/jquery-1.5.js'></script>
  
  <style type='text/css'>
    span.duplicate { background: #ffdddd;
		}
  </style>

<script type='text/javascript'>//<![CDATA[ 
$(function(){
var text = $('p').text(),
    words = text.split(' '),
    sortedWords = words.slice(0).sort(),
    duplicateWords = [],
    sentences = text.split('.'),
    sortedSentences = sentences.slice(0).sort(),
    duplicateSentences = [];


for (var a=0; a<sortedWords.length-1;a++) {
if ( sortedWords[a].length > 4 
    && sortedWords[a+1].length > 4 
    && sortedWords[a+1] == sortedWords[a])
{
    duplicateWords.push(sortedWords[a]);
}
} 

$('a.words').click(function(){
    var highlighted = $.map(words, function(word){
        if ($.inArray(word, duplicateWords) > -1)
            return '<span class="duplicate">' + word + '</span>';
        else return word;
    });
    $('p').html(highlighted.join(' '));
    return false;
});



});//]]>  

</script>
</head>
<body>

  <p>Bob is attempting to find the reason Janet is attempting to understand Bob and to reason with him.</p>

<hr />
<a class="words" href="#">Find duplicate words</a>
  
</body>
</html>

Open in new window

0
Comment
Question by:Qsorb
  • 3
  • 2
  • 2
7 Comments
 
LVL 23

Expert Comment

by:Roopesh Reddy
Comment Utility
Hi,

Check this -

<!DOCTYPE html>
<html>
<head>
  <script type='text/javascript' src='http://code.jquery.com/jquery-1.5.js'></script>
   
  <style type='text/css'>
   span.duplicate { background: yellow; }
  </style>
</head>
<body>

<p>Bob is attempting to find the reason Janet is attempting to understand Bob and to reason with him.</p>
<hr />
<ul id="dupList">
    
</ul>

<script type='text/javascript'>//<![CDATA[
    var text = $('p').text(),
    words = text.split(' '),
    sortedWords = words.slice(0).sort(),
    duplicateWords = []
    highlighted = [];

    var duplicateList = [];

    for (var i = 0; i < sortedWords.length - 1; i++) {
        var r1 = new RegExp('^' + sortedWords[i + 1] + '(\\.?)$'),
        r2 = new RegExp('^' + sortedWords[i] + '(\\.?)$')
        if (r1.test(sortedWords[i]) || r2.test(sortedWords[i + 1])) {
            duplicateWords.push(sortedWords[i].replace('.', ''));
        }
    }

    duplicateWords = $.unique(duplicateWords);


    for (var j = 0, m = []; j < words.length; j++) {
        var isDuplicate = false;
        for (var k = 0; k < duplicateWords.length; k++) {
            var re = new RegExp('^' + duplicateWords[k] + '(\\.?)$');
            if (re.test(words[j])) {
                isDuplicate = true;
                duplicateList.push(words[j]);
                break;
            }
        }

        m.push(isDuplicate);
        if (!m[j] && m[j - 1])
            highlighted.push('</span>');
        else if (m[j] && !m[j - 1])
            highlighted.push('<span class="duplicate">');
        highlighted.push(words[j]);

    }

    $('p').html(highlighted.join(' '));
    var result = [];
    for (var i = 0; i < duplicateList.length; i++) {
        for (var j = 0; j < duplicateList.length; j++) {
            if (duplicateList[i] == duplicateList[j]) {
                var isInList = $.inArray(duplicateList[i], result);
                if(isInList == -1)
                    result.push(duplicateList[i]);
            }
        }
    }

    for (var i = 0; i < result.length; i++) {
        $('<li>' + result[i] + '</li>').appendTo('#dupList');
    }

</script>
  
</body>
</html>

Open in new window


It displays the list of words duplicated.

Check this link to display the count - http://newcodeandroll.blogspot.in/2012/01/how-to-find-duplicates-in-array-in.html

Hope it helps u...
0
 
LVL 23

Expert Comment

by:Roopesh Reddy
Comment Utility
Hi,

Here we go the complete sample -

<!DOCTYPE html>
<html>
<head>
  <script type='text/javascript' src='http://code.jquery.com/jquery-1.5.js'></script>
   
  <style type='text/css'>
   span.duplicate { background: yellow; }
  </style>
</head>
<body>

<p>Bob is attempting to find the reason Janet is attempting to understand Bob and to reason with him.</p>
<hr />
<ul id="dupList">
    
</ul>

<script type='text/javascript'>//<![CDATA[
    var text = $('p').text(),
    words = text.split(' '),
    sortedWords = words.slice(0).sort(),
    duplicateWords = []
    highlighted = [];

    var duplicateList = [];

    for (var i = 0; i < sortedWords.length - 1; i++) {
        var r1 = new RegExp('^' + sortedWords[i + 1] + '(\\.?)$'),
        r2 = new RegExp('^' + sortedWords[i] + '(\\.?)$')
        if (r1.test(sortedWords[i]) || r2.test(sortedWords[i + 1])) {
            duplicateWords.push(sortedWords[i].replace('.', ''));
        }
    }

    duplicateWords = $.unique(duplicateWords);


    for (var j = 0, m = []; j < words.length; j++) {
        var isDuplicate = false;
        for (var k = 0; k < duplicateWords.length; k++) {
            var re = new RegExp('^' + duplicateWords[k] + '(\\.?)$');
            if (re.test(words[j])) {
                isDuplicate = true;
                duplicateList.push(words[j]);
                break;
            }
        }

        m.push(isDuplicate);
        if (!m[j] && m[j - 1])
            highlighted.push('</span>');
        else if (m[j] && !m[j - 1])
            highlighted.push('<span class="duplicate">');
        highlighted.push(words[j]);

    }

    $('p').html(highlighted.join(' '));
    var result = [];
    for (var i = 0; i < duplicateList.length; i++) {
        for (var j = 0; j < duplicateList.length; j++) {
            if (duplicateList[i] == duplicateList[j]) {
                var isInList = $.inArray(duplicateList[i], result);
                if(isInList == -1)
                    result.push(duplicateList[i]);
            }
        }
    }


    for (var i = 0; i < result.length; i++) {
        var count = 0;
        for (j = 0; j < words.length; j++) {
            if (result[i] == words[j]) {
                count++;
            }
        }
            $('<li>' + result[i]  +', '+ count+'</li>').appendTo('#dupList');
    }

</script>
  
</body>
</html>

Open in new window


Hope it helps u...
0
 
LVL 33

Expert Comment

by:Slick812
Comment Utility
not sure about the HTML output you might want ? but I had this javascript function called  getDups(txt)  which got the duplicate words from some Text input, and gave a two line (in html) for each word that was more than one in the Text, I changed it to only do words over 4 in length, here is the js I used to test it for this -

<script>
var tx1 = "Bobob is attempting-to find him the reason, Janet is attempting. to understand Bobob and to! reason with him a!s Bobob in reason or reason.";

function getDups(txt) { // Only has duplicates
// I had to remove most common punctuation
// BUT not every one, so BeWare if the text has other punctuation
txt=txt.replace(/,/g,"");
txt=txt.replace(/\./g,"");
txt=txt.replace(/\?/g,"");
txt=txt.replace(/!/g,"");
txt=txt.replace(/\"/g,"");
txt=txt.replace(/\'/g,"");
txt=txt.replace(/\-/g," ");
txt=txt.replace(/;/g,"");
txt=txt.replace(/:/g," ");

wordary = txt.split(' ');
wordary = wordary.sort();

var dNum = 1;
var dups = {};
for (var i=0; i < wordary.length-2; i++) {
if ( wordary[i].length > 4 
    && wordary[i+1].length > 4 
    && wordary[i+1] == wordary[i]) {
	++dNum;
	dups[wordary[i]] = dNum;
	} else dNum = 1;
}

txt = "";
var key = "";
for (key in dups) {
    txt+= key +"<br /> has "+dups[key]+" duplicates,<br />";
    }
return txt;
}


function showUm(text) {
text = getDups(text);
document.getElementById("de").innerHTML = text;
}

</script>

Open in new window


with this html -

<button onclick="showUm(tx1)"> Get Duplicates </button>
<p id="de">dups here</p>
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:Qsorb
Comment Utility
roopeshredd:

Thanks for the attempts. Unfortunately, both suggestions gave me the same result. I used only your code but could not get it to do anything other than what is seen in the image I attached.

But good news is that Slick81 has a suggestions that is just about usable for me. He will get all or the majority of the points but I have one additional need to get this to work. See below.
roopeshreddy-1.gif
0
 

Author Comment

by:Qsorb
Comment Utility
Slick81:

Hey, that works! But because the text can be quite large, I need the results sorted by highest number of hits descending, on top. I really only need the top ten results but if that's a pain, then just sort to highest result first. Is this doable?
0
 
LVL 33

Accepted Solution

by:
Slick812 earned 500 total points
Comment Utility
? ?
yea it's doable, I would think. .  but I have the output string created in this -
for (key in dups) {
    txt+= key +"<br /> has "+dups[key]+" duplicates,<br />";
    }

 the  dups[key]  has each Number of duplicates
for me to sort an Object would require me to do it by math and array[] restack to get a new object with reordered properties

maybe tomorrow, I'll have time, I have some code in pascal that does that, but none in javascript that I remember, not as easy as it sounds.
0
 

Author Closing Comment

by:Qsorb
Comment Utility
No problem. I'll make it another question because you were certainly able to do what I asked. I'll post the new question in a few minutes. Thanks so much.
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

JavaScript has plenty of pieces of code people often just copy/paste from somewhere but never quite fully understand. Self-Executing functions are just one good example that I'll try to demystify here.
Boost your ability to deliver ambitious and competitive web apps by choosing the right JavaScript framework to best suit your project’s needs.
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now