Solved

Count or sort times string is found in text

Posted on 2013-01-25
7
259 Views
Last Modified: 2013-01-26
This code works fine but I need either the total number of times each word is duplicated, (currently set to 4) then display only duplicates with duplicate count on a separate line above, or even below the list of words, or I need the duplicated listed in descending order of how many times repeated. I actually do not need to display any words which are not duplicated, just those which are duplicated (and the count).

Currently, this is far too difficult to make much sense of a full page of text with the duplicate words simply highlighted.

<!DOCTYPE html>
<html>
<head>
  <script type='text/javascript' src='http://code.jquery.com/jquery-1.5.js'></script>
  
  <style type='text/css'>
    span.duplicate { background: #ffdddd;
		}
  </style>

<script type='text/javascript'>//<![CDATA[ 
$(function(){
var text = $('p').text(),
    words = text.split(' '),
    sortedWords = words.slice(0).sort(),
    duplicateWords = [],
    sentences = text.split('.'),
    sortedSentences = sentences.slice(0).sort(),
    duplicateSentences = [];


for (var a=0; a<sortedWords.length-1;a++) {
if ( sortedWords[a].length > 4 
    && sortedWords[a+1].length > 4 
    && sortedWords[a+1] == sortedWords[a])
{
    duplicateWords.push(sortedWords[a]);
}
} 

$('a.words').click(function(){
    var highlighted = $.map(words, function(word){
        if ($.inArray(word, duplicateWords) > -1)
            return '<span class="duplicate">' + word + '</span>';
        else return word;
    });
    $('p').html(highlighted.join(' '));
    return false;
});



});//]]>  

</script>
</head>
<body>

  <p>Bob is attempting to find the reason Janet is attempting to understand Bob and to reason with him.</p>

<hr />
<a class="words" href="#">Find duplicate words</a>
  
</body>
</html>

Open in new window

0
Comment
Question by:Qsorb
  • 3
  • 2
  • 2
7 Comments
 
LVL 23

Expert Comment

by:Roopesh Reddy
ID: 38822294
Hi,

Check this -

<!DOCTYPE html>
<html>
<head>
  <script type='text/javascript' src='http://code.jquery.com/jquery-1.5.js'></script>
   
  <style type='text/css'>
   span.duplicate { background: yellow; }
  </style>
</head>
<body>

<p>Bob is attempting to find the reason Janet is attempting to understand Bob and to reason with him.</p>
<hr />
<ul id="dupList">
    
</ul>

<script type='text/javascript'>//<![CDATA[
    var text = $('p').text(),
    words = text.split(' '),
    sortedWords = words.slice(0).sort(),
    duplicateWords = []
    highlighted = [];

    var duplicateList = [];

    for (var i = 0; i < sortedWords.length - 1; i++) {
        var r1 = new RegExp('^' + sortedWords[i + 1] + '(\\.?)$'),
        r2 = new RegExp('^' + sortedWords[i] + '(\\.?)$')
        if (r1.test(sortedWords[i]) || r2.test(sortedWords[i + 1])) {
            duplicateWords.push(sortedWords[i].replace('.', ''));
        }
    }

    duplicateWords = $.unique(duplicateWords);


    for (var j = 0, m = []; j < words.length; j++) {
        var isDuplicate = false;
        for (var k = 0; k < duplicateWords.length; k++) {
            var re = new RegExp('^' + duplicateWords[k] + '(\\.?)$');
            if (re.test(words[j])) {
                isDuplicate = true;
                duplicateList.push(words[j]);
                break;
            }
        }

        m.push(isDuplicate);
        if (!m[j] && m[j - 1])
            highlighted.push('</span>');
        else if (m[j] && !m[j - 1])
            highlighted.push('<span class="duplicate">');
        highlighted.push(words[j]);

    }

    $('p').html(highlighted.join(' '));
    var result = [];
    for (var i = 0; i < duplicateList.length; i++) {
        for (var j = 0; j < duplicateList.length; j++) {
            if (duplicateList[i] == duplicateList[j]) {
                var isInList = $.inArray(duplicateList[i], result);
                if(isInList == -1)
                    result.push(duplicateList[i]);
            }
        }
    }

    for (var i = 0; i < result.length; i++) {
        $('<li>' + result[i] + '</li>').appendTo('#dupList');
    }

</script>
  
</body>
</html>

Open in new window


It displays the list of words duplicated.

Check this link to display the count - http://newcodeandroll.blogspot.in/2012/01/how-to-find-duplicates-in-array-in.html

Hope it helps u...
0
 
LVL 23

Expert Comment

by:Roopesh Reddy
ID: 38822345
Hi,

Here we go the complete sample -

<!DOCTYPE html>
<html>
<head>
  <script type='text/javascript' src='http://code.jquery.com/jquery-1.5.js'></script>
   
  <style type='text/css'>
   span.duplicate { background: yellow; }
  </style>
</head>
<body>

<p>Bob is attempting to find the reason Janet is attempting to understand Bob and to reason with him.</p>
<hr />
<ul id="dupList">
    
</ul>

<script type='text/javascript'>//<![CDATA[
    var text = $('p').text(),
    words = text.split(' '),
    sortedWords = words.slice(0).sort(),
    duplicateWords = []
    highlighted = [];

    var duplicateList = [];

    for (var i = 0; i < sortedWords.length - 1; i++) {
        var r1 = new RegExp('^' + sortedWords[i + 1] + '(\\.?)$'),
        r2 = new RegExp('^' + sortedWords[i] + '(\\.?)$')
        if (r1.test(sortedWords[i]) || r2.test(sortedWords[i + 1])) {
            duplicateWords.push(sortedWords[i].replace('.', ''));
        }
    }

    duplicateWords = $.unique(duplicateWords);


    for (var j = 0, m = []; j < words.length; j++) {
        var isDuplicate = false;
        for (var k = 0; k < duplicateWords.length; k++) {
            var re = new RegExp('^' + duplicateWords[k] + '(\\.?)$');
            if (re.test(words[j])) {
                isDuplicate = true;
                duplicateList.push(words[j]);
                break;
            }
        }

        m.push(isDuplicate);
        if (!m[j] && m[j - 1])
            highlighted.push('</span>');
        else if (m[j] && !m[j - 1])
            highlighted.push('<span class="duplicate">');
        highlighted.push(words[j]);

    }

    $('p').html(highlighted.join(' '));
    var result = [];
    for (var i = 0; i < duplicateList.length; i++) {
        for (var j = 0; j < duplicateList.length; j++) {
            if (duplicateList[i] == duplicateList[j]) {
                var isInList = $.inArray(duplicateList[i], result);
                if(isInList == -1)
                    result.push(duplicateList[i]);
            }
        }
    }


    for (var i = 0; i < result.length; i++) {
        var count = 0;
        for (j = 0; j < words.length; j++) {
            if (result[i] == words[j]) {
                count++;
            }
        }
            $('<li>' + result[i]  +', '+ count+'</li>').appendTo('#dupList');
    }

</script>
  
</body>
</html>

Open in new window


Hope it helps u...
0
 
LVL 34

Expert Comment

by:Slick812
ID: 38822715
not sure about the HTML output you might want ? but I had this javascript function called  getDups(txt)  which got the duplicate words from some Text input, and gave a two line (in html) for each word that was more than one in the Text, I changed it to only do words over 4 in length, here is the js I used to test it for this -

<script>
var tx1 = "Bobob is attempting-to find him the reason, Janet is attempting. to understand Bobob and to! reason with him a!s Bobob in reason or reason.";

function getDups(txt) { // Only has duplicates
// I had to remove most common punctuation
// BUT not every one, so BeWare if the text has other punctuation
txt=txt.replace(/,/g,"");
txt=txt.replace(/\./g,"");
txt=txt.replace(/\?/g,"");
txt=txt.replace(/!/g,"");
txt=txt.replace(/\"/g,"");
txt=txt.replace(/\'/g,"");
txt=txt.replace(/\-/g," ");
txt=txt.replace(/;/g,"");
txt=txt.replace(/:/g," ");

wordary = txt.split(' ');
wordary = wordary.sort();

var dNum = 1;
var dups = {};
for (var i=0; i < wordary.length-2; i++) {
if ( wordary[i].length > 4 
    && wordary[i+1].length > 4 
    && wordary[i+1] == wordary[i]) {
	++dNum;
	dups[wordary[i]] = dNum;
	} else dNum = 1;
}

txt = "";
var key = "";
for (key in dups) {
    txt+= key +"<br /> has "+dups[key]+" duplicates,<br />";
    }
return txt;
}


function showUm(text) {
text = getDups(text);
document.getElementById("de").innerHTML = text;
}

</script>

Open in new window


with this html -

<button onclick="showUm(tx1)"> Get Duplicates </button>
<p id="de">dups here</p>
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 

Author Comment

by:Qsorb
ID: 38822931
roopeshredd:

Thanks for the attempts. Unfortunately, both suggestions gave me the same result. I used only your code but could not get it to do anything other than what is seen in the image I attached.

But good news is that Slick81 has a suggestions that is just about usable for me. He will get all or the majority of the points but I have one additional need to get this to work. See below.
roopeshreddy-1.gif
0
 

Author Comment

by:Qsorb
ID: 38822937
Slick81:

Hey, that works! But because the text can be quite large, I need the results sorted by highest number of hits descending, on top. I really only need the top ten results but if that's a pain, then just sort to highest result first. Is this doable?
0
 
LVL 34

Accepted Solution

by:
Slick812 earned 500 total points
ID: 38823250
? ?
yea it's doable, I would think. .  but I have the output string created in this -
for (key in dups) {
    txt+= key +"<br /> has "+dups[key]+" duplicates,<br />";
    }

 the  dups[key]  has each Number of duplicates
for me to sort an Object would require me to do it by math and array[] restack to get a new object with reordered properties

maybe tomorrow, I'll have time, I have some code in pascal that does that, but none in javascript that I remember, not as easy as it sounds.
0
 

Author Closing Comment

by:Qsorb
ID: 38823312
No problem. I'll make it another question because you were certainly able to do what I asked. I'll post the new question in a few minutes. Thanks so much.
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article discusses how to create an extensible mechanism for linked drop downs.
Boost your ability to deliver ambitious and competitive web apps by choosing the right JavaScript framework to best suit your project’s needs.
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…
HTML5 has deprecated a few of the older ways of showing media as well as offering up a new way to create games and animations. Audio, video, and canvas are just a few of the adjustments made between XHTML and HTML5. As we learned in our last micr…

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question