Solved

Count or sort times string is found in text

Posted on 2013-01-25
7
260 Views
Last Modified: 2013-01-26
This code works fine but I need either the total number of times each word is duplicated, (currently set to 4) then display only duplicates with duplicate count on a separate line above, or even below the list of words, or I need the duplicated listed in descending order of how many times repeated. I actually do not need to display any words which are not duplicated, just those which are duplicated (and the count).

Currently, this is far too difficult to make much sense of a full page of text with the duplicate words simply highlighted.

<!DOCTYPE html>
<html>
<head>
  <script type='text/javascript' src='http://code.jquery.com/jquery-1.5.js'></script>
  
  <style type='text/css'>
    span.duplicate { background: #ffdddd;
		}
  </style>

<script type='text/javascript'>//<![CDATA[ 
$(function(){
var text = $('p').text(),
    words = text.split(' '),
    sortedWords = words.slice(0).sort(),
    duplicateWords = [],
    sentences = text.split('.'),
    sortedSentences = sentences.slice(0).sort(),
    duplicateSentences = [];


for (var a=0; a<sortedWords.length-1;a++) {
if ( sortedWords[a].length > 4 
    && sortedWords[a+1].length > 4 
    && sortedWords[a+1] == sortedWords[a])
{
    duplicateWords.push(sortedWords[a]);
}
} 

$('a.words').click(function(){
    var highlighted = $.map(words, function(word){
        if ($.inArray(word, duplicateWords) > -1)
            return '<span class="duplicate">' + word + '</span>';
        else return word;
    });
    $('p').html(highlighted.join(' '));
    return false;
});



});//]]>  

</script>
</head>
<body>

  <p>Bob is attempting to find the reason Janet is attempting to understand Bob and to reason with him.</p>

<hr />
<a class="words" href="#">Find duplicate words</a>
  
</body>
</html>

Open in new window

0
Comment
Question by:Qsorb
  • 3
  • 2
  • 2
7 Comments
 
LVL 23

Expert Comment

by:Roopesh Reddy
ID: 38822294
Hi,

Check this -

<!DOCTYPE html>
<html>
<head>
  <script type='text/javascript' src='http://code.jquery.com/jquery-1.5.js'></script>
   
  <style type='text/css'>
   span.duplicate { background: yellow; }
  </style>
</head>
<body>

<p>Bob is attempting to find the reason Janet is attempting to understand Bob and to reason with him.</p>
<hr />
<ul id="dupList">
    
</ul>

<script type='text/javascript'>//<![CDATA[
    var text = $('p').text(),
    words = text.split(' '),
    sortedWords = words.slice(0).sort(),
    duplicateWords = []
    highlighted = [];

    var duplicateList = [];

    for (var i = 0; i < sortedWords.length - 1; i++) {
        var r1 = new RegExp('^' + sortedWords[i + 1] + '(\\.?)$'),
        r2 = new RegExp('^' + sortedWords[i] + '(\\.?)$')
        if (r1.test(sortedWords[i]) || r2.test(sortedWords[i + 1])) {
            duplicateWords.push(sortedWords[i].replace('.', ''));
        }
    }

    duplicateWords = $.unique(duplicateWords);


    for (var j = 0, m = []; j < words.length; j++) {
        var isDuplicate = false;
        for (var k = 0; k < duplicateWords.length; k++) {
            var re = new RegExp('^' + duplicateWords[k] + '(\\.?)$');
            if (re.test(words[j])) {
                isDuplicate = true;
                duplicateList.push(words[j]);
                break;
            }
        }

        m.push(isDuplicate);
        if (!m[j] && m[j - 1])
            highlighted.push('</span>');
        else if (m[j] && !m[j - 1])
            highlighted.push('<span class="duplicate">');
        highlighted.push(words[j]);

    }

    $('p').html(highlighted.join(' '));
    var result = [];
    for (var i = 0; i < duplicateList.length; i++) {
        for (var j = 0; j < duplicateList.length; j++) {
            if (duplicateList[i] == duplicateList[j]) {
                var isInList = $.inArray(duplicateList[i], result);
                if(isInList == -1)
                    result.push(duplicateList[i]);
            }
        }
    }

    for (var i = 0; i < result.length; i++) {
        $('<li>' + result[i] + '</li>').appendTo('#dupList');
    }

</script>
  
</body>
</html>

Open in new window


It displays the list of words duplicated.

Check this link to display the count - http://newcodeandroll.blogspot.in/2012/01/how-to-find-duplicates-in-array-in.html

Hope it helps u...
0
 
LVL 23

Expert Comment

by:Roopesh Reddy
ID: 38822345
Hi,

Here we go the complete sample -

<!DOCTYPE html>
<html>
<head>
  <script type='text/javascript' src='http://code.jquery.com/jquery-1.5.js'></script>
   
  <style type='text/css'>
   span.duplicate { background: yellow; }
  </style>
</head>
<body>

<p>Bob is attempting to find the reason Janet is attempting to understand Bob and to reason with him.</p>
<hr />
<ul id="dupList">
    
</ul>

<script type='text/javascript'>//<![CDATA[
    var text = $('p').text(),
    words = text.split(' '),
    sortedWords = words.slice(0).sort(),
    duplicateWords = []
    highlighted = [];

    var duplicateList = [];

    for (var i = 0; i < sortedWords.length - 1; i++) {
        var r1 = new RegExp('^' + sortedWords[i + 1] + '(\\.?)$'),
        r2 = new RegExp('^' + sortedWords[i] + '(\\.?)$')
        if (r1.test(sortedWords[i]) || r2.test(sortedWords[i + 1])) {
            duplicateWords.push(sortedWords[i].replace('.', ''));
        }
    }

    duplicateWords = $.unique(duplicateWords);


    for (var j = 0, m = []; j < words.length; j++) {
        var isDuplicate = false;
        for (var k = 0; k < duplicateWords.length; k++) {
            var re = new RegExp('^' + duplicateWords[k] + '(\\.?)$');
            if (re.test(words[j])) {
                isDuplicate = true;
                duplicateList.push(words[j]);
                break;
            }
        }

        m.push(isDuplicate);
        if (!m[j] && m[j - 1])
            highlighted.push('</span>');
        else if (m[j] && !m[j - 1])
            highlighted.push('<span class="duplicate">');
        highlighted.push(words[j]);

    }

    $('p').html(highlighted.join(' '));
    var result = [];
    for (var i = 0; i < duplicateList.length; i++) {
        for (var j = 0; j < duplicateList.length; j++) {
            if (duplicateList[i] == duplicateList[j]) {
                var isInList = $.inArray(duplicateList[i], result);
                if(isInList == -1)
                    result.push(duplicateList[i]);
            }
        }
    }


    for (var i = 0; i < result.length; i++) {
        var count = 0;
        for (j = 0; j < words.length; j++) {
            if (result[i] == words[j]) {
                count++;
            }
        }
            $('<li>' + result[i]  +', '+ count+'</li>').appendTo('#dupList');
    }

</script>
  
</body>
</html>

Open in new window


Hope it helps u...
0
 
LVL 34

Expert Comment

by:Slick812
ID: 38822715
not sure about the HTML output you might want ? but I had this javascript function called  getDups(txt)  which got the duplicate words from some Text input, and gave a two line (in html) for each word that was more than one in the Text, I changed it to only do words over 4 in length, here is the js I used to test it for this -

<script>
var tx1 = "Bobob is attempting-to find him the reason, Janet is attempting. to understand Bobob and to! reason with him a!s Bobob in reason or reason.";

function getDups(txt) { // Only has duplicates
// I had to remove most common punctuation
// BUT not every one, so BeWare if the text has other punctuation
txt=txt.replace(/,/g,"");
txt=txt.replace(/\./g,"");
txt=txt.replace(/\?/g,"");
txt=txt.replace(/!/g,"");
txt=txt.replace(/\"/g,"");
txt=txt.replace(/\'/g,"");
txt=txt.replace(/\-/g," ");
txt=txt.replace(/;/g,"");
txt=txt.replace(/:/g," ");

wordary = txt.split(' ');
wordary = wordary.sort();

var dNum = 1;
var dups = {};
for (var i=0; i < wordary.length-2; i++) {
if ( wordary[i].length > 4 
    && wordary[i+1].length > 4 
    && wordary[i+1] == wordary[i]) {
	++dNum;
	dups[wordary[i]] = dNum;
	} else dNum = 1;
}

txt = "";
var key = "";
for (key in dups) {
    txt+= key +"<br /> has "+dups[key]+" duplicates,<br />";
    }
return txt;
}


function showUm(text) {
text = getDups(text);
document.getElementById("de").innerHTML = text;
}

</script>

Open in new window


with this html -

<button onclick="showUm(tx1)"> Get Duplicates </button>
<p id="de">dups here</p>
0
Instantly Create Instructional Tutorials

Contextual Guidance at the moment of need helps your employees adopt to new software or processes instantly. Boost knowledge retention and employee engagement step-by-step with one easy solution.

 

Author Comment

by:Qsorb
ID: 38822931
roopeshredd:

Thanks for the attempts. Unfortunately, both suggestions gave me the same result. I used only your code but could not get it to do anything other than what is seen in the image I attached.

But good news is that Slick81 has a suggestions that is just about usable for me. He will get all or the majority of the points but I have one additional need to get this to work. See below.
roopeshreddy-1.gif
0
 

Author Comment

by:Qsorb
ID: 38822937
Slick81:

Hey, that works! But because the text can be quite large, I need the results sorted by highest number of hits descending, on top. I really only need the top ten results but if that's a pain, then just sort to highest result first. Is this doable?
0
 
LVL 34

Accepted Solution

by:
Slick812 earned 500 total points
ID: 38823250
? ?
yea it's doable, I would think. .  but I have the output string created in this -
for (key in dups) {
    txt+= key +"<br /> has "+dups[key]+" duplicates,<br />";
    }

 the  dups[key]  has each Number of duplicates
for me to sort an Object would require me to do it by math and array[] restack to get a new object with reordered properties

maybe tomorrow, I'll have time, I have some code in pascal that does that, but none in javascript that I remember, not as easy as it sounds.
0
 

Author Closing Comment

by:Qsorb
ID: 38823312
No problem. I'll make it another question because you were certainly able to do what I asked. I'll post the new question in a few minutes. Thanks so much.
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Knockoutjs (Knockout) is a JavaScript framework (Model View ViewModel or MVVM framework).   The main ideology behind Knockout is to control from JavaScript how a page looks whilst creating an engaging user experience in the least …
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question