Solved

plagiarism detection system (comparing all words within two textareas and outputting their similarities)

Posted on 2006-10-26
7
268 Views
Last Modified: 2012-05-05
Hi there, I basically have a form which takes in two textarea fields, sorts each word in the textarea into alphabetical order using array.sort() and outputs the data into an asp page.

Is it possible to program a bit of code to show an alphabetical listing of all the words that the texts have in common. Each word is followed by two numbers indicating how often the word occurs in each of the texts. For example:

Shared words        Freq in textarea1           Freq in textarea2

a                                    20                                14      
the                                 15                                12


I know i need to use arrays to do this but unfortunatly i cant get my head around how i would compare every word against eachother in the textareas. Iam programming this page using asp and javascript.

Any help would be much appreciated.

cheers

Graeme
0
Comment
Question by:graeme_douglas
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
7 Comments
 
LVL 1

Expert Comment

by:ukwebguy
ID: 17850563
The simplist method of comparing the two arrays of words would be to use loops to loop through for every word, check every other word, for example

shared_count = 0

for each word in array_1 ' so for each word in array_1

for each sec_word in array_2 ' we look through each word in array_2 until we find the word

if word = sec_word then

'they are in both arrays
'so add this word to a third array
array_3(shared_count) = word
shared_count = shared_count+1
exit for 'exit the for because we dont need to keep looping because we have found the word

end if

next

next

'then all you need to do is

for each word in array_3
response.write word
next


this is not a very efficient script and i think if you are doing this for a lot of words, you may run into time issues. However the fact that they are both sorted alphabetically means we have reduced the time complexity. You could do some special checking if this becomes a bigger problem to check the first letter of each word in the array as you loop through, so if the first word is "banana" and the second "candle" we know that because the array is sorted we are never going to find any words beginning 'b' after "candle" so we can exit for on the inner loop and move onto the next word. You will need to remember that numbers will effect the results as will non standard characters.


I hope this helps you

Regards,
0
 

Author Comment

by:graeme_douglas
ID: 17851431
Will give this a go and let you know ukwebguy. I did assume it would use a method similar to the one above just wasnt sure exactly how to write it.
0
 

Author Comment

by:graeme_douglas
ID: 17956931
ok i have tried to impliment this with no avail. I have posted the javascript code i have currently below:

function SortText() {
var inText = ("is a idea");
var inText2 = ("is a good idea");
   var inTextArray = inText.split(" ");
   var inTextArray2 = inText2.split(" ");
   inTextArray.sort();
   inTextArray2.sort();

   for(var i = 0; i < inTextArray.length; i++){
   
   if (inTextArray[i]==(inTextArray2[i]))
   {
         document.write(inTextArray[i] + "<BR>");
   }
   else
   {

   document.write("they are not equal" + "<BR>");
   }
   }
}

Problems - I have it lookiing through the two arrays looking for matches. Unfortunatly it doesnt pick up a match unless they are both in the same array element field. How can i get it to look through all of array inTextArray2 for inTextArray[0] then if not match go to inTextArray[1] if a match then go to array 3 and add that [i] element to it.......:S confused
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Expert Comment

by:ukwebguy
ID: 17971629
You almost had it.

function SortText() {
var inText = ("is a idea");
var inText2 = ("is a good idea");
   var inTextArray = inText.split(" ");
   var inTextArray2 = inText2.split(" ");
   inTextArray.sort();
   inTextArray2.sort();

   for(var i = 0; i < inTextArray.length; i++){
      for(var j = 0; j <i inTextArray2.length; j++){
         if (inTextArray[i]==inTextArray2[j]){
               document.write(inTextArray[i] + "<BR>");
         }
         else {
               document.write("they are not equal" + "<BR>");
         }
      }
   }
}

There was 1 loop missing, for every element in textArray1 we need to loop through every element in textArray2.
0
 

Author Comment

by:graeme_douglas
ID: 17981573
lol thnx its basically working now ukwebguy. i have 1 last problem, i have it now printing out the shared words in a table but need it 2 also print out the total ammount of times its in both document 1 and document 2. ne ideas? I was thinking of setting a counterArray 1 and a counterArray 2, once a comparison matches between newArray3 (array holding all the shared words) and document1 array, start a total which then takes the next value in the document1 array. if they dont match end the total, paste the total in2 counterArray1 and try the next field. Do you think iam on the correct lines?

Graeme
0
 
LVL 1

Accepted Solution

by:
ukwebguy earned 500 total points
ID: 18113786
I dont quite follow what you mean, but it is not difficult.

in this section;

 if (inTextArray[i]==inTextArray2[j]){
     document.write(inTextArray[i] + "<BR>");
     totalCount[i] = totalCount[i] + 1
 }

add a new array called TotalCount

That will count how many times the words appear in both documents.


0

Featured Post

Why You Need a DevOps Toolchain

IT needs to deliver services with more agility and velocity. IT must roll out application features and innovations faster to keep up with customer demands, which is where a DevOps toolchain steps in. View the infographic to see why you need a DevOps toolchain.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

CTAs encourage people to do something specific to show interest in your company, product or service. Keep reading to learn why CTAs should always be thought of as extremely important, albeit small, sections of websites.
When the s#!t hits the fan, you don’t have time to look up who’s on call, draft emails, call collaborators, or send text messages. An instant chat window is definitely the way to go, especially one like HipChat. HipChat is a true business app. An…
The viewer will learn how to dynamically set the form action using jQuery.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question