We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you two Citrix podcasts. Learn about 2020 trends and get answers to your biggest Citrix questions!Listen Now

x

plagiarism detection system (comparing all words within two textareas and outputting their similarities)

graeme_douglas
on
Medium Priority
310 Views
Last Modified: 2012-05-05
Hi there, I basically have a form which takes in two textarea fields, sorts each word in the textarea into alphabetical order using array.sort() and outputs the data into an asp page.

Is it possible to program a bit of code to show an alphabetical listing of all the words that the texts have in common. Each word is followed by two numbers indicating how often the word occurs in each of the texts. For example:

Shared words        Freq in textarea1           Freq in textarea2

a                                    20                                14      
the                                 15                                12


I know i need to use arrays to do this but unfortunatly i cant get my head around how i would compare every word against eachother in the textareas. Iam programming this page using asp and javascript.

Any help would be much appreciated.

cheers

Graeme
Comment
Watch Question

Commented:
The simplist method of comparing the two arrays of words would be to use loops to loop through for every word, check every other word, for example

shared_count = 0

for each word in array_1 ' so for each word in array_1

for each sec_word in array_2 ' we look through each word in array_2 until we find the word

if word = sec_word then

'they are in both arrays
'so add this word to a third array
array_3(shared_count) = word
shared_count = shared_count+1
exit for 'exit the for because we dont need to keep looping because we have found the word

end if

next

next

'then all you need to do is

for each word in array_3
response.write word
next


this is not a very efficient script and i think if you are doing this for a lot of words, you may run into time issues. However the fact that they are both sorted alphabetically means we have reduced the time complexity. You could do some special checking if this becomes a bigger problem to check the first letter of each word in the array as you loop through, so if the first word is "banana" and the second "candle" we know that because the array is sorted we are never going to find any words beginning 'b' after "candle" so we can exit for on the inner loop and move onto the next word. You will need to remember that numbers will effect the results as will non standard characters.


I hope this helps you

Regards,

Author

Commented:
Will give this a go and let you know ukwebguy. I did assume it would use a method similar to the one above just wasnt sure exactly how to write it.

Author

Commented:
ok i have tried to impliment this with no avail. I have posted the javascript code i have currently below:

function SortText() {
var inText = ("is a idea");
var inText2 = ("is a good idea");
   var inTextArray = inText.split(" ");
   var inTextArray2 = inText2.split(" ");
   inTextArray.sort();
   inTextArray2.sort();

   for(var i = 0; i < inTextArray.length; i++){
   
   if (inTextArray[i]==(inTextArray2[i]))
   {
         document.write(inTextArray[i] + "<BR>");
   }
   else
   {

   document.write("they are not equal" + "<BR>");
   }
   }
}

Problems - I have it lookiing through the two arrays looking for matches. Unfortunatly it doesnt pick up a match unless they are both in the same array element field. How can i get it to look through all of array inTextArray2 for inTextArray[0] then if not match go to inTextArray[1] if a match then go to array 3 and add that [i] element to it.......:S confused

Commented:
You almost had it.

function SortText() {
var inText = ("is a idea");
var inText2 = ("is a good idea");
   var inTextArray = inText.split(" ");
   var inTextArray2 = inText2.split(" ");
   inTextArray.sort();
   inTextArray2.sort();

   for(var i = 0; i < inTextArray.length; i++){
      for(var j = 0; j <i inTextArray2.length; j++){
         if (inTextArray[i]==inTextArray2[j]){
               document.write(inTextArray[i] + "<BR>");
         }
         else {
               document.write("they are not equal" + "<BR>");
         }
      }
   }
}

There was 1 loop missing, for every element in textArray1 we need to loop through every element in textArray2.

Author

Commented:
lol thnx its basically working now ukwebguy. i have 1 last problem, i have it now printing out the shared words in a table but need it 2 also print out the total ammount of times its in both document 1 and document 2. ne ideas? I was thinking of setting a counterArray 1 and a counterArray 2, once a comparison matches between newArray3 (array holding all the shared words) and document1 array, start a total which then takes the next value in the document1 array. if they dont match end the total, paste the total in2 counterArray1 and try the next field. Do you think iam on the correct lines?

Graeme
Commented:
I dont quite follow what you mean, but it is not difficult.

in this section;

 if (inTextArray[i]==inTextArray2[j]){
     document.write(inTextArray[i] + "<BR>");
     totalCount[i] = totalCount[i] + 1
 }

add a new array called TotalCount

That will count how many times the words appear in both documents.


Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.