Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

plagiarism detection system (comparing all words within two textareas and outputting their similarities)

Posted on 2006-10-26
7
Medium Priority
?
270 Views
Last Modified: 2012-05-05
Hi there, I basically have a form which takes in two textarea fields, sorts each word in the textarea into alphabetical order using array.sort() and outputs the data into an asp page.

Is it possible to program a bit of code to show an alphabetical listing of all the words that the texts have in common. Each word is followed by two numbers indicating how often the word occurs in each of the texts. For example:

Shared words        Freq in textarea1           Freq in textarea2

a                                    20                                14      
the                                 15                                12


I know i need to use arrays to do this but unfortunatly i cant get my head around how i would compare every word against eachother in the textareas. Iam programming this page using asp and javascript.

Any help would be much appreciated.

cheers

Graeme
0
Comment
Question by:graeme_douglas
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
7 Comments
 
LVL 1

Expert Comment

by:ukwebguy
ID: 17850563
The simplist method of comparing the two arrays of words would be to use loops to loop through for every word, check every other word, for example

shared_count = 0

for each word in array_1 ' so for each word in array_1

for each sec_word in array_2 ' we look through each word in array_2 until we find the word

if word = sec_word then

'they are in both arrays
'so add this word to a third array
array_3(shared_count) = word
shared_count = shared_count+1
exit for 'exit the for because we dont need to keep looping because we have found the word

end if

next

next

'then all you need to do is

for each word in array_3
response.write word
next


this is not a very efficient script and i think if you are doing this for a lot of words, you may run into time issues. However the fact that they are both sorted alphabetically means we have reduced the time complexity. You could do some special checking if this becomes a bigger problem to check the first letter of each word in the array as you loop through, so if the first word is "banana" and the second "candle" we know that because the array is sorted we are never going to find any words beginning 'b' after "candle" so we can exit for on the inner loop and move onto the next word. You will need to remember that numbers will effect the results as will non standard characters.


I hope this helps you

Regards,
0
 

Author Comment

by:graeme_douglas
ID: 17851431
Will give this a go and let you know ukwebguy. I did assume it would use a method similar to the one above just wasnt sure exactly how to write it.
0
 

Author Comment

by:graeme_douglas
ID: 17956931
ok i have tried to impliment this with no avail. I have posted the javascript code i have currently below:

function SortText() {
var inText = ("is a idea");
var inText2 = ("is a good idea");
   var inTextArray = inText.split(" ");
   var inTextArray2 = inText2.split(" ");
   inTextArray.sort();
   inTextArray2.sort();

   for(var i = 0; i < inTextArray.length; i++){
   
   if (inTextArray[i]==(inTextArray2[i]))
   {
         document.write(inTextArray[i] + "<BR>");
   }
   else
   {

   document.write("they are not equal" + "<BR>");
   }
   }
}

Problems - I have it lookiing through the two arrays looking for matches. Unfortunatly it doesnt pick up a match unless they are both in the same array element field. How can i get it to look through all of array inTextArray2 for inTextArray[0] then if not match go to inTextArray[1] if a match then go to array 3 and add that [i] element to it.......:S confused
0
Plesk WordPress Toolkit

Plesk's WordPress Toolkit allows server administrators, resellers and customers to manage their WordPress instances, enabling a variety of development workflows for WordPress admins of all skill levels, from beginners to pros.

See why 2/3 of Plesk servers use it.

 
LVL 1

Expert Comment

by:ukwebguy
ID: 17971629
You almost had it.

function SortText() {
var inText = ("is a idea");
var inText2 = ("is a good idea");
   var inTextArray = inText.split(" ");
   var inTextArray2 = inText2.split(" ");
   inTextArray.sort();
   inTextArray2.sort();

   for(var i = 0; i < inTextArray.length; i++){
      for(var j = 0; j <i inTextArray2.length; j++){
         if (inTextArray[i]==inTextArray2[j]){
               document.write(inTextArray[i] + "<BR>");
         }
         else {
               document.write("they are not equal" + "<BR>");
         }
      }
   }
}

There was 1 loop missing, for every element in textArray1 we need to loop through every element in textArray2.
0
 

Author Comment

by:graeme_douglas
ID: 17981573
lol thnx its basically working now ukwebguy. i have 1 last problem, i have it now printing out the shared words in a table but need it 2 also print out the total ammount of times its in both document 1 and document 2. ne ideas? I was thinking of setting a counterArray 1 and a counterArray 2, once a comparison matches between newArray3 (array holding all the shared words) and document1 array, start a total which then takes the next value in the document1 array. if they dont match end the total, paste the total in2 counterArray1 and try the next field. Do you think iam on the correct lines?

Graeme
0
 
LVL 1

Accepted Solution

by:
ukwebguy earned 2000 total points
ID: 18113786
I dont quite follow what you mean, but it is not difficult.

in this section;

 if (inTextArray[i]==inTextArray2[j]){
     document.write(inTextArray[i] + "<BR>");
     totalCount[i] = totalCount[i] + 1
 }

add a new array called TotalCount

That will count how many times the words appear in both documents.


0

Featured Post

Ask an Anonymous Question!

Don't feel intimidated by what you don't know. Ask your question anonymously. It's easy! Learn more and upgrade.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There’s a good reason for why it’s called a homepage – it closely resembles that of a physical house and the only real difference is that it’s online. Your website’s homepage is where people come to visit you. It’s the family room of your website wh…
CTAs encourage people to do something specific to show interest in your company, product or service. Keep reading to learn why CTAs should always be thought of as extremely important, albeit small, sections of websites.
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
The viewer will learn how to count occurrences of each item in an array.
Suggested Courses

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question