Filrering multple duplicates from an array

Posted on 2007-11-30
Medium Priority
Last Modified: 2008-02-01
I am implementing a minimal spanning tree algorithm, and the only thing that I am stuck on is filtering out duplicate characters from an array (in the form of a queue) what I am trying to do, is dequeue the first character, compare it to all the other characters in the array then enqueue all others(by dequeueing) in a new queue. I set new queue equal to the old one in the for loop. here is the method,
public void filter()
              Queue q = new Queue();
              Queue q2 = new Queue();
              Queue q3 = new Queue();
              for(int i = 0; i < data.length; i++) //populate the queue with all vertices
                    q.enqueue(data[i].getB());  //this is how I populate the original
              }                                                                //queue from other parts of the
                                                                                                             //  progam that is working
                                                                //filter out duplicates
              int x=0,y = 0;
                    Object k = q.dequeue();
                    q3.enqueue(k);     //this is suposed to be the queue    
                                                                                              //without duplicates       
                    for(int j = 0; j < q.currentSize; j++)
                          if(!k.equals(q.getFront()))                         }

Question by:intrigue63
  • 2
LVL 12

Accepted Solution

PCableGuy earned 2000 total points
ID: 20386783
Pseudocode below, please write in the language of your choice and test it for accuracy.

While(Q is not empty)
   Object temp = Q.remove();
   while(Q is not empty)
    //This part removes the duplicates 
    If (temp equals Q.peek()) // peek, but don't remove yet
       Q.remove() // If it's equal to temp, just remove it
    }// end of inner while loop
newQ.add(temp); // enqueue temp
}// end of outer while loop

Open in new window

LVL 12

Expert Comment

ID: 20386794
Hi intrigue63,

Please disregard my pseudocode above, nothing is advancing the Q (queue) to the next value in either of the while loops, so as is it won't work. But it might give you a general idea that you need to nest two loops to remove duplicates. Sorry.

Expert Comment

ID: 20388658
If your set of possible characters is small enough, as in ASCII but not
Unicode, you can use an array that stores the queue rank of the
character rather than the character itself.
 1. Create an array of int. The size of the array should be equal to the
      number of possible characters supported by the character set,
      127 in the case of ASCII.
 2. Initialize the array to 0 to indicate character not yet found.
 3. Run through your queue using a counter to keep track of the rank of
     the character as it is popped out.
 4. Using the character value as index into the array, check if the array
     contents are still 0, and if so, store its rank. This is to ensure we retain
     the first occurence of the character and filter out the subsequent
When the loop exits, the array contains the rank of the first occurence
of each character in the queue and 0 for those characters that do not
occur in the queue.
 5. Create a new array of char. The size of the array should be equal
    to the queue size.
 6. Initialize the array to some invalid value not part of the character set,
     say 127 for ASCII. Or use an array of int and initialize to -1. This invalid
     value indicates the entry should be skipped.
 7. Run through the first array using the array index as the character value
     and pick up its contents that represent the rank in the original queue.
 8. Store this information in transposed way in the second array, using the
     rank as index into the array and the character as value to be stored.
When the loop exits, the second array will contain the desired characters
in the desired order with duplicated removed, but will contain some
interspersed invalid values at the locations where the duplicates existed.
 9. Now create a new queue which we will eventually return as the result.
10. Initialize the queue to EMPTY.
11. Run through the second array and check each character so retrieved
      to verify that it is a valid value.
12. Drop invalid values and push valid characters into the queue.
When the loop exits, you will have the desired result in the second queue.
We have avoided nested loops and obtained a linear algorithm. But this
works only for small character spaces like ASCII.

For large character spaces like Unicode, the best we can get is n*log(n).
 1. Create a single two-dimensinal array of (int,char) that will work as a
     table with two columns.
 2. Populate both the character and its queue-rank in successive locations
     of the array.
 3. Sort the array on the char column as primary and queue-rank as
     secondary. The array now has duplicate characters in
     consecutive locations and in the same order as they occured in the
     original queue.
 4. Run through the array and replace the duplicate occurences with an
     invalid value.
 5. Sort the array again, this time on queue-rank, which basically brings
     it back into its original order, but with duplicates replaced by the
     invalid value.
 6. Run through the array and push only the valid character values into
     a result queue that was initialized as EMPTY.
LVL 85

Expert Comment

ID: 20388794
you can do O(n) expected time with a hash table of seen values

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Make the most of your online learning experience.
Today, unlike web development, the mobile landscape is complex enough for a software engineer and Android is posing more challenging environment thanks to its fragmentation issues on hardware and software fronts.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

607 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question