Solved

remove duplicate array objects

Posted on 2004-10-20
9
1,689 Views
Last Modified: 2012-08-14
how do i remove duplicate array objects?

thanks
0
Comment
Question by:jmc430
9 Comments
 
LVL 16

Expert Comment

by:imladris
Comment Utility
The simple brute force tactic is to have (or create) a second array. Then copy elements across, from one array to the other, while checking if they are duplicate. In psuedo code something like:

Array withdup
Array nodup
int len  // length of array
int counter // count if items in nodup array
for all elements of withdup
    get the current value
    for all elements of nodup
         compare value to current element
    if current value not found in nodup array
        counter = counter + 1   // add value to nodup array
        nodup[counter]=current value

0
 
LVL 55

Expert Comment

by:Jaime Olivares
Comment Utility
Which language?
If your data is organized in an unordered contiguos array, then there is not a "best method" to eliminate duplicates, as imladris said.
Some programming languages like C++ allows you to store your information in more efficient structures like maps and binary trees, that allows you to easily detect and discard duplicates and even avoid to insert a duplicate item in an array.
Also, if your information comes from a database, there is always an option to retrieve data without duplicates.
0
 

Author Comment

by:jmc430
Comment Utility
Java ..

i've got two nsarrays .. and i added them together into one nsmutable set.   i thought using addObjectsFromArray would prevent identical objects from being added to the set.  

NSMutableSet ms = new NSMutableSet();
ms.addObjectsFromArray(a);
ms.addObjectsFromArray(b);

how do i remove the identical objects?  i tried minusSet but it didn't work..

0
 
LVL 8

Expert Comment

by:sigmacon
Comment Utility
The first problem you have to define us what you consider to be 'identical objects'. To you mean strings that are equal. Objects that have the same hash code? Once that is clear, a filter may be devised to get rid of those 'duplicates'. Second, in a mathematical sense a set is just a bunch of stuff, it can be a bunch of the same stuff. So the NSMutableSet implementation is not going to define some form of identity (comparability, countability) derived from the fact that it belongs to this set. Even the order in which you would read out the objects is meaningless. If you DO care about some identity, e.g. from the index of an object in an array, use NSArray.addObjectsFromArray(...) instead. This would still not remove 'identical object', but if you let us know what you mean with 'identical', I am sure the fix for that is easy. BTW, are you developing on a Mac? What are you interfacing with? Cocoa, webobjects, objective-c? Since you are referring to minusSet, I am guessing objective-c?

Second thought: Do you mean with identical 'the same' object? In java that would mean the simple == operator. But then you really should not use NSMutableSet BEFORE you removed duplicates, because trying to do removeObject() on the set would remove ALL references in the set, if that is in fact how the set is implemented (as a bucket of references to objects, which sounds most reasonable). The simplest and most computation-intensive solution is then to loop through array a and see whether each one of its objects is in array b, and if so, remove it in array b. Then add both arrays to the set.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 

Author Comment

by:jmc430
Comment Utility
I'm using Webobjects with Java.  I referred to minusSet because I was reading about NSSet terminology..

My "identical objects" are Department objects.  I'm trying to access all the Departments a Person is affiliated with, as a Department Member and also as a Department Administrator.  There are two arrays connecting a Person to a Department: "deptAdmins" and "affiliations".  My identical objects are the NSArrays that compose the these Person relationships to Department, "Affiliations_Department" and "DeptAdmins_Department".   These are Department  objects, connected through two different relationships, but Department objects nonetheless.

My function is to getAllDepartments is as follows:

public NSArray getAllDepartments() {
     NSArray a = (NSArray)storedValueForKey("affiliations_department");
     NSArray b = (NSArray)storedValueForKey("deptAdmins_department");
     
       if (a == null && b ==null)
        return NSArray.EmptyArray;
       
        NSMutableSet ms = new NSMutableSet();
        ms.addObjectsFromArray(a);
        ms.addObjectsFromArray(b);
        NSArray originalListWithDuplicates = ms.allObjects();
        NSSet uniqueingSet=new NSSet(originalListWithDuplicates);
        NSArray newListWithoutDuplicates=uniqueingSet.allObjects();
     
        EOSortOrdering ordering1 = new EOSortOrdering("departmentName", EOSortOrdering.CompareAscending);
        NSMutableArray sortOrdering = new NSMutableArray();
        sortOrdering.addObject(ordering1);
        return EOSortOrdering.sortedArrayUsingKeyOrderArray(newListWithoutDuplicates, sortOrdering);
     
    }

How would I implement the loop through array and remove it from array b?  

Am I completely off the mark on my implementation?

Thanks for all your help!
jamie
0
 
LVL 8

Expert Comment

by:sigmacon
Comment Utility
Not completely off the mark. I don't think the following lines from your code will actually remove 'duplicates':

NSArray originalListWithDuplicates = ms.allObjects();
NSSet uniqueingSet=new NSSet(originalListWithDuplicates);
NSArray newListWithoutDuplicates=uniqueingSet.allObjects();

You still have not answered my question to what makes two objects in your set identical, but from your description I am assuming that you want unique Department objects. So one department object would then be 'identical' to another one if both have the same departmentName. It would be better to use some form of key or id, if you have it available, e.g. maybe  departId.

After

if (a == null && b ==null)
        return NSArray.EmptyArray;

Try something like this (this semi-pseude-code is not checked for syntax errors, I am assuming the class name for your departments is Department)

Department[] depsA = a.objects();
Department[] depsB = b.objects();
NSMutableSet ms = new NSMutableSet();

int i;
int j;

for (i = 0; i < depsA.length; i++) {
  for (j = 0; j < depsB.length; j++) {
     ms.addObject(depsA[i]);
     if (depsB[j] != null) {
       if (!depsA[i].departmentName.equals(depsB[j].departmentName)) {
          ms.addObject(depsB[j]);
          depsB[j] = null; // setting null to not to look at this department in B again.
       }
     }
  }
}

You may have to modify this to make it compile and actually work for you.
0
 

Author Comment

by:jmc430
Comment Utility
i slightly modified your code .. but it seems to still be retrieving two identical objects (yes, it is when the department name and the division name are identical).  it compiles, but does not work to remove the duplicate department names and duplicate division names.  what am i doing incorrectly?

        NSArray a = (NSArray)storedValueForKey("affiliations_department");
        NSArray b = (NSArray)storedValueForKey("deptAdmins_department");
        java.util.Enumeration ae = a.objectEnumerator();
        java.util.Enumeration be = b.objectEnumerator();
        String deps = "";
        int i;
        int j;
       
        NSMutableSet ms = new NSMutableSet();
       
        if (a == null && b ==null)
            return NSArray.EmptyArray;
       
        for (i=0;i<a.objects().length;i++){
            Department dep   = (Department)a.objectAtIndex(i);  
            for(j=0;j<b.objects().length;j++){
                Department depb = (Department)b.objectAtIndex(j);          
                ms.addObject(a.objectAtIndex(i));
                if(b.objects()!=null){
                    if(!dep.departmentName().equals(depb.departmentName())){
                        /ms.addObject(b.objectAtIndex(j));  
                    }
                }
            }
        }    
thanks so much for your help!
0
 
LVL 8

Accepted Solution

by:
sigmacon earned 250 total points
Comment Utility
One mistake I made was that ms.addObject(depsA[i]); should be outside the inner loop:

for (i = 0; i < depsA.length; i++) {
  ms.addObject(depsA[i]);
  for (j = 0; j < depsB.length; j++) {
     if (depsB[j] != null) {
       if (!depsA[i].departmentName.equals(depsB[j].departmentName)) {
          ms.addObject(depsB[j]);
          depsB[j] = null; // setting null to not to look at this department in B again.
       }
     }
  }
}

But that's not the only mistake in my algorithm. In addition to algorithm problems, I also missed the cast that you added properly. You have to remember that I do not have a Mac and can't compile this code! See my fix after my comments regarding what you posted.

Otherwise, I think your (quite extensive) rewrite may have broken the already broken algorithm even more ... maybe. Calling a.objects() or b.objects() gets you a fresh COPY of ALL THE CONTENTS in the array EVERY TIME YOU CALL IT. b.objects() thus will never be null and you will always add its referenced object - thus you are running the same comparison more then once and adding duplicates if they match. Another problem I see is this line:

 /ms.addObject(b.objectAtIndex(j));

Is it commented out? Does that even compile with a single / in front of it?

The enumerations you get from the NSArrays are never used, neither is the string deps, so you should remove all those. Let me try to explain my thinking in pseudo-code and see whether this may help. Before I start, this warning: My approach does NOT REMOVE DUPLICATES WITHIN THE SAME ARRAY!

NSArray is not mutable, so you cannot modify it directly. My approach relies on a modification of the source array, and array copies are in general fast, so I
-> get a copy of the list of Departments in form of an array from both NSArrays
then
-> create an empty Mutable Set for the new elements

The comparison (search for duplicates) then is performed with two nested joins, which is often called the 'brute-force' solution, but I want to keep this managable:

For each Department in Array A
    Add it to the new set
    Compare (the name and division of) each
          Department in Array B that is not null
          with the CURRENT Department from Array A
               and, IF THEY MATCH mark it as duplicate by setting this Department to null

For each Department in Array B
    That has not been set to null (was a duplicate), add it to the set

-- in Java -----------------------------------

for (i = 0; i < depsA.length; i++) {
  ms.addObject(depsA[i]);
  for (j = 0; j < depsB.length; j++) {
     if (depsB[j] != null) {
       if (! ((Department)depsA[i]).departmentName.equals(((Department)depsB[j].)departmentName) ) { // you'll need to add the check for division
          depsB[j] = null; // setting null because this is a duplicate
       }
     }
  }
}

// now add remaining objects from the second list
for (j = 0; j < depsB.length; j++) {
   if (depsB[j] != null) { // if this is still a valid department
      ms.addObject(depsB[j]);
   }
}

Let me know where the compilation of this code breaks before you change significant parts of it. I hope the time we both spend on this is worth it!
0
 

Author Comment

by:jmc430
Comment Utility
hi sigmacon,

thanks for all of your help!  i tinkered around with your code and this is what i came up with ...

public NSArray getAllDepartments{
        NSArray a = (NSArray)storedValueForKey("affiliations_department");
        NSArray b = (NSArray)storedValueForKey("deptAdmins_department");
        java.util.Enumeration ae = a.objectEnumerator();
        java.util.Enumeration be = b.objectEnumerator();
        String deps = "";
        int i;
        int j;
       
        if (a == null && b ==null)
            return NSArray.EmptyArray;
       
        NSMutableSet ms = new NSMutableSet();

        for (i=0;i<a.objects().length;i++){
            ms.addObjectsFromArray(a);
            Department dep   = (Department)a.objectAtIndex(i);
            NSArray results = a;

            if (results.count() > 2){
                if(dep.divisionName().equals(dep.divisionName()));
                ms.removeObject(a.objectAtIndex(i));
            }
            else if (results.count() == 2){
                Department depF   = (Department)a.objectAtIndex(0);
                Department depL   = (Department)a.objectAtIndex(1);
           
                if(depF.departmentName().equals(depL.departmentName())){
                    if(depF.divisionName().equals(depL.divisionName())){//&& (depF.divisionName().equals(depL.divisionName())));
                    ms.removeObject(a.objectAtIndex(i));
}
                    else if (!(depF.divisionName().equals(depL.divisionName()))){        
                    }
               }
            }
             
            for(j=0;j<b.objects().length;j++){
               
                Department depb = (Department)b.objectAtIndex(j);        
                if(b.objects()!=null){
                    if((dep.departmentName().equals(dep.departmentName())) && (dep.divisionName().equals(dep.divisionName()))){
                        NSArray bresults = b;
                        if (bresults.count() == 2){
                            ms.removeObject(b.objectAtIndex(j));
                        }
                    }
                    if(!(dep.departmentName().equals(depb.departmentName()))){
                        ms.addObject(b.objectAtIndex(j));
                        isAdminDEPT =  true;    
                    }
                }
            }
        }    
 
        NSArray originalListWithDuplicates = ms.allObjects();
        EOSortOrdering ordering1 = new EOSortOrdering("departmentName", EOSortOrdering.CompareAscending);
        NSMutableArray sortOrdering = new NSMutableArray();
        sortOrdering.addObject(ordering1);
        return EOSortOrdering.sortedArrayUsingKeyOrderArray(originalListWithDuplicates, sortOrdering);
    }

  i followed your sample code .. i'm not sure if it's correct .. what do you think?

  for some reason i couldn't set depsA[i] and depsB[j] like you had shown me .. this was my workaround.

  thanks again for your help!  i really appreciate it!!

  best regards,
  jamie
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Suggested Solutions

RIA (Rich Internet Application) tools are interactive internet applications which have many of the characteristics of desktop applications. The RIA tools typically deliver output either by the way of a site-specific browser or via browser plug-in. T…
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now