Solved

What is the best data structure for my needs?

Posted on 2004-08-04
20
321 Views
Last Modified: 2010-03-31
Hi,

I have to map integers like this:

1 -> 1
2 -> 9
4 ->15
etc..

they are formed under the following rules:

1) the source number (2,4, etc.) is not necessarily contiguous. it is not necessarily 2,3,4 but can be 2,3,6,7 or something. However they are in ascending order. I mean when I call to add them to data structure the calls will be in ascending order (e.g. add(2,9)... add(4,15) etc.)

2) the result numbers are also in ascending order. e.g. the result of source 4 will be something > 9 which is the result of the source 2

requirements:

- I need efficient lookups of both source -> result and result -> source (e.g. result(2) returns 9 and source(9) returns 2)

- now this is the tricky part... I need to be able to look up "in-between" numbers of the result to get the source. (e.g. anything less than 9 should return 1 and anything >= 9 and < 15 should return 2 and anything >= 15 should return 4.

notes:
- also this data structure can be somewhat large. I'm not sure if there would be performance benefits keeping it as simple primitives instead of objects. Probably there wouldn't be usually more than, say, a few thousand of these mapped pairs.
0
Comment
Question by:ycomp
  • 5
  • 5
  • 4
  • +4
20 Comments
 
LVL 14

Expert Comment

by:sudhakar_koundinya
Comment Utility
listening
0
 
LVL 35

Expert Comment

by:TimYates
Comment Utility
I'd just use two HashTables one for result, and one for source.  Store the numbers as integers...

if the source numbers are going to be iterated through, and the ordering matters, I'd use two TreeMaps
0
 
LVL 35

Expert Comment

by:TimYates
Comment Utility
public void add( int a, int b )
{
    result.add( new Integer( a ), new Integer( b ) ) ;
    source.add( new Integer( b ), new Integer( a ) ) ;
}

Then, do

System.out.println( result.get( new Integer( 2 ) ) ) ;
0
 
LVL 14

Expert Comment

by:sudhakar_koundinya
Comment Utility
>>if the source numbers are going to be iterated through, and the ordering matters, I'd use two TreeMaps

I thought the same. But will it be the best??
0
 
LVL 16

Expert Comment

by:imladris
Comment Utility
At a few thousand pairs, efficiency of lookup is a very minor consideration. Even with a brute force linear search on most "array" kinds of structures, a user will not be able to notice a delay.

So my advice would be to keep it simple (and thus easy to debug and maintain). Isolate the data structure and its search method in a separate object. Implement it using the simple approach. If it turns out, now or in the future, that it is too slow, you can easily upgrade it to something more complicated and faster.

The simple approach would be arrays (or maybe Vectors if the size needs to change a lot), and a linear search. Next step up, for speed, would be to implement a binary search.

The "in-between" requirement should be relatively simple. Search for the number equal to or greater than the target. If the resulting number is greater than the target, get the "previous" result.
0
 

Author Comment

by:ycomp
Comment Utility
but how would I get, say, source(11)? which should give back 2 using the in-between logic I described above?
0
 
LVL 7

Expert Comment

by:mark-b
Comment Utility
I see we all had the same thought about TreeMap.  Anyway, here is my solution:

      class CustomTreeMap extends TreeMap {
            private TreeMap reverseTree = new TreeMap();
            public CustomTreeMap() {
                  super();
            }
            
            public int source( int result ) {
                  Integer res = new Integer( result );
                  Integer source = (Integer) reverseTree.get( res );
                  
                  if ( source != null ) {
                        return source.intValue();
                  }

                  SortedMap sm = reverseTree.subMap( new Integer(0), res );
                  Integer last = (Integer) sm.lastKey();
                  
                  return last == null ? -1 : last.intValue();
            }

            public Object put( int source, int result ) {
                  Integer src = new Integer( source );
                  Integer res = new Integer( result );
                  reverseTree.put( res, src );
                  return put( src, res );
            }
            
            public int result( int source ) {
                  Integer src = new Integer( source );
                  Integer result = (Integer) get( src );
                  
                  if ( result != null ) {
                        return result.intValue();
                  }

                  SortedMap sm = subMap( new Integer(0), src );
                  Integer last = (Integer) sm.lastKey();
                  
                  return last == null ? -1 : last.intValue();
            }
      }


And the test:


CustomTreeMap map = new CustomTreeMap();
map.put( 1, 1 );
map.put( 2, 9 );
map.put( 4, 15 );
map.put( 11, 21 );
            
System.out.println( "source of 9: " + map.source( 9 ) );
System.out.println( "result of 2: " + map.result( 2 ) );
            
System.out.println( "closest source to 8: " + map.source( 8 ) );
System.out.println( "closest result to 17: " + map.result( 17 ) );

-Mark
0
 

Author Comment

by:ycomp
Comment Utility
imladris, yes the array style is how I thought to implement it but I thought that it might be too slow to have to iterate so much each time a lookup is done. But if you are saying that it doesn't really affect the speed then that is the way I'll go. Sometimes could have 10,000 pairs or so but not usually more.
0
 
LVL 35

Expert Comment

by:TimYates
Comment Utility
> But if you are saying that it doesn't really affect the speed then that is the way I'll go.

Your problem with arrays will be if it is a dynamic list...as soon as you want to insert a number pair at position #3, you will  take an increasing performance hit
0
 

Author Comment

by:ycomp
Comment Utility
mark-b, your answer looks really cool... I'm gonna test it out.

Tim, I guess I forgot to mention that I will know the number of pairs when I initially construct the array.

But... so if we are talking max 10,000 pairs and I will know the number of pairs at the beginning then choosing between the array approach and the Tree-Map approach is simply a matter of style? That the performance difference using both approaches will be negligible?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 11

Expert Comment

by:bcladd
Comment Utility
Just a thought:
An array implementation with binary search should be easy to implement and understand.

Binary search on either array is possible because you know, going into it, that the arrays are sorted.

HTH, -bcl
0
 
LVL 7

Accepted Solution

by:
JugglerW earned 250 total points
Comment Utility
Try following code. I think simple int[] are best and fast:

========= Save as Mapper.java =============
public class Mapper
{
    int[] source;
    int[] result;
    int   size = 0;    

    public Mapper( )
    {
        this( 100 );
    }

    public Mapper( int initialSize )
    {
        if ( initialSize < 1 )
            initialSize = 100;
           
        source = new int[ initialSize ];
        result = new int[ initialSize ];
    }
   
    public void append( int src, int res )
    {
        if ( size > 0 && ( ( source[ size - 1 ] >= src ) || ( result[ size -1 ] >= res ) ) )
            throw new IllegalArgumentException("numbers added are not ascending!");
             
        if ( size == source.length )
            grow();
           
        source[ size ] = src;
        result[ size ] = res;
        size++;                                    
    }
   
    private void grow()
    {
        int[] osource = source;
        int[] oresult = result;
       
        source = new int[ osource.length << 1 ];
        result = new int[ oresult.length << 1 ];
       
        System.arraycopy(osource, 0, source, 0, osource.length);
        System.arraycopy(oresult, 0, result, 0, oresult.length);
    }
   
    private int binarySearch(int[] a, int key, int end)
    {
    int low = 0;
    int high = end-1;

    while (low <= high) {
        int mid = (low + high) >> 1;
        int midVal = a[mid];

        if (midVal < key)
        low = mid + 1;
        else if (midVal > key)
        high = mid - 1;
        else
        return mid; // key found
    }
    return -(low + 1);  // key not found.
    }    
   
    public int getResult( int src )
    {
        int index = binarySearch(source, src, size);
       
        return index >= 0 ? result[ index ] : result[ -index - 2 ];
    }

    public int getSource( int res )
    {
        int index = binarySearch(result, res, size);
       
        return index >= 0 ? source[ index ] : source[ -index - 2 ];
    }
   
    public void dump()
    {
        for ( int idx = 0; idx < size; idx++ )
            System.out.println( "(" + source[ idx ] + "," + result[ idx ] + ")" );
    }
    public static void main(String[] args)
    {
        Mapper mapper = new Mapper();
       
        mapper.append(1, 1);
        mapper.append(2, 9);
        mapper.append(4, 15);
       
        mapper.dump();
       
        System.out.println( "result(1) = " + mapper.getResult(1) );
        System.out.println( "result(2) = " + mapper.getResult(2) );
        System.out.println( "result(3) = " + mapper.getResult(3) );
        System.out.println( "result(4) = " + mapper.getResult(4) );
        System.out.println( "result(5) = " + mapper.getResult(5) );

        System.out.println( "source(1) = " + mapper.getSource(1) );
        System.out.println( "source(2) = " + mapper.getSource(2) );
        System.out.println( "source(3) = " + mapper.getSource(3) );
        System.out.println( "source(4) = " + mapper.getSource(4) );
        System.out.println( "source(5) = " + mapper.getSource(5) );
        System.out.println( "source(6) = " + mapper.getSource(6) );
        System.out.println( "source(7) = " + mapper.getSource(7) );
        System.out.println( "source(8) = " + mapper.getSource(8) );
        System.out.println( "source(9) = " + mapper.getSource(9) );
        System.out.println( "source(10) = " + mapper.getSource(10) );
        System.out.println( "source(11) = " + mapper.getSource(11) );
        System.out.println( "source(12) = " + mapper.getSource(12) );
        System.out.println( "source(13) = " + mapper.getSource(13) );
        System.out.println( "source(14) = " + mapper.getSource(14) );
        System.out.println( "source(15) = " + mapper.getSource(15) );
        System.out.println( "source(16) = " + mapper.getSource(16) );
    }
}
============= End Mapper.java ===========

Prints following results:

(1,1)
(2,9)
(4,15)
result(1) = 1
result(2) = 9
result(3) = 9
result(4) = 15
result(5) = 15
source(1) = 1
source(2) = 1
source(3) = 1
source(4) = 1
source(5) = 1
source(6) = 1
source(7) = 1
source(8) = 1
source(9) = 2
source(10) = 2
source(11) = 2
source(12) = 2
source(13) = 2
source(14) = 2
source(15) = 4
source(16) = 4

Hope this helps
0
 
LVL 35

Expert Comment

by:TimYates
Comment Utility
> That the performance difference using both approaches will be negligible?

TreeMaps will perform better than brute force searching your array...

They will use more memory, but nothing to worry about for only 10K pairs...
0
 
LVL 7

Expert Comment

by:mark-b
Comment Utility
Hmm.  I made a mistake in my CustomTreeMap by returning the keys, not the values when doing 'closest to'.  My bad.  Here is a new CustomTreeMap class:

public class CustomTreeMap extends TreeMap {
        private TreeMap reverseTree = new TreeMap();

        public CustomTreeMap() {
             super();
        }
       
        public int source( int result ) {
               return find( reverseTree, result );
        }
       
        public int result( int source ) {
              return find( this, source );
        }
       
        public int find( TreeMap map, int key ) {
             Integer res = new Integer( key );
             Integer source = (Integer) map.get( res );
             
             if ( source != null ) {
                  return source.intValue();
             }

             SortedMap sm = map.subMap( new Integer(0), res );
             Integer last = (Integer) sm.lastKey();
             
             return last == null ? -1 : ( (Integer) get( last ) ).intValue();
        }

        public Object put( int source, int result ) {
             Integer src = new Integer( source );
             Integer res = new Integer( result );
             reverseTree.put( res, src );
             return put( src, res );
        }
   }

-Mark
0
 
LVL 7

Expert Comment

by:mark-b
Comment Utility
Son of a #$@!!  Use this find method instead of the last.  :)

        public int find( TreeMap map, int key ) {
             Integer res = new Integer( key );
             Integer keyVal = (Integer) map.get( res );
             
             if ( keyVal != null ) {
                  return keyVal.intValue();
             }

             SortedMap sm = map.subMap( new Integer(0), res );
             Integer last = (Integer) sm.lastKey();
             
             return last == null ? -1 : ( (Integer) map.get( last ) ).intValue();
        }

I think this one is right.

-Mark
0
 
LVL 7

Expert Comment

by:JugglerW
Comment Utility
Use this main() in my code above to generate 10000 pairs.

    public static void main(String[] args)
    {
        Mapper mapper = new Mapper();

        int lastSrc = 0, lastRes = 0;
       
        for ( int idx = 0; idx < 10000; idx++ )
        {
            int src = lastSrc + (int)(20 * Math.random()) + 1;    
            int res = lastRes + (int)(20 * Math.random()) + 1;
            lastSrc = src;
            lastRes = res;
            mapper.append(src, res);    
        }
               
        mapper.dump(); // this last long :-)
     
        System.out.println( "result(9999) = " + mapper.getResult(9999) );  
        System.out.println( "source(9999) = " + mapper.getSource(9999) );  
    }
0
 
LVL 7

Assisted Solution

by:mark-b
mark-b earned 250 total points
Comment Utility
I don't think speed is an issue.  My code with JugglerW's test (modified to use my class and put in timestamps), returns this:

 Speed test: start at Wed Aug 04 21:55:31 GMT 2004
 populate took 0.094 seconds
 result(9999) = 9806
 source(9999) = 10148
 fetch took 0.0 seconds
 speed test:  end at Wed Aug 04 21:55:31 GMT 2004

I modified his test like this:

-----------
CustomTreeMap map = new CustomTreeMap();
            
int lastSrc = 0, lastRes = 0;
              
for ( int idx = 0; idx < 10000; idx++ ) {
    int src = lastSrc + (int)(20 * Math.random()) + 1;    
    int res = lastRes + (int)(20 * Math.random()) + 1;
    lastSrc = src;
    lastRes = res;
    map.put(src, res);    
}
              
long populate = System.currentTimeMillis();
System.out.println( "populate took " + ( (double) ( populate - mark.getTime() ) / 1000 ) + " seconds" );
                      
System.out.println( "result(9999) = " + map.result(9999) );  
System.out.println( "source(9999) = " + map.source(9999) );
              
System.out.println( "fetch took " + ( (double) ( System.currentTimeMillis() - populate ) / 1000 ) + " seconds" );
System.out.println( "speed test:  end at " + new Date() );
----------

I ran it about five times and had very little (+ or - 12ms) variance to the above results.  

I did not run any tests on on JugglerW code...

-Mark
0
 
LVL 7

Expert Comment

by:mark-b
Comment Utility
Oh.. I missed two lines at the beginning of the test (cut and paste):

Date mark = new Date();
System.out.println( "Speed test: start at " + mark );
            
-Mark
0
 

Author Comment

by:ycomp
Comment Utility
mark & Juggler your code is both great. I think I'll go with Juggler's for now since I'd rather stick with primitives than objects for now I guess. But I definitely learned something from the TreeMap approach.
0
 
LVL 35

Expert Comment

by:TimYates
Comment Utility
:-(
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

By the end of 1980s, object oriented programming using languages like C++, Simula69 and ObjectPascal gained momentum. It looked like programmers finally found the perfect language. C++ successfully combined the object oriented principles of Simula w…
Introduction This article is the first of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article explains our test automation goals. Then rationale is given for the tools we use to a…
The viewer will learn how to implement Singleton Design Pattern in Java.
This video teaches viewers about errors in exception handling.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now