# What is the best data structure for my needs?

Hi,

I have to map integers like this:

1 -> 1
2 -> 9
4 ->15
etc..

they are formed under the following rules:

1) the source number (2,4, etc.) is not necessarily contiguous. it is not necessarily 2,3,4 but can be 2,3,6,7 or something. However they are in ascending order. I mean when I call to add them to data structure the calls will be in ascending order (e.g. add(2,9)... add(4,15) etc.)

2) the result numbers are also in ascending order. e.g. the result of source 4 will be something > 9 which is the result of the source 2

requirements:

- I need efficient lookups of both source -> result and result -> source (e.g. result(2) returns 9 and source(9) returns 2)

- now this is the tricky part... I need to be able to look up "in-between" numbers of the result to get the source. (e.g. anything less than 9 should return 1 and anything >= 9 and < 15 should return 2 and anything >= 15 should return 4.

notes:
- also this data structure can be somewhat large. I'm not sure if there would be performance benefits keeping it as simple primitives instead of objects. Probably there wouldn't be usually more than, say, a few thousand of these mapped pairs.
###### Who is Participating?

Commented:
Try following code. I think simple int[] are best and fast:

========= Save as Mapper.java =============
public class Mapper
{
int[] source;
int[] result;
int   size = 0;

public Mapper( )
{
this( 100 );
}

public Mapper( int initialSize )
{
if ( initialSize < 1 )
initialSize = 100;

source = new int[ initialSize ];
result = new int[ initialSize ];
}

public void append( int src, int res )
{
if ( size > 0 && ( ( source[ size - 1 ] >= src ) || ( result[ size -1 ] >= res ) ) )
throw new IllegalArgumentException("numbers added are not ascending!");

if ( size == source.length )
grow();

source[ size ] = src;
result[ size ] = res;
size++;
}

private void grow()
{
int[] osource = source;
int[] oresult = result;

source = new int[ osource.length << 1 ];
result = new int[ oresult.length << 1 ];

System.arraycopy(osource, 0, source, 0, osource.length);
System.arraycopy(oresult, 0, result, 0, oresult.length);
}

private int binarySearch(int[] a, int key, int end)
{
int low = 0;
int high = end-1;

while (low <= high) {
int mid = (low + high) >> 1;
int midVal = a[mid];

if (midVal < key)
low = mid + 1;
else if (midVal > key)
high = mid - 1;
else
return mid; // key found
}
}

public int getResult( int src )
{
int index = binarySearch(source, src, size);

return index >= 0 ? result[ index ] : result[ -index - 2 ];
}

public int getSource( int res )
{
int index = binarySearch(result, res, size);

return index >= 0 ? source[ index ] : source[ -index - 2 ];
}

public void dump()
{
for ( int idx = 0; idx < size; idx++ )
System.out.println( "(" + source[ idx ] + "," + result[ idx ] + ")" );
}
public static void main(String[] args)
{
Mapper mapper = new Mapper();

mapper.append(1, 1);
mapper.append(2, 9);
mapper.append(4, 15);

mapper.dump();

System.out.println( "result(1) = " + mapper.getResult(1) );
System.out.println( "result(2) = " + mapper.getResult(2) );
System.out.println( "result(3) = " + mapper.getResult(3) );
System.out.println( "result(4) = " + mapper.getResult(4) );
System.out.println( "result(5) = " + mapper.getResult(5) );

System.out.println( "source(1) = " + mapper.getSource(1) );
System.out.println( "source(2) = " + mapper.getSource(2) );
System.out.println( "source(3) = " + mapper.getSource(3) );
System.out.println( "source(4) = " + mapper.getSource(4) );
System.out.println( "source(5) = " + mapper.getSource(5) );
System.out.println( "source(6) = " + mapper.getSource(6) );
System.out.println( "source(7) = " + mapper.getSource(7) );
System.out.println( "source(8) = " + mapper.getSource(8) );
System.out.println( "source(9) = " + mapper.getSource(9) );
System.out.println( "source(10) = " + mapper.getSource(10) );
System.out.println( "source(11) = " + mapper.getSource(11) );
System.out.println( "source(12) = " + mapper.getSource(12) );
System.out.println( "source(13) = " + mapper.getSource(13) );
System.out.println( "source(14) = " + mapper.getSource(14) );
System.out.println( "source(15) = " + mapper.getSource(15) );
System.out.println( "source(16) = " + mapper.getSource(16) );
}
}
============= End Mapper.java ===========

Prints following results:

(1,1)
(2,9)
(4,15)
result(1) = 1
result(2) = 9
result(3) = 9
result(4) = 15
result(5) = 15
source(1) = 1
source(2) = 1
source(3) = 1
source(4) = 1
source(5) = 1
source(6) = 1
source(7) = 1
source(8) = 1
source(9) = 2
source(10) = 2
source(11) = 2
source(12) = 2
source(13) = 2
source(14) = 2
source(15) = 4
source(16) = 4

Hope this helps
0

Commented:
listening
0

Commented:
I'd just use two HashTables one for result, and one for source.  Store the numbers as integers...

if the source numbers are going to be iterated through, and the ordering matters, I'd use two TreeMaps
0

Commented:
public void add( int a, int b )
{
result.add( new Integer( a ), new Integer( b ) ) ;
source.add( new Integer( b ), new Integer( a ) ) ;
}

Then, do

System.out.println( result.get( new Integer( 2 ) ) ) ;
0

Commented:
>>if the source numbers are going to be iterated through, and the ordering matters, I'd use two TreeMaps

I thought the same. But will it be the best??
0

Commented:
At a few thousand pairs, efficiency of lookup is a very minor consideration. Even with a brute force linear search on most "array" kinds of structures, a user will not be able to notice a delay.

So my advice would be to keep it simple (and thus easy to debug and maintain). Isolate the data structure and its search method in a separate object. Implement it using the simple approach. If it turns out, now or in the future, that it is too slow, you can easily upgrade it to something more complicated and faster.

The simple approach would be arrays (or maybe Vectors if the size needs to change a lot), and a linear search. Next step up, for speed, would be to implement a binary search.

The "in-between" requirement should be relatively simple. Search for the number equal to or greater than the target. If the resulting number is greater than the target, get the "previous" result.
0

Author Commented:
but how would I get, say, source(11)? which should give back 2 using the in-between logic I described above?
0

Commented:
I see we all had the same thought about TreeMap.  Anyway, here is my solution:

class CustomTreeMap extends TreeMap {
private TreeMap reverseTree = new TreeMap();
public CustomTreeMap() {
super();
}

public int source( int result ) {
Integer res = new Integer( result );
Integer source = (Integer) reverseTree.get( res );

if ( source != null ) {
return source.intValue();
}

SortedMap sm = reverseTree.subMap( new Integer(0), res );
Integer last = (Integer) sm.lastKey();

return last == null ? -1 : last.intValue();
}

public Object put( int source, int result ) {
Integer src = new Integer( source );
Integer res = new Integer( result );
reverseTree.put( res, src );
return put( src, res );
}

public int result( int source ) {
Integer src = new Integer( source );
Integer result = (Integer) get( src );

if ( result != null ) {
return result.intValue();
}

SortedMap sm = subMap( new Integer(0), src );
Integer last = (Integer) sm.lastKey();

return last == null ? -1 : last.intValue();
}
}

And the test:

CustomTreeMap map = new CustomTreeMap();
map.put( 1, 1 );
map.put( 2, 9 );
map.put( 4, 15 );
map.put( 11, 21 );

System.out.println( "source of 9: " + map.source( 9 ) );
System.out.println( "result of 2: " + map.result( 2 ) );

System.out.println( "closest source to 8: " + map.source( 8 ) );
System.out.println( "closest result to 17: " + map.result( 17 ) );

-Mark
0

Author Commented:
imladris, yes the array style is how I thought to implement it but I thought that it might be too slow to have to iterate so much each time a lookup is done. But if you are saying that it doesn't really affect the speed then that is the way I'll go. Sometimes could have 10,000 pairs or so but not usually more.
0

Commented:
> But if you are saying that it doesn't really affect the speed then that is the way I'll go.

Your problem with arrays will be if it is a dynamic list...as soon as you want to insert a number pair at position #3, you will  take an increasing performance hit
0

Author Commented:

Tim, I guess I forgot to mention that I will know the number of pairs when I initially construct the array.

But... so if we are talking max 10,000 pairs and I will know the number of pairs at the beginning then choosing between the array approach and the Tree-Map approach is simply a matter of style? That the performance difference using both approaches will be negligible?
0

Commented:
Just a thought:
An array implementation with binary search should be easy to implement and understand.

Binary search on either array is possible because you know, going into it, that the arrays are sorted.

HTH, -bcl
0

Commented:
> That the performance difference using both approaches will be negligible?

TreeMaps will perform better than brute force searching your array...

They will use more memory, but nothing to worry about for only 10K pairs...
0

Commented:
Hmm.  I made a mistake in my CustomTreeMap by returning the keys, not the values when doing 'closest to'.  My bad.  Here is a new CustomTreeMap class:

public class CustomTreeMap extends TreeMap {
private TreeMap reverseTree = new TreeMap();

public CustomTreeMap() {
super();
}

public int source( int result ) {
return find( reverseTree, result );
}

public int result( int source ) {
return find( this, source );
}

public int find( TreeMap map, int key ) {
Integer res = new Integer( key );
Integer source = (Integer) map.get( res );

if ( source != null ) {
return source.intValue();
}

SortedMap sm = map.subMap( new Integer(0), res );
Integer last = (Integer) sm.lastKey();

return last == null ? -1 : ( (Integer) get( last ) ).intValue();
}

public Object put( int source, int result ) {
Integer src = new Integer( source );
Integer res = new Integer( result );
reverseTree.put( res, src );
return put( src, res );
}
}

-Mark
0

Commented:
Son of a #\$@!!  Use this find method instead of the last.  :)

public int find( TreeMap map, int key ) {
Integer res = new Integer( key );
Integer keyVal = (Integer) map.get( res );

if ( keyVal != null ) {
return keyVal.intValue();
}

SortedMap sm = map.subMap( new Integer(0), res );
Integer last = (Integer) sm.lastKey();

return last == null ? -1 : ( (Integer) map.get( last ) ).intValue();
}

I think this one is right.

-Mark
0

Commented:
Use this main() in my code above to generate 10000 pairs.

public static void main(String[] args)
{
Mapper mapper = new Mapper();

int lastSrc = 0, lastRes = 0;

for ( int idx = 0; idx < 10000; idx++ )
{
int src = lastSrc + (int)(20 * Math.random()) + 1;
int res = lastRes + (int)(20 * Math.random()) + 1;
lastSrc = src;
lastRes = res;
mapper.append(src, res);
}

mapper.dump(); // this last long :-)

System.out.println( "result(9999) = " + mapper.getResult(9999) );
System.out.println( "source(9999) = " + mapper.getSource(9999) );
}
0

Commented:
I don't think speed is an issue.  My code with JugglerW's test (modified to use my class and put in timestamps), returns this:

Speed test: start at Wed Aug 04 21:55:31 GMT 2004
populate took 0.094 seconds
result(9999) = 9806
source(9999) = 10148
fetch took 0.0 seconds
speed test:  end at Wed Aug 04 21:55:31 GMT 2004

I modified his test like this:

-----------
CustomTreeMap map = new CustomTreeMap();

int lastSrc = 0, lastRes = 0;

for ( int idx = 0; idx < 10000; idx++ ) {
int src = lastSrc + (int)(20 * Math.random()) + 1;
int res = lastRes + (int)(20 * Math.random()) + 1;
lastSrc = src;
lastRes = res;
map.put(src, res);
}

long populate = System.currentTimeMillis();
System.out.println( "populate took " + ( (double) ( populate - mark.getTime() ) / 1000 ) + " seconds" );

System.out.println( "result(9999) = " + map.result(9999) );
System.out.println( "source(9999) = " + map.source(9999) );

System.out.println( "fetch took " + ( (double) ( System.currentTimeMillis() - populate ) / 1000 ) + " seconds" );
System.out.println( "speed test:  end at " + new Date() );
----------

I ran it about five times and had very little (+ or - 12ms) variance to the above results.

I did not run any tests on on JugglerW code...

-Mark
0

Commented:
Oh.. I missed two lines at the beginning of the test (cut and paste):

Date mark = new Date();
System.out.println( "Speed test: start at " + mark );

-Mark
0

Author Commented:
mark & Juggler your code is both great. I think I'll go with Juggler's for now since I'd rather stick with primitives than objects for now I guess. But I definitely learned something from the TreeMap approach.
0

Commented:
:-(
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.