[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 285
  • Last Modified:

Table data structure with good performance for sorting by any column, getting N-th value and getting a value by index

I need a hint how to design a data structure for a table of items with good performance.

The structure must fullfil folowing requirements:

1. table is sorted by specified item attribute and sort order, which may change, but not so often
2. get item by index (used in method for getting the N-th page of items) - very often
3. add new item - very often
4. remove item by ID - very often

Item has about 15 attributes, so the table will have 15 columns.
The attributes have different data type (long, int, string, timestamp, ...).
The table will hold about 1000-10000 items.

The data structure will be implemented in java. I need only theoretical design, not the implementation, but the implementation is appreciated.

Thanks !
0
MirecXP
Asked:
MirecXP
1 Solution
 
CEHJCommented:
The best way would be to use a database

>>Item has about 15 attributes, so the table will have 15 columns.

In db terms that would be better split up into more tables
0
 
MirecXPAuthor Commented:
Currently I cannot use a DB.
0
 
CEHJCommented:
If you're implementing it in Java then i would implement it as a LinkedList of class X, where class X has the required attributes
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
KantiCommented:
Referring  Java Collections will help
http://java.sun.com/docs/books/tutorial/collections/index.html

and this has some explaination on datastructures

http://www.cs.sunysb.edu/~algorith/lectures-good/node7.html
0
 
MirecXPAuthor Commented:
Thank you for the answers.

I tried to implement it as self-sorted structure which uses ArrayList internally with optimized capacity ensuring and trimming.

For adding new item I use binary search alg to find new index, so insertion is O(1+log2N).
Getting N-th item is O(1) as it is internally an array.
To delete N-th item in arraylist is quite quick.

The main problem is finding the index of item with specified ID.
The ID is String and to find it in an array (or even in LinkedList) is tooooooo slow.

Some numbers:
To create sorted table with 5000 items takes 60-80 ms.
To traverse the table takes 0-15 ms.
To empty the table by deleting 1th item in the loop takes ~50 ms.
! To find every item by its ID takes 12000 ms ! Thats 2500 ms average search time :-(((

Do you have an idea how to make it quicker ? Possibly by storing some additional data.

My idea was to have an index as next attribute of item and to have all items also in HashSet when hashCode would be based on the ID. To get an index I could find the item in HashSet first and then get the index from item directly. But then it would be necessary to update all indexes in all items after inserted new item (or deleted one).
That's also more complicated, as I cannot have an index attribute directly in the item, because the item is shared in more tables with different settings (filter, sort column, sort order). To solve this I would have to encapsulate it in another "Position" object with index and with reference to the item.
0
 
CEHJCommented:
>>I could find the item in HashSet

That would be OK

>>But then it would be necessary to update all indexes in all items after inserted new item (or deleted one).

I don't see why. The hash code *is* the (an) index effectively
0
 
MirecXPAuthor Commented:
Hashcode based on the ID is "an index" in hashset.
But item's index is in the array sorted by selected item attribute and stores the position in the array to quickly generate the page of sorted table.
0
 
CEHJCommented:
I'll come back to this tomorrow - it's late here, but if the ID is unique it can serve as the return value of  hashCode
0
 
MirecXPAuthor Commented:
Anyway, do you think that DB would perform well for the scenario described ?
In more details:
There will be about 20-100 tables (views) showing the same 1000 - 10000 items at the same time.
New items will be added or deleted at rates about 1 - 100 per minute.
Every table showing some page (10-100 items) is refreshed at rate 30s - 120s.
Sorting will be changed only sometimes (let's say that in every table once per 5 minutes).
0
 
CEHJCommented:
It sounds OK. Of course, since items are sorted, you could use a TreeMap, which:

"... provides guaranteed log(n) time cost for the containsKey, get, put and remove  operations"

http://java.sun.com/j2se/1.5.0/docs/api/java/util/TreeMap.html
0
 
MirecXPAuthor Commented:
but TreeMap may sort only by the key (in my case ID), am I right ?
I need also quick operation for
-remove by ID (in case of TreeMap it's ok)
-get item at N-th position (problem when using treemap ?)
-keep the values sorted by optional item attribute when adding/removing
-changing the sort order and sortBy-attribute
0
 
CEHJCommented:
>>but TreeMap may sort only by the key (in my case ID), am I right ?

It certainly has a default sort order, but that can be changedby using its ctor that takes a Comparator as a parameter

>>-get item at N-th position (problem when using treemap ?)

Yes, unless it's backed by a List. You'd think that LinkedHashMap should allow access to the collection by index too - unfortunately not
0
 
MirecXPAuthor Commented:
I have a feeling that I need some "egg-laying wool milk sow" :-)
The best structure for everything, perpetual motion or something like that :-)
0
 
CEHJCommented:
Maybe ;-) Perhaps you could find something here..?

http://javolution.org/api/index.html
0
 
MirecXPAuthor Commented:
OK, I implemented and tested the solution with wrapping the item into new "Position" object described above...

Result is for 5000 items:

To create sorted table with 5000 items takes 1500 ms if adding item one by one (this will be used) and 150ms if adding at once.
To traverse the table in sorted order takes 0-15 ms.
To empty the table by deleting 1th item in the loop takes 2500 ms.
To find every item by its ID takes 0-15 ms
To re-sort the table takes 150ms.

What do you think about it ?
0
 
RomModCommented:
Question closed - 250 points refunded.

Best regards,
RomMod
Experts Exchange
Community Support Moderator
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now