Looking for suggestions on in-memory searching...

Posted on 2003-12-11
Last Modified: 2010-04-15

I currently am using binary searching on an index which is completely maintained in memory.   This is for our intranet daemon which keeps an active list of logged on users.  This handles around 500 requests a second, and has a average size of 750,000 entries (or users logged in).

Basically we have a queue every time a user logs in, and every 10 seconds we "reindex" the main index with the queue.  We did it this way because 500 mutex locks a second on the index + time to search completely killed the CPU's and memory.  We have "markers" throughout the index to help find out where the users information is stored.

This isn't cutting it for us anymore, and I am looking for a solution that will be able to handle REAL-TIME inserts/deletes/searches.

Every 10 minutes we'll delete users who haven't been active in 24 hours.  Every second there will be 500 searches, 250 users will be already found in the index, and 250 will be added.

Basically our userinfo structure looks something like this:

struct _userinfo {
       unsigned uid;
       int position;
       char country[3];
       char *data;

Yes, small indeed.  But having quick lookups and real-time additions, deletion, and searching is a must, like I said before.

Can anyone suggest a library that can handle this?  We would rather use something tested and being used in the real world right now.  Developing an in-house library for this would take months to get something perfect. :)

Thank you
Question by:rulerr
  • 7
  • 5
LVL 45

Expert Comment

ID: 9926053
I think a small change in data structures will go a long way to help you

>struct _userinfo {
>      unsigned uid;
>      int position;
>      char country[3];
>      char *data;
Your struct will be 15 bytes on most platforms ... If you are malloc'ing structs, then on most common platforms, minimum memory malloc'ed is 64 bytes ... So you lose 49 bytes per struct you malloc ....

Since you are going to use around >750,000 entries <, why not have an array of structs with around 750,000 elements ... let the index of the struct in the array be the uid (so you save another 4 bytes per struct)
Moreover, this will make your search direct access instead of binay

We had a discussion about managing similar situation a bit differently, ... go though this link

Author Comment

ID: 9926064

Thanks for the quick response, but like I said, I would much rather have a library that implements this all safely.  And also, this is a completely dynamic user list.  Sometimes it's 250,000, sometimes its 1,500,000.  It's unpredictable.

We need to fix this pretty fast, and that's why we really don't have time to develop an implementation ourselves.

Again, thanks for the response.
LVL 45

Expert Comment

ID: 9926083
see if this helps
LVL 45

Expert Comment

ID: 9926090
If the ideas there appeal to you, then here is a library

Author Comment

ID: 9926118

Seems that library is for Pascal ;)

Do you really think the best approach is to use an array of structs?
LVL 45

Expert Comment

ID: 9926146
>Do you really think the best approach is to use an array of structs?
Depends on how much memory and processing power you have to spare ...
with binary search, in 750,000 elements, you need 14 comparisons in worst case ... sme for BSTs and other such data organizations ...

array of structs will reduce it to constant (direct access) ... but price paid is large amounts of memory required ... as you need a struct even for data that you do not have to store (unused ids)

B+ trees are a tradeoff that will reduce number of comparisons while not taking up so much memory

So it all boils down to
1. how much resources you have to spare
2. how much more efficient you want it to be

>Seems that library is for Pascal
My bad ... I am sorry, but I am sure you can find similar one easily using google or at or or ..... list is endless
ScreenConnect 6.0 Free Trial

Discover new time-saving features in one game-changing release, ScreenConnect 6.0, based on partner feedback. New features include a redesigned UI, app configurations and chat acknowledgement to improve customer engagement!


Author Comment

ID: 9926161

So what "other" way besides array of structs would you suggest for real-time inserting, deleting, etc, unlike the queue system we are using now?

Basically this queue thing is really messing things up when someone comes back within 10 seconds (search returns nothing because it didn't get indexed yet)

I believe we tried using mutex locks and it completely crapped out.  Can't remember though, it was a long time ago.  If you can persuade me otherwise I'll go ahead and write the code again for that and see how it works.   Wouldn't be too hard to convert what we currently have to use that.

Thanks again, Sunny.
LVL 45

Accepted Solution

sunnycoder earned 125 total points
ID: 9926176
>So what "other" way besides array of structs would you suggest for real-time inserting, deleting, etc, unlike the queue system
>we are using now?
If you have memory to spare and want to make it real fast, use array of structs...

If you want to save some memory and can compromise a bit with the speed, something like B+ trees is in order ... if you do not have range searches and sequential access requirements, B trees or a variant will be better than B+ tree
Take a look at the organization of data in these trees and the algorithms (insertion/deletion) to get an idea of how things work with it...

btw ... were you malloc'ing the structs ?

Author Comment

ID: 9926207
Yes we are currently mallocing the structs.
LVL 45

Expert Comment

ID: 9926215
Then I guess you were using up a lot of memory due to internal fragmentation anyway ... An array of structs should be affordable

Author Comment

ID: 9926226
Heh, okay.

Let's see what I can do now.

Thank you for the quick responses ;)
LVL 45

Expert Comment

ID: 9926228
glad to help :o)

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Preface I don't like visual development tools that are supposed to write a program for me. Even if it is Xcode and I can use Interface Builder. Yes, it is a perfect tool and has helped me a lot, mainly, in the beginning, when my programs were small…
This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use switch statements in the C programming language.

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now