Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Looking for suggestions on in-memory searching...

Posted on 2003-12-11
Last Modified: 2010-04-15

I currently am using binary searching on an index which is completely maintained in memory.   This is for our intranet daemon which keeps an active list of logged on users.  This handles around 500 requests a second, and has a average size of 750,000 entries (or users logged in).

Basically we have a queue every time a user logs in, and every 10 seconds we "reindex" the main index with the queue.  We did it this way because 500 mutex locks a second on the index + time to search completely killed the CPU's and memory.  We have "markers" throughout the index to help find out where the users information is stored.

This isn't cutting it for us anymore, and I am looking for a solution that will be able to handle REAL-TIME inserts/deletes/searches.

Every 10 minutes we'll delete users who haven't been active in 24 hours.  Every second there will be 500 searches, 250 users will be already found in the index, and 250 will be added.

Basically our userinfo structure looks something like this:

struct _userinfo {
       unsigned uid;
       int position;
       char country[3];
       char *data;

Yes, small indeed.  But having quick lookups and real-time additions, deletion, and searching is a must, like I said before.

Can anyone suggest a library that can handle this?  We would rather use something tested and being used in the real world right now.  Developing an in-house library for this would take months to get something perfect. :)

Thank you
Question by:rulerr
  • 7
  • 5
LVL 45

Expert Comment

ID: 9926053
I think a small change in data structures will go a long way to help you

>struct _userinfo {
>      unsigned uid;
>      int position;
>      char country[3];
>      char *data;
Your struct will be 15 bytes on most platforms ... If you are malloc'ing structs, then on most common platforms, minimum memory malloc'ed is 64 bytes ... So you lose 49 bytes per struct you malloc ....

Since you are going to use around >750,000 entries <, why not have an array of structs with around 750,000 elements ... let the index of the struct in the array be the uid (so you save another 4 bytes per struct)
Moreover, this will make your search direct access instead of binay

We had a discussion about managing similar situation a bit differently, ... go though this link

Author Comment

ID: 9926064

Thanks for the quick response, but like I said, I would much rather have a library that implements this all safely.  And also, this is a completely dynamic user list.  Sometimes it's 250,000, sometimes its 1,500,000.  It's unpredictable.

We need to fix this pretty fast, and that's why we really don't have time to develop an implementation ourselves.

Again, thanks for the response.
LVL 45

Expert Comment

ID: 9926083
see if this helps
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

LVL 45

Expert Comment

ID: 9926090
If the ideas there appeal to you, then here is a library

Author Comment

ID: 9926118

Seems that library is for Pascal ;)

Do you really think the best approach is to use an array of structs?
LVL 45

Expert Comment

ID: 9926146
>Do you really think the best approach is to use an array of structs?
Depends on how much memory and processing power you have to spare ...
with binary search, in 750,000 elements, you need 14 comparisons in worst case ... sme for BSTs and other such data organizations ...

array of structs will reduce it to constant (direct access) ... but price paid is large amounts of memory required ... as you need a struct even for data that you do not have to store (unused ids)

B+ trees are a tradeoff that will reduce number of comparisons while not taking up so much memory

So it all boils down to
1. how much resources you have to spare
2. how much more efficient you want it to be

>Seems that library is for Pascal
My bad ... I am sorry, but I am sure you can find similar one easily using google or at codeproject.com or programmersheaven.com or sourceforge.net ..... list is endless

Author Comment

ID: 9926161

So what "other" way besides array of structs would you suggest for real-time inserting, deleting, etc, unlike the queue system we are using now?

Basically this queue thing is really messing things up when someone comes back within 10 seconds (search returns nothing because it didn't get indexed yet)

I believe we tried using mutex locks and it completely crapped out.  Can't remember though, it was a long time ago.  If you can persuade me otherwise I'll go ahead and write the code again for that and see how it works.   Wouldn't be too hard to convert what we currently have to use that.

Thanks again, Sunny.
LVL 45

Accepted Solution

sunnycoder earned 125 total points
ID: 9926176
>So what "other" way besides array of structs would you suggest for real-time inserting, deleting, etc, unlike the queue system
>we are using now?
If you have memory to spare and want to make it real fast, use array of structs...

If you want to save some memory and can compromise a bit with the speed, something like B+ trees is in order ... if you do not have range searches and sequential access requirements, B trees or a variant will be better than B+ tree
Take a look at the organization of data in these trees and the algorithms (insertion/deletion) to get an idea of how things work with it...

btw ... were you malloc'ing the structs ?

Author Comment

ID: 9926207
Yes we are currently mallocing the structs.
LVL 45

Expert Comment

ID: 9926215
Then I guess you were using up a lot of memory due to internal fragmentation anyway ... An array of structs should be affordable

Author Comment

ID: 9926226
Heh, okay.

Let's see what I can do now.

Thank you for the quick responses ;)
LVL 45

Expert Comment

ID: 9926228
glad to help :o)

Featured Post

Is Your AD Toolbox Looking More Like a Toybox?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you thought about creating an iPhone application (app), but didn't even know where to get started? Here's how: ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Important pre-programming comments: I’ve never tri…
This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
The goal of this video is to provide viewers with basic examples to understand how to use strings and some functions related to them in the C programming language.
The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question