• C

Looking for suggestions on in-memory searching...


I currently am using binary searching on an index which is completely maintained in memory.   This is for our intranet daemon which keeps an active list of logged on users.  This handles around 500 requests a second, and has a average size of 750,000 entries (or users logged in).

Basically we have a queue every time a user logs in, and every 10 seconds we "reindex" the main index with the queue.  We did it this way because 500 mutex locks a second on the index + time to search completely killed the CPU's and memory.  We have "markers" throughout the index to help find out where the users information is stored.

This isn't cutting it for us anymore, and I am looking for a solution that will be able to handle REAL-TIME inserts/deletes/searches.

Every 10 minutes we'll delete users who haven't been active in 24 hours.  Every second there will be 500 searches, 250 users will be already found in the index, and 250 will be added.

Basically our userinfo structure looks something like this:

struct _userinfo {
       unsigned uid;
       int position;
       char country[3];
       char *data;

Yes, small indeed.  But having quick lookups and real-time additions, deletion, and searching is a must, like I said before.

Can anyone suggest a library that can handle this?  We would rather use something tested and being used in the real world right now.  Developing an in-house library for this would take months to get something perfect. :)

Thank you
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

I think a small change in data structures will go a long way to help you

>struct _userinfo {
>      unsigned uid;
>      int position;
>      char country[3];
>      char *data;
Your struct will be 15 bytes on most platforms ... If you are malloc'ing structs, then on most common platforms, minimum memory malloc'ed is 64 bytes ... So you lose 49 bytes per struct you malloc ....

Since you are going to use around >750,000 entries <, why not have an array of structs with around 750,000 elements ... let the index of the struct in the array be the uid (so you save another 4 bytes per struct)
Moreover, this will make your search direct access instead of binay

We had a discussion about managing similar situation a bit differently, ... go though this link
rulerrAuthor Commented:

Thanks for the quick response, but like I said, I would much rather have a library that implements this all safely.  And also, this is a completely dynamic user list.  Sometimes it's 250,000, sometimes its 1,500,000.  It's unpredictable.

We need to fix this pretty fast, and that's why we really don't have time to develop an implementation ourselves.

Again, thanks for the response.
see if this helps
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

If the ideas there appeal to you, then here is a library
rulerrAuthor Commented:

Seems that library is for Pascal ;)

Do you really think the best approach is to use an array of structs?
>Do you really think the best approach is to use an array of structs?
Depends on how much memory and processing power you have to spare ...
with binary search, in 750,000 elements, you need 14 comparisons in worst case ... sme for BSTs and other such data organizations ...

array of structs will reduce it to constant (direct access) ... but price paid is large amounts of memory required ... as you need a struct even for data that you do not have to store (unused ids)

B+ trees are a tradeoff that will reduce number of comparisons while not taking up so much memory

So it all boils down to
1. how much resources you have to spare
2. how much more efficient you want it to be

>Seems that library is for Pascal
My bad ... I am sorry, but I am sure you can find similar one easily using google or at codeproject.com or programmersheaven.com or sourceforge.net ..... list is endless
rulerrAuthor Commented:

So what "other" way besides array of structs would you suggest for real-time inserting, deleting, etc, unlike the queue system we are using now?

Basically this queue thing is really messing things up when someone comes back within 10 seconds (search returns nothing because it didn't get indexed yet)

I believe we tried using mutex locks and it completely crapped out.  Can't remember though, it was a long time ago.  If you can persuade me otherwise I'll go ahead and write the code again for that and see how it works.   Wouldn't be too hard to convert what we currently have to use that.

Thanks again, Sunny.
>So what "other" way besides array of structs would you suggest for real-time inserting, deleting, etc, unlike the queue system
>we are using now?
If you have memory to spare and want to make it real fast, use array of structs...

If you want to save some memory and can compromise a bit with the speed, something like B+ trees is in order ... if you do not have range searches and sequential access requirements, B trees or a variant will be better than B+ tree
Take a look at the organization of data in these trees and the algorithms (insertion/deletion) to get an idea of how things work with it...

btw ... were you malloc'ing the structs ?

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
rulerrAuthor Commented:
Yes we are currently mallocing the structs.
Then I guess you were using up a lot of memory due to internal fragmentation anyway ... An array of structs should be affordable
rulerrAuthor Commented:
Heh, okay.

Let's see what I can do now.

Thank you for the quick responses ;)
glad to help :o)
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.