Solved

Looking for suggestions on in-memory searching...

Posted on 2003-12-11
12
251 Views
Last Modified: 2010-04-15
Hi,

I currently am using binary searching on an index which is completely maintained in memory.   This is for our intranet daemon which keeps an active list of logged on users.  This handles around 500 requests a second, and has a average size of 750,000 entries (or users logged in).

Basically we have a queue every time a user logs in, and every 10 seconds we "reindex" the main index with the queue.  We did it this way because 500 mutex locks a second on the index + time to search completely killed the CPU's and memory.  We have "markers" throughout the index to help find out where the users information is stored.

This isn't cutting it for us anymore, and I am looking for a solution that will be able to handle REAL-TIME inserts/deletes/searches.

Every 10 minutes we'll delete users who haven't been active in 24 hours.  Every second there will be 500 searches, 250 users will be already found in the index, and 250 will be added.

Basically our userinfo structure looks something like this:

struct _userinfo {
       unsigned uid;
       int position;
       char country[3];
       char *data;
};

Yes, small indeed.  But having quick lookups and real-time additions, deletion, and searching is a must, like I said before.

Can anyone suggest a library that can handle this?  We would rather use something tested and being used in the real world right now.  Developing an in-house library for this would take months to get something perfect. :)

Thank you
0
Comment
Question by:rulerr
  • 7
  • 5
12 Comments
 
LVL 45

Expert Comment

by:sunnycoder
Comment Utility
I think a small change in data structures will go a long way to help you

>struct _userinfo {
>      unsigned uid;
>      int position;
>      char country[3];
>      char *data;
>};
Your struct will be 15 bytes on most platforms ... If you are malloc'ing structs, then on most common platforms, minimum memory malloc'ed is 64 bytes ... So you lose 49 bytes per struct you malloc ....

Since you are going to use around >750,000 entries <, why not have an array of structs with around 750,000 elements ... let the index of the struct in the array be the uid (so you save another 4 bytes per struct)
Moreover, this will make your search direct access instead of binay

We had a discussion about managing similar situation a bit differently, ... go though this link
http:/Q_20821132.html
0
 

Author Comment

by:rulerr
Comment Utility
Hi,

Thanks for the quick response, but like I said, I would much rather have a library that implements this all safely.  And also, this is a completely dynamic user list.  Sometimes it's 250,000, sometimes its 1,500,000.  It's unpredictable.

We need to fix this pretty fast, and that's why we really don't have time to develop an implementation ourselves.

Again, thanks for the response.
0
 
LVL 45

Expert Comment

by:sunnycoder
Comment Utility
see if this helps
http:/Q_20798196.html
0
 
LVL 45

Expert Comment

by:sunnycoder
Comment Utility
If the ideas there appeal to you, then here is a library
http://www.bitsoft.com/programming/freestuff/containers.asp?lang=en
0
 

Author Comment

by:rulerr
Comment Utility
Hi,

Seems that library is for Pascal ;)

Do you really think the best approach is to use an array of structs?
0
 
LVL 45

Expert Comment

by:sunnycoder
Comment Utility
>Do you really think the best approach is to use an array of structs?
Depends on how much memory and processing power you have to spare ...
with binary search, in 750,000 elements, you need 14 comparisons in worst case ... sme for BSTs and other such data organizations ...

array of structs will reduce it to constant (direct access) ... but price paid is large amounts of memory required ... as you need a struct even for data that you do not have to store (unused ids)

B+ trees are a tradeoff that will reduce number of comparisons while not taking up so much memory

So it all boils down to
1. how much resources you have to spare
2. how much more efficient you want it to be

>Seems that library is for Pascal
My bad ... I am sorry, but I am sure you can find similar one easily using google or at codeproject.com or programmersheaven.com or sourceforge.net ..... list is endless
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 

Author Comment

by:rulerr
Comment Utility
Ok,

So what "other" way besides array of structs would you suggest for real-time inserting, deleting, etc, unlike the queue system we are using now?

Basically this queue thing is really messing things up when someone comes back within 10 seconds (search returns nothing because it didn't get indexed yet)

I believe we tried using mutex locks and it completely crapped out.  Can't remember though, it was a long time ago.  If you can persuade me otherwise I'll go ahead and write the code again for that and see how it works.   Wouldn't be too hard to convert what we currently have to use that.

Thanks again, Sunny.
0
 
LVL 45

Accepted Solution

by:
sunnycoder earned 125 total points
Comment Utility
>So what "other" way besides array of structs would you suggest for real-time inserting, deleting, etc, unlike the queue system
>we are using now?
If you have memory to spare and want to make it real fast, use array of structs...

If you want to save some memory and can compromise a bit with the speed, something like B+ trees is in order ... if you do not have range searches and sequential access requirements, B trees or a variant will be better than B+ tree
Take a look at the organization of data in these trees and the algorithms (insertion/deletion) to get an idea of how things work with it...

btw ... were you malloc'ing the structs ?
0
 

Author Comment

by:rulerr
Comment Utility
Yes we are currently mallocing the structs.
0
 
LVL 45

Expert Comment

by:sunnycoder
Comment Utility
Then I guess you were using up a lot of memory due to internal fragmentation anyway ... An array of structs should be affordable
0
 

Author Comment

by:rulerr
Comment Utility
Heh, okay.

Let's see what I can do now.

Thank you for the quick responses ;)
0
 
LVL 45

Expert Comment

by:sunnycoder
Comment Utility
glad to help :o)
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Preface I don't like visual development tools that are supposed to write a program for me. Even if it is Xcode and I can use Interface Builder. Yes, it is a perfect tool and has helped me a lot, mainly, in the beginning, when my programs were small…
Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to use…
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use nested-loops in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use conditional statements in the C programming language.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now