Solved

Fastest search/sort algorithm for a link list data structure

Posted on 2007-03-21
14
1,014 Views
Last Modified: 2008-02-01
Hi expert,

If I have a link list data structure - the fastest search algorithm will be linear search still?  I thought about binary search, but can I use it with the link list data structure?  
Usually binary search will jump index and I don't think it would works for link list

Also, what is the appropriate sort method I should use?  Can I use quicksort?

Thanks
0
Comment
Question by:4eyesgirl
  • 5
  • 3
  • 3
  • +2
14 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 18764749
How "dynamic" is that list structure? You might be fine by "indexing" the whole list if it is quite static, on th eother hand, this might get inconvenient if that list constantly changes. A good tradeoff might be to track pointers the elements in a  std::map, so you can look them up easily as well as taking care of changes in that list.
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 18764930
Even a std::vector combined with the STL sort algorithms (or your own if you prefer) might be sufficient.

About binary search : as you know : that works pretty well on sorted data (O(log(N)) as opposed to O(N) for linear search).

Then there's interpolation search, which works even better for large (sorted) lists with even distribution (between O(log(log(N))) and O(N)).

If you don't want to sort the list, you can do better than linear search by using Grover's algorithm for example (O(N^(1/2))).

As jkr suggested, you could also use specialized data structures, like hash tables, associative maps, (self-balancing) binary search trees, etc.

Some links on the mentioned algorithms/data structures :

http://en.wikipedia.org/wiki/Linear_search
http://en.wikipedia.org/wiki/Binary_search
http://en.wikipedia.org/wiki/Interpolation_search
http://en.wikipedia.org/wiki/Grover%27s_algorithm

http://en.wikipedia.org/wiki/Associative_array
http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 18764936
The best choice for you depends largely on the type of data, frequency of data modification (addition, deletion, etc.), frequency of access, etc.
0
 
LVL 9

Expert Comment

by:smidgie82
ID: 18765192
The short answer is, if you have a singly-linked (or even double-linked) list, with no extra data structures added for indexing purposes, the best you can do is to search it item-by-item in one direction or another, which requires linear time, as you observed.

The longer answer is that you can use STL algorithms and classes to speed things up if you just want it to work in faster than linear time.

The yet longer, but more complete answer is, if you're dealing with a process wherein items can be encountered and inserted into (and possibly deleted from) the linked list in a random order, then you have to deal with some overhead at insertion (and deletion) time in order to ensure your data is indexed (or linked) in a way that is conducive to efficient searching.  The answer to "what's the fastest search algorithm" is a misleading one.  What you really need is to profile the sort of data your program will be encountering, and then analyze the performance of your process to make the _process_ as fast as possible.  An example: if your program typically performs a million searches between each data insertion, then it's most effective for you to spend lots of time setting up and maintaining efficient indexing structures to make the searches as fast as possible.  On the other hand, if your program does a gazillion insertions, and only occasionally does a search, spend less time on maintaining indexing structures, knowing that while it'll make the searches slower, you'll win in the long run because your insertions will take less time.  The tricky part is that insertions (and deletions) typically also require searching the structure, so you have to balance the search time with insertion/deletion overhead.

IMO, chances are good that if you just go with the STL you'll be happy.  If not, and you want to use a custom data structure rather than going with the STL, implement a self-balancing binary tree (red/black trees are good) instead of a linked list.  Insertion, deletion, and searching are all O(log n) operations.  It's about the most efficient, easy-top-implement, general-purpose ADT you can get.  If that's still not good enough, you can try a B-tree or R-tree, which sacrifice some search time (depending on the circumstances) in favor of less maintenance overhead.
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 18765261
>>>> what is the appropriate sort method I should use?

The best for linked lists is if they are already sorted. You should insert new items so that the list is always sorted. As a list is very quick for inserting between, that gives the best overall performance.

Note, making a sorted insert into a singly linked list gives very bad results if the values are already sorted. Then, you need to compare each new item with each existing item, hence it is similar (bad) as a bubble sort. In case of a doubly linked list you could check the first and last list entry to decide whether you run from front to back or back to front  to find the insertion point.

Often you may get the values from an other container or a database. If so, you might check whether that other container can provide a sorted list. If so you simply could build the sorted list by adding nodes at the back of the list and only for additional entries you have to lookup for  the insert position.

Regards, Alex
0
 

Author Comment

by:4eyesgirl
ID: 18765806
I guess I wasn't clear about my question -

I am just building a simple double linked-list and I cannot use the stl.  

To make it simple, let's assume I have a data structure that has 4 items (firstname, lastname, email and DOB).

I was thinking to do linear search since the data structure is a link-list, I don't think I will go for the indexing approach as this is a fairly small program.

Then my next concern is sorting - I do plan to keep the record in a sorting order (by firstname and last name).  but I also want to provide a sorting feature if user decide they want to print out the link list contents sorted by email address or DOB.  In this case, even if I have a sorted- link -list, I will have to go through the entire list and create a new link list and sorted by DOB or email.  Does it make sense to you?  I have a pretty good idea how to do it, but I would like your expert opinion.

Thanks.
0
 
LVL 9

Expert Comment

by:smidgie82
ID: 18766054
Unfortunately, if you're required to use your double-linked list, and don't want to add indexing overhead, then inserting elements in sorted order is an O(N) process, with average time taking N/2.  Searching for an element is also an O(N) process, with the same average time.  Printing in that sorted order is an O(N) process as well, and printing sorted by any other criteria is at best an O(N*log(N)) process.

So, building the list is O(N^2) process (because each insertion is O(Ni), where Ni is the number of elements in the list at that time, which means the process is at worst O(sum(Ni)), which would be O((N*(N+1))/2), or O(N^2)) if you build it in sorted order.  Average time for an insertion at each list size would be Ni/2, so the average time to build the list is N(N+1)/4.  Since printing in sorted order would be O(N*log(N)), the upshot is that unless you're planning on printing the list (N+1)/4 times or more, it's faster to keep the list in unsorted order (assuming inserting at the beginning or end is an O(1) operation), and copy the contents of the list to a dynamically-created array to quicksort it every time you need to print a sorted list.

My numbers may not be accurate, but the bottom line is that if you choose to keep it in some sort of sorted order, linear time is the best you can do for insertions.  And, I think the fastest way to sort it by any other criteria will be to build an array, copy over the contents of the linked list, and quicksort it.  You can then recreate the linked list from that array in O(N) time, if you want.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 53

Expert Comment

by:Infinity08
ID: 18766087
If you need to sort the same list on different keys, you can always add more than 1 pointer to the nodes.

For example :

typedef struct node {
  string firstname;
  string lastname;
  string email;
  string DOB;
  struct node *next_firstname; // ordering by firstname
  struct node *next_lastname; // ordering by lastname
}
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 18766104
Make that of course :

typedef struct node {
  string firstname;
  string lastname;
  string email;
  string DOB;
  struct node *next_firstname; // ordering by firstname
  struct node *next_lastname; // ordering by lastname
} node;
0
 

Author Comment

by:4eyesgirl
ID: 18766325
smidgie82:-

Do you think there is any benefit if I keep the link-listed sorted anyway?  For example, if I want to check (perform a search) for insertion (make sure there is no duplicate?
Or as you would suggested, there will not be much benefit to keep the link list sorted?  Since I will also need to create or copy to another sorted link-list or data structure(array) and sorted by any field of the structure.

inifinity08 - I am not really understanding what you are saying.  Are you suggested me to have multiple node* for different order.  In this simple strcture, I would have 4 pointer to organize the link-list, and I think it is kind of messy.  Maybe I am not getting what you are suggesting?
0
 
LVL 39

Assisted Solution

by:itsmeandnobodyelse
itsmeandnobodyelse earned 100 total points
ID: 18766412
1. To get a list sorted for more than one criteria you may use
     - function pointers to different sort functions
     - a static class property which tells what member should be
       used for sorting

The second choice is not thread-safe cause you can't sort different lists in different threads asynchronously if you have only one property fro the class which tells how to sort:

struct data
{
      enum SortCriteria { SORT_FN, SORT_LN, SORT_EM, SORT_DOB };
      static  SortCriteria sc;   // initialized e. g. by SORT_FN
 
       ....

        bool operator< (const node& n2) const
        {
               switch (sc)
               {
               case SORT_FN: return firstname < n2.firstname;
               ...
               }
               return false;
        }
};

The linked list will use the built-in operator < for sorting.

Note, passing function pointers is the more usual way and gives more flexibilit. But the above approach maybe is easier to understand especially if you have less experiences with function pointers.

If your list isn't a template but a container for one struct only, you can store the sort criteria in the list rather than as a static in the data class. Then the list would pass the needed sort criteria to a compareless function of struct data which has an implementation similar to that of operator< above but gets the sort criteria as an argument.

2. I do plan to keep the record in a sorting order (by firstname and last name).  
    but I also want to provide a sorting feature

If the list is always sorted according to a given sort criteria stored as a member in the list, you would create a new list if a new sort order is required passing the new sort criteria. Then you iterate thru the current list and insert any data item to the new list according the new sort criteria. You can delete any (old) entry at iteration or delete the old list after the new list was fully created. Or you can have both lists if you don't mind the redundancy.

Regards, Alex

0
 
LVL 53

Assisted Solution

by:Infinity08
Infinity08 earned 100 total points
ID: 18766414
>> inifinity08 - I am not really understanding what you are saying
Let me clarify then.

Suppose these names are in the list :

    John Smith
    John Miles
    Miles Davis
    Richard Nixon

Then your linked list would look like this :

                                                                            |
                                                                           2
                                                                           \/
    --1-->John Smith --1--> John Miles --1--> Miles Davis --1--> Richard Nixon
                     /\                     |      /\                      |                             /\      |
                      |                     |       |-----2----------|                              |       |
                      |                     |------------------------------2--------------|       |
                      |-----------------------------2-------------------------------------|

I'm not sure if this is gonna show correctly ...


>> and I think it is kind of messy

If you think it's messy, then don't do it ... only use constructs like this if you feel comfortable with them.
0
 
LVL 9

Accepted Solution

by:
smidgie82 earned 300 total points
ID: 18766522
@4eyesgirl,

If you want to avoid duplicates, it'll be better to keep the list sorted.  Here's why:

Assume you don't keep the list sorted.  You can insert an item at the beginning (or end, doesn't matter) of the list in a single operation.  First, however, you have to search the _entire_ list to make sure there are no duplicates.  Thus, a successful insertion operation (i.e., no duplicate was found) _always_ takes a minimum of N steps.

On the other hand, if your list is in sorted order, even though it _might_ take N steps (the item you're inserting is greater than the last element in the list), on average it only takes N/2 steps, and in the same number of steps (which only need to be performed once), you either locate the place the item should be put, or you locate an identical item that already exists in the list.

The upshot is, if you don't want to allow duplicate items to be inserted into the list, it's twice as fast to keep the list in sorted order all the time.
0
 

Author Comment

by:4eyesgirl
ID: 18766960
Thanks for all your help!  It was all very helpful suggestion.

I slit the points to the answer that I understand the most!

Thanks so much!!!
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to use…
The goal of this video is to provide viewers with basic examples to understand recursion in the C programming language.
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now