Solved

Sort/Merge Algorithm

Posted on 2001-07-15
48
751 Views
Last Modified: 2007-12-19
Hi.
I have an array, that contains sorted blocks.
What is the best way to sort it ?
I suppose that it is better to merge the blocks, but I don't know how to merge the sorted blocks without using additional space.
0
Comment
Question by:Amor_
  • 18
  • 15
  • 6
  • +7
48 Comments
 
LVL 42

Expert Comment

by:sedgwick
ID: 6284972
0
 
LVL 1

Author Comment

by:Amor_
ID: 6285003
Hi
I know those algorithms and the best is stl sort.
But I have more specific problem - I have a very big array (about 300,000 elements) that contains about 100 huge sorted blocks and I am looking for the best sort algorithm.
The simple mergesort is not good because it requires additional allocation and deallocation ~1mb space and many read/write actions.
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285106
I recommend that you use an STL vector class.

Example:

#include <list>
#include <string>
#include <iostream>

class Students
{
public:
     Students():m_nGrade(0), m_szFirstName(""), m_szLastName(""){};
     Students(int Grade, std::string FirstName, std::string LastName):m_nGrade(Grade), m_szFirstName(FirstName), m_szLastName(LastName)
     {
     };
     int m_nGrade;
     std::string m_szFirstName;
     std::string m_szLastName;
     bool operator <(const Students& src) const
     {
          if (m_nGrade != src.m_nGrade) return (m_nGrade < src.m_nGrade);
          if (strcmp(m_szLastName.c_str(),src.m_szLastName.c_str()))
          {
               return (strcmp(m_szLastName.c_str(),src.m_szLastName.c_str()) < 0);
          }
          if (strcmp(m_szFirstName.c_str(),src.m_szFirstName.c_str()))
          {
               return (strcmp(m_szFirstName.c_str(),src.m_szFirstName.c_str()) < 0);
          }
          return false;
     }
};

int main(int argc, char* argv[])
{
     std::list<Students> students;
     students.push_back(Students(10,"David","Maisonave"));
     students.push_back(Students(11,"Bob","Black"));
     students.push_back(Students(12,"David","Hancock"));
     students.push_back(Students(11,"Bill","Gate"));
     students.push_back(Students(11,"John","Brown"));
     students.push_back(Students(12,"Ann","George"));
     students.push_back(Students(10,"Joe","Maisonave"));
     students.push_back(Students(11,"Jill","White"));
     students.push_back(Students(12,"Sam","Gate"));
     students.push_back(Students(10,"Paul","White"));
     students.push_back(Students(11,"Betty","Black"));

     students.sort();

     std::cout << "Sudent List:" << std::endl;
     for(std::list<Students>::iterator i = students.begin();i!=students.end();i++)
     {
          std::cout << "Grade " << i->m_nGrade << " Name: " << i->m_szFirstName
               << " " << i->m_szLastName << std::endl;
     }
     return 0;
}

0
 
LVL 1

Author Comment

by:Amor_
ID: 6285114
Axter, thanks.
I use stl sort, but it is a general algorithm and I try to optimize it, according to my array features.
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285120
>>I use stl sort, but it is a general algorithm and I try
>>to optimize it, according to my array features

I don't understand.  Are you saying you ARE trying to optimize it?
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285123
Is there a reason why the data is not sorted as it comes in?
Could you give more information about your 300,000 records?
0
 
LVL 2

Expert Comment

by:smitty1276
ID: 6285128
Just because the stl sort() is the best for one thing doesn't mean its the best sort for another thing.  Depending on the number of elements, etc., different sorts are more efficient for different things.

They wouldn't teach you all of those sorts if one was ALWAYS better than the others.

In any case... you never specified how you are wanting to sort it.  If your array already has sorted blocks then what are you wanting to sort?  If you want the whole array sorted, why don't you just sort it... what's up with the merge thing?  Is it an array of sorted arrays or is it actually one array?
0
 
LVL 1

Author Comment

by:Amor_
ID: 6285144
Hi.
I have an array with about 300,000 elements.
I get the elements in such order, that the array actually consists about 100-500 huge sorted blocks.

I need to sort whole array. I tried some kinds of sorts (stl::sort, mergesort, quicksort) and found that the best is stl sort.

The problem is that it is still slow. I want to write sort algorithm for this specific problem, which will be faster and ask your advise, which way can be better ?
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285158
Please define slow.  How long is it taking, and how big is each element?
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285163
What are you using to store the sorted blocks?

Are you using one of the STL containers? (list, vector, or deque)??
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285165
Can you show some code?
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285171
It's hard to give an optimize solution with out knowing the specifics.
Please give as much information as you can.
0
 
LVL 1

Author Comment

by:Amor_
ID: 6285178
Each element is a struct, 8 byte.
I have a class, which wraps the array (continuous block of memory) of those elements, so I can use stl functions.

The stl sort shows about 200 ms, but because it is a general algorithm, I think that I can improve performance and write my own.

0
 
LVL 30

Expert Comment

by:Axter
ID: 6285198
So you're saying it takes 200 ms for it to sort a 300,000 elements?

Can you show the data structure, and the current code you're using to sort it?
0
 
LVL 1

Author Comment

by:Amor_
ID: 6285202
Axter, the code contains many irellevant parts, but actually I have an array of elements and implementation of iterators.

If I look in the array, I see continuous blocks of sorted elements.
e.g.

1 3 4 5 6 7 ... 2 5 7 9 ... 3 6 7 9 ... etc.
0
 
LVL 1

Author Comment

by:Amor_
ID: 6285233
Data Structure:

class CGlyphContainer
{
public:    
Glyph*     m_data;

iterator begin();
iterator end();
}

struct Glyph {
 int  id;
 byte vals;
 byte flags;
 int  data;
}


sorting function:
stl::sort()
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285241
Please show the exact sort code
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285265
So let me see if I understand this.
You want to merge multiple blocks into one sorted block.

Is that right?

sedgwick has already posted some general sorting algorithms.

But your reply to his post seem to indicate that you're having a problem merging the multiple blocks.

It is not clear to me what you're looking for.

If you need the multiple blocks merge and sorted, and the merge sort is no good, then I suggest that you first combined the multiple blocks, and then sort the merge data.

If you need to optimize your sort code, then you need to post the code you're currently using to sort your blocks.

All the information is rellevant for any optimization.
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285324
The following is an example of three blocks of data being merged into one container, and then sorted.

int main(int argc, char* argv[])
{
     std::vector<Students> Elementry;
     Elementry.push_back(Students(3,"David","Maisonave"));
     Elementry.push_back(Students(5,"Bob","Black"));
     Elementry.push_back(Students(4,"David","Hancock"));
     Elementry.push_back(Students(5,"Bill","Gate"));
     Elementry.push_back(Students(5,"John","Brown"));
     Elementry.push_back(Students(4,"Ann","George"));
     Elementry.push_back(Students(3,"Joe","Maisonave"));
     Elementry.push_back(Students(5,"Jill","White"));
     Elementry.push_back(Students(4,"Sam","Gate"));
     Elementry.push_back(Students(3,"Paul","White"));
     Elementry.push_back(Students(5,"Betty","Black"));
     
     std::vector<Students> Middle;
     Middle.push_back(Students(8,"David","Regan"));
     Middle.push_back(Students(7,"Bob","Jones"));
     Middle.push_back(Students(9,"David","Shoe"));
     Middle.push_back(Students(7,"Bill","Black"));
     Middle.push_back(Students(7,"John","Black"));
     Middle.push_back(Students(9,"Ann","Black"));
     Middle.push_back(Students(8,"Joe","Regan"));
     Middle.push_back(Students(7,"Jill","Maisonave"));
     Middle.push_back(Students(9,"Sam","Black"));
     Middle.push_back(Students(8,"Paul","Maisonave"));
     Middle.push_back(Students(7,"Betty","Jones"));
     
     std::vector<Students> HighSchool;
     HighSchool.push_back(Students(10,"David","Black"));
     HighSchool.push_back(Students(11,"Bob","Black"));
     HighSchool.push_back(Students(12,"David","Regan"));
     HighSchool.push_back(Students(11,"Bill","Love"));
     HighSchool.push_back(Students(11,"John","Brown"));
     HighSchool.push_back(Students(12,"Ann","Love"));
     HighSchool.push_back(Students(10,"Joe","Hancock"));
     HighSchool.push_back(Students(11,"Jill","Black"));
     HighSchool.push_back(Students(12,"Sam","Love"));
     HighSchool.push_back(Students(10,"Paul","Black"));
     HighSchool.push_back(Students(11,"Betty","Black"));
     
     std::vector<Students> Schools;
     Schools.insert(Schools.end(), Elementry.begin(),Elementry.end());
     Schools.insert(Schools.end(), Middle.begin(),Middle.end());
     Schools.insert(Schools.end(), HighSchool.begin(),HighSchool.end());
     
     std::sort(Schools.begin(),Schools.end(),std::less<Students>());

     std::cout << "Sudent List:" << std::endl;
     for(std::vector<Students>::iterator i = Schools.begin();i!=Schools.end();i++)
     {
          std::cout << "Grade " << i->m_nGrade << " Name: " << i->m_szFirstName
               << " " << i->m_szLastName << std::endl;
     }
     return 0;
}


This code uses the previously posted Students class.
0
 
LVL 1

Author Comment

by:Amor_
ID: 6285342
I will try to explain myself:
I should to sort a huge array, which is unsorted.
But big parts of this array are already sorted (sub-arrays):
e.g. A[0..10,000] - sorted, A[10,001-50,000] - sorted, A[50,001-300,000] - sorted.

I use now stl::sort(A[0],A[300,000]) but I think that if I will use the information that subparts of array are already sorted - I can write better sort function.

I simply tried to merge sorted parts, but it takes a lot of times, because allocation of huge memory (for temporary use) and copying elements.

If I had a merge algorithm, which don`t require additional space - I think the problem was solved.

Or maybe you can suggest another solution ?
0
 
LVL 1

Expert Comment

by:LDC
ID: 6285646
How did you merge?

Can you use something like:
merge(sortedBlock1,sortedBlock2)
where:
You keep a counter of the lowest sortedBlock2 value, you iterate on sortedBlock1, when you reach that value, you swap all the sortedBlock2 contents lower than sortedBlock1:
Say you have values:
 111222444555771223346
1 is smallest of block 2:
swap:
 111122444555772223346: You still have 2 sorted arrays
2 is smallest of block2:
swap:
 111122222555774443346: 2 is not sorted.
3 is smallest of block2 and you can find it fast (since block2 is actually 2 sorted blocks again and you know where the cut is):
 111122222335774445546: Repeat:
 111122222334445775546: Former Block1 contains all smallest elements.
Continue:
 111122222334444775556
 111122222334444555776 // note you should copy from the end here or just the minimum  size of arrays: 2 sevens, 3 fives: move two elements only.
111122222334444555677

This is linear in number of values (you can meet a given value only twice in right hand block), so O(n), needs a good swap for memory but you should allocate it once and for all and then memcopy the lengths you need, so you take only one memory allocation if you can have a good shot at buffer size, and otherwise merge in place. I may be mistaken for big arrays since the right hand side seems to lose its sorted character for a while, that may be pathologic.

I think it is one of the regular O(N2) sort algorithms, but in your case it would be O(n) because your array is already so much sorted.
Did you try the "bad" algorithms? Because coding the above just to test if you gain a few ms would be nightmarish...
0
 
LVL 1

Author Comment

by:Amor_
ID: 6285660
The merge that I use - doesn't help anyway.
If I allocate additional space for this and copying the elements there - the performance is in 100 times worse.

Maybe there are exists merge algorithm by swapping elements ?
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285703
>>But big parts of this array are already sorted (sub-arrays):

So are you saying that all you have to do is sort the subsections?
If so, you might be able to optimize your code by sorting the subsections.

>>stl::sort(A[0],A[300,000])

That is not an STL sort function.  First of all, an STL sort function has an std namespace "std::sort()"
Second, an STL sort function takes three arguments.
So it should look something like the following:
std::sort(&glyph[0],&glyph[299999],std::less<Glyph>());

You would also need a LessThen operator "operator <()"
bool operator <(const Glyph& src) const
{
  //some code here
  return ???;
}

Since your sort has the wrong namespace, incorrect quantity of arguments, and it's not getting passed a pointer, I don't think you're really using the STL sort function.
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285723
You should check to see if there's another sort function within your code.  You might be using a different sort function, and think you're really using the STL sort function.

If you were able to post more of your code, we would be able to tell what was really happening.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 30

Expert Comment

by:Axter
ID: 6285747
The following is an example of a STL sort function:

struct Glyph {
int  id;
byte vals;
byte flags;
int  data;
    bool operator <(const Glyph& src) const
    {//default operator
         return (id < src.id);
     }
};

class CompGlyphData
{
public:
     operator()(const Glyph& s1, const Glyph& s2)
     {
          return (s1.data < s2.data);
     }
};

class CompGlyphVals
{
public:
     operator()(const Glyph& s1, const Glyph& s2)
     {
          return (s1.vals < s2.vals);
     }
};

void SomeFunction(void)
{
     const int QtyGlyph = 300000;
     Glyph glyph[QtyGlyph];
     memset(&glyph,0,sizeof(Glyph) * QtyGlyph);
     srand( (unsigned)time( NULL ) );
     int id = 1;
     for (int i = 0;i < QtyGlyph;i++)
     {
          int r,d;
          do
          {
               r = rand();
               d = rand();
          }while(glyph[r].id);
          glyph[r].id = id++;
          glyph[r].data = d;
     }
     std::sort(&glyph[0],&glyph[299999],std::less<Glyph>());
     std::sort(&glyph[0],&glyph[299999],CompGlyphData());
     std::sort(&glyph[0],&glyph[299999],CompGlyphVals());
}
0
 
LVL 1

Author Comment

by:Amor_
ID: 6285775
Axter, I use right stl::sort function, but I forget to add that I overloaded operator [] and *, that it returns the iterator.
My struct also has operator <.
The sort function gets two elements (Taken from MSDN):
template<class RanIt>
    void sort(RanIt first, RanIt last);

The function works well and I .


Parts of array already sorted, I should only to find the way to merge them into array itself, not using additional space.
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285788
In order to use the STL sort function you either need to have an operator<() in the class/struct or you need to use a Pred class.

In order to use std::sort with out the third argument, you need to use the operator<() in your class/struct.

Your struct doesn't have one, but in the above code, I added this function.  With the above Glyph class you can do std::sort(&glyph[0],&glyph[299999]);
Notice that you need to use "&" with it.
You don't need "&" if your variable is a pointer varialbe.  Which is what you're using in the CGlyphContainer class.
0
 
LVL 1

Author Comment

by:Amor_
ID: 6285799
template <class Glyph, class Header=SGlyphArrayHeader>
class CGlyphContainer
{
public:    
typedef Glyph*               iterator;

void sort() {
     std::sort(begin(),end());
}

iterator begin() {
       return m_data;
}

iterator end() {
     return m_data+m_data->m_size;;
}

0
 
LVL 30

Expert Comment

by:Axter
ID: 6285820
>>find the way to merge them into array itself, not using
>>additional space.

Did you try using the STL merge function?
Example:
std::vector<Glyph> MergeGlyph;
std::merge(&glyph[0],&glyph[10000],&glyph[50000],&glyph[10000],MergeGlyph.begin());
0
 
LVL 1

Author Comment

by:Amor_
ID: 6285821
Axter, Im sure that I use stl::sort right.
I'm looking for advise about idea to write new sort for my specific problem.
Stl sort is good, but I want to improve the performance
0
 
LVL 1

Author Comment

by:Amor_
ID: 6285826
I tried use stl::merge.
The problem is that it needs additional space, and it means that I should allocate and deallocate about 3 mb space.
And the performance is very very bad in this case :(
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285887
>>Axter, Im sure that I use stl::sort right.
>>I'm looking for advise about idea to write new sort for
>>my specific problem.
The way you're wording this, is really confusing, because it really is not a sort() problem.  It's a merge problem.

So if I'm interpreting you correctly, what you really want is a better way to merge your data which has already been sorted.

Are you programming this in windows?
If so, you can try using MapView functions for additional space.
What is the target OS?
0
 
LVL 30

Expert Comment

by:Axter
ID: 6285912
>>it means that I should allocate and deallocate about
>>3 mb space.
I really don't see any other way around this, unless you use MapView/shm_ functions or harddrive space.

If you have three 1000 elements and you want to merge all three, you have to allocate 3000 elements space to do it.
There's no way around that, unless the three blocks are in consecutive order to begin with.
0
 
LVL 1

Expert Comment

by:LDC
ID: 6286061
You can allocate only minimal memory, but you may be much slower if you swap elements one by one (hardly any memory needed).
Can't you use a 100-element buffer? That way you could allocate only say 400 bytes and use this buffer to merge:
Swapping 1Mo would mean moving 400 bytes around 2500 times, which would probably be better than moving elements one at a time.
Again, did you try a poor o(N2) in-place sort algorithm? That may be faster in your case than all the O(NLogN) sort algorithms.
0
 
LVL 2

Expert Comment

by:jonnin
ID: 6286334
Did you try shell sort? it -> O(n) as you get closer to sorted. Worst is O(n^4/3) and average is O(n^7/6). It does not need extra memory, recursion, or any of that.  I often use it on systems that cannot afford recursion and do not have the stl (c only).  On a 1 ghz machine, for 7 million integers, here are some results for it:

all entries the same : 2.5 sec
sorted order, unique numbers: 2.5 sec
reverse sorted order: 3.5 sec
from here, it -> 7.1 seconds as the numbers go to unique and fully random order.  

With the pockets, I would guess similar data would fall into the 4-5 second range on this machine, data,  and my code.

Email me (jonnin@vol.com) if you want a highly optimized version.

The stl sort will beat this on many implementations, for many data sets, but for an in place sort with no recursion and fast time, this should do very well (besides, its only about 6 lines of code)...

0
 
LVL 1

Expert Comment

by:LDC
ID: 6286451
I agree with jonnin. Didn't know shellsort, but that's the idea I tried to talk about. Try it, it should be fast for you.
0
 
LVL 2

Expert Comment

by:jonnin
ID: 6286622
Here is the code, for anyone who wants it, sorry about the mess; its C-- and as tweaked as I could make it.
I hope this shows up, sometimes copy and paste do not work...
And watch that size/2 if you play with the sequence, its a long story...
 
#ifndef jss
#define jss

typedef int stype ; //whatever

void ssort(stype *list , const int size);
void ssort(stype *list , const int size)
{
     stype temp;
     int i, j, dx=0;          
     int harr[] =
     {
         0, /*terminal condition for sort     */          
         
1,
10,
53,
105,
542,
1047,
6239,
16256,
56720,
134096,
579823,
1000000,
5430201,
999999999 /*terminal condition for find index          */
     };    

     while(size/2 > harr[++dx]); /*get index in sequence, +1   */
     
     for(dx--;dx;dx--)  /*while not 0 in sequence*/
     {
     const  int hdx = harr[dx]; /*saves some access time! */
          for( i = hdx, temp = list[i]; i< size; temp = list[++i])          
          {
               for(j = i; (j >= hdx) && (list[j-hdx] > temp);
               list[j] = list[j-hdx], j -= hdx);          
               list[j] = temp;
          }              
     }
}
#endif
0
 

Expert Comment

by:BadMoodGuy
ID: 6287312
I would recommend trying a heap sort.  It is in-place, non-recursive, and O(n log n) with a worst case equal to average case.  Most importantly, it works great when large "chunks" of the array are already sorted.

Here is an example in Java:

public class Sort
{
  public static final void heapSort (Comparable[] a, int n) {
    if (n < 2) return;
    for (int i = (n/2)-1; i >= 0; i--) {
      demote (a, n, i);
    }
    for (int i = n-1; i > 0; i--) {
      swap (a, i, 0);
      demote (a, i, 0);
    }
  }

  private static void demote (Comparable[] heap, int size, int parent) {
    Comparable item = heap[parent];
    int child;
    while ((child = (2*parent)+1) < size) {
      if (child+1 < size && heap[child].compareTo (heap[child+1]) < 0)
        ++child;
      if (item.compareTo (heap[child]) < 0) {
        heap[parent] = heap[child];
        parent = child;
      } else
        break;
    }
    heap[parent] = item;
  }

  private static void swap (Object[] a, int i, int j) {
    Object tmp = a[i];
    a[i] = a[j];
    a[j] = tmp;
  }
}


You'll want to optimize it to your particular implementation when you code it in C++.

0
 
LVL 1

Author Comment

by:Amor_
ID: 6288580
Hi.
I tried to use merge, quick, heap and shell sorts and still stl::sort is in 3 times faster.

Maybe someone has an idea of algorithm according to my needs ?
0
 
LVL 2

Expert Comment

by:jonnin
ID: 6289451
You are not going to beat the stl sort, unless you are willing to spend a LOT of time at this. It is (likely) an assembler optimized version of intro sort or quicksort.  It probably recurses so far and then switches to inserion to clean up.  If you want to beat it using your own code, you will have to become an expert on the subject AND implement the thing in optimal assembler code for your platform.  Data type may also make a difference.  If you can't use the stl, you are going to have to live with the best you can get from the posted methods...

What were the times on the data for the different sorts?
Did you turn on all optimizations and turn off all debugging?  This will affect the coded ones but not the stl one (or not as much).  

something to try:
Stop quicksort at 10% left to recurse and use insertion or shell on the whole thing from there. (my shell code, with just 0,1,999999999999 in the harr array is insertion sort).

Its much faster, by 10% or so, still not a 3x speedup...
 
some reasons:
function calls are EXPENSIVE
so your recursive sorts will never be as fast as an iterative version of these algorithms. And if you call an external swap or helper function, its over.

jumps are expensive. Each loop costs a jump, and each conditional does too.  

Condtions are expensive, because of the jump and data shuffle to get the things to compare.

Swaps are also expensive. Try the xor swap for ints, if that is the data type.  

These are integers? if so, look up bucket sort. it is O(n) but costs a LOT of memory.  See time-space tradeoff theory.
Basically, you place numbers in arrays based on their most sig digit, then again on next most sig digit, ...
then put back.  Humans sort this way, for example sort alpha by a-z piles, the sort the a pile, the b pile, ...
This one MAY beat the stl, but it will depend on how you code and what you know about the data.



But all this aside, go with the stl sort if its fastest and you can use it.  You will save a lot of time, and it will be very much the law of diminishing returns even if you beat its time consistently.  If you can't use it, live with one of these methods or get into some assembler coding.  You can cut the time on most code by 50% or more using assembler, but it will be hard work.  Do not use inline asm for speed; make an obj from your assembler and link it in using extern.  Inline asm can be slower as many compilers do a lot of safety pushing and popping before allowing the asm code.  With any luck, you may find the asm code on the net, I did not look (and it wouldnt help me, different chips same code issues)...




0
 
LVL 1

Author Comment

by:Amor_
ID: 6289497
I didn't need to use asm code, I use all optimizations of MS Studio and I hope it is good enough.
My point is that stl::sort is good, the best that I found but it is still a general algorithm, and I think that in my case I can write better algorithm for this specific purpose.
stl time is 150 ms
others about 400-500 ms
size of array is ~200,000-300,000 elements
each element is 8 byte

array has large sorted chunks about 1000-20,000 elements.
0
 
LVL 1

Author Comment

by:Amor_
ID: 6289499
I didn't need to use asm code, I use all optimizations of MS Studio and I hope it is good enough.
My point is that stl::sort is good, the best that I found but it is still a general algorithm, and I think that in my case I can write better algorithm for this specific purpose.
stl time is 150 ms
others about 400-500 ms
size of array is ~200,000-300,000 elements
each element is 8 byte

array has large sorted chunks about 1000-20,000 elements.

Where can I find merge algorithm without additional space, described in Cormack ?
0
 
LVL 2

Expert Comment

by:jonnin
ID: 6290229
I gave it bucket a try, here are some results:

For the 7 million random integer test, from -100 to 100 in value, this took only 0.3 seconds! Moving up to +- 10000, no time increase.  

+-100k = 1.2 seconds, the temp array is getting big...
+-1 million takes 2.5 seconds.

stl was @4 max seconds for fully random, decreases for any ordering / repeated values.

still, this will blow the stl away for ints if your values will fit, maybe you can do something with it...


void bsort(int * a, int size, int pmax);
// pmax is the abs val of the largest int in the data...
void bsort(int * a, int size, int pmax)
{
   /*bucket sort for data of  -pmax ... pmax valued ints*/
  int x, dx;
  int *record = (int*) malloc(sizeof(int) * pmax *2);
  memset(record,0, pmax*2*sizeof(int));
  for(x = 0; x < size; x++)
  {
      record[a[x] + pmax]+=1;
  }

  dx = 0;
x = 0;
 while(x < pmax*2)
  {
     if(record[x] != 0)
     {
    a[dx] = x - pmax;
     dx ++;
     record[x] --;
     }
     else
          x++;
  }
 free(record);
}

ps I think this is bug free, but its only 10 min work, so check me. It sorts (checked manually) small arrays, and a loop that prints an error if array[x] > array[x+1] was used to check the large arrays (it never printed, so I guess its ok ...)



0
 
LVL 2

Expert Comment

by:jonnin
ID: 6290232
I gave it bucket a try, here are some results:

For the 7 million random integer test, from -100 to 100 in value, this took only 0.3 seconds! Moving up to +- 10000, no time increase.  

+-100k = 1.2 seconds, the temp array is getting big...
+-1 million takes 2.5 seconds.

stl was @4 max seconds for fully random, decreases for any ordering / repeated values.

still, this will blow the stl away for ints if your values will fit, maybe you can do something with it...


void bsort(int * a, int size, int pmax);
// pmax is the abs val of the largest int in the data...
void bsort(int * a, int size, int pmax)
{
   /*bucket sort for data of  -pmax ... pmax valued ints*/
  int x, dx;
  int *record = (int*) malloc(sizeof(int) * pmax *2);
  memset(record,0, pmax*2*sizeof(int));
  for(x = 0; x < size; x++)
  {
      record[a[x] + pmax]+=1;
  }

  dx = 0;
x = 0;
 while(x < pmax*2)
  {
     if(record[x] != 0)
     {
    a[dx] = x - pmax;
     dx ++;
     record[x] --;
     }
     else
          x++;
  }
 free(record);
}

ps I think this is bug free, but its only 10 min work, so check me. It sorts (checked manually) small arrays, and a loop that prints an error if array[x] > array[x+1] was used to check the large arrays (it never printed, so I guess its ok ...)



0
 
LVL 2

Expert Comment

by:jonnin
ID: 6290238
I hate it when that happens, I feel like saying pizza pizza or something. Sorry, don't know why it double posts...
0
 
LVL 11

Expert Comment

by:griessh
ID: 6828283
I think you forgot this question. I will ask Community Support to close it unless you finalize it within 7 days. Unless there is objection or further activity,  I will suggest to refund the points and PAQ at zero points since nobody had a satisfying answer for you.

The link to the Community Support area is: http://www.experts-exchange.com/jsp/qList.jsp?ta=commspt

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!
======
Werner
0
 
LVL 1

Accepted Solution

by:
Computer101 earned 0 total points
ID: 6845961
Points reduced to 0 and placed in PAQ.

Thank you
E-E Moderator
0
 

Expert Comment

by:pad_anu
ID: 13892302
hey
                    for merging u can use polyphase merging using replacement selection algorithm
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now