Sort/Merge Algorithm

ASKER

Hi
I know those algorithms and the best is stl sort.
But I have more specific problem - I have a very big array (about 300,000 elements) that contains about 100 huge sorted blocks and I am looking for the best sort algorithm.
The simple mergesort is not good because it requires additional allocation and deallocation ~1mb space and many read/write actions.

I recommend that you use an STL vector class.

Example:

#include <list>
#include <string>
#include <iostream>

class Students
{
public:
Students():m_nGrade(0), m_szFirstName(""), m_szLastName(""){};
Students(int Grade, std::string FirstName, std::string LastName):m_nGrade(Grade), m_szFirstName(FirstName), m_szLastName(LastName)
{
};
int m_nGrade;
std::string m_szFirstName;
std::string m_szLastName;
bool operator <(const Students& src) const
{
if (m_nGrade != src.m_nGrade) return (m_nGrade < src.m_nGrade);
if (strcmp(m_szLastName.c_str(),src.m_szLastName.c_str()))
{
return (strcmp(m_szLastName.c_str(),src.m_szLastName.c_str()) < 0);
}
if (strcmp(m_szFirstName.c_str(),src.m_szFirstName.c_str()))
{
return (strcmp(m_szFirstName.c_str(),src.m_szFirstName.c_str()) < 0);
}
return false;
}
};

int main(int argc, char* argv[])
{
std::list<Students> students;
students.push_back(Students(10,"David","Maisonave"));
students.push_back(Students(11,"Bob","Black"));
students.push_back(Students(12,"David","Hancock"));
students.push_back(Students(11,"Bill","Gate"));
students.push_back(Students(11,"John","Brown"));
students.push_back(Students(12,"Ann","George"));
students.push_back(Students(10,"Joe","Maisonave"));
students.push_back(Students(11,"Jill","White"));
students.push_back(Students(12,"Sam","Gate"));
students.push_back(Students(10,"Paul","White"));
students.push_back(Students(11,"Betty","Black"));

students.sort();

std::cout << "Sudent List:" << std::endl;
for(std::list<Students>::iterator i = students.begin();i!=students.end();i++)
{
std::cout << "Grade " << i->m_nGrade << " Name: " << i->m_szFirstName
<< " " << i->m_szLastName << std::endl;
}
return 0;
}

ASKER

Axter, thanks.
I use stl sort, but it is a general algorithm and I try to optimize it, according to my array features.

>>I use stl sort, but it is a general algorithm and I try
>>to optimize it, according to my array features

I don't understand. Are you saying you ARE trying to optimize it?

Is there a reason why the data is not sorted as it comes in?
Could you give more information about your 300,000 records?

smitty1276

Just because the stl sort() is the best for one thing doesn't mean its the best sort for another thing. Depending on the number of elements, etc., different sorts are more efficient for different things.

They wouldn't teach you all of those sorts if one was ALWAYS better than the others.

In any case... you never specified how you are wanting to sort it. If your array already has sorted blocks then what are you wanting to sort? If you want the whole array sorted, why don't you just sort it... what's up with the merge thing? Is it an array of sorted arrays or is it actually one array?

ASKER

Hi.
I have an array with about 300,000 elements.
I get the elements in such order, that the array actually consists about 100-500 huge sorted blocks.

I need to sort whole array. I tried some kinds of sorts (stl::sort, mergesort, quicksort) and found that the best is stl sort.

The problem is that it is still slow. I want to write sort algorithm for this specific problem, which will be faster and ask your advise, which way can be better ?

Please define slow. How long is it taking, and how big is each element?

What are you using to store the sorted blocks?

Are you using one of the STL containers? (list, vector, or deque)??

Can you show some code?

It's hard to give an optimize solution with out knowing the specifics.
Please give as much information as you can.

ASKER

Each element is a struct, 8 byte.
I have a class, which wraps the array (continuous block of memory) of those elements, so I can use stl functions.

The stl sort shows about 200 ms, but because it is a general algorithm, I think that I can improve performance and write my own.

So you're saying it takes 200 ms for it to sort a 300,000 elements?

Can you show the data structure, and the current code you're using to sort it?

ASKER

Axter, the code contains many irellevant parts, but actually I have an array of elements and implementation of iterators.

If I look in the array, I see continuous blocks of sorted elements.
e.g.

1 3 4 5 6 7 ... 2 5 7 9 ... 3 6 7 9 ... etc.

ASKER

Data Structure:

class CGlyphContainer
{
public:
Glyph* m_data;

iterator begin();
iterator end();
}

struct Glyph {
int id;
byte vals;
byte flags;
int data;
}

sorting function:
stl::sort()

Please show the exact sort code

So let me see if I understand this.
You want to merge multiple blocks into one sorted block.

Is that right?

sedgwick has already posted some general sorting algorithms.

But your reply to his post seem to indicate that you're having a problem merging the multiple blocks.

It is not clear to me what you're looking for.

If you need the multiple blocks merge and sorted, and the merge sort is no good, then I suggest that you first combined the multiple blocks, and then sort the merge data.

If you need to optimize your sort code, then you need to post the code you're currently using to sort your blocks.

All the information is rellevant for any optimization.

The following is an example of three blocks of data being merged into one container, and then sorted.

int main(int argc, char* argv[])
{
std::vector<Students> Elementry;
Elementry.push_back(Students(3,"David","Maisonave"));
Elementry.push_back(Students(5,"Bob","Black"));
Elementry.push_back(Students(4,"David","Hancock"));
Elementry.push_back(Students(5,"Bill","Gate"));
Elementry.push_back(Students(5,"John","Brown"));
Elementry.push_back(Students(4,"Ann","George"));
Elementry.push_back(Students(3,"Joe","Maisonave"));
Elementry.push_back(Students(5,"Jill","White"));
Elementry.push_back(Students(4,"Sam","Gate"));
Elementry.push_back(Students(3,"Paul","White"));
Elementry.push_back(Students(5,"Betty","Black"));

std::vector<Students> Middle;
Middle.push_back(Students(8,"David","Regan"));
Middle.push_back(Students(7,"Bob","Jones"));
Middle.push_back(Students(9,"David","Shoe"));
Middle.push_back(Students(7,"Bill","Black"));
Middle.push_back(Students(7,"John","Black"));
Middle.push_back(Students(9,"Ann","Black"));
Middle.push_back(Students(8,"Joe","Regan"));
Middle.push_back(Students(7,"Jill","Maisonave"));
Middle.push_back(Students(9,"Sam","Black"));
Middle.push_back(Students(8,"Paul","Maisonave"));
Middle.push_back(Students(7,"Betty","Jones"));

std::vector<Students> HighSchool;
HighSchool.push_back(Students(10,"David","Black"));
HighSchool.push_back(Students(11,"Bob","Black"));
HighSchool.push_back(Students(12,"David","Regan"));
HighSchool.push_back(Students(11,"Bill","Love"));
HighSchool.push_back(Students(11,"John","Brown"));
HighSchool.push_back(Students(12,"Ann","Love"));
HighSchool.push_back(Students(10,"Joe","Hancock"));
HighSchool.push_back(Students(11,"Jill","Black"));
HighSchool.push_back(Students(12,"Sam","Love"));
HighSchool.push_back(Students(10,"Paul","Black"));
HighSchool.push_back(Students(11,"Betty","Black"));

std::vector<Students> Schools;
Schools.insert(Schools.end(), Elementry.begin(),Elementry.end());
Schools.insert(Schools.end(), Middle.begin(),Middle.end());
Schools.insert(Schools.end(), HighSchool.begin(),HighSchool.end());

std::sort(Schools.begin(),Schools.end(),std::less<Students>());

std::cout << "Sudent List:" << std::endl;
for(std::vector<Students>::iterator i = Schools.begin();i!=Schools.end();i++)
{
std::cout << "Grade " << i->m_nGrade << " Name: " << i->m_szFirstName
<< " " << i->m_szLastName << std::endl;
}
return 0;
}

This code uses the previously posted Students class.

ASKER

I will try to explain myself:
I should to sort a huge array, which is unsorted.
But big parts of this array are already sorted (sub-arrays):
e.g. A[0..10,000] - sorted, A[10,001-50,000] - sorted, A[50,001-300,000] - sorted.

I use now stl::sort(A[0],A[300,000]) but I think that if I will use the information that subparts of array are already sorted - I can write better sort function.

I simply tried to merge sorted parts, but it takes a lot of times, because allocation of huge memory (for temporary use) and copying elements.

If I had a merge algorithm, which don`t require additional space - I think the problem was solved.

Or maybe you can suggest another solution ?

LDC

How did you merge?

Can you use something like:
merge(sortedBlock1,sortedBlock2)
where:
You keep a counter of the lowest sortedBlock2 value, you iterate on sortedBlock1, when you reach that value, you swap all the sortedBlock2 contents lower than sortedBlock1:
Say you have values:
111222444555771223346
1 is smallest of block 2:
swap:
111122444555772223346: You still have 2 sorted arrays
2 is smallest of block2:
swap:
111122222555774443346: 2 is not sorted.
3 is smallest of block2 and you can find it fast (since block2 is actually 2 sorted blocks again and you know where the cut is):
111122222335774445546: Repeat:
111122222334445775546: Former Block1 contains all smallest elements.
Continue:
111122222334444775556
111122222334444555776 // note you should copy from the end here or just the minimum size of arrays: 2 sevens, 3 fives: move two elements only.
111122222334444555677

This is linear in number of values (you can meet a given value only twice in right hand block), so O(n), needs a good swap for memory but you should allocate it once and for all and then memcopy the lengths you need, so you take only one memory allocation if you can have a good shot at buffer size, and otherwise merge in place. I may be mistaken for big arrays since the right hand side seems to lose its sorted character for a while, that may be pathologic.

I think it is one of the regular O(N2) sort algorithms, but in your case it would be O(n) because your array is already so much sorted.
Did you try the "bad" algorithms? Because coding the above just to test if you gain a few ms would be nightmarish...

ASKER

The merge that I use - doesn't help anyway.
If I allocate additional space for this and copying the elements there - the performance is in 100 times worse.

Maybe there are exists merge algorithm by swapping elements ?

>>But big parts of this array are already sorted (sub-arrays):

So are you saying that all you have to do is sort the subsections?
If so, you might be able to optimize your code by sorting the subsections.

>>stl::sort(A[0],A[300,000])

That is not an STL sort function. First of all, an STL sort function has an std namespace "std::sort()"
Second, an STL sort function takes three arguments.
So it should look something like the following:
std::sort(&glyph[0],&glyph[299999],std::less<Glyph>());

You would also need a LessThen operator "operator <()"
bool operator <(const Glyph& src) const
{
//some code here
return ???;
}

Since your sort has the wrong namespace, incorrect quantity of arguments, and it's not getting passed a pointer, I don't think you're really using the STL sort function.

You should check to see if there's another sort function within your code. You might be using a different sort function, and think you're really using the STL sort function.

If you were able to post more of your code, we would be able to tell what was really happening.

The following is an example of a STL sort function:

struct Glyph {
int id;
byte vals;
byte flags;
int data;
bool operator <(const Glyph& src) const
{//default operator
return (id < src.id);
}
};

class CompGlyphData
{
public:
operator()(const Glyph& s1, const Glyph& s2)
{
return (s1.data < s2.data);
}
};

class CompGlyphVals
{
public:
operator()(const Glyph& s1, const Glyph& s2)
{
return (s1.vals < s2.vals);
}
};

void SomeFunction(void)
{
const int QtyGlyph = 300000;
Glyph glyph[QtyGlyph];
memset(&glyph,0,sizeof(Glyph) * QtyGlyph);
srand( (unsigned)time( NULL ) );
int id = 1;
for (int i = 0;i < QtyGlyph;i++)
{
int r,d;
do
{
r = rand();
d = rand();
}while(glyph[r].id);
glyph[r].id = id++;
glyph[r].data = d;
}
std::sort(&glyph[0],&glyph[299999],std::less<Glyph>());
std::sort(&glyph[0],&glyph[299999],CompGlyphData());
std::sort(&glyph[0],&glyph[299999],CompGlyphVals());
}

ASKER

Axter, I use right stl::sort function, but I forget to add that I overloaded operator [] and *, that it returns the iterator.
My struct also has operator <.
The sort function gets two elements (Taken from MSDN):
template<class RanIt>
void sort(RanIt first, RanIt last);

The function works well and I .

Parts of array already sorted, I should only to find the way to merge them into array itself, not using additional space.

In order to use the STL sort function you either need to have an operator<() in the class/struct or you need to use a Pred class.

In order to use std::sort with out the third argument, you need to use the operator<() in your class/struct.

Your struct doesn't have one, but in the above code, I added this function. With the above Glyph class you can do std::sort(&glyph[0],&glyph[299999]);
Notice that you need to use "&" with it.
You don't need "&" if your variable is a pointer varialbe. Which is what you're using in the CGlyphContainer class.

ASKER

template <class Glyph, class Header=SGlyphArrayHeader>
class CGlyphContainer
{
public:
typedef Glyph* iterator;

void sort() {
std::sort(begin(),end());
}

iterator begin() {
return m_data;
}

iterator end() {
return m_data+m_data->m_size;;
}

>>find the way to merge them into array itself, not using
>>additional space.

Did you try using the STL merge function?
Example:
std::vector<Glyph> MergeGlyph;
std::merge(&glyph[0],&glyph[10000],&glyph[50000],&glyph[10000],MergeGlyph.begin());

ASKER

Axter, Im sure that I use stl::sort right.
I'm looking for advise about idea to write new sort for my specific problem.
Stl sort is good, but I want to improve the performance

ASKER

I tried use stl::merge.
The problem is that it needs additional space, and it means that I should allocate and deallocate about 3 mb space.
And the performance is very very bad in this case :(

>>Axter, Im sure that I use stl::sort right.
>>I'm looking for advise about idea to write new sort for
>>my specific problem.
The way you're wording this, is really confusing, because it really is not a sort() problem. It's a merge problem.

So if I'm interpreting you correctly, what you really want is a better way to merge your data which has already been sorted.

Are you programming this in windows?
If so, you can try using MapView functions for additional space.
What is the target OS?

>>it means that I should allocate and deallocate about
>>3 mb space.
I really don't see any other way around this, unless you use MapView/shm_ functions or harddrive space.

If you have three 1000 elements and you want to merge all three, you have to allocate 3000 elements space to do it.
There's no way around that, unless the three blocks are in consecutive order to begin with.

LDC

You can allocate only minimal memory, but you may be much slower if you swap elements one by one (hardly any memory needed).
Can't you use a 100-element buffer? That way you could allocate only say 400 bytes and use this buffer to merge:
Swapping 1Mo would mean moving 400 bytes around 2500 times, which would probably be better than moving elements one at a time.
Again, did you try a poor o(N2) in-place sort algorithm? That may be faster in your case than all the O(NLogN) sort algorithms.

Did you try shell sort? it -> O(n) as you get closer to sorted. Worst is O(n^4/3) and average is O(n^7/6). It does not need extra memory, recursion, or any of that. I often use it on systems that cannot afford recursion and do not have the stl (c only). On a 1 ghz machine, for 7 million integers, here are some results for it:

all entries the same : 2.5 sec
sorted order, unique numbers: 2.5 sec
reverse sorted order: 3.5 sec
from here, it -> 7.1 seconds as the numbers go to unique and fully random order.

With the pockets, I would guess similar data would fall into the 4-5 second range on this machine, data, and my code.

Email me (jonnin@vol.com) if you want a highly optimized version.

The stl sort will beat this on many implementations, for many data sets, but for an in place sort with no recursion and fast time, this should do very well (besides, its only about 6 lines of code)...

LDC

I agree with jonnin. Didn't know shellsort, but that's the idea I tried to talk about. Try it, it should be fast for you.

Here is the code, for anyone who wants it, sorry about the mess; its C-- and as tweaked as I could make it.
I hope this shows up, sometimes copy and paste do not work...
And watch that size/2 if you play with the sequence, its a long story...

#ifndef jss
#define jss

typedef int stype ; //whatever

void ssort(stype *list , const int size);
void ssort(stype *list , const int size)
{
stype temp;
int i, j, dx=0;
int harr[] =
{
0, /*terminal condition for sort */

1,
10,
53,
105,
542,
1047,
6239,
16256,
56720,
134096,
579823,
1000000,
5430201,
999999999 /*terminal condition for find index */
};

while(size/2 > harr[++dx]); /*get index in sequence, +1 */

for(dx--;dx;dx--) /*while not 0 in sequence*/
{
const int hdx = harr[dx]; /*saves some access time! */
for( i = hdx, temp = list[i]; i< size; temp = list[++i])
{
for(j = i; (j >= hdx) && (list[j-hdx] > temp);
list[j] = list[j-hdx], j -= hdx);
list[j] = temp;
}
}
}
#endif

BadMoodGuy

I would recommend trying a heap sort. It is in-place, non-recursive, and O(n log n) with a worst case equal to average case. Most importantly, it works great when large "chunks" of the array are already sorted.

Here is an example in Java:

public class Sort
{
public static final void heapSort (Comparable[] a, int n) {
if (n < 2) return;
for (int i = (n/2)-1; i >= 0; i--) {
demote (a, n, i);
}
for (int i = n-1; i > 0; i--) {
swap (a, i, 0);
demote (a, i, 0);
}
}

private static void demote (Comparable[] heap, int size, int parent) {
Comparable item = heap[parent];
int child;
while ((child = (2*parent)+1) < size) {
if (child+1 < size && heap[child].compareTo (heap[child+1]) < 0)
++child;
if (item.compareTo (heap[child]) < 0) {
heap[parent] = heap[child];
parent = child;
} else
break;
}
heap[parent] = item;
}

private static void swap (Object[] a, int i, int j) {
Object tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
}

You'll want to optimize it to your particular implementation when you code it in C++.

ASKER

Hi.
I tried to use merge, quick, heap and shell sorts and still stl::sort is in 3 times faster.

Maybe someone has an idea of algorithm according to my needs ?

You are not going to beat the stl sort, unless you are willing to spend a LOT of time at this. It is (likely) an assembler optimized version of intro sort or quicksort. It probably recurses so far and then switches to inserion to clean up. If you want to beat it using your own code, you will have to become an expert on the subject AND implement the thing in optimal assembler code for your platform. Data type may also make a difference. If you can't use the stl, you are going to have to live with the best you can get from the posted methods...

What were the times on the data for the different sorts?
Did you turn on all optimizations and turn off all debugging? This will affect the coded ones but not the stl one (or not as much).

something to try:
Stop quicksort at 10% left to recurse and use insertion or shell on the whole thing from there. (my shell code, with just 0,1,999999999999 in the harr array is insertion sort).

Its much faster, by 10% or so, still not a 3x speedup...

some reasons:
function calls are EXPENSIVE
so your recursive sorts will never be as fast as an iterative version of these algorithms. And if you call an external swap or helper function, its over.

jumps are expensive. Each loop costs a jump, and each conditional does too.

Condtions are expensive, because of the jump and data shuffle to get the things to compare.

Swaps are also expensive. Try the xor swap for ints, if that is the data type.

These are integers? if so, look up bucket sort. it is O(n) but costs a LOT of memory. See time-space tradeoff theory.
Basically, you place numbers in arrays based on their most sig digit, then again on next most sig digit, ...
then put back. Humans sort this way, for example sort alpha by a-z piles, the sort the a pile, the b pile, ...
This one MAY beat the stl, but it will depend on how you code and what you know about the data.

But all this aside, go with the stl sort if its fastest and you can use it. You will save a lot of time, and it will be very much the law of diminishing returns even if you beat its time consistently. If you can't use it, live with one of these methods or get into some assembler coding. You can cut the time on most code by 50% or more using assembler, but it will be hard work. Do not use inline asm for speed; make an obj from your assembler and link it in using extern. Inline asm can be slower as many compilers do a lot of safety pushing and popping before allowing the asm code. With any luck, you may find the asm code on the net, I did not look (and it wouldnt help me, different chips same code issues)...

ASKER

I didn't need to use asm code, I use all optimizations of MS Studio and I hope it is good enough.
My point is that stl::sort is good, the best that I found but it is still a general algorithm, and I think that in my case I can write better algorithm for this specific purpose.
stl time is 150 ms
others about 400-500 ms
size of array is ~200,000-300,000 elements
each element is 8 byte

array has large sorted chunks about 1000-20,000 elements.

ASKER

I didn't need to use asm code, I use all optimizations of MS Studio and I hope it is good enough.
My point is that stl::sort is good, the best that I found but it is still a general algorithm, and I think that in my case I can write better algorithm for this specific purpose.
stl time is 150 ms
others about 400-500 ms
size of array is ~200,000-300,000 elements
each element is 8 byte

array has large sorted chunks about 1000-20,000 elements.

Where can I find merge algorithm without additional space, described in Cormack ?

I gave it bucket a try, here are some results:

For the 7 million random integer test, from -100 to 100 in value, this took only 0.3 seconds! Moving up to +- 10000, no time increase.

+-100k = 1.2 seconds, the temp array is getting big...
+-1 million takes 2.5 seconds.

stl was @4 max seconds for fully random, decreases for any ordering / repeated values.

still, this will blow the stl away for ints if your values will fit, maybe you can do something with it...

void bsort(int * a, int size, int pmax);
// pmax is the abs val of the largest int in the data...
void bsort(int * a, int size, int pmax)
{
/*bucket sort for data of -pmax ... pmax valued ints*/
int x, dx;
int *record = (int*) malloc(sizeof(int) * pmax *2);
memset(record,0, pmax*2*sizeof(int));
for(x = 0; x < size; x++)
{
record[a[x] + pmax]+=1;
}

dx = 0;
x = 0;
while(x < pmax*2)
{
if(record[x] != 0)
{
a[dx] = x - pmax;
dx ++;
record[x] --;
}
else
x++;
}
free(record);
}

ps I think this is bug free, but its only 10 min work, so check me. It sorts (checked manually) small arrays, and a loop that prints an error if array[x] > array[x+1] was used to check the large arrays (it never printed, so I guess its ok ...)

I gave it bucket a try, here are some results:

For the 7 million random integer test, from -100 to 100 in value, this took only 0.3 seconds! Moving up to +- 10000, no time increase.

+-100k = 1.2 seconds, the temp array is getting big...
+-1 million takes 2.5 seconds.

stl was @4 max seconds for fully random, decreases for any ordering / repeated values.

still, this will blow the stl away for ints if your values will fit, maybe you can do something with it...

void bsort(int * a, int size, int pmax);
// pmax is the abs val of the largest int in the data...
void bsort(int * a, int size, int pmax)
{
/*bucket sort for data of -pmax ... pmax valued ints*/
int x, dx;
int *record = (int*) malloc(sizeof(int) * pmax *2);
memset(record,0, pmax*2*sizeof(int));
for(x = 0; x < size; x++)
{
record[a[x] + pmax]+=1;
}

dx = 0;
x = 0;
while(x < pmax*2)
{
if(record[x] != 0)
{
a[dx] = x - pmax;
dx ++;
record[x] --;
}
else
x++;
}
free(record);
}

ps I think this is bug free, but its only 10 min work, so check me. It sorts (checked manually) small arrays, and a loop that prints an error if array[x] > array[x+1] was used to check the large arrays (it never printed, so I guess its ok ...)