asked on

SPEED CHALLENGE - SORTING

This is, i hope, to be similar to WorkShop_Alex's challenge. The aim is to write the fastest sort possible with the following procedure:

sort(arraylength,arraymaxnum:integer); where arraylength is the length of the array and the arraymaxnum the largest integer possible in the array and the population of the array using:

Item := VarArrayCreate([0, arraylength], varInteger);
//POPULATE ARRAY
Randomize;
For X:=0 to arraylength do
begin
item[x]:=random(arraymaxnum);
end;

You must initiate the sort like so:

QueryPerformanceFrequency(freq); {Get counts/second}
QueryPerformanceCounter(start); {Get string count}
sort(10, 100) //values can be changed

Your sort must have showresult; at the end to get the details

The following unit has my testing procedure (we can configure thbis if it doesn't work properley) and my first sort algorithm for you guys to try and beat ;-). Currently my sort speed is approx 350 ticks.

unit Unit1;

interface

uses
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
Dialogs;

type
TForm1 = class(TForm)
procedure FormCreate(Sender: TObject);
private
procedure sort(arraylength,arraymaxnum:integer);
procedure showresult;
public
start,stop1,freq:int64;
TotalTicks,NumASec: int64;
Ticksfor1,Timefor1: real;
end;

var
Form1: TForm1;

implementation

{$R *.dfm}

procedure tform1.showresult;
begin
QueryPerformanceCounter(Stop1); {Get end of loop2 count}
TotalTicks:= (stop1-start);
Timefor1:= TotalTicks / freq;
NumASec:= Trunc(1{second} / Timefor1);
If freq>0 then
showmessage('Sort took := '+
inttostr(TotalTicks)+
#13+' Ticks for 1 Call:= '+FloatToStrF(Ticksfor1,ffNumber, 6,3)+
#13+' Ticks per second:= '+inttostr(freq)+
#13+' Time in micro seconds (millionths) for 1 call:= '+FloatToStrF(1e6*Timefor1,ffNumber, 4,8))
else showmessage('No hardware timer available');
end;

procedure tform1.sort(arraylength, arraymaxnum:integer);
var Item:variant ;
X,y,temp:Integer;
result:string;
begin
Item := VarArrayCreate([0, arraylength], varInteger);
//POPULATE ARRAY
Randomize;
For X:=0 to arraylength do
begin
item[x]:=random(arraymaxnum);
end;

//SORT
Y:=-1;
X:=arraylength+1 ;
repeat
dec(x) ;
y:=-1;
repeat
inc(y) ;
if item[y]>item[x] then
begin
temp:=item[y];
item[y]:=item[x];
item[x]:=temp;
end;
until y>=x ;
until X<=0;
//End SORT

showresult;

For X:=0 to arraylength do
begin
result:=result + ' ' + inttostr(item[x]);
end;
showmessage(result);
end;

procedure TForm1.FormCreate(Sender: TObject);
begin
QueryPerformanceFrequency(freq); {Get counts/second}
QueryPerformanceCounter(start); {Get string count}
sort(10, 100)

end;

end.

Regards,

Hypoviax

Hypoviax

ASKER

I'll just add we should test a few ranges:

1. 0-10
2. 0-100
3. 0-1000
4. 0-10000

My results with max integer of 100 (probably grossley slow ;-) ):

1. 345
2. 7642
3. 646902
4. 127752446

Hypoviax

Wim ten Brink

Hmmm. Sorting arrays the fastest way is easily done by just using a QuickSort routine, and a simple example of this routine is found in the Classes unit, for sorting stringlists. However, this Delphi source contains a recursive version of the QuickSort routine, while it would even be faster if you use a non-recursive version. And of course the QuickSort routine in the Classes unit does a callback to a compare routine, thus allowing anything to be compared. But the additional function call would slow it down again.

I also want to note that you could measure the time it it takes to finish the procedure but this speed depends heavily on the system used to run the test. The Pentium IV 2.4 GHz at work would be a lot faster at this than my Pentium III 866 MHz at home. But if you would compare processor ticks then the clockspeed would not matter much, since it will take about the same amount of ticks on every processor. The only difference is that one processor executes ticks faster than the other... :-)

It's fun though that you use a variant for the dynamic array. It would be slightly faster by using an array of integers instead. (Less pointer-handling.)

I hope my hints provided enough information to get a real fast solution here. :-) But I'm just saying that the fastest sorting solutions are already found. Just not always implemented. (And be honest, if you only have 20 records to sort, then it doesn't matter much to save one-tenth of a millisecond by optimizing the sorting routine. Actually, with a low amount of records (1-10) the quicksort might actually be a bit slower than other solutions...

Russell Libby

Hypoviax,

I would agree with most of the comments made by Workshop_Alex, as the timing above is a little too machine dependant. Regarding the sorting aspect, given the fact that we have the highest number in the array, and provided that the highest number is relatively small (eg, less than 32768), then a radix type sort would actually be faster than a quicksort. Anyways, here are some examples of sorting routines, and also of variant array locking which will also help decrease the time required to sort the array.

Regards,
Russell

// Used for data handling
type
PIntArray = ^TIntArray;
TIntArray = Array [0..Pred(MaxInt shr 2)] of Integer;

// QuickSort - An optimal "all around" sort routine. Performs well under most, if not all
// conditions.
procedure QuickSort(Data: PIntArray; L, R: Integer);
var i, j: Integer;
p, t: Integer;
begin

repeat
i:=l;
j:=r;
p:=Data^[(l + r) shr 1];
repeat
while ((Data^[i]-p) < 0) do Inc(i);
while ((Data^[j]-p) > 0) do Dec(j);
if (i <= j) then
begin
t:=Data^[i];
Data^[i]:=Data^[j];
Data^[j]:=t;
Inc(i);
Dec(j);
end;
until (i > j);
if (l < j) then QuickSort(Data, l, j);
l:=i;
until (i >= r);

end;

// Linear type sort - Not much faster than a bubble sort, but is fine for sorting small
// amounts of data (requires less memory and uses no recursion)
procedure LinearSort(Data: PIntArray; ArrayLength: Integer);
var dwCount: Integer;
dwIndex: Integer;
dwTemp: Integer;
dwPos: Integer;
begin

for dwCount:=0 to Pred(ArrayLength) do
begin
dwPos:=dwCount;
for dwIndex:=Succ(dwCount) to Pred(ArrayLength) do
begin
if (Data^[dwIndex] < Data^[dwPos]) then dwPos:=dwIndex;
end;
if (dwPos > dwCount) then
begin
dwTemp:=Data^[dwCount];
Data^[dwCount]:=Data^[dwPos];
Data^[dwPos]:=dwTemp;
end;
end;

end;

// Radix type sort - excellent when the highest number in the array is known and is
// relatively small
procedure RadixSort(Data: PIntArray; ArrayLength, ArrayMaxNum: Integer);
var lpTemp: Array [0..32767] of Integer;
dwIndex: Integer;
dwCount: Integer;
dwTemp: Integer;
begin

// Check ArrayMaxNum
if (ArrayMaxNum < 32768) then
begin
// Clear the temp array
FillChar(lpTemp, SizeOf(lpTemp), 0);
// Count frequency
for dwIndex:=0 to Pred(ArrayLength) do Inc(lpTemp[Data^[dwIndex]]);
// Sort the array
dwIndex:=0;
for dwTemp:=0 to ArrayMaxNum do
begin
// Get frequency count for slot
dwCount:=lpTemp[dwTemp];
while (dwCount > 0) do
begin
Data^[dwIndex]:=dwTemp;
Inc(dwIndex);
Dec(dwCount);
end;
if (dwIndex = ArrayLength) then break;
end;
end;

end;

// Generic sort procedure that attempts to pick the most optimal sorting routine
procedure Sort(ArrayLength, ArrayMaxNum: Integer);
var vArray: Variant;
lpArray: PIntArray;
dwIndex: Integer;
begin

// Create variant array
vArray:=VarArrayCreate([0, Pred(ArrayLength)], varInteger);

// Lock that bad boy ;-)
lpArray:=VarArrayLock(vArray);

// Resource protection
try
// Seed random number generator
Randomize;
// Fill the array
for dwIndex:=0 to Pred(ArrayLength) do lpArray^[dwIndex]:=Random(ArrayMaxNum);
// Determine sort routine to use
if (ArrayLength < 20) then
LinearSort(lpArray, ArrayLength)
else if (ArrayMaxNum < 32768) then
// Radix sort
RadixSort(lpArray, ArrayLength, ArrayMaxNum)
else
// Quick sort
QuickSort(lpArray, 0, Pred(ArrayLength));
finally
// Unlock the variant array
VarArrayUnlock(vArray);
end;

end;

Hypoviax

ASKER

Rlibby,

Your results as on my machine (AMD AthlonXP 1500)

0-10 : 211
0-100 : 856
0-1000 : 933
0-10000 : 1573
0-100000 : 8171

Very good, much much faster than my selection sort

Any improvements (what about the merge sort...) i'd like to see. I'll provide my next sort soon to see how it compares.

Regards,

Hypoviax

gwalkeriq

Try looking here if you want a generic high-speed qsort for Delphi arrays, I know it can sort a million integers in less than 1 second. It is not the fastest sort since it uses a function call for each comparison, but it is generic, just like the qsort you find in C libraries.

http://www.javacode.net/vb/scripts/ShowCode.asp?txtCodeId=1185&lngWId=7

Hypoviax

ASKER

I'm working on a new algorithm for a sort, it would be interesting to see, even if they arn't super fast, new sorting based algorithms based on the parameters i set. Can anyone beat rllibby? I'll test any submission and post the results

Best regards,

Hypoviax

Russell Libby

There is a a trade off for for being generic...

The routines mentioned above, while generic (which allows for sorting of all kinds of data), is about 3-4 times slower than the quicksort I already coded above which targets an integer array, and about 6-7 times slower than the radix/counting sort. note: radix/counting only applies if the the maxnumber falls within a certain range, so is a more specific implementation than quicksort.

What should be gathered from all of this is that there is no one "silver bullet" for sorting, period....

Knowledge of the data, and its size must be used to determine the best sorting approach for the current task at hand. For example, an array of 15 items can be sorted faster with an insertion sort/linear sort than it can with a quicksort, due to the overhead and recursion involved with quicksort. Now, if you start talking about 1 million items, then the story changes. This is due to the BIG "O" notation that is used to describe each of these algorithms.

eg:

Quicksort = O (n log2n) // Logarithmic
Bubble sort / selection sort = O (n2) // Quadratic

We can see from above that as the number of items increases, the selection sort will keep getting exponentially worse. The numbers show this as well:

Selection Sort Auto Fit (could not tell from comments what max
number was, so hard to tell if qsort or
radix was used)
------------------------------------------------------------------------------
1. 0-10 345 211 (Linear used)
2. 0-100 7642 856 (qsort or radix)
3. 0-1000 646902 933 (qsort or radix)
4. 0-10000 127752446 1573 (qsort or radix)
5. 0-100000 not reported 8171 (qsort or radix)

Also, if the data falls into a certain limited range, etc, then we can optimize the sorting even further to use radix type / counting sorts which improves performance even more, **IF*** the range is relatively small. Again, this has to do with the Big "O" notation associated with each. Radix/Counting types sorts are linear, whereas quicksort and mergesort are logarithmic, and linear will overtake logarithmic after a certain point.

Regards,
Russell

Hypoviax

ASKER

You are right Russel,

Merge and Quicksorts are N X log2 N algorithms whereas bubble, selection sorts are n2 algorithms. Still waiting for someone to beat you Russel ;-)

Regards,

Hypoviax

Wim ten Brink

Well, Russel... You could speed up that QuickSort routine by changing it from a recursive function to a non-recursive one. Too bad I'm running a bit out of time else I'd write it... :-P

There are two minor issues, of course. First of all we can and will have duplicate values in out list. Furthermore, the original question is about sorting a variant array full variants. However, since we have a known maximum value, we might even take a different approach of sorting the data... Here's the alternative version:

function RDTSC: Int64;
asm
dw $310F {BASM doesn't support RDTSC}
end;

procedure SortingStuff;
const
arraylength = 100000;
arraymaxnum = 1000;
var
Items: Variant;
List: array of Integer;
I: Integer;
Idx: Integer;
Ticks: Int64;
begin
Items := VarArrayCreate( [ 0, arraylength ], varInteger );
//POPULATE ARRAY
Randomize;
for I := 0 to arraylength do begin
Items[ I ] := random( arraymaxnum );
end;
// Create our list.
Ticks := RDTSC;
SetLength( List, arraymaxnum );
for I := Low( List ) to High( List ) do
List[ I ] := 0;
for I := 0 to arraylength do begin
Idx := Items[ I ];
List[ Idx ] := List[ Idx ] + 1;
end;
Ticks := RDTSC - Ticks;
WriteLn( 'This solution took ', Ticks, ' processor ticks.' );
// Done sorting... Display...
for I := Low( List ) to High( List ) do begin
for Idx := 1 to List[ I ] do
WriteLn( I );
end;
end;

Yep... This sorting version looks soooooooooooooooooooo simple and is probably quite fast too. Never bothered to time it, though... It took 23657308 processor ticks on my system. This for about one hundred thousand values. Sometimes, sorting can be defeated by other interesting techniques...

But what if the arraymaxnum would be a lot larger? Well, integers are 4 bytes. The dynamic array would be 4 bytes x arraymaxnum so even with a million possible values, it would not really run out of memory. I've changed the value 1.000 into 1.000.000 and saw my solution become about 5 times slower. That isn't a lot, really, but it might be enough to justify a sorting routine...

Wim ten Brink

Btw. If you time my solution using QueryPerformanceFrequency/QueryPerformanceCounter then you'll get 0.0 ticks! :-) This is because my solution is just too fast to be measured in ticks per second. I need around 23.7 million ticks on a system that runs at 2.4 billion ticks per second. It takes about 11.4 ticks per millisecond to sort a list of 100,000 records with a maximum value of 100,000... So, basically if you want to just sort a list of integers then my method defeats all other options... ;-)

Russell Libby

Workshop_Alex,

> Well, Russel(l)... You could speed up that QuickSort routine by changing it from a recursive function to a non-recursive one. Too bad I'm running a bit out of time else I'd write it... :-P

Iv'e already done it, and it does not speed the function up by any great amount. In truth, it shaved about 15 ms off a 1 million count sort. Not worth the hassle in my opinion, but I have the code that does it for those that may be interested. I also have code for mergesort, sedgesort (qsort+insort), as well as a few others. Anyways, what I have attempted to demonstrate via code are some common examples of linear, logarthimic, and quadratic sorting functions.

> So, basically if you want to just sort a list of integers then my method defeats all other options... ;-)

If you take a look at the code I already provided (RadixSort) then you will see that its essentially the same as what you have provided above. It uses a bucketing array to count the number of items in the range of 0..MaxArrayNum, then takes those numbers and places them (in order) back in to the original array. And this (linear) method WILL defeat other solutions... to a point. Due to its linear nature, as the maxarraynum increases, so does the processing time. At a MaxArrayNum of about 1 million or so, the quicksort function becomes slightly faster than the radix/counting sort.

Now....
If you compare your method against my previous implementation (in the same context), what you will ALSO find is that your code is about 4-5 times SLOWER than what I provided.

Regards,
Russell

procedure SortingStuff2;
const
arraylength = 1000000;
arraymaxnum = 1000;
var
lpTemp: Array [0..32767] of Integer;
vArray: Variant;
lpArray: PIntArray;
dwIndex: Integer;
dwCount: Integer;
dwTemp: Integer;
st, et: LongWord;
Ticks: Int64;
begin

// Create variant array
vArray:=VarArrayCreate([0, Pred(ArrayLength)], varInteger);

// Lock that bad boy ;-)
lpArray:=VarArrayLock(vArray);

// Resource protection
try
// Seed random number generator
Randomize;
// Fill the array
for dwIndex:=0 to Pred(ArrayLength) do lpArray^[dwIndex]:=Random(ArrayMaxNum);
// Mark time
st:=GetTickCount;
Ticks := RDTSC;
// Clear the temp array
for dwIndex:=0 to 32767 do lpTemp[dwIndex]:=0;
// Count frequency
for dwIndex:=0 to Pred(ArrayLength) do Inc(lpTemp[lpArray^[dwIndex]]);
// Sort the array
dwIndex:=0;
for dwTemp:=0 to ArrayMaxNum do
begin
// Get frequency count for slot
dwCount:=lpTemp[dwTemp];
while (dwCount > 0) do
begin
lpArray^[dwIndex]:=dwTemp;
Inc(dwIndex);
Dec(dwCount);
end;
if (dwIndex = ArrayLength) then break;
end;
finally
// Unlock the variant array
VarArrayUnlock(vArray);
end;

// Mark time
Ticks := RDTSC - Ticks;
et:=GetTickCount-st;

ShowMessage(Format('This solution took %d processor ticks (%d ms).', [Ticks, et]));

end;

------------------

procedure SortingStuff1;
const
arraylength = 1000000;
arraymaxnum = 1000;
var
Items: Variant;
List: array of Integer;
I: Integer;
Idx: Integer;
st, et: LongWord;
Ticks: Int64;
begin

Items := VarArrayCreate( [ 0, arraylength ], varInteger );
//POPULATE ARRAY
Randomize;
for I := 0 to arraylength do begin
Items[ I ] := random( arraymaxnum );
end;
// Create our list.

st:=GetTickCount;
Ticks := RDTSC;
SetLength( List, arraymaxnum );
for I := Low( List ) to High( List ) do
List[ I ] := 0;
for I := 0 to arraylength do begin
Idx := Items[ I ];
List[ Idx ] := List[ Idx ] + 1;
end;
Ticks := RDTSC - Ticks;
et:=GetTickCount-st;

ShowMessage(Format('This solution took %d processor ticks (%d ms).', [Ticks, et]));

end;

SOLUTION

Wim ten Brink

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Wim ten Brink

MaxLength = 10 and MaxValue= 10.
8532 processor ticks required for Alex's solution.
189044 processor ticks required for Russell's solution.
2652 processor ticks required for Alex's second solution.
Wow...

MaxLength = 10,000 and MaxValue= 10,000.
677996 processor ticks required for Alex's solution.
651448 processor ticks required for Russell's solution.
574812 processor ticks required for Alex's second solution.
Okay...

MaxLength = 1,000,000 and MaxValue= 10,000.
29780776 processor ticks required for Alex's solution.
28292756 processor ticks required for Russell's solution.
28211212 processor ticks required for Alex's second solution.
Uhoh...

MaxLength = 10 and MaxValue= 25000.
603976 processor ticks required for Alex's solution.
363492 processor ticks required for Russell's solution.
407044 processor ticks required for Alex's second solution.
@#$%&...

Well, playing with the possible values is fun. This method is costly, though, when the value of the items can be very different. This is because the system needs to allocate more memory for the larger sort-table.

Russell Libby

To reiterate again what I have already stated:

What should be gathered from all of this is that there is no one "silver bullet" for sorting, period....

Knowledge of the data, and its size must be used to determine the best sorting approach for the current task at hand.

Regards,
Russell

aikimark

I'm sorry to have arrive late to this discussion. If not too late for the questioner, I would like to add/suggest the following:

1. copy the data to a tIntList and use the object's .Sort method that might offer some optimization

2a. reconfigure the bytes in the integer data from littleEndian (Windows/Intel format) to bigEndian (everyone else's format) and use a byte radix sort -- bottom up solution.

2b. use a variant of the American Flag sort on each byte (or two byte combinations) of the integer data, modifying this to optimize for maxvalue -- top down solution

3. for the maxvalues shown in the discussion thread, you should probably use a counting sort. You only need 4*maxvalue additional bytes for the counting array and may only need 2*maxvalue bytes, depending on the number of items you are sorting. This is an O(N) sort that is very fast.

4. dynamically create an optimized ASM object stream in memory and execute it instead of the busiest part of your Delphi loops. I read an interesting article about this method in a Delphi Informant article in the late 1990s. I think the author was a developer with N-sort.

Hypoviax

ASKER

This is what i like to see...

...raw competition. I'll test the solutions on my machine to get a controlled result (unless WorkShop_Alex you want to test). I don't mind variations in using the variant. If you need to speed up solutions you can use the array of integers instead.

Hypoviax

Russell Libby

The competition aspect of this is good, I agree...

But what I have been trying to emphasize is that the performace "testing" at this point is very subjective. For utlra small data sets a quadratic (but tightly written) sort will beat other sorts. For reasonable value ranges, a counting/radix type (linear) sort will beat others. For large amounts of data a logarithmic sort like qsort will outperform the others.

Without a clear and defined testbed, anyone can make ANY of the above sort routines look like the "best", but in fact:

1.) They are all good at certain types of sorting
2.) None of them could be considered perfect for ALL scenarios

Speaking only for myself; I believe it is more important to have the knowledge of "when" to use the different types of sorts, what works well, and when.

----

Regards,
Russell

Hypoviax

ASKER

I will use WorkShop_Alex's unit for testing as it seems this is the best way:

Results maxvalue 1000:

0-10 :

Alex Solution1 : 21938
Russel : 260745
Alex Solution2 : 10640

0-100 :

Alex Solution1 : 25513
Russel : 271152
Alex Solution2 : 12382

0-1000 :

Alex Solution1 : 43384
Russel : 291959
Alex Solution2 : 29299

0-100000 :

Alex Solution1 : 1603945
Russel : 1817554
Alex Solution2 :1547393

0-1000000 :

Alex Solution1 : 20439284
Russel : 20438833
Alex Solution2 :18273701

Hypoviax

aikimark

BTW...the article is by Jay Cole and appeared in the October 97 issue:
http://www.delphizine.com/features/1997/10/di199710jc_f/di199710jc_f.asp

aikimark

since a prior version of Nitrosort was open-sourced, you might benefit from some of their techniques

Hypoviax

ASKER

Russel,

I agree. But it would be interesting if there was a good allround sort - especially if the size and maxnum is not known. You can use the original method you posted (the combination) - if that is faster than your newer solution. Is there a better way to test?

Regards,

Hypoviax

Hypoviax

ASKER

Sorry, its a members only article aikimark,

Hypoviax

Hypoviax

ASKER

I'll just add as a comparison my simple selection sort results:

0-10 : 58274
0-100 : 3752968
0-1000 :694607400
0-10000 : still waiting....
0-100000 : not even game to try

Regards,

Hypoviax

aikimark

Even with a million as maxvalue, the counting sort only requires 4MB for its array.

Usually, the most highly optimized Quicksort algorithms revert to a less efficient sort, O(N^2), when the list size is trivial. I usually think of trivial lists of from 1 to 7 items, although I've seen this definition stretched up to 35 items.

========================
There are potential problems with measurements and comparisons. You've already identified differences in CPU speeds and processes that finish too quickly to measure. Also, Windows has a limited granularity in it's time measurements (ticks is a small unit, but still not granular enough to be considered 'atomic'. The timer units are woefully large at 55ms).

One potential solution is to use an external clock that is accurate in the nano-second range. We could use an external feed of highly accurate clock times from one of the worldwide public atomic clock feeds.

Another problem is the operating system itself. Getting accurate timings from a multi-tasking OS can be tricky at best. Windows mades this even more difficult with limited isolation and prioritization options. You might want to recompile the application as a DOS application and run it without the baggage and overhead of Windows. I'm not sure about the options under Linux, but you might be able to get decent isolation and timing data with a Kylix compilation.

aikimark

Hypoviax,

>Sorry, its a members only article aikimark,

I'm not sure I understand your statement. Aren't we all EE 'members'?!?

ASKER CERTIFIED SOLUTION

Russell Libby

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

aikimark

Russell,

Thanks for the clarification. And, as always, I defer to your superior Delphi coding skills and experience.

One last thing to suggest for a Quicksort optimization is the "middle of three" pivot selection. In this variation, three items in the list are randomly selected and the middle one is used as the pivot for that round. You can get a 14% (avg) improvement over the single-pivot versions of Quicksort.

Also, beware of bad, O(N^2), Quicksort performance when there are a large number of duplicate values. This might occur with a much larger Maxlength value compared to MaxValue. In this case, you would do well to check the current pivot value with the previous iteration's pivot point value. If there are lots of duplicate values, a special pass of the items might separate the items into three ranges (<, =, >) instead of two ranges.

One such (three range) algorithm is the terinary sort described by Jon Bentley. Jon used this algorithm to quickly and efficiently sort string items. I think it could be used to sort integer data.

========================
For future readers of this discussion thread, you might want to pick up a copy of Rod Stephens book "Ready to Run Delphi 3.0 Algorithms." There's a good section on sorting and source code on the companion CD. If you were a reader of Delphi Informant, you might have already read some of the content of this book when reading Rod's published articles.

Hypoviax

ASKER

This is my second attempt for a faster solution. It will often outperform when the maxnum is quite high:

procedure Hypoviaxsort( var Items: Variant; MaxLength, MaxValue: Integer );
var i, j : integer;
tempr : Integer;
flag : boolean;
Ticks: Int64;
begin
Ticks := RDTSC;
for i:=maxlength-1 downto 0 do
begin
tempr := items[i];
j := i+1;
flag := true;
while (j<=maxvalue) and flag do
if tempr > items[j] then
begin
items[j-1] := items[j];
j := j+1
end
else
flag := false;
items[j-1] := tempr
end ;
Ticks := RDTSC - Ticks;
WriteLn( Ticks: 18, ' processor ticks required for Hypoviax''s second solution.' );

end;

I will read through everybodys comments soon. I have run out of time today.

Best Regards,

Hypoviax

aikimark

Hypoviax,

In reading your most recent comment, I will recommend that you should replace the inner loop with a single 'block copy' (Delphi copy function or maybe the memcopy or RtlMoveMemory API calls) that will copy the contiguous block of integers one location, making room for the moved item. You will still need to do the comparison of items to find the correct position.

http://www.delphibasics.co.uk/RTL.asp?Name=Copy

===============================
For the O(N^2) algorithms, you might have to do many runs and project times for very large maxvalue or maxlength values using statistical extrapolation.

===============================
Missing from my earlier comment are several contributors to performance differences beyond clock speed:
* CPU cache size and speed
* memory speed
* bus speed
* architecture (Intel vs. AMD vs. transmeta)
* temperature -- Intel automatically slows down if it gets dangerously warm

Hypoviax

ASKER

Russel's results for the last post:

0-10: 3933
0-100:23888
0-1000:16650
0-100000:18579267
0-1000000:20363031

Alex's

0-10 :Alex Solution1 : 21938
Alex Solution2 : 10640

0-100 :Alex Solution1 : 25513
Alex Solution2 : 12382

0-1000 :Alex Solution1 : 43384
Alex Solution2 : 29299

0-100000 :Alex Solution1 : 1603945
Alex Solution2 :1547393

0-1000000 :Alex Solution1 : 20439284
Alex Solution2 :18273701

Looks like there are 2 winners!

Regards,

Hypoviax

Hypoviax

ASKER

My results for my last sort:

0-10 : 11291
0-100: 92825
0-1000: 585722
0-100000 :71662595
0-1000000: 781265785

A substantial increase in speed for me, but nothing comparable to you guys. Russel's sort seems to be the fastest for 0-10 and 0-1000 and Alex's sort (solution2) seems to be the fastest from 0-100, 0-100000 and 0-1000000.

Unless there are further posts regarding solutions (I'm happy to wait, i have plenty of time now), i will split the points up appropriately. Please tell me when you guys believe there is no more to be posted.

Best Regards,

Hypoviax

Hypoviax

ASKER

I might add, when the maxnum is 10000 it looks like my solution is the fastest from 0-10 and 0-100 ;-) (other than that it is not comparable)

regards,

Hypoviax

SOLUTION

aikimark

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Wim ten Brink

@Hypoviax, it's your contest, you should do the testing. However, please try to measure the time used with the RDTSC function from my code since it's more exact than those performance counters. RDTSC are just the exact number of processor cycles. Not some value per second...
Also keep in mind that Russel has a flaw in his code. He's clearing too much records in his code, if the MaxValue is too low. If he would only clear the real amount of memory then his solution would be a lot faster.

And you know the size of the list. If you don't know the maximum number then you could always walk just once through the array to find that maximum value. It does slow down things a bit more but it would save on speed if you get a reasonably low MaxValue account since you don't have to allocate that much memory in that case. It might be a consideration to find the minimum and maximum values in the list prior to running the proper sorting routine. That way, you could select the most optimal sorting routine. The counting sort is great with a low maximum value while a QuickSort might be better for larger lists with a high maximum value.

@Russell, you might speed up some of your sorting routines that use recursion simply by removing the recursive calls and use a non-recursive version of those sort routines. A recursive routine is a bit expensive in performance simply because Delphi must do a lot of function calls in those cases.

@Hypoviax, you might make the test even more interesting by using a test list that does NOT contain any duplicate values. (Which means that MaxValue must be larger than MaxLength.) The counting sorts get it's biggest advantage just because there are quite a few duplicates in the file.

Now, something interesting. A counting sort that doesn't know anything about the length or maximum value. It's not the fastest solution but it's an interesting example showing how to do it.

procedure SortingStuffWA3(var Items: Variant);
var
List: array of Integer;
I, J, K: Integer;
Ticks: Int64;
lpArray: PIntArray;
MaxLength, MaxValue: Integer;
Value: Integer;
begin
Ticks := RDTSC;
MaxLength := VarArrayHighBound(Items, 1) - VarArrayLowBound(Items, 1);
MaxValue := -1;
lpArray := VarArrayLock(Items);
for I := 0 to MaxLength do begin
Value := lpArray^[I];
if (Value > MaxValue) then begin
SetLength(List, Value + 1);
while (MaxValue < Value) do begin
Inc(MaxValue);
List[MaxValue] := 0;
end;
end;
inc(List[Value]);
end;
K := MaxLength;
for I := High(List) downto Low(List) do begin
for J := 1 to List[I] do begin
lpArray^[K] := I;
Dec(K);
end;
end;
VarArrayUnlock(Items);
Ticks := RDTSC - Ticks;
WriteLn(Ticks: 18, ' processor ticks required for Alex''s third solution.');
end;

This method is slow when it's already sorted from low to hugh and fast if the original list is ordered from high to low. The reason for this is because this function just allocates as much memory as needed and adds more if it needs more. The biggest advantage is that this sort method just needs the variant array to work with and nothing more. But this flexibility comes with a price which is a slower speed. The speed of this routine is highly dependant of the data it contains and thus not reliable if you want a fast sorting routine. But it is a fast one if you don't know the maximum value...

Hypoviax

ASKER

Workshop_Alex,

I have been using your RDTSC function when measuring the performance of code for every solution. It seems however, we have lost our other major competitor (he said he has made his last post). So we can either go ahead and try and do one more round with the fact of not knowing max length or num or leave it here. It would be interesting to see performance differences, however.

regards,

Hypoviax

aikimark

Alex,

What impact does the existence of (Intel) hyperthreading have on the RDTSC measurements?

In fact, would this contest be any more interesting if we were to spawn another thread on the other processor(s), actual or virtual, and sort with parallel processes?

I wonder, with your most recent counting sort example, if the performance figures you see aren't really a result of the use of a variant array instead of a pure integer array. Please excuse my ignorance if I have misunderstood the code.

Wim ten Brink

Hmmm. Hyperthreading is one thing that troubles me so I actually don't know. The same is true about checking the performance on a multi-processor system. It could be even worse on a dual-hyperthreading-proceesor system. :-) But in general you can specify that your application will only run on one specific processor, thus a processor with hyperthreading could be told to use only one "emulated" processor for the application.
Then again, if all tests are done on a hyperthreading or multi-processor system then the comparisons will still be fair values.

For my latest counting sort I did have to do what the rules told me to do. I had to sort items in a variant array. If I had used a dynamic array of integers, the speed would be better. A static array would perform even faster. But data in a variant array can be locked so you can access it as a large block of data. And that is what is used in most sort examples given above. The performance decrease however comes from the dynamic array that I might have to resize once in a while. But if the first element in the array has the highest value then it will perform at it's top speed.

Furthermore, don't forget that a counting sort doesn't always be the fastest solution. It also has another flaw since it can only sort positive numbers. If you would also want it to sort negative numbers then you'd also have to determine the minimum value. And you'd have to create an array with values from the Minimum value to the Maximum value.

People have been experimenting with finding the best sorting routines even before the computer existed. The problem however is that the sorting speed is highly dependant of the data that is used to sort with.

@Hypoviax, my last version isn't meant to be part of a new speed test. It's only there to show how to solve the problem of an unknown maximum value. And of course the problem you could encounter if you're trying to use it. Say, the data you want to sort is this: 9, 9, 8, 6, 5, 3, 3, 2, 2, 0. Thus, when my latest version tries to process this, it will set up a count list for (9+1) elements, since the first element is used as MaxValue. When it continues to read the data, it will not find any new value greater than the MaxValue so the array is set at the optimum length from the beginning.
But if the data was actually sorted in opposite order: 0, 2, 2, 3, 3, 5, 6, 6, 8, 9 then it will be very slow. Because the first MaxValue is 0, thus it creates an array for (0+1) elements. The next element raises MaxValue to 2 so it needs to add two more elements to the dynamic array. Then another to raise it for MaxValue 3, two more for MaxValue 5, etc, until it finally reaches it's optimal size at the end. All this resizing consumes a lot of time so in such a case it is pretty inefficient. A QuickSort would be more efficient since it would not have to resize any arrays..

Russell is just correct when he says that the sorting method used is highly dependant of the data that you want to sort in the first place. The count sort is actually only useful if you're sorting just numbers and don't have a high maximum value and no negative values. It is therefore quite limited. The QuickSort routine is very popular since it can be used for all kinds of data and it will perform quite fast in general.

If you really want to have a fast sorting solution then you need to know quite a lot of information about the data you want to sort. Preferably the data type, the variance in different values, the amount of data elements and also the number of duplicate values. And to execute a good test of the sorting routines you will need lots and lots of testdata to test each and every solution with. You would, for example, need a list that is already sorted and contains no duplicates. And another list without duplicates that is sorted in reverse order. A list with only one single value. A list with two values that both occur a lot of times. You need small lists, large lists, lists with a low maximum value, lists with a high maximum value. All in all, you need to generate quite a few lists to do a good test with.
Furthermore, you should test every method with all those lists more than once per list. Preferably 5 times, dropping the highest and lowest result and calculating the average over the other three results. Maybe display it all in some graph too. Very complex to do it right. :-)

Hypoviax

ASKER

Alex,

I understood, your last comment as not being part of the speed test, it is most interesting however. I will award points after 2-3 days unless people want to continue. I will do a final test on all solutions (including mine and your last one as a comparison only) taking into account your comments; i will do the test 5 times and caculate an average. My machine is an AMD so the HT discussion will not apply to my results

Regards,

Hypoviax

aikimark

Alex,

I didn't see anywhere that the items had to be sorted "in place" and couldn't be copied to another data structure for sorting and then copied back to the variant array. It just seemed that some of the algorithms' performance figures might benefit from a different data structure.

Since we are trying to use the optimal algorithm to fit a variety of data, some of the performance profiles (similar to Russell's code) might be used to gauge the appropriateness/desireability of the counting sort. Since we want to best the Quicksort profile, O(N logN), we need to predict the likely counting sort performance, based on the following factors:
1. the number of additions (=N) during the counting
2. the number of additions (=MaxValue+1) during output
3. the number of comparisons during the output phase (=MaxValue+N)
4. relative cost of comparison and addition to similar 'atomic' operations in the other algorithms.

Hypoviax

ASKER

Final Results:

AVERAGES:

TEST: MaxLength of 10 with a max value of 1000

RUSSEL : 5610
ALEX : 19483
HYPOVIAX: 16477

WINNER : RUSSEL

TEST: MaxLength of 100 with a max value of : 1000

RUSSEL : 24450
ALEX : 23500
HYPOVIAX: 98640

WINNER : ALEX

TEST: MaxLength of 1000 with a max value of : 1000

RUSSEL : 46500
ALEX : 31515
HYPOVIAX: 959132

WINNER : ALEX

TEST: MaxLength of 100000 with a max value of : 1000

RUSSEL : 1522955
ALEX : 1300142
HYPOVIAX: 110902858

WINNER : ALEX

TEST: MaxLength of 1000000 with a max value of : 1000

RUSSEL : 38756156
ALEX : 17860298
HYPOVIAX: 554318985

WINNER : ALEX

From these averages (5 tests for each length change) the winner looks to be Alex. However,

Much the same ranks result when the max value is 100

Regards,

Hypoviax

Wim ten Brink

Personally, I consider Russell the winner because he came with several different and interesting sorting mechanisms. You should actually try them all in this competition and then you might notice that sometimes, other sorting methods will win. I found my copy of "Ready to run Delphi 3.0 Algorithms" again and the book shows some interesting comparisons. The book makes it clear too that the sorting speed is highly dependant of the data it contains.

@aikimark, the best sorting mechanism is the counting sort, since it's a simple O(M+N) system, compared to O(NxLog(N)). And M is the maximum value in the list and N the number of items. Thus, a large N and a small M will be fast. A small N and a large M might actually be very slow. Slower than a Quicksort for the same value of N. But in it's turn, a Quicksort is slow when you have many duplicate values, in which case it might be beaten by a bubblesort. Thus, the best sorting mechanism depends on the data you need to sort. But in general, a QuickSort will do the trick quite nicely.

However, I wished someone would come up with a non-recursive QuickSort routine, which should be even faster... :)

aikimark

Aren't all Delphi developers "winners"?!? :-)

@Alex, you deserve some credit for the RDTSC function. It's pretty slick.

Hypoviax

ASKER

I think i will hand the points out now. Thanks greatly for participating in such an interesting competition. I have learnt heaps and hope others have too.

Hypoviax

ASKER

If i could only set the points higher i would. Thanks to all participants

Regards,

Hypoviax

p.s will look for the non-recursive Quick sort. If found i'll post it here.

Wim ten Brink

@aikimark, Naaah... The RDTSC is a well-known function by now for timing purposes. It's just easy to forget again. :-) It was introduced with the first Pentium processors but never become part of Delphi. It would be the same as rewarding someone for inventing the lightswitch.

@Hypoviax, I have some good ideas about how to write a non-recursive QuickSort routine. Basically, it's similar to enumerating folders in a non-recursive way. You do this by using a dynamic list. Each list entry should contain the index of the first and last item of the list that needs to be sorted. You add Low() and High() as first values to that list and whenever you would call the recursive Quicksort routine, you would now just add a new entry to the list. All you have to do is walk through the list from the first to the last element and then the list you needed to sort is sorted.
When you would do a recursive sort, you're doing something similar. The only difference is that you don't use a list to keep track of what needs to be sorted but you call the function again with new parameters that end up on the stack. It's quite similar.

In https://www.experts-exchange.com/questions/21195960/Listing-all-the-files-in-a-directory.html I have added a non-recursive routine that enumerates files, folders and subfolders. If I had time, I would write something similar for the QuickSort. It would be faster than the recursive Quicksort since you don't have the overhead of the additional function calls but the final code would look quite complex. Using a list to sort another list. Interesting... :-)

Hypoviax

ASKER

Not sure i understand the code but here is a sample of the non-recursive quicksort which may interest you Alex:

http://www.bsdg.org/SWAG/SORTING/0066.PAS.html

Regards,

Hypoviax

aikimark

Hypoviax,

Thank you for the points...quite unexpected for not having written any Delphi code. Kibbutz points :-)

Hypoviax

ASKER

Well, you did make significant input into the maths of it all, and made worthy suggestions of improvement to Russel, Alex and myself so i thought you should benefit from some of your efforts.

Regards,

Hypoviax

Russell Libby

Been away for a few days, but glad to see that this question is now closed. ;-)
(thanks for the points)

I also see that there were a few comments directed towards me, and how I could make certain improvements to my code. To be truthful, I would be more inclined to research/implement/etc these "improvements" had there been code that went along with the suggestions. Talking is one thing, doing is another; No offense directed towards anyone...

I will also re-iterate what I previously stated on the recursive functions vs their iterative versions:

-- There is little to be gained by going this route --

for example,

MaxLength = 100,000
MaxValue = 1,000,000

100 Iterations using SedgeSort (recursive)

RDTSC / ms
6686967768 : 2656ms to complete 100 iterations
6694915724 : 2656ms to complete 100 iterations
6569867360 : 2610ms to complete 100 iterations

100 Iteartions using SedgeSortNR (iterative)

RDTSC / ms
6702981572 : 2656ms to complete 100 iterations
6678711928 : 2641ms to complete 100 iterations
6519984612 : 2594ms to complete 100 iterations

In most cases there is minimal (and sometimes no) performance gain, and the code becomes much more complex. The fact is, a recursive call in Delphi is not as expensive as one would think. A good write up on using recursion, and when to use iterative functions can be found here:

http://66.102.7.104/search?q=cache:w0mycdadNDYJ:www.cs.unc.edu/~weiss/COMP114/BOOK/06Recursion.doc+%2Brecursion%2Biterative%2Bquicksort%2Bspeed%2Bdifference&hl=en

At the end the author makes the following statement, with which I fully agree.

<quote>
So when should one use iteration, and when recursion? The first answer is, avoid using recursion if it doesn't make the code significantly easier to write and understand. In that light, many of our examples, including factorial, selection sort, and finding the largest entry in an array, must be considered convenient pedagogical exercises rather than good code. We classify the code for binary search and quicksort as examples where the gain in clarity and ease of programming justify the small degradation in execution speed much of the time.
<quote>

As will soon be seen, the iterative versions of quick sort /sedge sort are much more complex than their recursive counterparts.

Anyways, here's the code for both those algo's done in iterative fashion. Feel free to modify/improve on them, but please don't suggest that I be the one to do it <g>

Regards to all,
Russell

procedure SedgeSortNR(Data: PIntArray; ArrayLength: Integer);
var Stack: Array [0..1023] of Integer;
dwStack: Integer;
dwSplit: Integer;
dwSwap: Integer;
dwTemp: Integer;
dwFirst: Integer;
dwLast: Integer;
dwHi: Integer;
dwLo: Integer;
begin

Stack[0]:=0;
Stack[1]:=Pred(ArrayLength);
dwStack:=2;

while (dwStack > 0) do
begin
Dec(dwStack, 2);
dwFirst:=Stack[dwStack];
dwLast:=Stack[dwStack+1];
repeat
dwLo:=dwFirst;
dwHi:=dwLast;
dwSplit:=Data^[(dwFirst + dwLast) shr 1];
while True do
begin
while (Data^[dwLo] < dwSplit) do Inc(dwLo);
while (Data^[dwHi] > dwSplit) do Dec(dwHi);
if (dwLo > dwHi) then break;
dwSwap:=Data^[dwLo];
Data^[dwLo]:=Data^[dwHi];
Data^[dwHi]:=dwSwap;
Inc(dwLo);
Dec(dwHi);
end;
if ((dwHi-dwFirst) < (dwLast-dwLo)) then
begin
if (dwLo < dwLast) then
begin
if ((dwLast-dwLo) < 32) then
begin
for dwSplit:=Succ(dwLo) to dwLast do
begin
dwSwap:=dwSplit;
dwTemp:=Data^[dwSwap];
while (dwSwap > dwLo) and (Data^[dwSwap-1] > dwTemp) do
begin
Data^[dwSwap]:=Data^[dwSwap-1];
Dec(dwSwap);
end;
Data[dwSwap]:=dwTemp;
end;
end
else
begin
Stack[dwStack]:=dwLo;
Stack[dwStack+1]:=dwLast;
Inc(dwStack, 2);
end;
end;
dwLast:=dwHi;
end
else
begin
if (dwFirst < dwHi) then
begin
if ((dwHi-dwFirst) < 32) then
begin
for dwSplit:=Succ(dwFirst) to dwHi do
begin
dwSwap:=dwSplit;
dwTemp:=Data^[dwSwap];
while (dwSwap > dwFirst) and (Data^[dwSwap-1] > dwTemp) do
begin
Data^[dwSwap]:=Data^[dwSwap-1];
Dec(dwSwap);
end;
Data[dwSwap]:=dwTemp;
end;
end
else
begin
Stack[dwStack]:=dwFirst;
Stack[dwStack+1]:=dwHi;
Inc(dwStack, 2);
end;
end;
dwFirst:=dwLo;
end;
until (dwFirst >= dwLast);
end;

end;

procedure QuickSortNR(Data: PIntArray; ArrayLength: Integer);
var Stack: Array [0..1023] of Integer;
dwStack: Integer;
dwSplit: Integer;
dwSwap: Integer;
dwFirst: Integer;
dwLast: Integer;
dwHi: Integer;
dwLo: Integer;
begin

Stack[0]:=0;
Stack[1]:=Pred(ArrayLength);
dwStack:=2;

while (dwStack > 0) do
begin
Dec(dwStack, 2);
dwFirst:=Stack[dwStack];
dwLast:=Stack[dwStack+1];
repeat
dwLo:=dwFirst;
dwHi:=dwLast;
dwSplit:=Data^[(dwFirst + dwLast) shr 1];
repeat
while (Data^[dwLo] < dwSplit) do Inc(dwLo);
while (Data^[dwHi] > dwSplit) do Dec(dwHi);
if (dwLo <= dwHi) then
begin
dwSwap:=Data^[dwLo];
Data^[dwLo]:=Data^[dwHi];
Data^[dwHi]:=dwSwap;
Inc(dwLo);
Dec(dwHi);
end;
until (dwLo > dwHi);
if ((dwHi-dwFirst) < (dwLast-dwLo)) then
begin
if (dwLo < dwLast) then
begin
Stack[dwStack]:=dwLo;
Stack[dwStack+1]:=dwLast;
Inc(dwStack, 2);
end;
dwLast:=dwHi;
end
else
begin
if (dwFirst < dwHi) then
begin
Stack[dwStack]:=dwFirst;
Stack[dwStack+1]:=dwHi;
Inc(dwStack, 2);
end;
dwFirst:=dwLo;
end;
until (dwFirst >= dwLast);
end;

end;

aikimark

Russell,

No offense taken. I was a 'Johnny-come-lately' to this discussion thread and most of the code had alread been shown. I still consider myself unworthy of submitting original code to compete with you and Alex, so I relegated my time to the more pedagogical and esoteric topics. I wasn't trying to make more work for anyone. I did think your examples' performance numbers might have bettered Alex and Hypoviax code with a little tweaking.

However, it was more of a comment for future readers than an expectation of your time. I've included a little of my own code for everyones' amusement (below).
_______________________
Thank you for including the Stephen Weiss reference. Steve just stepped down as chair of the UNC Chapel Hill Comp Sci Dept after 15 years. He is a GREAT teacher, friend, and colleague.
_______________________

My Delphi code example (don't laugh too hard)

// pick a middle-of-three values for the pivot
var
Pvts: Array [1..3] of Integer;

Randomize;
Pvts[1] := Data^[RandomRange(L, R)];
Pvts[2] := Data^[RandomRange(L, R)];
Pvts[3] := Data^[RandomRange(L, R)];
InsertionSort(Pvts, 3); //for this small of a list, an insert sort is a good choice

// Pvts[2] contains the middle-of-three value which improves
// quicksort performance by about 14%

Russell Libby

aikimark ,

Glad no offense was taken, because none was meant...

And you are MORE than worthy of submitting code, as a way for us to all better our skills. The topics that you brought up, with which I am familiar with some (but not all), were pointed and valid, but my problem was in trying to cover such a broad topic as "sorting". Lets face it, given this "contest", and the ranges of data size/max values, there could be no one function that could win for every scenario.

So, what I ended up doing in the end was trying to cover a multitude of sorting functions; linear, log, and quadratic, discussing when to use each, and when not to use them, etc. But, this unfortunately did not leave me with the time required to fine tune each function. Perhaps at a later date I might post a library of two dozen or so sorting functions.... after i clean and tune them all that is. ;-)

Regarding Stephen Weiss; while I do not know him personally, I can say based on his writings that he IS a great teacher. He is well spoken, and is able to apply his topics of teaching to practical coding (vs those that speak totally about theory)

I envy you for having such colleagues...

Best regards and well wishes,
Russell

aikimark

Russell,

If you get to NC, please contact me through the RTP Delphi Interest Group and I'll be glad to introduce you to Steve and we'll take a tour through the Comp Sci labs (neat VR stuff).

Steve used Mr. Wizard (Don Herbert) as his teaching role model. I used both Steve and Mr. Wizard (and others) as my teaching role models.

I use Delphi experts, such as you, to assimilate and better my Delphi programming skills. I'm standing on the shoulders of giants, as it were.