magellanLS
asked on
Very Fast Function For Checking If Two Memory Blocks(~100bytes) Are Equal?
Hello Experts! :)
Is There A Very Fast Function For Checking If Two Memory Blocks(~100bytes) Are Equal?
Thanks Alot!
Is There A Very Fast Function For Checking If Two Memory Blocks(~100bytes) Are Equal?
Thanks Alot!
look at CompareMem
from the delphi help
Performs a binary comparison of two memory images.
Unit
SysUtils
Category
comparison routines
Delphi syntax:
function CompareMem(P1, P2: Pointer; Length: Integer): Boolean; assembler;
Description
CompareMem performs a binary compare of Length bytes of memory referenced by P1 to that of P2. CompareMem returns true if the memory referenced by P1 is identical to that of P2.
mewikl ;-)
from the delphi help
Performs a binary comparison of two memory images.
Unit
SysUtils
Category
comparison routines
Delphi syntax:
function CompareMem(P1, P2: Pointer; Length: Integer): Boolean; assembler;
Description
CompareMem performs a binary compare of Length bytes of memory referenced by P1 to that of P2. CompareMem returns true if the memory referenced by P1 is identical to that of P2.
mewikl ;-)
yes.
You may also use a type Record of Byte whose Addr is set to your memory block(s) via absolute, and just compare equality using '='
Or use an array (vector) of Byte for the same purpose
You may also use a type Record of Byte whose Addr is set to your memory block(s) via absolute, and just compare equality using '='
Or use an array (vector) of Byte for the same purpose
VGR, in the end something like CompareMem is called.
yes, I know 8-))
ASKER
Hello!
Robert ,Could you show some code, how you would do it with a downto loop.
Mewikl ,'CompareMem' is not fast enough.
Thanks!
P.S: Forgot to say that I need to compare two small blocks for equality 1 Million times per second or even more.
Robert ,Could you show some code, how you would do it with a downto loop.
Mewikl ,'CompareMem' is not fast enough.
Thanks!
P.S: Forgot to say that I need to compare two small blocks for equality 1 Million times per second or even more.
damn, where comes the w from,
not mewikl -> meikl
not mewikl -> meikl
I am not sure if it is at all possible. Comparing 100 bytes 1 million times a second means 100.000.000 million instructions at least. If it is always a new array to compare then the memory interface throughput is not fast enough.
This only works if the size of the data is divisible by four.
untested code:
type
PCompMem = ^TCompMem;
TCompMem = array [0..255] of DWORD;
var
I: Integer;
Cmp1, Cmp2: PCompMem;
begin
Cmp1 := PCompMem(SomeMem1);
Cmp2 := PCompMem(SomeMem2);
Result := False;
for I := SizeOfAreaInDWORDs downto 0 do
if Cmp1[I] <> Cmp2[I] then
begin
Result := True;
Break;
end;
end;
The main speed improvements are
1. the code is inlined so no overhead of calling a function
2. the compares are handled 4 bytes at once
This only works if the size of the data is divisible by four.
untested code:
type
PCompMem = ^TCompMem;
TCompMem = array [0..255] of DWORD;
var
I: Integer;
Cmp1, Cmp2: PCompMem;
begin
Cmp1 := PCompMem(SomeMem1);
Cmp2 := PCompMem(SomeMem2);
Result := False;
for I := SizeOfAreaInDWORDs downto 0 do
if Cmp1[I] <> Cmp2[I] then
begin
Result := True;
Break;
end;
end;
The main speed improvements are
1. the code is inlined so no overhead of calling a function
2. the compares are handled 4 bytes at once
there is another option: to use MMX registers and ops, you can do 64 byte compare in a single instruction
but I am not sure if you can reach a speed you want - there could be some overheads... where do you get these blocks from ? are they all loaded to memory ? give us a large scaled algorythm please
but I am not sure if you can reach a speed you want - there could be some overheads... where do you get these blocks from ? are they all loaded to memory ? give us a large scaled algorythm please
by the way, this is a source of CompareMem() ;-)
as you can see it is dwords compare op: cmpsd
function CompareMem(P1, P2: Pointer; Length: Integer): Boolean; assembler;
asm
PUSH ESI
PUSH EDI
MOV ESI,P1
MOV EDI,P2
MOV EDX,ECX
XOR EAX,EAX
AND EDX,3
SHR ECX,1
SHR ECX,1
REPE CMPSD
JNE @@2
MOV ECX,EDX
REPE CMPSB
JNE @@2
@@1: INC EAX
@@2: POP EDI
POP ESI
end;
as you can see it is dwords compare op: cmpsd
function CompareMem(P1, P2: Pointer; Length: Integer): Boolean; assembler;
asm
PUSH ESI
PUSH EDI
MOV ESI,P1
MOV EDI,P2
MOV EDX,ECX
XOR EAX,EAX
AND EDX,3
SHR ECX,1
SHR ECX,1
REPE CMPSD
JNE @@2
MOV ECX,EDX
REPE CMPSB
JNE @@2
@@1: INC EAX
@@2: POP EDI
POP ESI
end;
ASKER
Hello,
Alisher_N ,
>where do you get these blocks from ?
From a big bitmap image
>are they all loaded to memory ?
Yes
Alisher_N ,
>where do you get these blocks from ?
From a big bitmap image
>are they all loaded to memory ?
Yes
so it is just a byte sequence, why 1xx byte chunks then ?
you only get 70ms overhead for every 1,000,000 function calls so if CompareMem is not fast enough you can only gain less than 1/10 of a second by implementing it inline instead of calling it. You need a better algorithm I think.
ASKER
Alisher_N ,
>so it is just a byte sequence, why 1xx byte chunks then ?
Becuase, I have 2 Bitmap Images in memory ,instead of comparing 1 pixel(if pixel value dont match, append to a memory buffer) at a time it should be faster comparing by 60(or more) pixels
>so it is just a byte sequence, why 1xx byte chunks then ?
Becuase, I have 2 Bitmap Images in memory ,instead of comparing 1 pixel(if pixel value dont match, append to a memory buffer) at a time it should be faster comparing by 60(or more) pixels
I peak at about 2.5M transactions per second on an 800 MHz machine for a thread safe object lock and release (separate function calls). If you use a three processor box and your algorithm is thread safe you should _just_ be able to manage this task, as long as nothing else is going on.
Everything up to the REPE CMPSD is setup. REPE CMPSD performs all of its work internally (except the actual DWord fetches) in the CPU, so you are running as fast as the CPU will allow. Have you confirmed that your data falls on DWord boundaries? This can have significant impact.
Everything up to the REPE CMPSD is setup. REPE CMPSD performs all of its work internally (except the actual DWord fetches) in the CPU, so you are running as fast as the CPU will allow. Have you confirmed that your data falls on DWord boundaries? This can have significant impact.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Alisher raises a good point.
What is the "end user" version of this problem ... there may be a better approach than BIBF comparisons between massive arrays, or you may need to adjust your expectations. I don't think PC busses are capable of moving 100,000,000 bytes per second (or even 25,000,000 DWORDS) without special hardware, let alone doing work on them.
The CPU might handle it, and multiple CPU's will handle it, but the bus won't.
To get the performance you need, you might need to go to a higher performance platform like a PowerPC or a SPARC. On the intel platform it might be possible horizontally scale the problem across multiple parallel machines, or to queue the problem serially between machines to get adequate performance, depending on the end-user expectations.
What is the "end user" version of this problem ... there may be a better approach than BIBF comparisons between massive arrays, or you may need to adjust your expectations. I don't think PC busses are capable of moving 100,000,000 bytes per second (or even 25,000,000 DWORDS) without special hardware, let alone doing work on them.
The CPU might handle it, and multiple CPU's will handle it, but the bus won't.
To get the performance you need, you might need to go to a higher performance platform like a PowerPC or a SPARC. On the intel platform it might be possible horizontally scale the problem across multiple parallel machines, or to queue the problem serially between machines to get adequate performance, depending on the end-user expectations.
magellanLS:
This old question needs to be finalized -- accept an answer, split points, or get a refund. For information on your options, please click here-> http:/help/closing.jsp#1
EXPERTS:
Post your closing recommendations! No comment means you don't care.
This old question needs to be finalized -- accept an answer, split points, or get a refund. For information on your options, please click here-> http:/help/closing.jsp#1
EXPERTS:
Post your closing recommendations! No comment means you don't care.
Any assembler function will vary in speed depending on the CPU. It should not be that much faster than the loop anyway.
The loop should even be faster because the overhead of calling a function may hamper the assembler function.