[x]
Posted via EE Mobile

Search, ask, and monitor your questions on the go with EE Mobile. Visit Experts Exchange from your mobile device and never be out of touch again.

08/27/2009 at 11:08PM PDT, ID: 24688938
[x]
Attachment Details
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

7.6

Delphi, MMX/SSE, Opinions from those who have stepped here!

Asked by ThievingSix in Delphi Programming, Assembly Programming Language, C++ Programming Language

Well, ok. Doing my first large project for a company friend. So I decided that I wanted to look into the speed increase and just try something I've never done before. After reading about the instruction sets and how they work I've come up with the code below.

As expected the MMX version runs faster than the Normal, and the SSE version is faster than the MMX. Now, I probably have done some type of no no since this is my first time going this deep into asm.

Any expert opinions that do this sort of thing?
Especially the wrappers I did for GetMem and FreeMem. Was really iffy when I made them but they seem to work. I commented the code so that it followed my train of thought.
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
65:
66:
67:
68:
69:
70:
71:
72:
73:
74:
75:
76:
77:
78:
79:
80:
81:
82:
83:
84:
85:
86:
87:
88:
89:
90:
91:
92:
93:
94:
95:
96:
97:
98:
99:
100:
101:
102:
103:
104:
105:
106:
107:
108:
109:
110:
111:
112:
113:
114:
115:
116:
117:
118:
119:
120:
121:
122:
123:
124:
125:
126:
127:
128:
129:
130:
131:
132:
133:
134:
135:
136:
137:
138:
139:
140:
141:
142:
143:
144:
145:
146:
147:
148:
149:
150:
151:
152:
153:
154:
155:
156:
157:
158:
159:
160:
161:
162:
163:
164:
165:
166:
167:
168:
169:
170:
171:
172:
173:
174:
175:
176:
177:
178:
179:
180:
181:
182:
183:
184:
185:
186:
187:
188:
189:
190:
191:
192:
193:
194:
195:
196:
197:
198:
199:
200:
201:
202:
203:
204:
205:
206:
207:
208:
209:
210:
211:
212:
213:
214:
215:
216:
217:
218:
219:
220:
221:
222:
223:
224:
225:
226:
227:
228:
program MMX_SSE;
 
{$APPTYPE CONSOLE}
 
uses
  Windows,
  SysUtils;
 
const
  Iterations = 1000 + 1; //Number of iterations for each instruction test
  BytesNeeded = 16; //We need multiple of 16 bytes for SSE
  BytesPerGo = BytesNeeded * 50000; //How many bytes to XOR per interation
 
function IsMMXAvailable: Boolean;
var
  Supported : Integer;
begin
  asm
    mov eax, 1 //using function 0x1 of cpuid
    cpuid
    and edx, 800000H //check the 23rd bit
    mov Supported, edx
  end;
  Result := (Supported <> 0); //If the bit is set we can use MMX
end;
 
function IsSSEAvailable: Boolean;
var
  Supported : Integer;
begin
  asm
    mov eax, 1 //using function 0x1 of cpuid
    cpuid
    and edx, 2000000H //check the 25th bit
    mov Supported, edx
  end;
  Result := (Supported <> 0); //If the bit is set we can use SSE
end;
 
//GetMem wrapper that aligns the pointer on a 16 bit boundry
procedure GetMemA(var P: Pointer; const Size: DWORD); inline;
var
  OriginalAddress : Pointer;
begin
  P := nil;
  GetMem(OriginalAddress,Size + 32); //Allocate users size plus extra for storage
  If OriginalAddress = nil Then Exit; //If not enough memory then exit
  P := PByte(OriginalAddress) + 4; //We want at least enough room for storage
  DWORD(P) := (DWORD(P) + (15)) And (Not(15)); //align the pointer
  If DWORD(P) < DWORD(OriginalAddress) Then Inc(PByte(P),16); //If we went into storage goto next boundry
  Dec(PDWORD(P)); //Move back 4 bytes so we can save original pointer
  PDWORD(P)^ := DWORD(OriginalAddress); //Save original pointer
  Inc(PDWORD(P)); //Back to the boundry
end;
 
//Freemem wrapper to free aligned memory
procedure FreeMemA(P: Pointer); inline;
begin
  Dec(PDWORD(P)); //Move back to where we saved the original pointer
  DWORD(P) := PDWORD(P)^; //Set P back to the original
  FreeMem(P); //Free the memory
end;
 
var
  StartTick,
  EndTick,
  Frequency : Int64;
 
  Normal,
  MMX,
  SSE,
  XORData : Pointer;
 
  I : Integer;
  OutHex : String;
begin
  If Not(IsMMXAvailable) Or Not(IsSSEAvailable) Then //Check to see if supported
    begin
    Writeln('MMX or SSE is not supported by your OS/CPU. Press any key to exit. . .');
    Readln;
    Exit;
  end;
 
  //Allocate the pointers for each type and fill
  GetMemA(Normal,BytesPerGo);
  FillChar(Normal^,16,$7F);
 
  GetMemA(MMX,BytesPerGo);
  FillChar(MMX^,BytesPerGo,$7F);
 
  GetMemA(SSE,BytesPerGo);
  FillChar(SSE^,BytesPerGo,$7F);
 
  GetMemA(XORData,BytesNeeded);
  FillChar(XORData^,BytesNeeded,$FF);
 
  Writeln(Format('Number of iterations: %d',[Iterations - 1]));
  Writeln(Format('Number of bytes per iteration: %d'#13#10,[BytesPerGo]));
 
  QueryPerformanceFrequency(Frequency);
 
  Try
    //Normal 4 byte xor at a time
    QueryPerformanceCounter(StartTick);
    asm
      push eax //Save eax
      mov eax, XORData //Set eax to the XORData pointer
      mov eax, [eax] //Load eax with a 4 byte dword
      push ebx //Save ebx
      push esi //Save esi
      mov esi, Iterations //Keep track of our iterations
      push edx //Save edx
 
      @@Loop:
        mov ebx, Normal //Set ebx with the Normal pointer
        mov edx, BytesPerGo //Set edx with the number of bytes
        @@InnerLoop:
          xor [ebx], eax //Xor 4 bytes in ebx with eax
          add ebx, 4 //Goto the next 4 bytes in ebx
          sub edx, 4 //Subtract our index register
        JNZ @@InnerLoop //Once edx = 0 stop looping
        dec esi //Deincrement our interation registery
      JNZ @@Loop //Once esi = 0 stop looping
 
      pop edx //Restore edx
      pop esi //Restore esi
      pop ebx //Restore ebx
      pop eax //Restore eax
    end;
    QueryPerformanceCounter(EndTick);
    //Write out performance data
    OutHex := '';
    Writeln(Format('Normal - %.2f ms',[((EndTick - StartTick) / Frequency) * 1000]));
    For I := 0 To 15 Do //Write the first 16 bytes that we xor'd
      begin
      OutHex := OutHex + IntToHex(PByteArray(Normal)[I],2) + #32;
    end;
    Writeln('Data: ' + OutHex + #13#10);
 
    //MMX 8 byte xor at a time
    QueryPerformanceCounter(StartTick);
    asm
      push eax //Save eax
      mov eax, XORData //Set eax to the XORData pointer
      movq mm0, [eax] //Move 8 bytes from eax to the mm0 register
      push ebx //Save ebx
      push esi //Save esi
      mov esi, Iterations //Keep track of our iterations
      push edx //Save edx
 
      @@Loop:
        mov ebx, MMX //Set ebx to the MMX pointer
        mov edx, BytesPerGo //Set edx with the number of bytes
        @@InnerLoop:
          movq mm1, [ebx] //Move 8 bytes from ebx to mm1 register
          pxor mm1, mm0 //Xor 8 bytes at a time
          movq [ebx], mm1 //Put the xor'd data back to ebx
          add ebx, 8 //Goto the next 8 bytes in ebx
          sub edx, 8 //Subtract our index register
        JNZ @@InnerLoop //Once edx = 0 stop looping
        dec esi //Deincrement our interation registery
      JNZ @@Loop //Once esi = 0 stop looping
 
      pop edx //Restore edx
      pop esi //Restore esi
      pop ebx //Restore ebx
      pop eax //Restore eax
    end;
    QueryPerformanceCounter(EndTick);
    asm EMMS end; //Resets all mmX registers.
    //Write out performance data
    OutHex := '';
    Writeln(Format('MMX - %.2f ms',[((EndTick - StartTick) / Frequency) * 1000]));
    For I := 0 To 15 Do //Write the first 16 bytes that we xor'd
      begin
      OutHex := OutHex + IntToHex(PByteArray(MMX)[I],2) + #32;
    end;
    Writeln('Data: ' + OutHex + #13#10);
 
    //SSE 16 byte xor at a time
    QueryPerformanceCounter(StartTick);
    asm
      push eax //Save eax
      mov eax, XORData //Set eax to the XORData pointer
      movaps xmm0, [eax] //Move 16 bytes from eax to the mm0 register
      push ebx //Save ebx
      push esi //Save esi
      mov esi, Iterations //Keep track of our iterations
      push edx //Save edx
 
      @@Loop:
        mov ebx, SSE //Set ebx to the SSE pointer
        mov edx, BytesPerGo //Set edx with the number of bytes
        @@InnerLoop:
          movaps xmm1, [ebx] //Move 16 bytes from ebx to xmm1 register
          xorps xmm1, xmm0 //Xor 16 bytes at a time
          movaps [ebx], xmm1 //Put the xor'd data back to ebx
          add ebx, 16 //Goto the next 16 bytes in ebx
          sub edx, 16 //Subtract our index register
        JNZ @@InnerLoop //Once edx = 0 stop looping
        dec esi //Deincrement our interation registery
      JNZ @@Loop //Once esi = 0 stop looping
 
      pop edx //Restore edx
      pop esi //Restore esi
      pop ebx //Restore ebx
      pop eax //Restore eax
    end;
    QueryPerformanceCounter(EndTick);
    //Write out performance data
    OutHex := '';
    Writeln(Format('SSE - %.2f ms',[((EndTick - StartTick) / Frequency) * 1000]));
    For I := 0 To 15 Do //Write the first 16 bytes that we xor'd
      begin
      OutHex := OutHex + IntToHex(PByteArray(MMX)[I],2) + #32;
    end;
    Writeln('Data: ' + OutHex + #13#10);
  Except
    On E:Exception Do Writeln(E.Message);
  end;
  //Free all the aligned memory
  FreeMemA(Normal);
  FreeMemA(MMX);
  FreeMemA(SSE);
  FreeMemA(XORData);
  Writeln(#13#10'--'#13#10'Press any key to exit. . .'#13#10'--'#13#10);
  Readln;
end.
[+][-]08/29/09 02:52 AM, ID: 25213526

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]08/31/09 11:25 PM, ID: 25228948

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]08/31/09 11:29 PM, ID: 25228967

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]08/31/09 11:50 PM, ID: 25229053

View this solution now by starting your 30-day free trial. Setting up your free trial is quick, easy, and secure. We will return you to this solution, unlocked, when you're done.

 

About this solution

Zones: Delphi Programming, Assembly Programming Language, C++ Programming Language
Sign Up Now!
Solution Provided By: ikework
Participating Experts: 1
Solution Grade: A
 
 
[+][-]08/31/09 11:52 PM, ID: 25229059

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]08/31/09 11:56 PM, ID: 25229070

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]09/01/09 12:06 AM, ID: 25229103

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]09/01/09 12:10 AM, ID: 25229118

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
 
Loading Advertisement...
20091111-EE-VQP-91 - Hierarchy / EE_QW_3_20080625