Avatar of Wim ten Brink
Wim ten Brink
Flag for Netherlands asked on

[Competition] Build the fastest StringReplace function!

To continue on what was asked in https://www.experts-exchange.com/Programming/Programming_Languages/Delphi/Q_21980883.html I now want to pose a completely different challenge here. What I want is everyone to post a function EXACTLY like the StringReplace function in Delphi. Thus, your code must be something like this:

type TReplaceFlags = set of (rfReplaceAll, rfIgnoreCase);
function [Yourname]_StringReplace(const S, OldPattern, NewPattern: string; Flags: TReplaceFlags): string;

where [Yourname] is replaced by your name, of course. :-)

Now, the challenge is to write a version of this function which is (at least) twice as fast as the Delphi version. (I will test it with Delphi 2006 and will be measuring both the time required to get the result as the number of clockcycles.
Measurements will be done by using a huge file with text. This file will be \Borland\Delphi 2006\source\Win32\rtl\win|Windows.pas for Delphi 2006 which happens to be 1.21 megabytes in size. Thus quite large. The file will be read into a string and then passed to the StringReplace functions. The text which will be replaced will be chosen from a list with words, varying in length from 1 character to 20 characters or maybe even a bit more. The strings that they will be replaced with might have the same length or are either bigger or smaller. I will also do one run replacing a word that's not part of this sourcefile. (Which should result in no changes...)

I will have to write a simple console application which will run all the tests, which I will share here. Once that is done, I too will try to write a fast solution... :-)

There will be points for every solution which is twice as fast as the Delphi version. The fastest one will receive 100 points and will be accepted as answer. The second place will be rewarded with 75 points. third-fastest solution will receive 50 points and all other solutions will receive 20 points, if they are fast enough. This Q will be closed when there are 20 fast solutions here or when there haven't been any new solutions for about three weeks. Points will be divided over the top-20.
Feel free to provide as many solutions as you like. Just be clear if something is a new solution or a fix on an existing one.
Delphi

Avatar of undefined
Last Comment
Wim ten Brink

8/22/2022 - Mon
alkisg

Alex, I hope you have a lot of spare time for the test!

DelphiStringReplace needed 307.72 sec (more than 5 minutes) to replace ' ' with ' ' on a 1Mb text file...

I guess we can do this in under a sec! :-)
Wim ten Brink

ASKER
Well, I wasn't planning to do anything this saturday so yeah, I have some time to set up the sourcecode for the testing tool. After that, I will just have to run it. And to get better measurements, I am planning to allow the test run to do quite a few tests. For me it's no problem if I run the test in the evening if it finishes the next morning... :-)

Just have to write a simple and fair testing template, though. :-)
ASKER CERTIFIED SOLUTION
alkisg

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
Wim ten Brink

ASKER
Nice one, alkiqs... :-) That link will count as one solution. :-) Any more solutions?

Actually, the problem with those FastCode solutions is that many of them are written in Assembler. I want one written in just Pascal code. I have nothing against Assembler code, especially fast code. But my preference is to improve Delphi applications using better programming techniques and not by going one level lower.
A former collegue of mine once had a problem with some string-handling function which turned out to be too slow in the Delphi code he wrote. So he "solved" his problem by rewriting it in Assembler, thus making it harder to maintain for others. Then I finally took a look at his code and easily noticed what he did wrong in his original Delphi code and was able to increase the efficiency of that piece of code to an even faster solution than his Assembler code. (Which was easy since his Assembler code wasn't really optimized either.)
If you use a disassembler to examine Delphi binaries then you will see that very optimal code is already used in many situations. The Delphi compiler already writes very good optimized code. :-)
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
robert_marquardt

In fact the problem is unsolvable in the way it is asked. CPUs from the original Pentium up to the current Core Duo (not to mention AMD CPUs) are so different that no single function can be the fastest on all of them.
BTW fastcode is using the Borland basm newsgroup for communication.
Wim ten Brink

ASKER
True, Robert. But basically the processor should not make that much difference if you look at the number of clock cycles. Basically, it's more about the question which Delphi code would generate the fastest solution after compilation. Technically, it would be the solution that requires the least number of function calls to other functions. Something you could easily do in Assembler but that's the cheap way out in my opinion. I am not looking for the fastest function either. I am looking for functions that are about twice as fast as the Delphi version while still just using Delphi code.
For the test, the source text will be identical for all and they will all have to perform the same actions with exactly the same values. Then average speeds will be compared.

I'm not saying that the work done by FastCode is worthless. It's actually quite good to have some very optimized functions in Delphi just in case when you really need something lightning-fast. But it is more interesting to know how far you can go in Delphi itself before you'd have to start with Assembler code. (Besides, processors are changing and what used to be fast BASM code in the past could be slow today.)
Of course, FastCode would be less valuable than a Delphi code when you're going to port your code to .NET because in .NET you just can't use any Assembler instructions. But that's more for the other contest question that I posted. :-)
alkisg

Delphi StringReplace basically works like this:
...locate a match and then do
Result := Result + NewStr

This instruction is a major bottleneck. Suppose you need convert MSHTML.PAS (~2Mb) to a syntax highlighted HTML page.
You'll need to replace ' ' with  

Delphi would need ~1 hour for this, because it doesn't use an optimized ALGORITHM.
An optimized algorithm IN PURE PASCAL would need about 1 sec (I've actually measured these times).
Even if you use the same algorithm in assembly code, it would still be very slow.

The reason is, that when Result grows to become 1 Mb and replacements continue (until 2Mb),
Result := Result + NewStr
will cause 1Mb memory reallocations for each ' ' matched.
If 10.000 spaces exist in the range 1Mb to 2Mb,
10.000 x 1.000.000 bytes would need to be moved.

To optimize, one could save the match offsets and do only ONE large memory allocation.

It's a shame Delphi still uses such slow algorithms on MOST string functions...

P.S. the above apply to the .NET version as well.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
alkisg

> Even if you use the same algorithm in assembly code, it would still be very slow.
I meant the Delphi unoptimized algorithm.
An O(N) pascal implementation would be much faster than a O(N^2) implementation in assembly.
alkisg

This is a quick patched version of a ReplaceAll function I've once done, to make it compatible with the contest.
Remarks:
*) If Length(NewPattern) = Length(OldPattern), it replaces in-place, with no memory allocations.
*) If Length(NewPattern) > Length(OldPattern), it saves the offsets and only does one large memory allocation. The offsets array grows 150% in size everytime it's needed. I don't know if it would be better to use a list (many small allocations vs a few somewhat larger ones).
*) If Length(NewPattern) < Length(OldPattern), in-place replacements can be done. I don't do this now, it can be optimized.
*) For one replacement, it just calls SysUtils.StringReplace. A single replacement is not that slow anyway, but as I said, that's a quick patch. :-)


function Alkisg_StringReplace(const S, OldPattern, NewPattern: string; Flags: TReplaceFlags): string;
{ (c) 2002 - 2006 Alkis - http://www.alkisg.com }
var
  offsets: array of integer;
  ofs, lastResultOfs, lastSOfs, numOffsets, i: integer;
  upS, upOldPattern: string;
begin
  if not (rfReplaceAll in Flags) then           //Don't bother if only one replacement was needed!!!
  begin
    Result := SysUtils.StringReplace(S, OldPattern, NewPattern, Flags);
    Exit;
  end;

  if rfIgnoreCase in Flags then
  begin
    upS := UpperCase(s);
    upOldPattern := UpperCase(OldPattern);
  end
  else begin                                        //I hope that string reference counting works...
    upS := S;
    upOldPattern := OldPattern;
  end;

  ofs := PosEx(OldPattern, s, 1);                        //returns 0 even for Length(OldPattern) = 0
  if ofs = 0 then
  begin
    Result := s;
    Exit;
  end;
  if Length(OldPattern) = Length(newPattern) then                                        //both <> 0
  begin
    Result := s;
    repeat                                                              //ofs <> 0 on the first loop
      StrMove(@Result[ofs], @newPattern[1], Length(newPattern));
      ofs := PosEx(upOldPattern, upS, ofs + Length(upOldPattern));
    until ofs = 0;
  end
{ This case will be implemented later.
  Still works OK now using the other 'else' below, but it could be optimized more.
  else if Length(OldPattern) < Length(newPattern) then               //no need for memory allocation
  begin

  end }
  else begin        //Length(OldPattern) > Length(newPattern), start with marking all the occurences
    numOffsets := 0;
    repeat                                                              //ofs <> 0 on the first loop
      if numOffsets > High(offsets) then
        SetLength(offsets, 100 + MulDiv(Length(offsets), 150, 100));
      offsets[numOffsets] := ofs;
      Inc(numOffsets);
      ofs := PosEx(upOldPattern, upS, ofs + Length(upOldPattern));
    until ofs = 0;
    SetLength(offsets, numOffsets);
  //Build Result length. Length(newPattern)-Length(OldPattern) could be < 0.
    SetLength(Result, Length(s) + numOffsets * (Length(newPattern) - Length(OldPattern)));
  //Do the copying.
    lastResultOfs := 1;
    lastSOfs := 1;
    for i := 0 to numOffsets - 1 do
    begin
      StrMove(@Result[lastResultOfs], @s[lastSOfs], offsets[i] - lastSOfs);
      Inc(lastResultOfs, offsets[i] - lastSOfs);
      StrMove(@Result[lastResultOfs], PAnsiChar(newPattern), Length(newPattern));
      Inc(lastResultOfs, Length(newPattern));
      lastSOfs := offsets[i] + Length(OldPattern);
    end;
    StrMove(@Result[lastResultOfs], @s[lastSOfs],
      Length(Result) - lastResultOfs + 1);
  end;
end;
alkisg

An error here:
ofs := PosEx(OldPattern, s, 1);                        //returns 0 even for Length(OldPattern) = 0
I should have used the uppercase'd ones:
ofs := PosEx(upOldPattern, upS, 1);                        //returns 0 even for Length(OldPattern) = 0
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Wim ten Brink

ASKER
@mnasman, the thing I'm trying to avoid here is the use of Assembler. Of course a piece of code could be rewritten using ASM or BASM but then it's not a real Delphi challenge anymore. At least, not in my opinion. Personally, I think Delphi can be highly optimized even without the use of Assembler just by making smarter algorithms and more importantly, by changing the way we look at data.

@alkisq, this weekend I created the test tool which I will use for the tests and I'll share the source of it here. A quick test showed that on my Pentium III/866MHz I could replace the word 'begin' with 'Start' in the Windows.pas file within slightly more than a second. This was compiled with Delphi 2006, btw. (And the sourcefile is also from Delphi 2006.)
I still need to make the test more input-driven, though. I am planning to let it run lots of tests, calculating averages and thus getting better statistical results. Doesn't matter to me if it has to run half the night. :-)
Wim ten Brink

ASKER
@alkisq, I've tested your code and it's not much faster than the regular StringReplace. I'll still need to upload the code to my site, with some description on how the testing module works but so far this is it:
Borland version Delphi 2006: 162.2 cycles, 139173.5 ticks.
alkisq version Version 1.0: 156.1 cycles, 134344.4 ticks.

These are averages over 8 different tests, each performed 6 times. If you divide the number of ticks by the number of cycles, you'll get my average clock speed of my Pentium III. Yes, it's only 866 MHz... :-)

I'll try to upload the code of my test tool tomorrow, which will include some interesting views on how to write such tools. (Which is also why it took me so long.)

Well, funnier... I later added one of the FastString functions to the test, just to compare and WOW! 20.7 cycles, 18375.6 ticks.  So there is room for a lot of improvements, based on these values. However, what I don't like about the FastString solution I've used it that it uses a global variable. But it's a version that uses no assembler. :-) The unit is called AnsiStringReplaceJOHPASUnit4 in case you want to know which one. They do also have an Assembler version, though.
alkisg

Your results are quite strange... my solution uses the same algorithm as AnsiStringReplaceJOHPASUnit4 (didn't know about this until I heard about fastcode some days ago), we even grow the offsets array by the same amount, 150%... I've been using it for 4 years now, it's quite fast...

On my PC it had the following results on 1 Mb text:
Delphi StringReplace ~= 300 sec
Alkisg_StringReplace < 1 sec.

==> Did you include rfReplaceAll in Flags?
A single replacement is not that time consuming, I just call Delphi StringReplace in that case...

Try replacing ' ' with &nbsp; on Windows.pas, INCLUDING rfReplaceAll.

I'll be eagerly waiting for your source code!!!
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Wim ten Brink

ASKER
Well, either it's a bug in my test application or Borland improved it's StringReplace function drastically in D2006. :-) I'll know tonight. But I was amazed that it turned out to be this fast.
alkisg

A note:
FastCode replaces ALL the calls to Delphi StringReplace with their own implementations (they actually overwrite Delphi StringReplace code to JMP to their own procedure). So if you use the library as it is, you won't be measuring Delphi StringReplace but FastCode StringReplace. Perhaps this is why you thought Delphi StringReplace was fast???

So, disable the patchXXX calls on the initialization section of fastcode...

BTW, I have D2006 update 2 and StringReplace still sucks! :-)
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
alkisg

My tests so far: (of course Alex results will be the official ones)
All tests (except for the last one) replace ALL ' ' with '&nbsp;'. This is a bad test for Russell's function, because on Boyer-Moore method the longer the search string, the better the results. It's also a bad test for my function, because PosEx calling overhead is large since a lot of ' ' exist.
Compiled with D2006 update 2, ran on my Celeron 1.6.

1) The source string was SysConst.pas
Running with Length(text) = 8131
StringReplace completed in 70431344 CPU cycles (44 ms).
Alkisg_StringReplace completed in 393720 CPU cycles (0 ms).
rllibby_StringReplace completed in 360708 CPU cycles (0 ms).

2) The source string was Forms.pas
Running with Length(text) = 264104
StringReplace completed in 111195762448 CPU cycles (69764 ms).
Alkisg_StringReplace completed in 17048240 CPU cycles (10 ms).
rllibby_StringReplace completed in 10058372 CPU cycles (6 ms).

3) The source string was MSHTML.pas
Running with Length(text) = 1983824
StringReplace completed in 7646110671088 CPU cycles (4801183 ms).
Alkisg_StringReplace completed in 85554848 CPU cycles (53 ms).
rllibby_StringReplace completed in 61707444 CPU cycles (38 ms).

**********************
Comments:
**********************
Tests #2 and #3 prove that you can't use Delphi StringReplace for strings > some Kb.
More than one hour was needed to replace ' ' with '&nbsp;' on MSHTML.PAS.

Also, on some tests (not the ones above) I saw errors in either mine or Russell's function, I'll have to test again to tell which one...  

Finally, my function is faster than Russell's ONLY when the search string is small OR has a lot of different chars that occur frequently in the text but without many actual matches.

Boyer-Moore algorithm has a best case better than O(N) and a worst case of O(N*M) if I'm not mistaken...

And something more significant:
Russell, if I try to replace ' ' with '&nbsp;' on MSHTML.PAS, I think that your function will allocate an 8Mb offset array, which is quite large... Any memory optimizations? :-)    
Your help has saved me hundreds of hours of internet surfing.
fblack61
Wim ten Brink

ASKER
Yeah, still need to upload the source of my test tool first. :-) Have been a bit too busy lately and hope to have more time available tonight. And I'll have to add Russell's solution to it today.

One important thing, though. I will have to make sure that my test-tool compares the generated output with what Delphi's StringReplace is providing. Slows it down a bit...
Russell Libby

Alkisg,

There was a small bug in my shift logic, where if the tail char (in the old pattern) also existed in the start of the old pattern then the correct shift was not made, allowing the code to miss possible matches. Corrected code (plus additions) follows later today...

And yes, the boyer-moore doesn't buy anything (its worse because of table init) when the old pattern is short (eg 1 byte) or when the string to have replacements on is itself short, again due to table init time. But, have you tried using it to replace full words in a string? Obviously, that is where it will shines.

Regarding the memory usage, you are correct in the amount it allocates. Unfortunately, there is *usually* a tradeoff between raw speed & memory usage. Having to reallocate just once or twice (either for result or for index array) will impact the overall speed and degrade performance. I am currently looking at another viable alternative , which is to scan the string twice. Once to pre-calculate the actual result size, and the second to perform the replacements. This would be the most optimal memory wise, as only one allocation is done, and its for exactly the amount of memory required for the reault. The catch is to make it work as fast, or faster, than it currently does. I am very close btw...

Russell


Russell Libby

Version that:

- Corrects invalid shift when last old pattern char matches, and same char exists earlier in the old pattern
- Performs one mem allocation, making it optimal in memory usage, as only the exact amount of memory required for the result is allocated.
- Sub-optimizes single char (old pattern) scans
- Implements the boyer-moore algo which improves scan time when the old pattern is >= 4 bytes (roughly)
- Operates as fast as the first version I posted (or so my tests indicate ;-)

-----

function rllibby_StringReplace(const S, OldPattern, NewPattern: String; Flags: TReplaceFlags): String;
var  lpShift:       Array [#0..#255] of Integer;
     lpCompare:     Array [#0..#255] of Char;
     lpszSource:    PChar;
     lpszDest:      PChar;
     cChar:         Char;
     dwCount:       Integer;
     dwSource:      Integer;
     dwOld:         Integer;
     dwNew:         Integer;
     dwIndex:       Integer;

     function Compare(Pattern, Source: PChar; Length: Cardinal): Boolean;
     begin
        // Compare bytes
        while (Length > 0) do
        begin
           // Perform comparison using the case table
           if not(lpCompare[Source^] = lpCompare[Pattern^]) then break;
           // Increment the pointers
           Inc(Source);
           Inc(Pattern);
           // Decrement the length
           Dec(Length);
        end;
        // Return result of comparison
        result:=(Length = 0);
     end;

begin

  // Get lengths of source and old pattern string
  dwSource:=Length(S);
  dwOld:=Length(OldPattern);

  // Test source length
  if (dwSource < dwOld) or (dwSource = 0) then
     // Source string is smaller than old pattern, no possible match so return source
     result:=S
  // Test old pattern
  else if (dwOld = 0) then
     // No old pattern provided, just return source
     result:=S
  else
  begin
     // Get source pointer
     lpszSource:=Pointer(S);
     // Case handling determined now
     if (rfIgnoreCase in Flags) then
     begin
        // Fill comparison table
        for cChar:=#0 to #255 do
        begin
           // Update shift table
           lpShift[cChar]:=dwOld;
           // Check for lower case chars, store ordinal of uppercase char
           if (cChar in [#97..#122]) then
              lpCompare[cChar]:=Char(Byte(cChar) - 32)
           else
              lpCompare[cChar]:=cChar;
        end;
        // Set shift table
        for dwIndex:=1 to Pred(dwOld) do
        begin
           // Set shift for char
           lpShift[OldPattern[dwIndex]]:=dwOld - dwIndex;
           // Set shift for other case of char
           case OldPattern[dwIndex] of
              'A'..'Z' :  lpShift[Char(Byte(OldPattern[dwIndex]) + 32)]:=dwOld - dwIndex;
              'a'..'z' :  lpShift[Char(Byte(OldPattern[dwIndex]) - 32)]:=dwOld - dwIndex;
           end;
        end;
     end
     else
     begin
        // Fill comparison table
        for cChar:=#0 to #255 do
        begin
           // Update shift table
           lpShift[cChar]:=dwOld;
           // Set ordinal of char
           lpCompare[cChar]:=cChar;
        end;
        // Set shift table
        for dwIndex:=1 to Pred(dwOld) do lpShift[OldPattern[dwIndex]]:=dwOld - dwIndex;
     end;
     // Get the last char to test against
     cChar:=lpCompare[OldPattern[dwOld]];
     // Decrement the old pattern length
     Dec(dwOld);
     // Clear counter
     dwCount:=0;
     // Set starting index
     dwIndex:=dwOld;
     // Check for single char matching
     if (dwOld = 0) then
     begin
        // Match char
        while (dwIndex < dwSource) do
        begin
           // Single char match
           if (cChar = lpCompare[lpszSource[dwIndex]]) then
           begin
              // Update match count
              Inc(dwCount);
              // Break if only a single match required
              if not(rfReplaceAll in Flags) then break;
           end;
           // Increment the index
           Inc(dwIndex);
        end;
     end
     else
     begin
        // Match string
        while (dwIndex < dwSource) do
        begin
           // Run the table comparison
           if (cChar = lpCompare[lpszSource[dwIndex]]) then
           begin
              // Check pattern against the source
              if Compare(Pointer(OldPattern), @lpszSource[dwIndex - dwOld], Succ(dwOld)) then
              begin
                 // Update match count
                 Inc(dwCount);
                 // Check to see if we should continue
                 if not(rfReplaceAll in Flags) then break;
                 // Increment by length of old pattern
                 Inc(dwIndex, Succ(dwOld));
                 // Continue
                 Continue;
              end;
           end;
           // Update index by character shift
           Inc(dwIndex, lpShift[lpszSource[dwIndex]]);
        end;
     end;
     // Check count of matched items
     if (dwCount = 0) then
        // No matches found, just return the source string
        result:=S
     else
     begin
        // Get new pattern length
        dwNew:=Length(NewPattern);
        // Calculate the actual final destination size
        SetLength(result, dwSource + dwCount * (dwNew - Succ(dwOld)));
        // Get pointer to destination string
        lpszDest:=Pointer(result);
        // Reset count to use as location marker
        dwCount:=0;
        // Reset index for second scan / replace
        dwIndex:=dwOld;
        // Secondary pass to perform the actual replace
        while (dwIndex < dwSource) do
        begin
           // Run the table comparison
           if (cChar = lpCompare[lpszSource[dwIndex]]) then
           begin
              // Short circuit test for single byte old pattern
              if (dwOld = 0) or Compare(Pointer(OldPattern), @lpszSource[dwIndex - dwOld], Succ(dwOld)) then
              begin
                 // Check for data to move
                 while (dwCount < (dwIndex - dwOld)) do
                 begin
                    // Byte move
                    lpszDest^:=lpszSource[dwCount];
                    // Push destination pointer
                    Inc(lpszDest);
                    // Increment the counter
                    Inc(dwCount);
                 end;
                 // Reset move pointer
                 dwCount:=0;
                 // Move the new pattern
                 while (dwCount < dwNew) do
                 begin
                    // Push index
                    Inc(dwCount);
                    // Byte move
                    lpszDest^:=NewPattern[dwCount];
                    // Push destination pointer
                    Inc(lpszDest);
                 end;
                 // Update next counter index
                 dwCount:=Succ(dwIndex);
                 // Check to see if we should continue
                 if not(rfReplaceAll in Flags) then break;
                 // Increment by length of old pattern
                 Inc(dwIndex, Succ(dwOld));
                 // Continue
                 Continue;
              end;
           end;
           // Update index by character shift
           Inc(dwIndex, lpShift[lpszSource[dwIndex]]);
        end;
        // Calculate final move size
        Dec(dwSource, dwCount);
        // Move final block of data
        if (dwSource > 0) then Move(lpszSource[dwCount], lpszDest^, dwSource);
     end;
  end;

end;

Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Wim ten Brink

ASKER
Sources for the test tool can be found here, including results: http://www.workshop-alex.org/Sources/StringReplace/ :-)

Sorry, AlkiSQ, but your solution is as fast as StringReplace on my system for some reason. Then again, you came with the FastCode link from where I got another solution. So, so far this Q will cost me only 40 points, 20 for Russell and 20 for Alki's link. :-)
alkisg

I'm downloading your sources to see why...
Meanwhile, I'd appreciate it if you wrote my nickname as AlkisG, not AlkisQ! :-)
Damn hyperlinks, it looks like a q...

Russell, did you happen to test my solution in your system? Is it as slow as Delphi?

Cheers,
Alkis
alkisg

Alex, I think you have some error in your code, Flags *never* contain rfReplaceAll.

Put a MessageBox in the beginning of Alkisg_StringReplace, it never shows:
  if (rfReplaceAll in Flags) then
    MessageBox(0, 'rfReplaceAll was in Flags, Alkis was wrong', 'Alkisg_StringReplace', MB_OK);

And of course my function would be as slow as Delphi's in this case, cause as I told before, it's a quick patch, for one replacement I call sysutils.stringreplace! :-)
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
Wim ten Brink

ASKER
@AlkisG, Oops. I'll change the spelling... :-) Btw, I have updated my code a few moments ago (and will have to do it again to correct some spelling error :-P ) and the test results still don't change much. Russell and FastCode have very fast results. Your's still is just slightly faster than Delphi's function for some reason. Feel free to upgrade your version. ;-)

Once this post is posted here, it's time to download the new version again. :-) So go there, now. :-)

http://www.workshop-alex.org/Sources/StringReplace/Source.zip
alkisg

I don't think you've read my last post about the rfReplaceAll bug, so I just post for you to receive a notification email...
alkisg

The bug is somewhere in procedure StartTests, rfIgnoreCase gets set instead of rfReplaceAll.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Wim ten Brink

ASKER
Ah, okay. Another thing to fix then. Yeah, we're both posted about the same time so I missed your post. :-)
You're correct about that bug, though. I altered the way of calculating that Flags thing now and the results are very, very different now. Delphi ends up to be a lot slower and your code is suddenly a lot faster for some strange reason.

Also, I noticed that with Russell, there seemed to be a few results missing. So I now added more error checkings and as it turns out, when using GetTickCount() it sometimes turns out that Russell's solution have the same start and end time, resulting in exactly zero seconds. That's fast! :-)

Hmmm. Must upload the fixes again... This time I'll skip Delphi's StringReplace method. An additional /Sys commandline switch will be added in case you want to see Delphi's version in comparison too. And I adjusted the test configuration to do 32 different tests now. Or rather 8 tests repeated 4 times for every variance of the flags. These are then run 16 times for a nice average value. :-) And the first thing that I notice, Alkis, is that your code tends to actually slow down if you're not having to replace all occurrences.

Interesting...

Btw, the new version is uploaded now, including the new results.
Wim ten Brink

ASKER
@Alkis, you might want to find out why your code slows down so much when only replacing one single occurrance. Then again, a quick view already tells me why your solution was about as fast as the Delphi version. If you're not replacing ALL occurrences then you'd just use Delphi's StringReplace version instead. That's cheating! :-)

Of course, the problem I have for now is that I haven't built in a check to validate the results of all tests. I'll have to work on that.

And of course interesting to see how many of Russells tests were just too fast. :-) But you also had some results that were too fast to measure.
alkisg

The "slowing down" in my code is because ***I call SysUtils.StringReplace for just one replacement***.
Delphi may be some MS slower for one replacement, but I didn't care about that in my original StringReplaceAll.
But Delphi can be ***ONE HOUR*** slower if rfReplaceAll is set, and that's what I wanted to optimize... An application of mine included a text editor and I couldn't tell the users to wait one hour if they wanted to replace a string...

As I keep saying, AlkisG_StringReplace is but a quick patch to my original StringReplaceAll function to make it compatible with the contest:
  if not (rfReplaceAll in Flags) then           //Don't bother if only one replacement was needed!!!
  begin
    Result := SysUtils.StringReplace(S, OldPattern, NewPattern, Flags);
    Exit;
  end;

Hey, it's not cheating, I'm willing to accept last-place if rfReplaceAll is not set!!! :-) :-) :-)

OK, if I have some free time tomorrow, I'll make a version that works better for a single replacement, too... But really, saving only a few ms is not VERY significant in regular apps...
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Wim ten Brink

ASKER
Well, to be honest your solution has to be twice as fast as the regular StringReplace function. Since your solution is only faster than StringReplace in half the cases (when rfReplaceAll is set) your solution would technically never be at least twice as fast, if I'd look at the average results. :-)
But actually, it seems to me that if you also make it faster then you might have the fastest one until now. :-)

You are correct that just saving a few MS is not very significant but still, it makes life a bit more interesting. And don't worry. Your solution is good enough to deserve points. :-)

Am amazed how few people actually provided a faster solution though. Basically, I'm not looking for the fastest solution here, but for a solution which is at least twice as fast. Of course, the fastest one is rewarded as the answer and get the most points but still, I did expect a few more responses. :-)

And I think I need a bigger text file to test the functions with, since still too many are too fast. :-)
Russell Libby

>> Am amazed how few people actually provided a faster solution though. Basically, I'm not looking for the fastest solution here, but for a solution which is at least twice as fast. Of course, the fastest one is rewarded as the answer and get the most points but still, I did expect a few more responses. :-)

What were you expecting for 20 pts?.....

As to "fast", ie. "speed". This indicates to me that speed is more important to you than memory usage, and this is the crux of this sort of coding problem. If memory is not a concern, then one can allocate a result buffer based on a worst case scenario and then perform the scan / replace in one shot with no need for an index array or pre-scan. One of the fastest ways to go, but not very optimal memory wise. For this type of routine, speed is directly tied to the amount of memory you are willing to allocate to the problem.

Due to Alkisg's prompting, I had proposed a second version that is *the* most optimal memory wise, as only one allocation is done, and its right on the money regarding size. The approach required 2 scans though; so while very fast in most cases, there is no way it will win out in all tests.

All that being said, I can provide a final solution mixed on my original work along with other optimizations "borrowed" from the fast strings version that should prove to be an excellent "all around" performer. It uses a resizeable index array that is not allocated until the first match is made, optimizations for single char matching, boyer-moore for longer strings, and optimized pattern length (old = new) alloc/replacement. I had to manually test this against the others, as your benchmark will not compile on D5, but the results looked good. And due to the speed that some of these run at, it wouldn't hurt to run 10, 20, 50, etc iterations for each test and calc an average. (1 sampling is not enough, and is difficult to time accurately)

Russell

----

function rllibby_StringReplace(const S, OldPattern, NewPattern: String; Flags: TReplaceFlags): String;
var  lpCharMap:     Array [Char] of Char;
     lpShift:       Array [Char] of Integer;
     lpIndexes:     PIndexArray;
     lpszSource:    PChar;
     lpszDest:      PChar;
     cChar:         Char;
     dwStart:       Integer;
     dwAlloc:       Integer;
     dwCount:       Integer;
     dwSource:      Integer;
     dwOld:         Integer;
     dwNew:         Integer;
     dwIndex:       Integer;

     function Compare(Pattern, Source: PChar; Count: Cardinal): Boolean;
     begin
        // While string
        while (Count > 0) and (lpCharMap[Source^] = lpCharMap[Pattern^]) do
        begin
           // Increment the pointers
           Inc(Source);
           Inc(Pattern);
           // Decrement the count
           Dec(Count);
        end;
        // Return result of comparison
        result:=(Count = 0);
     end;

     procedure ResizeIndexArray;
     begin
        // Check for null start
        if (dwAlloc = 0) then
           // Set to base size
           dwAlloc:=ARRAY_BASE
        else
           // Set new allocation size (double up)
           Inc(dwAlloc, dwAlloc);
        // Realloc memory
        ReallocMem(lpIndexes, dwAlloc * SizeOf(Integer));
     end;

     function StrByteMove(Source, Dest: PChar; Count: Cardinal): PChar;
     begin
        // Set result
        result:=Dest;
        // While string
        while (Count > 0) do
        begin
           // Move char
           result^:=Source^;
           // Increment the pointers
           Inc(Source);
           Inc(result);
           // Decrement the count
           Dec(Count);
        end;
     end;

begin

  // Get lengths of parameter strings
  dwSource:=Length(S);
  dwOld:=Length(OldPattern);
  dwNew:=Length(NewPattern);

  // Test source length
  if (dwSource < dwOld) or (dwSource = 0) then
     // Source string is smaller than old pattern, no possible match so return source
     result:=S
  // Test old pattern
  else if (dwOld = 0) then
     // No old pattern provided, just return source
     result:=S
  else
  begin
     // Get source pointer
     lpszSource:=Pointer(S);
     // Set index array
     lpIndexes:=nil;
     // Set starting index array size
     dwAlloc:=0;
     // Resource protection
     try
        // Check for single char match
        if (dwOld = 1) then
        begin
           // Decrement the count
           Dec(dwOld);
           // Clear index
           dwIndex:=0;
           // Set match count
           dwCount:=0;
           // Check case sensitive
           if (rfIgnoreCase in Flags) then
           begin
              // Update char map
              for cChar:=#0 to #255 do
              begin
                 // Set char map
                 lpCharMap[cChar]:=cChar;
                 // Handle case insensitive
                 if (cChar in [#97..#122]) and (rfIgnoreCase in Flags) then Dec(Byte(lpCharMap[cChar]), 32);
              end;
              // Get char to match against
              cChar:=lpCharMap[OldPattern[1]];
              // Case insensitive
              while (dwIndex < dwSource) do
              begin
                 // Single char match
                 if (cChar = lpCharMap[lpszSource[dwIndex]]) then
                 begin
                    // Check for resize
                    if (dwCount = dwAlloc) then ResizeIndexArray;
                    // Set match index
                    lpIndexes^[dwCount]:=dwIndex;
                    // Update match count
                    Inc(dwCount);
                    // Break if only a single match required
                    if not(rfReplaceAll in Flags) then break;
                 end;
                 // Increment pointer
                 Inc(dwIndex);
              end;
           end
           else
           begin
              // Get char to match against
              cChar:=OldPattern[1];
              // Case insensitive
              while (dwIndex < dwSource) do
              begin
                 // Single char match
                 if (cChar = lpszSource[dwIndex]) then
                 begin
                    // Check for resize
                    if (dwCount = dwAlloc) then ResizeIndexArray;
                    // Set match index
                    lpIndexes^[dwCount]:=dwIndex;
                    // Update match count
                    Inc(dwCount);
                    // Break if only a single match required
                    if not(rfReplaceAll in Flags) then break;
                 end;
                 // Increment pointer
                 Inc(dwIndex);
              end;
           end;
        end
        else
        begin
           // Default shift and char map table
           for cChar:=#0 to #255 do
           begin
              // Set shift value
              lpShift[cChar]:=dwOld;
              // Set char map
              lpCharMap[cChar]:=cChar;
              // Handle case insensitive
              if (cChar in [#97..#122]) and (rfIgnoreCase in Flags) then Dec(Byte(lpCharMap[cChar]), 32);
           end;
           // Decrement the old pattern length
           Dec(dwOld);
           // Set shift
           dwCount:=dwOld;
           // Update the shift table
           for dwIndex:=1 to dwOld do
           begin
              // Get shift char
              cChar:=OldPattern[dwIndex];
              // Set shift for char
              lpShift[cChar]:=dwCount;
              // Set shift for reverse case of char
              if (rfIgnoreCase in Flags) then
              begin
                 // Check alpha chars
                 if (cChar in [#65..#90]) then
                    Inc(cChar, 32)
                 else if (cChar in [#97..#122]) then
                    Dec(cChar, 32);
                 // Update
                 lpShift[cChar]:=dwCount;
              end;
              // Update shift
              Dec(dwCount);
           end;
           // Get the last char to test against
           cChar:=lpCharMap[OldPattern[Succ(dwOld)]];
           // Clear match count
           dwCount:=0;
           // Set starting index
           dwIndex:=dwOld;
           // Match string
           while (dwIndex < dwSource) do
           begin
              // Run the table comparison
              if (cChar = lpCharMap[lpszSource[dwIndex]]) then
              begin
                 // Check pattern against the source (already matched last byte)
                 if Compare(Pointer(OldPattern), @lpszSource[dwIndex - dwOld], dwOld) then
                 begin
                    // Check for resize
                    if (dwCount = dwAlloc) then ResizeIndexArray;
                    // Set match index
                    lpIndexes^[dwCount]:=dwIndex - dwOld;
                    // Update match count
                    Inc(dwCount);
                    // Check to see if we should continue
                    if not(rfReplaceAll in Flags) then break;
                    // Increment by length of old pattern
                    Inc(dwIndex, Succ(dwOld));
                    // Continue
                    Continue;
                 end;
              end;
              // Increment by shift
              Inc(dwIndex, lpShift[lpszSource[dwIndex]])
           end;
        end;
        // Check count of matched items
        if (dwCount = 0) then
           // No matches found, just return the source string
           result:=S
        else
        begin
           // Reset old size
           Inc(dwOld);
           // Set starting marker
           dwStart:=0;
           // Check for equal match optimization
           if (dwNew = dwOld) then
           begin
              // Copy source
              result:=S;
              // Make unique
              UniqueString(result);
              // Get pointer to destination string
              lpszDest:=Pointer(result);
              // Perform the replace
              for dwIndex:=0 to Pred(dwCount) do
              begin
                 // Get match location
                 dwAlloc:=lpIndexes^[dwIndex];
                 // Push pointer
                 Inc(lpszDest, (dwAlloc - dwStart));
                 // Move new pattern
                 lpszDest:=StrByteMove(Pointer(NewPattern), lpszDest, dwNew);
                 // Update next start location
                 dwStart:=dwAlloc + dwOld;
              end;
           end
           else
           begin
              // Calculate the actual final destination size
              SetLength(result, dwSource + dwCount * (dwNew - dwOld));
              // Get pointer to destination string
              lpszDest:=Pointer(result);
              // Perform the replace
              for dwIndex:=0 to Pred(dwCount) do
              begin
                 // Get match location
                 dwAlloc:=lpIndexes^[dwIndex];
                 // Move data
                 lpszDest:=StrByteMove(@lpszSource[dwStart], lpszDest, (dwAlloc - dwStart));
                 // Move new pattern
                 lpszDest:=StrByteMove(Pointer(NewPattern), lpszDest, dwNew);
                 // Update next start location
                 dwStart:=dwAlloc + dwOld;
              end;
              // Calculate final move size
              Dec(dwSource, dwStart);
              // Move final block of data
              StrByteMove(@lpszSource[dwStart], lpszDest, dwSource);
           end;
        end;
     finally
        // Free index memory
        FreeMem(lpIndexes);
     end;
  end;

end;
Russell Libby

Required before the function:

type
  TIndexArray       =  Array [0..Pred(MaxInt div 4)] of Integer;
  PIndexArray       =  ^TIndexArray;

const
  ARRAY_BASE        =  1024;

---

Russell
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Wim ten Brink

ASKER
Am in-between computers right now. Am replacing my old 866MHz system for a dual-core system so there is a bit of a delay in me testing this all. :-)
Russell Libby


Any chance of closing this question? I would think 2 months is more than suffient time (1 month since last post)...

Russell

Wim ten Brink

ASKER
Yeah, you're right. Problem is, I wanted to test it on my new computer but unfortunately I only got it last week. Oh, well... I'll start comparing the solutions, determine the fastest ones and award points hopefully by tonight. (I finally have Delphi installed again!) :-)

Too bad that I didn't get many more responses though.
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
Wim ten Brink

ASKER
Okay, first let's check the competitors: mnasman suggested FastStrings. Russell and AlkisG provided code. And AlkisG also provided a link to FastCode. But I don't count them for this competition since I wanted code exactly like the StringReplace function. So the competition is mainly between FastStrings, Russell and Alki.

Now, for the tests I've waited for my new computer. It was ordered about a month ago but I only had it last week and have been busy setting it up completely. (Including moving a lot of files and installing Borland Delphi 2006 on it.) I have tested the code on a dual-core Intel Pentium 6700 processor at 2.66 GHz and with 3 Gigabytes of RAM. Yeah, I know... Really powerful. Actually, this system is so fast that running the test on the Windows.pas unit was just too fast for the system to measure. So I added this to my test-tool:
    for I := 1 to 8 do
      Origin := Origin + Origin;
And when you realise that Origin contains the full code of the Windows unit, you will also realise that the result is 256 times bigger. Windows.pas is 1.21 MB in size so it's about 260 megabytes of text that my tool has to process.

And processing it did, and still bloody fast. Most times the functions needed less of a second to replace the texts. Well, except for AlkisG's code. Then again, in some other situations, AlkisG's code was a lot faster than the FastStrings function or Russells function. I also ran every test 512 times to get nice average values and although it took my system about half an hour or so to run all tests, it was quite clear that Russell and FastStrings were a lot faster on average. Then again, my code seems to have some minor problems with the amount of clock ticks that it counts. I often ended up with negative values. Just means that counting ticks isn't reliable on a system this fast with such a large amount of data.

Still, in clock cycles it was the FastStrings solution that ended with 608 cycles as the fastest one. That's milliseconds, btw. Russell had an average speed of 945 and AlkisG has a speed of 2057. But now on an individual test basis...

I created 32 different testcases. Or actually, 8 different tests with different options for case-sensitive and ReplaceAll. Simple tests too, like replacing a common word, a single character, a long text which was a bit uncommon and an even longer text that is to be deleted, a text that doesn't occur in the file, a text that was replaced by itself, a common word which would be replaced by a longer word and finally something that occurs only one time in the original sourcefile. (But 256 times in the test.)

And now the fun part... If you look at every separate test then it's not very clear which one is the fastest. In the first 8 tests, AlkisG is slightly faster and Russell is slightly slower than the FastStrings solution. The next 8 tests are about a tie between AlkisG and FastStrings but Russell is slower. But in the third and fourth set of 8 tests, Russell and FastStrings are both clearly faster than AlkisG solution. In some cases even 10 times faster! And it's about a tie between Russell and FastStrings in these situations.

Still, Russells solution is extremely fast in test 5, where the text to be found doesn't exist. About four times faster even. Russell discovers within 100 ms that the string isn't in the file.

So based upon these results, I would suggest that this would the the proper order:
1) mnasman with the FastStrings solution
2) rllibby
3) AlkisG

Final judgement will be this weekend once I've improved the statistics that are generated by my code. (Have to clear those negative values from the clockticks. :-) I do wonder if I should reward mnasman though since he didn't provide the code himself but just provided a link. Still, unless anyone objects, I will count it as his solution since he did provide a link to the code.
alkisg

Alex, which function did you use from FastStrings? This one?
function FastAnsiReplace(const S, OldPattern, NewPattern: string; Flags: TReplaceFlags): string;

Russell had to parse the string twice to keep the code from allocating too much RAM...
FastAnsiReplace does this:
    BufferSize := (SourceLength * ReplaceLength) div FindLength;
    if BufferSize > (SourceLength * 10) then
      BufferSize := SourceLength * 10;

Allocating a string 10 times bigger than you may actually need is not good practice... Try to run a test with a string length = 1/3 of your RAM, I bet you'll see poor performance... (OK, you have A LOT of RAM, but not everyone does... :-))


>> And AlkisG also provided a link to FastCode. But I don't count them for this competition since I wanted code exactly like the StringReplace function.

What do you mean? The parameters are the same, e.g.
function StringReplace_JOH_PAS_4(const S, OldPattern, NewPattern: AnsiString; Flags: TReplaceFlags): AnsiString;

The only difference I see is that FastCode functions DON'T work with multibyte chars, unlike Delphi's or FastStrings.
But neither mine nor Russell's functions work with multibyte chars as well...

Congrats on your new system!!! :-)
Wim ten Brink

ASKER
The function used is the one in the ZIP file that I've linked to. It's the exact test that I've ran, with one minor change: I made the original text bigger. :-) The function is called StringReplace_JOH_PAS_4.

And running a test with one-third of my RAM? Wow, means I have to expand the text to over a gigabyte. :-)

I should actually run that test again but this time replacing the word 'begin' with a very long text. (Something over 4 kilobytes or so.) And check how the string would explode to something huge. :-) Makes a better test for all the RAM I have.

I'll take a second look at the FastCode functions too this weekend. Maybe I just looked at the wrong function. :-) Then again, I have to adjust my test-tool anyways tonight and tomorrow to stop the tick count from overflowing. Am also wondering why my dual-core system is at 99% while running this test since it's not a multithreaded one. I think I need to tell the application to just use one of the two CPU's. And some other minor adjustments so this test will run nicely on my new system.

And thanks! I'm very happy with my new system. Still have to get used to it's speed, though. :-)

Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Mohammed Nasman

That's interesting Alex :-)
Btw, I will not be sad if you don't give me points, rllibby and AlkisG, give you more times and were working to give you better soluation, I just gave a link ;-)
Russell Libby


Doesn't matter to me either way at this point...

As I have already stated, its a common problem of trading memory for speed. To that end I provided 3 versions: one that allocated a worst case index array, one that performed a double scan, and one that used a resizeable index array.

I did find it intersting that no one picked up on a simple optimization for tests 6, 14, 22, and 30 though. If both the old and new pattern are the same (case must match), then nothing needs to be done. Modifying the routine drops those test numbers down to almost zero.

Eg:

  // Get lengths of parameter strings
  dwSource:=Length(S);
  dwOld:=Length(OldPattern);
  dwNew:=Length(NewPattern);

  // Test source length
  if (dwSource < dwOld) or (dwSource = 0) then
     // Source string is smaller than old pattern, no possible match so return source
     result:=S
  // Test old pattern
  else if (dwOld = 0) then
     // No old pattern provided, just return source
     result:=S
  // Check for equal match
  else if (dwOld = dwNew) and (CompareStr(OldPattern, NewPattern) = 0) then
     // Old and new patterns are the same
     result:=S
  else
  begin
     ...
Wim ten Brink

ASKER
@mnasman, it's not about the points anyway but you did provide a bloody fast solution.

@rlibby, what saddens me a bit is the fact that just a few people participated, as if there's just no faster solution. Or perhaps no one is really that much interested in optimizing for speed. :-)

Btw, I didn't have much time this weekend to do a full test since I still had to set up some things on my new computer. It's amazing, the amount of software that I have collected in the past two decades and the number of things I'm still using. :-)

Anyway, since it's no use to keep this question open forever I think it's easier to just give every solution with code 100 points. With three persons providing solutions, this means a total of 300 points. Question is, which one to accept as answer...
So I looked at the links again and on the FastStrings site I saw this comment: "Make sure you check out the Delphi FastCode project, it has a faster StringReplace routine and also some fast Pos routines too."
Well, apparantly FastCode is faster than FastStrings. And FastStrings was fastest in general. Based on this, it's clear that the FastCode answer is to be considered fastest, even though I haven't tested it. So AlkisQ is hereby the "winner". (Even though everyone gets the same amount of points...)

Too bad there were just a few participants...
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes