Link to home
Start Free TrialLog in
Avatar of daveky
daveky

asked on

stringbuilder efficiency

This question has two parts. I have a number of "controls" that build html strings from data for display in a browser. I use as stringBuilder for efficency. The default Capacity for a stringbuilder is only 16 and when you go over this the memory is re-allocated and doubled. This is a waste if you do it a lot and the stringbuilder capacity can get large very quickly. but if you set it overly large and its not big enough you eat too much memory. Is it best to try an estimate the size of a resultant string-build or go way over the top? Some thoughts/guidelines would be appreciated. Also whan appending to the stringbuilder is it best to use lots and lots of small statements. ie:

Use this:

stringbuilder.append("<p>")
stringbuilder.append(aStringVariable)
stringbuilder.append("</p><p>")
stringbuilder.append(anotherStringVariable)
stringbuilder.append("</p>")

And NOT this

stringbuilder.append("<p>" & aStringVariable & "</p><p>" & anotherStringVariable & "</p>")

My thinking is that there may be a nasty string concatenation going on before a string is appended to the string builder. Something like:

dim s as string
s = s & "<p>"
s = s & aStringVariable
s = s & "</p><p>"
s = s & anotherStringVariable
s = s & "</p>"
stringbuilder.append(s)

Cheers!
Avatar of bramsquad
bramsquad
Flag of United States of America image

"The String object is immutable. Every time you use one of the methods in the System.String class, you create a new string object in memory, which requires a new allocation of space for that new object."

i got this from ms-help, what its saying is that in memory every string created inherently creates a new object.

so yes, your first example is going to be the most efficient, becuause it is only using one object.  if you have enough iterations of the call (inside a loop for example) then it would be better to go with your first example.

however, i think the readibility is compromised....

~b

Avatar of daveky
daveky

ASKER

I thought it would be too, but I did not know. I had seen the documentation you quoted. But try throwing a quick project together. (I sed a windows forms app in vb)

Dim LoopCounter As Integer = 1000000
Private Sub btnBad_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnBad.Click
  Dim ts As New TimeSpan()
  Dim x As Integer
  lblBad.Text = String.Empty
  Dim s As New System.Text.StringBuilder()
  Dim starttime As DateTime = DateTime.Now
  For x = 0 To LoopCounter
    s.Append("a" & "a" & "a" & "a" & "a" & "a" & "a" & "a" & "a" & "a")
  Next
  Dim endtime As DateTime = DateTime.Now
  ts = endtime.Subtract(starttime)
  lblBad.Text = ts.TotalMilliseconds.ToString
End Sub

Private Sub btnGood_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnGood.Click
  Dim ts As New TimeSpan()
  Dim x As Integer
  lblGood.Text = String.Empty
  Dim s As New System.Text.StringBuilder()
  Dim starttime As DateTime = DateTime.Now
  For x = 0 To LoopCounter
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
  Next
  Dim endtime As DateTime = DateTime.Now
  ts = endtime.Subtract(starttime)
  lblGood.Text = ts.TotalMilliseconds.ToString
End Sub

And the second one is WAY slower. Does the compiler convert the "a" & "a" etc into stringbuilder code? Also changing the constructor to be:

Dim s As New System.Text.StringBuilder(10000000)

doesn't seem to make much difference at all. It should have had to do 20 re-allocations of large strings. Could it be the performance benefit is limited?
When I use stringbuilder to concatenate strings, I use the append method based on what seems to be the most organized approach. Not necessarliy trying to use lots of appends, or few appends. Moreover, I find that using the AppendFormat method is cleaner than Append method.

AppendFormat makes the code easier to read...

stringbuilder.appendformat("<p>{0}</p><p>{1}</p>",  aStringVariable, anotherStringVariable)

or

stringbuilder.appendformat("<p>{0}</p>", aStringVariable).appendFormat("<p>{1}</p>", anotherStringVariable)

My response focuses on how to use the  stringbuilder to make code easier to read / maintain. Using stringuilder is also faster than building immutable strings, no matter how the stringbuilder is initialized. I've read that somewhere...
Avatar of daveky

ASKER

I'd never noticed appendformat before. But using my little windows app (see above) and doing appendformat it's the worst of all; up to six times slower that the "a" & "a" etc version! That is surprising.

Dim ts As New TimeSpan()
Dim x As Integer
lblBad.Text = String.Empty
Dim s As New System.Text.StringBuilder(10000000)
Dim s1 As String = "a{0}b{1}c{2}d{3}e{4}"
Dim starttime As DateTime = DateTime.Now
For x = 0 To LoopCounter
  s.AppendFormat(s1, "a", "b", "c", "d", "e")
  ' OR
  ' s.AppendFormat("a{0}b{1}c{2}d{3}e{4}", "a", "b", "c", "d", "e")
Next
Dim endtime As DateTime = DateTime.Now
ts = endtime.Subtract(starttime)
lblBad.Text = ts.TotalMilliseconds.ToString

It's very readable though. I suppose it depends on how many you are doing and how maintainable you code needs to be.

Yes, Stringbuilder *is* always more efficient thatn immutable string, and so I would have expected the "a" & "a" etc version to be slower than the multiple append version. I am worried I am missing something here...
You could always run ildasm on the compiled executable and see what kind of optimizations the compiler is performing for you.

It would make sense that the time factor would increase, because you have the overhead of all those method calls, which you do not in your original version. However, like what has already been mentioned, the efficiency of the StringBuilder (memory-wise) would be much better.

Matt
ASKER CERTIFIED SOLUTION
Avatar of Bob Learned
Bob Learned
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Your test code above (btnBad_Click and btnGood_Click) only tests the execution speed, and only for one-time through.

The real inefficiency of the String class is because it allocates more memory for each catenation, and then eventually the garbage collector (GC) needs to clean up all those little temporary strings.

Your test only includes the time for the allocation, but ignores the cost of the garbage collection.  The clean-up's far more time consuming than the allocation.  It's difficult to set up a configuration (memory-limited), and write and run a test program that properly tests this.

Also, it's possible that the compiler could optimize ("a" & "a" & "a" & "a" & "a" & "a" & "a" & "a" & "a" & "a") into a single constant string.    mpf1748's ildasm recommendation might reveal that.
Well, Bill, howdy stranger, haven't seen you around these parts in a while.  Some very valid points, I couldn't have said them better myself.

Bob
[ Yep, I've definately had my attention elsewhere.  It's good to see some of the same old names here, as well as some of the creative new nicks. ]

I understand that Microsoft is including a lot more testing and performance tools in future Visual Studio products.  These tools will probably help to test a situation such as this.  I'm a little fuzzy as to exactly what release and versions these will tools will ship with.  I think with VSTS - Visual Studio Team System, but unsure about other versions.
Avatar of daveky

ASKER

farsight: I'd hadn't even thought about GC. Thanks for clouding the issue! :D
After finally managing to get to see some IL* I notice the the compiler does indeed create a single string from "a" & "b" etc.

Well, I have created three functions: a single append, a multi append, and a formatappend. I have looked at the IL and am none the wiser. Pehaps someone could offer some wisdom...?

============================

Function SingleAppend(ByVal a As String, ByVal b As String) As String
      Dim s As New System.Text.StringBuilder()
      s.Append(":" & a & ":" & b & ":")
      Return s.ToString
End Function

.method public instance string SingleAppend(string a, string b) cil managed
{
      // Code Size: 65 byte(s)
      .maxstack 4
      .locals (
            [mscorlib]System.Text.StringBuilder builder1,
            string text1,
            string[] textArray1)
      L_0000: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor()
      L_0005: stloc.0
      L_0006: ldloc.0
      L_0007: ldc.i4.5
      L_0008: newarr string
      L_000d: stloc.2
      L_000e: ldloc.2
      L_000f: ldc.i4.0
      L_0010: ldstr ":"
      L_0015: stelem.ref
      L_0016: ldloc.2
      L_0017: ldc.i4.1
      L_0018: ldarg.1
      L_0019: stelem.ref
      L_001a: ldloc.2
      L_001b: ldc.i4.2
      L_001c: ldstr ":"
      L_0021: stelem.ref
      L_0022: ldloc.2
      L_0023: ldc.i4.3
      L_0024: ldarg.2
      L_0025: stelem.ref
      L_0026: ldloc.2
      L_0027: ldc.i4.4
      L_0028: ldstr ":"
      L_002d: stelem.ref
      L_002e: ldloc.2
      L_002f: call string string::Concat(string[])
      L_0034: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
      L_0039: pop
      L_003a: ldloc.0
      L_003b: callvirt instance string [mscorlib]System.Text.StringBuilder::ToString()
      L_0040: ret
}

============================

Function MultiAppend(ByVal a As String, ByVal b As String) As String
      Dim s As New System.Text.StringBuilder()
      s.Append(":")
      s.Append(a)
      s.Append(":")
      s.Append(b)
      s.Append(":")
      Return s.ToString
End Function

method public instance string MultiAppend(string a, string b) cil managed
{
      // Code Size: 65 byte(s)
      .maxstack 2
      .locals (
            string text1,
            [mscorlib]System.Text.StringBuilder builder1)
      L_0000: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor()
      L_0005: stloc.1
      L_0006: ldloc.1
      L_0007: ldstr ":"
      L_000c: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
      L_0011: pop
      L_0012: ldloc.1
      L_0013: ldarg.1
      L_0014: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
      L_0019: pop
      L_001a: ldloc.1
      L_001b: ldstr ":"
      L_0020: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
      L_0025: pop
      L_0026: ldloc.1
      L_0027: ldarg.2
      L_0028: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
      L_002d: pop
      L_002e: ldloc.1
      L_002f: ldstr ":"
      L_0034: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
      L_0039: pop
      L_003a: ldloc.1
      L_003b: callvirt instance string [mscorlib]System.Text.StringBuilder::ToString()
      L_0040: ret
}

============================

Function FormatAppend(ByVal a As String, ByVal b As String) As String
      Dim s As New System.Text.StringBuilder()
      s.AppendFormat(":{0}:{1}:", a, b)
      Return s.ToString
End Function

.method public instance string FormatAppend(string a, string b) cil managed
{
      // Code Size: 27 byte(s)
      .maxstack 4
      .locals (
            string text1,
            [mscorlib]System.Text.StringBuilder builder1)
      L_0000: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor()
      L_0005: stloc.1
      L_0006: ldloc.1
      L_0007: ldstr ":{0}:{1}:"
      L_000c: ldarg.1
      L_000d: ldarg.2
      L_000e: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object, object)
      L_0013: pop
      L_0014: ldloc.1
      L_0015: callvirt instance string [mscorlib]System.Text.StringBuilder::ToString()
      L_001a: ret
}

============================

With respect to farsight's comment, which has the least detritus left over for the GC to clean up? It's all very curious how they can all produce such differing code isn't it?

* I users Luts Roeder's Reflector - I recommend it if you understand this sort of thing. http://www.aisto.com/roeder/dotnet/
[ Wishing I knew more about IL ...
I do find it interesting that SingleAppend builds a String array and passes it to String.Concat().
It seems like   ":" & a & ":" & b & ":"   is equivalent to:
 Dim tmp As String = String.Concat(New String() {":", a, ":", b, ":"})
]

GC.GetTotalMemory(False) will give you at least an estimate of the number of bytes to be cleaned up.  Unfortunately, that's only part of the story, because it cleans up chunks at a time, not bytes at a time.   A megabyte of 16-byte chunks will take a lot longer than a megabyte of 16K-byte chunks.

You could also add:
  GC.Collect()
  GC.WaitForPendingFinalizers()
and time that to determine how long it takes to clean up each method.  That's something.
But that, too, can be misleading, as GC behavior in the real program will be affected by other objects that need to be deallocated, the total memory picture, and many other factors.

[ This is one of the few times I'll recommend using GC.Collect().  It's only in a sample program used to test performance.  With few exceptions, in any real application, I recommend letting .NET manage the GC.  It's designed to do so.  Only if a developer has an extremely good idea of how memory allocation works and the GC works in .NET, should he/she consider calling GC.Collect directly.  Even then, I'd test to see if it's actually improving performance, or just making it worse. ]
Getting back to the question itself, my rule of thumb is:
(1) If I'm only appending one or two times, I just use a string.  [  s &= "a"  ; s &= "b" ]
(2) If I'm appending multiple times (3+)  I use StringBuilder.
(3) Sometimes I opt for readability rather than performance.  Especially with something like:
    > stringbuilder.append("<p>" & aStringVariable & "</p><p>" & anotherStringVariable & "</p>")
(4) I don't get overly concerned about performance unless it's in a bottleneck that's identified by thorough testing and measurement.  If it's a bottleneck, I'll focus all my performance-enhancing efforts on that area.

I would recommend creating the StringBuilder objects to hold a reasonable capacity for your actual use.  (Why not?)
For example:
   Dim sb As StringBuilder = New StringBuilder(4096)
You could use the 80% rule to come up with a number ... I'm guessing that 80% of your web pages are less that 4096 characters.  Then only 20% of your pages will require a second allocation, and probably only one.  This estimation should be done on a case by case basis.  For example, if you're just building up a single row an HTML table, 128 characters may be fine.

As a complete judgement call, I would not have the program spend a lot of time computing an estimated capacity.  That computation may well take longer than an extra allocation or two.

> This is a waste if you do it a lot and the stringbuilder capacity can get large very quickly. but if you set it overly large and its not big enough you eat too much memory. Is it best to try an estimate the size of a resultant string-build or go way over the top?

LOL  This reminds me of the recent cell-phone-service TV advertisements that force the kids to choose how much time they'll be playing, with potential overcharges.  They don't think it's very fair having to guess.  (I guess the ad's not working that well, as I have no clue what cell-phone-service company is advertising!)
Avatar of daveky

ASKER

I'm still here thinking! (I thought I had better mention it just in case anybody thought I had taken the advice and abandoned the question. ;D)

farsight: I find it interesting that these methods do different things in the IL too. As for your second rule of thumb I found this from Dr. Dotnetsky on eggheadcafe.com :

-- begin quote --

Myth:
StringBuilder.Append() is always faster than String.Concat().

Rationale:
String operations always create new string objects, whereas StringBuilder uses internal magic to avoid expensive allocations.

Truth:
Using a StringBuilder is sometimes (even usually) faster than using simple string concatenation, but there is a performance cutover point. Various people have found this to be around the 5 - 10 concatenations mark. The bottom line being that replacing this:

string fullName = firstName + " " + lastName;

with this:

StringBuilder builder = new StringBuilder();
builder.Append(firstName);
builder.Append(" ");
builder.Append(lastName);
string fullName = builder.ToString();

is rather pointless. Remember that StringBuilder is itself reference type, and has the associated allocation / deallocation costs.

-- end quote --

Note: "5 - 10 concatenations ".

TheLearnedOne: Unfortunatly I am not in a position to write directly to the stream with response.write (HtmlTextWriter.write ?) as I am just asked to provide a useable html string. I have no references to the request/response objects, neither can I create a control to be inserted into the control tree for "normal" rendering. (Also I guess your 'exerpt' pretty much agrees with Dr. Dotnesky's veiw of things.)

---------------------------------

I thought this question would be quite straighforward. I do a lot of this string stuff and I was hoping to get an extra speed boost somewhere, but it looks like the effort involved may not be worth it. Still, I am going to look at Compuware's profiler and Microsoft's too in the hope of shedding some light. I'll keep you all posted before handing out any points....
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of daveky

ASKER

Yes, it would be interesting to see how he got to the figures. I "trust" his judgment though as I have read many an article from that site. But you're right, facts are better.

As it happens I do pass around stringbuilder objects. Pehaps I should do it more. I just got bitten with a lost day when I realised I has passed them ByRef so I could reuse them without any need to. It took me a while to redo all those parameters. (And of course there were one or two that *needed* to be ByRef - grrrr)

Streams? hmmm... (Don't do it to me; I am already struggling to keep the worms in their can!)

I realise the act of measuring changes the measurement. The Compuware profiler does timing of functions and their children. I guess so you can see whether code changes have made speed improvements or not. The MS one does GC for all levels. (At least this is what they promise.) I guess between the two of them I should be OK. I just hope I get time to try them out and they don't kill my machine in the process!
Avatar of daveky

ASKER

Well no, TheLearnedOne, not abandoned just forgotten. I am very busy at the moment! :D

I have decided to split the points; mostly to farsight for all his effort, because he specifically tried to answer my questions, for the tip "Create _ONE_ StringBuilder object" and his reasoning on what size to initialise it to; also some to TheLearnedOne for pointing out that "Response.Write is fastest" (I have been able argue with other people for changes so I geta reference to HtmlTextWriter for performance reasons and I have abandoned many a stringbuilder with good speed increases, result!).

I think 'my' final conclusions are:
1) Use a sinle stringbuilder and pass a reference around.
2) Try to create it with a reasonable size but don't try to calculate this size on the fly.
3) Use as few appends as possible (rather than more) and don't use appendformat.
4) Pass the buck to a stream where possible to avoid creating them at all.

Cheers!