Solved

stringbuilder efficiency

Posted on 2004-09-01
17
1,403 Views
Last Modified: 2012-05-05
This question has two parts. I have a number of "controls" that build html strings from data for display in a browser. I use as stringBuilder for efficency. The default Capacity for a stringbuilder is only 16 and when you go over this the memory is re-allocated and doubled. This is a waste if you do it a lot and the stringbuilder capacity can get large very quickly. but if you set it overly large and its not big enough you eat too much memory. Is it best to try an estimate the size of a resultant string-build or go way over the top? Some thoughts/guidelines would be appreciated. Also whan appending to the stringbuilder is it best to use lots and lots of small statements. ie:

Use this:

stringbuilder.append("<p>")
stringbuilder.append(aStringVariable)
stringbuilder.append("</p><p>")
stringbuilder.append(anotherStringVariable)
stringbuilder.append("</p>")

And NOT this

stringbuilder.append("<p>" & aStringVariable & "</p><p>" & anotherStringVariable & "</p>")

My thinking is that there may be a nasty string concatenation going on before a string is appended to the string builder. Something like:

dim s as string
s = s & "<p>"
s = s & aStringVariable
s = s & "</p><p>"
s = s & anotherStringVariable
s = s & "</p>"
stringbuilder.append(s)

Cheers!
0
Comment
Question by:daveky
  • 6
  • 5
  • 2
  • +3
17 Comments
 
LVL 8

Expert Comment

by:bramsquad
ID: 11953776
"The String object is immutable. Every time you use one of the methods in the System.String class, you create a new string object in memory, which requires a new allocation of space for that new object."

i got this from ms-help, what its saying is that in memory every string created inherently creates a new object.

so yes, your first example is going to be the most efficient, becuause it is only using one object.  if you have enough iterations of the call (inside a loop for example) then it would be better to go with your first example.

however, i think the readibility is compromised....

~b

0
 
LVL 4

Author Comment

by:daveky
ID: 11954420
I thought it would be too, but I did not know. I had seen the documentation you quoted. But try throwing a quick project together. (I sed a windows forms app in vb)

Dim LoopCounter As Integer = 1000000
Private Sub btnBad_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnBad.Click
  Dim ts As New TimeSpan()
  Dim x As Integer
  lblBad.Text = String.Empty
  Dim s As New System.Text.StringBuilder()
  Dim starttime As DateTime = DateTime.Now
  For x = 0 To LoopCounter
    s.Append("a" & "a" & "a" & "a" & "a" & "a" & "a" & "a" & "a" & "a")
  Next
  Dim endtime As DateTime = DateTime.Now
  ts = endtime.Subtract(starttime)
  lblBad.Text = ts.TotalMilliseconds.ToString
End Sub

Private Sub btnGood_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnGood.Click
  Dim ts As New TimeSpan()
  Dim x As Integer
  lblGood.Text = String.Empty
  Dim s As New System.Text.StringBuilder()
  Dim starttime As DateTime = DateTime.Now
  For x = 0 To LoopCounter
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
    s.Append("a")
  Next
  Dim endtime As DateTime = DateTime.Now
  ts = endtime.Subtract(starttime)
  lblGood.Text = ts.TotalMilliseconds.ToString
End Sub

And the second one is WAY slower. Does the compiler convert the "a" & "a" etc into stringbuilder code? Also changing the constructor to be:

Dim s As New System.Text.StringBuilder(10000000)

doesn't seem to make much difference at all. It should have had to do 20 re-allocations of large strings. Could it be the performance benefit is limited?
0
 
LVL 8

Expert Comment

by:gregasm
ID: 11954550
When I use stringbuilder to concatenate strings, I use the append method based on what seems to be the most organized approach. Not necessarliy trying to use lots of appends, or few appends. Moreover, I find that using the AppendFormat method is cleaner than Append method.

AppendFormat makes the code easier to read...

stringbuilder.appendformat("<p>{0}</p><p>{1}</p>",  aStringVariable, anotherStringVariable)

or

stringbuilder.appendformat("<p>{0}</p>", aStringVariable).appendFormat("<p>{1}</p>", anotherStringVariable)

My response focuses on how to use the  stringbuilder to make code easier to read / maintain. Using stringuilder is also faster than building immutable strings, no matter how the stringbuilder is initialized. I've read that somewhere...
0
 
LVL 4

Author Comment

by:daveky
ID: 11954860
I'd never noticed appendformat before. But using my little windows app (see above) and doing appendformat it's the worst of all; up to six times slower that the "a" & "a" etc version! That is surprising.

Dim ts As New TimeSpan()
Dim x As Integer
lblBad.Text = String.Empty
Dim s As New System.Text.StringBuilder(10000000)
Dim s1 As String = "a{0}b{1}c{2}d{3}e{4}"
Dim starttime As DateTime = DateTime.Now
For x = 0 To LoopCounter
  s.AppendFormat(s1, "a", "b", "c", "d", "e")
  ' OR
  ' s.AppendFormat("a{0}b{1}c{2}d{3}e{4}", "a", "b", "c", "d", "e")
Next
Dim endtime As DateTime = DateTime.Now
ts = endtime.Subtract(starttime)
lblBad.Text = ts.TotalMilliseconds.ToString

It's very readable though. I suppose it depends on how many you are doing and how maintainable you code needs to be.

Yes, Stringbuilder *is* always more efficient thatn immutable string, and so I would have expected the "a" & "a" etc version to be slower than the multiple append version. I am worried I am missing something here...
0
 
LVL 3

Expert Comment

by:mpf1748
ID: 11955148
You could always run ildasm on the compiled executable and see what kind of optimizations the compiler is performing for you.

It would make sense that the time factor would increase, because you have the overhead of all those method calls, which you do not in your original version. However, like what has already been mentioned, the efficiency of the StringBuilder (memory-wise) would be much better.

Matt
0
 
LVL 96

Accepted Solution

by:
Bob Learned earned 50 total points
ID: 11956055
Some more information:

O p t i m i z e    S t r i n g    P e r f o r m a n c e  
http://getdotnetco.web101.discountasp.net/GdncStore/free/Articles/Optimize%20String%20Performance.htm

Excerpt:

  "However, if you do not expect to manipulate a string very often, String is a better choice. This is because StringBuilder has one-time overhead that String does not. At creation time, the StringBuilder constructor takes more time than the String constructor."

Also:

ASP Speed Tricks
http://www.somacon.com/aspdocs/ASPSpeedTricks.pdf

Excerpt:
Concatenation may be removed easily by using Response.write directly in the loop. (In ASP.Net, the StringBuilder class can be used for creating long strings, but Response.write is fastest.)

Bob

0
 
LVL 12

Expert Comment

by:farsight
ID: 11970815
Your test code above (btnBad_Click and btnGood_Click) only tests the execution speed, and only for one-time through.

The real inefficiency of the String class is because it allocates more memory for each catenation, and then eventually the garbage collector (GC) needs to clean up all those little temporary strings.

Your test only includes the time for the allocation, but ignores the cost of the garbage collection.  The clean-up's far more time consuming than the allocation.  It's difficult to set up a configuration (memory-limited), and write and run a test program that properly tests this.

Also, it's possible that the compiler could optimize ("a" & "a" & "a" & "a" & "a" & "a" & "a" & "a" & "a" & "a") into a single constant string.    mpf1748's ildasm recommendation might reveal that.
0
 
LVL 96

Expert Comment

by:Bob Learned
ID: 11973069
Well, Bill, howdy stranger, haven't seen you around these parts in a while.  Some very valid points, I couldn't have said them better myself.

Bob
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 12

Expert Comment

by:farsight
ID: 11974006
[ Yep, I've definately had my attention elsewhere.  It's good to see some of the same old names here, as well as some of the creative new nicks. ]

I understand that Microsoft is including a lot more testing and performance tools in future Visual Studio products.  These tools will probably help to test a situation such as this.  I'm a little fuzzy as to exactly what release and versions these will tools will ship with.  I think with VSTS - Visual Studio Team System, but unsure about other versions.
0
 
LVL 4

Author Comment

by:daveky
ID: 11974769
farsight: I'd hadn't even thought about GC. Thanks for clouding the issue! :D
After finally managing to get to see some IL* I notice the the compiler does indeed create a single string from "a" & "b" etc.

Well, I have created three functions: a single append, a multi append, and a formatappend. I have looked at the IL and am none the wiser. Pehaps someone could offer some wisdom...?

============================

Function SingleAppend(ByVal a As String, ByVal b As String) As String
      Dim s As New System.Text.StringBuilder()
      s.Append(":" & a & ":" & b & ":")
      Return s.ToString
End Function

.method public instance string SingleAppend(string a, string b) cil managed
{
      // Code Size: 65 byte(s)
      .maxstack 4
      .locals (
            [mscorlib]System.Text.StringBuilder builder1,
            string text1,
            string[] textArray1)
      L_0000: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor()
      L_0005: stloc.0
      L_0006: ldloc.0
      L_0007: ldc.i4.5
      L_0008: newarr string
      L_000d: stloc.2
      L_000e: ldloc.2
      L_000f: ldc.i4.0
      L_0010: ldstr ":"
      L_0015: stelem.ref
      L_0016: ldloc.2
      L_0017: ldc.i4.1
      L_0018: ldarg.1
      L_0019: stelem.ref
      L_001a: ldloc.2
      L_001b: ldc.i4.2
      L_001c: ldstr ":"
      L_0021: stelem.ref
      L_0022: ldloc.2
      L_0023: ldc.i4.3
      L_0024: ldarg.2
      L_0025: stelem.ref
      L_0026: ldloc.2
      L_0027: ldc.i4.4
      L_0028: ldstr ":"
      L_002d: stelem.ref
      L_002e: ldloc.2
      L_002f: call string string::Concat(string[])
      L_0034: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
      L_0039: pop
      L_003a: ldloc.0
      L_003b: callvirt instance string [mscorlib]System.Text.StringBuilder::ToString()
      L_0040: ret
}

============================

Function MultiAppend(ByVal a As String, ByVal b As String) As String
      Dim s As New System.Text.StringBuilder()
      s.Append(":")
      s.Append(a)
      s.Append(":")
      s.Append(b)
      s.Append(":")
      Return s.ToString
End Function

method public instance string MultiAppend(string a, string b) cil managed
{
      // Code Size: 65 byte(s)
      .maxstack 2
      .locals (
            string text1,
            [mscorlib]System.Text.StringBuilder builder1)
      L_0000: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor()
      L_0005: stloc.1
      L_0006: ldloc.1
      L_0007: ldstr ":"
      L_000c: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
      L_0011: pop
      L_0012: ldloc.1
      L_0013: ldarg.1
      L_0014: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
      L_0019: pop
      L_001a: ldloc.1
      L_001b: ldstr ":"
      L_0020: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
      L_0025: pop
      L_0026: ldloc.1
      L_0027: ldarg.2
      L_0028: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
      L_002d: pop
      L_002e: ldloc.1
      L_002f: ldstr ":"
      L_0034: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(string)
      L_0039: pop
      L_003a: ldloc.1
      L_003b: callvirt instance string [mscorlib]System.Text.StringBuilder::ToString()
      L_0040: ret
}

============================

Function FormatAppend(ByVal a As String, ByVal b As String) As String
      Dim s As New System.Text.StringBuilder()
      s.AppendFormat(":{0}:{1}:", a, b)
      Return s.ToString
End Function

.method public instance string FormatAppend(string a, string b) cil managed
{
      // Code Size: 27 byte(s)
      .maxstack 4
      .locals (
            string text1,
            [mscorlib]System.Text.StringBuilder builder1)
      L_0000: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor()
      L_0005: stloc.1
      L_0006: ldloc.1
      L_0007: ldstr ":{0}:{1}:"
      L_000c: ldarg.1
      L_000d: ldarg.2
      L_000e: callvirt instance [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::AppendFormat(string, object, object)
      L_0013: pop
      L_0014: ldloc.1
      L_0015: callvirt instance string [mscorlib]System.Text.StringBuilder::ToString()
      L_001a: ret
}

============================

With respect to farsight's comment, which has the least detritus left over for the GC to clean up? It's all very curious how they can all produce such differing code isn't it?

* I users Luts Roeder's Reflector - I recommend it if you understand this sort of thing. http://www.aisto.com/roeder/dotnet/
0
 
LVL 12

Expert Comment

by:farsight
ID: 11975581
[ Wishing I knew more about IL ...
I do find it interesting that SingleAppend builds a String array and passes it to String.Concat().
It seems like   ":" & a & ":" & b & ":"   is equivalent to:
 Dim tmp As String = String.Concat(New String() {":", a, ":", b, ":"})
]

GC.GetTotalMemory(False) will give you at least an estimate of the number of bytes to be cleaned up.  Unfortunately, that's only part of the story, because it cleans up chunks at a time, not bytes at a time.   A megabyte of 16-byte chunks will take a lot longer than a megabyte of 16K-byte chunks.

You could also add:
  GC.Collect()
  GC.WaitForPendingFinalizers()
and time that to determine how long it takes to clean up each method.  That's something.
But that, too, can be misleading, as GC behavior in the real program will be affected by other objects that need to be deallocated, the total memory picture, and many other factors.

[ This is one of the few times I'll recommend using GC.Collect().  It's only in a sample program used to test performance.  With few exceptions, in any real application, I recommend letting .NET manage the GC.  It's designed to do so.  Only if a developer has an extremely good idea of how memory allocation works and the GC works in .NET, should he/she consider calling GC.Collect directly.  Even then, I'd test to see if it's actually improving performance, or just making it worse. ]
0
 
LVL 12

Expert Comment

by:farsight
ID: 11976122
Getting back to the question itself, my rule of thumb is:
(1) If I'm only appending one or two times, I just use a string.  [  s &= "a"  ; s &= "b" ]
(2) If I'm appending multiple times (3+)  I use StringBuilder.
(3) Sometimes I opt for readability rather than performance.  Especially with something like:
    > stringbuilder.append("<p>" & aStringVariable & "</p><p>" & anotherStringVariable & "</p>")
(4) I don't get overly concerned about performance unless it's in a bottleneck that's identified by thorough testing and measurement.  If it's a bottleneck, I'll focus all my performance-enhancing efforts on that area.

I would recommend creating the StringBuilder objects to hold a reasonable capacity for your actual use.  (Why not?)
For example:
   Dim sb As StringBuilder = New StringBuilder(4096)
You could use the 80% rule to come up with a number ... I'm guessing that 80% of your web pages are less that 4096 characters.  Then only 20% of your pages will require a second allocation, and probably only one.  This estimation should be done on a case by case basis.  For example, if you're just building up a single row an HTML table, 128 characters may be fine.

As a complete judgement call, I would not have the program spend a lot of time computing an estimated capacity.  That computation may well take longer than an extra allocation or two.

> This is a waste if you do it a lot and the stringbuilder capacity can get large very quickly. but if you set it overly large and its not big enough you eat too much memory. Is it best to try an estimate the size of a resultant string-build or go way over the top?

LOL  This reminds me of the recent cell-phone-service TV advertisements that force the kids to choose how much time they'll be playing, with potential overcharges.  They don't think it's very fair having to guess.  (I guess the ad's not working that well, as I have no clue what cell-phone-service company is advertising!)
0
 
LVL 4

Author Comment

by:daveky
ID: 11990760
I'm still here thinking! (I thought I had better mention it just in case anybody thought I had taken the advice and abandoned the question. ;D)

farsight: I find it interesting that these methods do different things in the IL too. As for your second rule of thumb I found this from Dr. Dotnetsky on eggheadcafe.com :

-- begin quote --

Myth:
StringBuilder.Append() is always faster than String.Concat().

Rationale:
String operations always create new string objects, whereas StringBuilder uses internal magic to avoid expensive allocations.

Truth:
Using a StringBuilder is sometimes (even usually) faster than using simple string concatenation, but there is a performance cutover point. Various people have found this to be around the 5 - 10 concatenations mark. The bottom line being that replacing this:

string fullName = firstName + " " + lastName;

with this:

StringBuilder builder = new StringBuilder();
builder.Append(firstName);
builder.Append(" ");
builder.Append(lastName);
string fullName = builder.ToString();

is rather pointless. Remember that StringBuilder is itself reference type, and has the associated allocation / deallocation costs.

-- end quote --

Note: "5 - 10 concatenations ".

TheLearnedOne: Unfortunatly I am not in a position to write directly to the stream with response.write (HtmlTextWriter.write ?) as I am just asked to provide a useable html string. I have no references to the request/response objects, neither can I create a control to be inserted into the control tree for "normal" rendering. (Also I guess your 'exerpt' pretty much agrees with Dr. Dotnesky's veiw of things.)

---------------------------------

I thought this question would be quite straighforward. I do a lot of this string stuff and I was hoping to get an extra speed boost somewhere, but it looks like the effort involved may not be worth it. Still, I am going to look at Compuware's profiler and Microsoft's too in the hope of shedding some light. I'll keep you all posted before handing out any points....
0
 
LVL 12

Assisted Solution

by:farsight
farsight earned 75 total points
ID: 11996467
> Note: "5 - 10 concatenations ".

I looked up that site.  I wrote Dr. Dotnetsky to see if he's got any actual data measurements for that, or if he's just restating others opinions.  (Not that I doubt it ... It would just be nice to have real measurements to support or deny the numbers.  I don't have any measurements supporting my rule-of-thumb.)

daveky:
  A technique I've sometimes used is this:
(1) Create _ONE_ StringBuilder object.
(2) Pass that object into each routine that will be generating some text (or HTML or code or ... ).
(3) Just .Append to that object wherever code needs to emit text.
This reduces the stringbuilder overhead to allocation/deallocation of a single object.  You might be doing 100's or 1000's of appends to this object, well over the threshold.

Alternatives:
(A) Instead of passing the object around, you could make a globally accessible singleton object containing the StringBuilder.
(B) Instead of using a StringBuilder, you could use an in-memory stream, then use stream.Write(...).  At the top level, convert it to a String to return the HTML text, just as you would with a StringBuilder.

> profilers
Be careful with your interpretation of the numbers the profilers provide, and make sure you include the overhead of cleanup in any measurements you do.  (I haven't used them.  I wonder if they separately report GC time, or if they have some way of allocating the GC time to the various parts of the program.)
0
 
LVL 4

Author Comment

by:daveky
ID: 11996951
Yes, it would be interesting to see how he got to the figures. I "trust" his judgment though as I have read many an article from that site. But you're right, facts are better.

As it happens I do pass around stringbuilder objects. Pehaps I should do it more. I just got bitten with a lost day when I realised I has passed them ByRef so I could reuse them without any need to. It took me a while to redo all those parameters. (And of course there were one or two that *needed* to be ByRef - grrrr)

Streams? hmmm... (Don't do it to me; I am already struggling to keep the worms in their can!)

I realise the act of measuring changes the measurement. The Compuware profiler does timing of functions and their children. I guess so you can see whether code changes have made speed improvements or not. The MS one does GC for all levels. (At least this is what they promise.) I guess between the two of them I should be OK. I just hope I get time to try them out and they don't kill my machine in the process!
0
 
LVL 4

Author Comment

by:daveky
ID: 12246745
Well no, TheLearnedOne, not abandoned just forgotten. I am very busy at the moment! :D

I have decided to split the points; mostly to farsight for all his effort, because he specifically tried to answer my questions, for the tip "Create _ONE_ StringBuilder object" and his reasoning on what size to initialise it to; also some to TheLearnedOne for pointing out that "Response.Write is fastest" (I have been able argue with other people for changes so I geta reference to HtmlTextWriter for performance reasons and I have abandoned many a stringbuilder with good speed increases, result!).

I think 'my' final conclusions are:
1) Use a sinle stringbuilder and pass a reference around.
2) Try to create it with a reasonable size but don't try to calculate this size on the fly.
3) Use as few appends as possible (rather than more) and don't use appendformat.
4) Pass the buck to a stream where possible to avoid creating them at all.

Cheers!
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Recently while returning home from work my wife (another .NET developer) was murmuring something. On further poking she said that she has been assigned a task where she has to serialize and deserialize objects and she is afraid of serialization. Wha…
This document covers how to connect to SQL Server and browse its contents.  It is meant for those new to Visual Studio and/or working with Microsoft SQL Server.  It is not a guide to building SQL Server database connections in your code.  This is mo…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now