Solved

How much memory does a String take in Java?

Posted on 2004-04-13
16
1,256 Views
Last Modified: 2012-05-04
Hi,

I am about to cache a lot of data from a database into a static HashMap and have to estimate how much memory I would chew up if I store it like this.

I know that Java stores characters as Unicode but is there any documentation as to which encoding is used by Java itself?  (eg. UTF-8, UTF-16, UTF-32)

So if I store the string "Hello", under UTF-8 I think I chew up 5 bytes (worst case for non-ASCII 20 bytes) and under UTF-16 its 10 bytes (worst case for non-ASCII 20 bytes)

Darren
0
Comment
Question by:darrenc
  • 4
  • 4
  • 3
  • +2
16 Comments
 
LVL 30

Expert Comment

by:Mayank S
ID: 10811756
I don't think that it is possible to find out the exact size of any data-type. You can find out the serialized size using ByteArrayOutputStream. Calling gc () and other methods and finding free-memory would also not help, perhaps.
0
 
LVL 24

Expert Comment

by:sciuriware
ID: 10811896
System.out.println("Memory free before: " + Runtime.getRuntime().freeMemory());
.....
create Strings, f.i. 100000 times the String you want ....

System.out.println("Memory free after: " + Runtime.getRuntime().freeMemory());

And, do not create Strings like                 String a = "Literal";
because JAVA is smart enough to use the same literal again and again.


;JOOP!
0
 
LVL 24

Expert Comment

by:sciuriware
ID: 10811904
You can create all those strings by filling an array with output from a counter, converted to a fixed length.

;JOOP!
0
Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

 
LVL 30

Expert Comment

by:Mayank S
ID: 10811917
I still think that free-memory is not the correct way to do it.

What if you have more threads, etc?

What if there was some garbage-collection which was not performed till you had not declared any Strings but it was performed at the time you instantiated the new Strings? Then the amount of free memory you have is not the correct one. And remember - calls to gc (), runFinalization (), etc don't guarantee that garbage-collection will be performed when they are called. The JVM will do it when it feels like doing.
0
 
LVL 13

Accepted Solution

by:
Webstorm earned 150 total points
ID: 10811951
Hi darrenc,

Java use UTF-16 to store characters : it doesn't decode the 0xD800-0xDFFF characters which are used to encode 20 bits Unicode characters.

You can only estimate the minimum size occupied by a string variable : 2 * length + 4 (for the string length)

You can test how much memory is used when you have a String array :

    static String[] array = {
      "fixed length string",
      "fixed length string",
   };

   public void main(String[] args)
   {
       try{
           Runtime r=Runtime.getRuntime();
           System.gc();
           Thread.sleep(4000);
           System.out.println("Memory used (max) : "+(r.totalMemory() - r.freeMemory())+" bytes");
       } catch (Exception ex) {}
   }

run this application many times to evaluate the minimum memory used, and modify the array size by duplicating the string :

    static String[] array = {
      "fixed length string",
      "fixed length string",
      "fixed length string",
      "fixed length string",
   };

compile & run this modified application to see the difference.
0
 
LVL 24

Expert Comment

by:sciuriware
ID: 10811955
If you do it all at once other threads will not interfere,
if you are far below the heap limit, garbage collection will not interfere.
Read the technical topics at SUN's.

By the way: try it several times: the outcome(s) will convince you.

;JOOP!
0
 
LVL 24

Expert Comment

by:sciuriware
ID: 10812000
Webstorm, you are using identical literals ....

;JOOP!
0
 
LVL 13

Expert Comment

by:Webstorm
ID: 10813130
>> Webstorm, you are using identical literals ....

Yes, in order to evaluate the memory occupied by string of the same length.
But i forgot the Java constant pool,

    static String[] array = {
      "fixed length string 000",
      "fixed length string 001",
      "fixed length string 002",
      "fixed length string 003",
   };
0
 
LVL 30

Expert Comment

by:Mayank S
ID: 10813555
That is a very common way of doing it but again:

>> System.gc();

- does not guarantee that garbage-collection will be performed....
0
 
LVL 4

Expert Comment

by:john-at-7fff
ID: 10819700
Darren,

Just out of curiosity, is the textual data in your database plain ASCII? If the encoding in the database is ASCII, you might be able to get away with keeping your data in Java byte[] or char[] arrays, and thereby not have to deal with UTF issues at all. This could really be the way to go for caching -- if you don't need any of the methods on String you might not need it.
0
 

Author Comment

by:darrenc
ID: 10819832
Hi everyone,

Thanks for your comments.  My original question was just to determine the encoding that Java uses internally to store Strings and Webstorm has indicated that it's UTF-16.  I didn't really understand the comment "it doesn't decode the 0xD800-0xDFFF characters which are used to encode 20 bits Unicode characters".  My understanding is that 0xD800-0xDFFF characters are ill-formed (not valid).  I still can't find any doco that supports Webstorm but I suppose if I run the tests that everyone has suggested I could get a feeling for it.  (I agree that running a test in the JVM are not entirely accurate).

john-at-7fff, the data in the database that I'm going to cache will be Unicode data with some non-ASCII characters.

So if UTF-16 is the encoding, the minimum space used by the cache would be 2 * length + 4 (for the string length)?
And the maximum size (the one I have to estimate) is
4 * length + 4 (for the string length)?
.. where maximum size indicates that all characters need surrogate pairs (4 bytes)
And the average might be
2.5 * length + 4 (for the string length) (assuming 25% need surrogate pairs)?

Does anyone have a link to doco that confirms that UTF-16 is the internal Java encoding?
0
 
LVL 4

Assisted Solution

by:john-at-7fff
john-at-7fff earned 50 total points
ID: 10819860
Internal encoding of a Java String: http://www.i18nfaq.com/java.html#4
0
 
LVL 4

Expert Comment

by:john-at-7fff
ID: 10819866
You might also find this interesting (though not an answer to your question):
http://www.joelonsoftware.com/articles/Unicode.html
0
 
LVL 13

Expert Comment

by:Webstorm
ID: 10821513
darrenc
>> And the maximum size (the one I have to estimate) is 4 * length + 4 (for the string length)?
No, i only mean that String object can also store some other information (optimization, native data structure, ...) that may depend on the JVM used. This is why you need to test memory space occupied by String objects on your JVM.


mayankeagle
>>>> System.gc();
>>- does not guarantee that garbage-collection will be performed....
yes, but the Thread.sleep() following the call  let time for garbage collection to run. It may sometime works, and sometime not. But successives run may give a pretty good evaluation.

      long size=-1L;
      int n=10; // 10 iterations
      while (n-->0)
      {
           System.gc();
           Thread.sleep(4000); // 4 seconds
           long t=(r.totalMemory() - r.freeMemory());
           if ( (size<0L) || (t<size) ) size=t;
      }
      System.out.println("Memory used (evaluation) : "+size+" bytes");

0
 

Author Comment

by:darrenc
ID: 10821592
Hi again,

So the answer to my question is ... it depends on the JVM.  Strings stored as UTF-16 but can store other information as well.  So I have to run a test to figure it out.  Oh well, thought I could get away with a simple metric.
Thanks to everyone for their help.

Darren
0
 
LVL 13

Expert Comment

by:Webstorm
ID: 10821979
:-)
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

For customizing the look of your lightweight component and making it look opaque like it was made of plastic.  This tip assumes your component to be of rectangular shape and completely opaque.   (CODE)
INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question