We help IT Professionals succeed at work.

Check out our new AWS podcast with Certified Expert, Phil Phillips! Listen to "How to Execute a Seamless AWS Migration" on EE or on your favorite podcast platform. Listen Now

x

How much memory does a String take in Java?

darrenc
darrenc asked
on
Medium Priority
1,417 Views
Last Modified: 2012-05-04
Hi,

I am about to cache a lot of data from a database into a static HashMap and have to estimate how much memory I would chew up if I store it like this.

I know that Java stores characters as Unicode but is there any documentation as to which encoding is used by Java itself?  (eg. UTF-8, UTF-16, UTF-32)

So if I store the string "Hello", under UTF-8 I think I chew up 5 bytes (worst case for non-ASCII 20 bytes) and under UTF-16 its 10 bytes (worst case for non-ASCII 20 bytes)

Darren
Comment
Watch Question

Mayank SPrincipal Technologist
CERTIFIED EXPERT

Commented:
I don't think that it is possible to find out the exact size of any data-type. You can find out the serialized size using ByteArrayOutputStream. Calling gc () and other methods and finding free-memory would also not help, perhaps.
System.out.println("Memory free before: " + Runtime.getRuntime().freeMemory());
.....
create Strings, f.i. 100000 times the String you want ....

System.out.println("Memory free after: " + Runtime.getRuntime().freeMemory());

And, do not create Strings like                 String a = "Literal";
because JAVA is smart enough to use the same literal again and again.


;JOOP!
You can create all those strings by filling an array with output from a counter, converted to a fixed length.

;JOOP!
Mayank SPrincipal Technologist
CERTIFIED EXPERT

Commented:
I still think that free-memory is not the correct way to do it.

What if you have more threads, etc?

What if there was some garbage-collection which was not performed till you had not declared any Strings but it was performed at the time you instantiated the new Strings? Then the amount of free memory you have is not the correct one. And remember - calls to gc (), runFinalization (), etc don't guarantee that garbage-collection will be performed when they are called. The JVM will do it when it feels like doing.
Top Expert 2004
Commented:
Unlock this solution with a free trial preview.
(No credit card required)
Get Preview
If you do it all at once other threads will not interfere,
if you are far below the heap limit, garbage collection will not interfere.
Read the technical topics at SUN's.

By the way: try it several times: the outcome(s) will convince you.

;JOOP!
Webstorm, you are using identical literals ....

;JOOP!
Top Expert 2004

Commented:
>> Webstorm, you are using identical literals ....

Yes, in order to evaluate the memory occupied by string of the same length.
But i forgot the Java constant pool,

    static String[] array = {
      "fixed length string 000",
      "fixed length string 001",
      "fixed length string 002",
      "fixed length string 003",
   };
Mayank SPrincipal Technologist
CERTIFIED EXPERT

Commented:
That is a very common way of doing it but again:

>> System.gc();

- does not guarantee that garbage-collection will be performed....
Darren,

Just out of curiosity, is the textual data in your database plain ASCII? If the encoding in the database is ASCII, you might be able to get away with keeping your data in Java byte[] or char[] arrays, and thereby not have to deal with UTF issues at all. This could really be the way to go for caching -- if you don't need any of the methods on String you might not need it.

Author

Commented:
Hi everyone,

Thanks for your comments.  My original question was just to determine the encoding that Java uses internally to store Strings and Webstorm has indicated that it's UTF-16.  I didn't really understand the comment "it doesn't decode the 0xD800-0xDFFF characters which are used to encode 20 bits Unicode characters".  My understanding is that 0xD800-0xDFFF characters are ill-formed (not valid).  I still can't find any doco that supports Webstorm but I suppose if I run the tests that everyone has suggested I could get a feeling for it.  (I agree that running a test in the JVM are not entirely accurate).

john-at-7fff, the data in the database that I'm going to cache will be Unicode data with some non-ASCII characters.

So if UTF-16 is the encoding, the minimum space used by the cache would be 2 * length + 4 (for the string length)?
And the maximum size (the one I have to estimate) is
4 * length + 4 (for the string length)?
.. where maximum size indicates that all characters need surrogate pairs (4 bytes)
And the average might be
2.5 * length + 4 (for the string length) (assuming 25% need surrogate pairs)?

Does anyone have a link to doco that confirms that UTF-16 is the internal Java encoding?
Unlock this solution with a free trial preview.
(No credit card required)
Get Preview
You might also find this interesting (though not an answer to your question):
http://www.joelonsoftware.com/articles/Unicode.html
Top Expert 2004

Commented:
darrenc
>> And the maximum size (the one I have to estimate) is 4 * length + 4 (for the string length)?
No, i only mean that String object can also store some other information (optimization, native data structure, ...) that may depend on the JVM used. This is why you need to test memory space occupied by String objects on your JVM.


mayankeagle
>>>> System.gc();
>>- does not guarantee that garbage-collection will be performed....
yes, but the Thread.sleep() following the call  let time for garbage collection to run. It may sometime works, and sometime not. But successives run may give a pretty good evaluation.

      long size=-1L;
      int n=10; // 10 iterations
      while (n-->0)
      {
           System.gc();
           Thread.sleep(4000); // 4 seconds
           long t=(r.totalMemory() - r.freeMemory());
           if ( (size<0L) || (t<size) ) size=t;
      }
      System.out.println("Memory used (evaluation) : "+size+" bytes");

Author

Commented:
Hi again,

So the answer to my question is ... it depends on the JVM.  Strings stored as UTF-16 but can store other information as well.  So I have to run a test to figure it out.  Oh well, thought I could get away with a simple metric.
Thanks to everyone for their help.

Darren
Top Expert 2004

Commented:
:-)
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a free trial preview!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.