• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1314
  • Last Modified:

How much memory does a String take in Java?

Hi,

I am about to cache a lot of data from a database into a static HashMap and have to estimate how much memory I would chew up if I store it like this.

I know that Java stores characters as Unicode but is there any documentation as to which encoding is used by Java itself?  (eg. UTF-8, UTF-16, UTF-32)

So if I store the string "Hello", under UTF-8 I think I chew up 5 bytes (worst case for non-ASCII 20 bytes) and under UTF-16 its 10 bytes (worst case for non-ASCII 20 bytes)

Darren
0
darrenc
Asked:
darrenc
  • 4
  • 4
  • 3
  • +2
2 Solutions
 
Mayank SAssociate Director - Product EngineeringCommented:
I don't think that it is possible to find out the exact size of any data-type. You can find out the serialized size using ByteArrayOutputStream. Calling gc () and other methods and finding free-memory would also not help, perhaps.
0
 
sciuriwareCommented:
System.out.println("Memory free before: " + Runtime.getRuntime().freeMemory());
.....
create Strings, f.i. 100000 times the String you want ....

System.out.println("Memory free after: " + Runtime.getRuntime().freeMemory());

And, do not create Strings like                 String a = "Literal";
because JAVA is smart enough to use the same literal again and again.


;JOOP!
0
 
sciuriwareCommented:
You can create all those strings by filling an array with output from a counter, converted to a fixed length.

;JOOP!
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
Mayank SAssociate Director - Product EngineeringCommented:
I still think that free-memory is not the correct way to do it.

What if you have more threads, etc?

What if there was some garbage-collection which was not performed till you had not declared any Strings but it was performed at the time you instantiated the new Strings? Then the amount of free memory you have is not the correct one. And remember - calls to gc (), runFinalization (), etc don't guarantee that garbage-collection will be performed when they are called. The JVM will do it when it feels like doing.
0
 
WebstormCommented:
Hi darrenc,

Java use UTF-16 to store characters : it doesn't decode the 0xD800-0xDFFF characters which are used to encode 20 bits Unicode characters.

You can only estimate the minimum size occupied by a string variable : 2 * length + 4 (for the string length)

You can test how much memory is used when you have a String array :

    static String[] array = {
      "fixed length string",
      "fixed length string",
   };

   public void main(String[] args)
   {
       try{
           Runtime r=Runtime.getRuntime();
           System.gc();
           Thread.sleep(4000);
           System.out.println("Memory used (max) : "+(r.totalMemory() - r.freeMemory())+" bytes");
       } catch (Exception ex) {}
   }

run this application many times to evaluate the minimum memory used, and modify the array size by duplicating the string :

    static String[] array = {
      "fixed length string",
      "fixed length string",
      "fixed length string",
      "fixed length string",
   };

compile & run this modified application to see the difference.
0
 
sciuriwareCommented:
If you do it all at once other threads will not interfere,
if you are far below the heap limit, garbage collection will not interfere.
Read the technical topics at SUN's.

By the way: try it several times: the outcome(s) will convince you.

;JOOP!
0
 
sciuriwareCommented:
Webstorm, you are using identical literals ....

;JOOP!
0
 
WebstormCommented:
>> Webstorm, you are using identical literals ....

Yes, in order to evaluate the memory occupied by string of the same length.
But i forgot the Java constant pool,

    static String[] array = {
      "fixed length string 000",
      "fixed length string 001",
      "fixed length string 002",
      "fixed length string 003",
   };
0
 
Mayank SAssociate Director - Product EngineeringCommented:
That is a very common way of doing it but again:

>> System.gc();

- does not guarantee that garbage-collection will be performed....
0
 
john-at-7fffCommented:
Darren,

Just out of curiosity, is the textual data in your database plain ASCII? If the encoding in the database is ASCII, you might be able to get away with keeping your data in Java byte[] or char[] arrays, and thereby not have to deal with UTF issues at all. This could really be the way to go for caching -- if you don't need any of the methods on String you might not need it.
0
 
darrencAuthor Commented:
Hi everyone,

Thanks for your comments.  My original question was just to determine the encoding that Java uses internally to store Strings and Webstorm has indicated that it's UTF-16.  I didn't really understand the comment "it doesn't decode the 0xD800-0xDFFF characters which are used to encode 20 bits Unicode characters".  My understanding is that 0xD800-0xDFFF characters are ill-formed (not valid).  I still can't find any doco that supports Webstorm but I suppose if I run the tests that everyone has suggested I could get a feeling for it.  (I agree that running a test in the JVM are not entirely accurate).

john-at-7fff, the data in the database that I'm going to cache will be Unicode data with some non-ASCII characters.

So if UTF-16 is the encoding, the minimum space used by the cache would be 2 * length + 4 (for the string length)?
And the maximum size (the one I have to estimate) is
4 * length + 4 (for the string length)?
.. where maximum size indicates that all characters need surrogate pairs (4 bytes)
And the average might be
2.5 * length + 4 (for the string length) (assuming 25% need surrogate pairs)?

Does anyone have a link to doco that confirms that UTF-16 is the internal Java encoding?
0
 
john-at-7fffCommented:
Internal encoding of a Java String: http://www.i18nfaq.com/java.html#4
0
 
john-at-7fffCommented:
You might also find this interesting (though not an answer to your question):
http://www.joelonsoftware.com/articles/Unicode.html
0
 
WebstormCommented:
darrenc
>> And the maximum size (the one I have to estimate) is 4 * length + 4 (for the string length)?
No, i only mean that String object can also store some other information (optimization, native data structure, ...) that may depend on the JVM used. This is why you need to test memory space occupied by String objects on your JVM.


mayankeagle
>>>> System.gc();
>>- does not guarantee that garbage-collection will be performed....
yes, but the Thread.sleep() following the call  let time for garbage collection to run. It may sometime works, and sometime not. But successives run may give a pretty good evaluation.

      long size=-1L;
      int n=10; // 10 iterations
      while (n-->0)
      {
           System.gc();
           Thread.sleep(4000); // 4 seconds
           long t=(r.totalMemory() - r.freeMemory());
           if ( (size<0L) || (t<size) ) size=t;
      }
      System.out.println("Memory used (evaluation) : "+size+" bytes");

0
 
darrencAuthor Commented:
Hi again,

So the answer to my question is ... it depends on the JVM.  Strings stored as UTF-16 but can store other information as well.  So I have to run a test to figure it out.  Oh well, thought I could get away with a simple metric.
Thanks to everyone for their help.

Darren
0
 
WebstormCommented:
:-)
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 4
  • 4
  • 3
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now