Solved

How much memory does a String take in Java?

Posted on 2004-04-13
16
1,233 Views
Last Modified: 2012-05-04
Hi,

I am about to cache a lot of data from a database into a static HashMap and have to estimate how much memory I would chew up if I store it like this.

I know that Java stores characters as Unicode but is there any documentation as to which encoding is used by Java itself?  (eg. UTF-8, UTF-16, UTF-32)

So if I store the string "Hello", under UTF-8 I think I chew up 5 bytes (worst case for non-ASCII 20 bytes) and under UTF-16 its 10 bytes (worst case for non-ASCII 20 bytes)

Darren
0
Comment
Question by:darrenc
  • 4
  • 4
  • 3
  • +2
16 Comments
 
LVL 30

Expert Comment

by:mayankeagle
Comment Utility
I don't think that it is possible to find out the exact size of any data-type. You can find out the serialized size using ByteArrayOutputStream. Calling gc () and other methods and finding free-memory would also not help, perhaps.
0
 
LVL 24

Expert Comment

by:sciuriware
Comment Utility
System.out.println("Memory free before: " + Runtime.getRuntime().freeMemory());
.....
create Strings, f.i. 100000 times the String you want ....

System.out.println("Memory free after: " + Runtime.getRuntime().freeMemory());

And, do not create Strings like                 String a = "Literal";
because JAVA is smart enough to use the same literal again and again.


;JOOP!
0
 
LVL 24

Expert Comment

by:sciuriware
Comment Utility
You can create all those strings by filling an array with output from a counter, converted to a fixed length.

;JOOP!
0
 
LVL 30

Expert Comment

by:mayankeagle
Comment Utility
I still think that free-memory is not the correct way to do it.

What if you have more threads, etc?

What if there was some garbage-collection which was not performed till you had not declared any Strings but it was performed at the time you instantiated the new Strings? Then the amount of free memory you have is not the correct one. And remember - calls to gc (), runFinalization (), etc don't guarantee that garbage-collection will be performed when they are called. The JVM will do it when it feels like doing.
0
 
LVL 13

Accepted Solution

by:
Webstorm earned 150 total points
Comment Utility
Hi darrenc,

Java use UTF-16 to store characters : it doesn't decode the 0xD800-0xDFFF characters which are used to encode 20 bits Unicode characters.

You can only estimate the minimum size occupied by a string variable : 2 * length + 4 (for the string length)

You can test how much memory is used when you have a String array :

    static String[] array = {
      "fixed length string",
      "fixed length string",
   };

   public void main(String[] args)
   {
       try{
           Runtime r=Runtime.getRuntime();
           System.gc();
           Thread.sleep(4000);
           System.out.println("Memory used (max) : "+(r.totalMemory() - r.freeMemory())+" bytes");
       } catch (Exception ex) {}
   }

run this application many times to evaluate the minimum memory used, and modify the array size by duplicating the string :

    static String[] array = {
      "fixed length string",
      "fixed length string",
      "fixed length string",
      "fixed length string",
   };

compile & run this modified application to see the difference.
0
 
LVL 24

Expert Comment

by:sciuriware
Comment Utility
If you do it all at once other threads will not interfere,
if you are far below the heap limit, garbage collection will not interfere.
Read the technical topics at SUN's.

By the way: try it several times: the outcome(s) will convince you.

;JOOP!
0
 
LVL 24

Expert Comment

by:sciuriware
Comment Utility
Webstorm, you are using identical literals ....

;JOOP!
0
 
LVL 13

Expert Comment

by:Webstorm
Comment Utility
>> Webstorm, you are using identical literals ....

Yes, in order to evaluate the memory occupied by string of the same length.
But i forgot the Java constant pool,

    static String[] array = {
      "fixed length string 000",
      "fixed length string 001",
      "fixed length string 002",
      "fixed length string 003",
   };
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 30

Expert Comment

by:mayankeagle
Comment Utility
That is a very common way of doing it but again:

>> System.gc();

- does not guarantee that garbage-collection will be performed....
0
 
LVL 4

Expert Comment

by:john-at-7fff
Comment Utility
Darren,

Just out of curiosity, is the textual data in your database plain ASCII? If the encoding in the database is ASCII, you might be able to get away with keeping your data in Java byte[] or char[] arrays, and thereby not have to deal with UTF issues at all. This could really be the way to go for caching -- if you don't need any of the methods on String you might not need it.
0
 

Author Comment

by:darrenc
Comment Utility
Hi everyone,

Thanks for your comments.  My original question was just to determine the encoding that Java uses internally to store Strings and Webstorm has indicated that it's UTF-16.  I didn't really understand the comment "it doesn't decode the 0xD800-0xDFFF characters which are used to encode 20 bits Unicode characters".  My understanding is that 0xD800-0xDFFF characters are ill-formed (not valid).  I still can't find any doco that supports Webstorm but I suppose if I run the tests that everyone has suggested I could get a feeling for it.  (I agree that running a test in the JVM are not entirely accurate).

john-at-7fff, the data in the database that I'm going to cache will be Unicode data with some non-ASCII characters.

So if UTF-16 is the encoding, the minimum space used by the cache would be 2 * length + 4 (for the string length)?
And the maximum size (the one I have to estimate) is
4 * length + 4 (for the string length)?
.. where maximum size indicates that all characters need surrogate pairs (4 bytes)
And the average might be
2.5 * length + 4 (for the string length) (assuming 25% need surrogate pairs)?

Does anyone have a link to doco that confirms that UTF-16 is the internal Java encoding?
0
 
LVL 4

Assisted Solution

by:john-at-7fff
john-at-7fff earned 50 total points
Comment Utility
Internal encoding of a Java String: http://www.i18nfaq.com/java.html#4
0
 
LVL 4

Expert Comment

by:john-at-7fff
Comment Utility
You might also find this interesting (though not an answer to your question):
http://www.joelonsoftware.com/articles/Unicode.html
0
 
LVL 13

Expert Comment

by:Webstorm
Comment Utility
darrenc
>> And the maximum size (the one I have to estimate) is 4 * length + 4 (for the string length)?
No, i only mean that String object can also store some other information (optimization, native data structure, ...) that may depend on the JVM used. This is why you need to test memory space occupied by String objects on your JVM.


mayankeagle
>>>> System.gc();
>>- does not guarantee that garbage-collection will be performed....
yes, but the Thread.sleep() following the call  let time for garbage collection to run. It may sometime works, and sometime not. But successives run may give a pretty good evaluation.

      long size=-1L;
      int n=10; // 10 iterations
      while (n-->0)
      {
           System.gc();
           Thread.sleep(4000); // 4 seconds
           long t=(r.totalMemory() - r.freeMemory());
           if ( (size<0L) || (t<size) ) size=t;
      }
      System.out.println("Memory used (evaluation) : "+size+" bytes");

0
 

Author Comment

by:darrenc
Comment Utility
Hi again,

So the answer to my question is ... it depends on the JVM.  Strings stored as UTF-16 but can store other information as well.  So I have to run a test to figure it out.  Oh well, thought I could get away with a simple metric.
Thanks to everyone for their help.

Darren
0
 
LVL 13

Expert Comment

by:Webstorm
Comment Utility
:-)
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
array6 challenfge 6 62
count11 challenge 6 47
ArrayIndexOutOfBoundException 9 30
eclipse java  build path 6 18
An old method to applying the Singleton pattern in your Java code is to check if a static instance, defined in the same class that needs to be instantiated once and only once, is null and then create a new instance; otherwise, the pre-existing insta…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now