OutOfMemoryError when using apache POI API to write Excel file

I am using apache POI API in a Servlet to write data into an Excel (.xls) file then send to browsers. It work fine when the data is not much. But it give an java.lang.OutOfMemoryError when the data volume is high.

Then I have try JExcel API, it is more lightweight compare to POI where it is can support higher volume of data. But there is still a limit (26k of rows with 25 columns).

The main reason I found is that both API use the same mechanism to write Excel file: create an object to represent workbook --> add Sheets --> add Cells --> write to OutputStream.

so the workbook object will encapsule all data before it was write to OutputStream. This size of object will grow very fast when there is lots of data until OutOfMemoryError occurs.

Any suggstion to solve this problem besides configure JVM heap size(Since it will still have a limit) ? Thanks in advance.
LVL 1
b612_foreverAsked:
Who is Participating?
 
bloodredsunConnect With a Mentor Commented:
Are far as your options go:
1./ is a valid option as Excel can read CSV natively although you will lose any formatting and CSV doesn't support multiple sheets, but at least it's nowhere near as memory intensive as creating an xls file.
2./I know that excel can read XML files but I'm not sure about HTML tables. If that's a valid option then it sounds like a good idea.
But if you use XML rather than HTML you shoud be able to produce multiple sheets if you use the correct schema. Have a look here http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnexcl2k2/html/odc_xlsmlinss.asp, apparently the only things that the XML file cannot support are:
# OLE objects
# Drawing shapes
# Charts
# Microsoft® Visual Basic® for Applications (VBA) code
# Groups and outlines
# Custom envelope information

Which means that you shoud be able to support multiple sheets (fingers crossed!)
0
 
bloodredsunCommented:
0
 
bloodredsunCommented:
0
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

 
bloodredsunCommented:
If you were doing this for a commercial project and had the budget, you could try ExtenXLs http://www.extentech.com/estore/product_detail.jsp?product_group_id=1 or Actuate http://www.actuate.com/services/eservices/offerings.asp#javaed. They may be a bit expensive but currently they appear to be better featured than POI/HSSF, but I have no idea whether they could deal with large datasets where POI doesn't.
0
 
b612_foreverAuthor Commented:
I am still new in Apache POI, it that possible to write the workbook part by part to OutputStream?

which means the flow will be something like this:

1. insert 5000 rows into workbook
2. write workbook to OutputStream
3. repeat step 1 and 2 until all rows written.

I have try this with JExcel but it end up with client get an corrupted file (because it rewrite the whole workboook to OutputStream everytime) and the OutOfMemoryException will still happen (because the workbook object still do not release any resources all cells that already written to OutputStream).
0
 
bloodredsunCommented:
If you use " hssfworkbook.write(new FileOutputStream(filename));" then I don't think so, although I'll admit it might be possible as I've never tried it and I've never seen any examples do it this way either.

 I seem to remember that there is some issue with the OLE format that prevents you from doing this, which is a great pity as it would at least free up the memory.

I found that changing the memory settings with as much RAM as I could get away with was the best answer. The worst increase in memory usage seemed to be just at the writing stage where the memory usage would just double as the app tried to create the file.

They do know about this memoery problem as in the docs you'll find:
-------------
What's Next?

Further effort on HSSF is going to focus on the following major areas:

    * Performance: POI currently uses a lot of memory for large sheets.
...
--------------


0
 
b612_foreverAuthor Commented:
thanks for ur suggestion.

I had do some reading and is now considering 2 alternative:
1) send data to user as CSV
2) send data to user in HTML table format but the content-type of "application/vnd.ms-excel"

both work with huge amount of data since It can send data into OutputStream while it loop through the data to be send. However, both have the same limitation: MS Excel can only support 65,536 row per sheet.

Is there any way to write break data into different Sheets (not file) in a CSV file or the HTML table way?

Thank You.
0
 
b612_foreverAuthor Commented:
hi bloodredsun, Thank You for your suggestion on XML. After reading on the MSDN link u post, I am now finally able to export my data into Excel file that is more than 1 sheet.

Just to shared with u, I am finding this link useful, (all I do basically is just change the ASP into Java):
http://www.experts-exchange.com/Web/Web_Languages/XML/Q_20919841.html

how ever, this solution create a file with larger file size compare to the normal Excel file. But it is good enough for me. Thank You.
0
 
bloodredsunCommented:
Thanks dude, and cheers for the link
0
 
ittsiCommented:
Not sure if others are still looking at this post 5 years later but here is the real solution to this problem using POI:

http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/usermodel/examples/BigGridDemo.java
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.