troubleshooting Question

Can I speed up Apache POI when working with xlsx files?

Avatar of FLEA2007
FLEA2007 asked on
JavaMicrosoft ExcelProgramming Languages-OtherProgramming
5 Comments1 Solution4891 ViewsLast Modified:
I have been using POI for a few years to generate excel files for users.  I've recently had to move from xls files (HSSF) to xlsx files (XSSF) due to the number of rows in a particular export.  Unfortunately, this has resulted in a massive performance loss.  

My use case usually involves a template file that I open and create sheet(s) inside.  Generally there will be some pivot charts and/or other static data in these templates which is why my test uses an empty 'template' file, if I didn't say this someone would suggest using streaming, which would be nice, but I can't lose my template sheets (or the styles on them).

So, here's a fairly simple demonstration:

//import org.apache.poi.hssf.usermodel.HSSFWorkbook;
//import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.*;

import java.io.*;
import java.util.*;

public class ExcelTest {
	static public int NUM_ROWS = 10000;
	static public int NUM_COLS = 25;
	
	private Workbook workbook;
	private Sheet cSheet;
	private boolean useHSSF;
	
	public ExcelTest(boolean useHSSF) {
		this.useHSSF = useHSSF;
	}
	
	public boolean Test() {
		boolean tbr = true;
		try {
			//Open template xlsx file.  This is a single workbook with a single, empty sheet:
			this.workbook = WorkbookFactory.create(new FileInputStream((this.useHSSF ? "data/test.xls" : "data/test.xlsx")));
			
			//generate a new sheet:
			this.cSheet = this.workbook.createSheet();
			
			//write NUM_ROWS rows
			for (int i = 0; i < NUM_ROWS; i++) {
				Row r = this.cSheet.createRow(i);
				
				//write NUM_COLS columns
				for (int j = 0; j < NUM_COLS; j++) {
					Cell c = r.getCell(j);
					if (c == null) {
						c = r.createCell(j);
					}
					c.setCellValue("TEST " + i + ":" + j);
				}
			}
		} catch (Exception e) {
			tbr = false;
			e.printStackTrace(System.out);
		}
		
		return tbr;
	}
	
	public boolean Export(String filename) {
		boolean tbr = true;
		try {
			FileOutputStream out = new FileOutputStream(filename);
			this.workbook.write(out);
			out.close();
		} catch (Exception e) {
			tbr = false;
			e.printStackTrace(System.out);
		}
		
		return tbr;
	}
	
	static public void main(String[] args) {
		long start;
		boolean didTest;
		ExcelTest et;
		System.out.println("Starting test...");
		
		start = System.currentTimeMillis();
		et = new ExcelTest(true);
		didTest = et.Test();
		if (didTest) {
			et.Export("TEST.xls");
		}
		System.out.println("XLS (HSSF) version took " + (System.currentTimeMillis() - start) + " ms.");
		
		start = System.currentTimeMillis();
		et = new ExcelTest(false);
		didTest = et.Test();
		if (didTest) {
			et.Export("TEST.xlsx");
		}
		System.out.println("XLSX (XSSF) version took " + (System.currentTimeMillis() - start) + " ms.");
	}
}

You will notice that even for a relatively-small data set of 10k rows with 25 cols, the xls version takes ~2 seconds, while the xlsx version takes ~20 seconds.  Does anyone know how/if I can speed up POI's performance when working with large xlsx files?

Thanks!
ASKER CERTIFIED SOLUTION
FLEA2007

Our community of experts have been thoroughly vetted for their expertise and industry experience.

Join our community to see this answer!
Unlock 1 Answer and 5 Comments.
Start Free Trial
Learn from the best

Network and collaborate with thousands of CTOs, CISOs, and IT Pros rooting for you and your success.

Andrew Hancock - VMware vExpert
See if this solution works for you by signing up for a 7 day free trial.
Unlock 1 Answer and 5 Comments.
Try for 7 days

”The time we save is the biggest benefit of E-E to our team. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange.

-Mike Kapnisakis, Warner Bros