Can I speed up Apache POI when working with xlsx files?
I have been using POI for a few years to generate excel files for users. I've recently had to move from xls files (HSSF) to xlsx files (XSSF) due to the number of rows in a particular export. Unfortunately, this has resulted in a massive performance loss.
My use case usually involves a template file that I open and create sheet(s) inside. Generally there will be some pivot charts and/or other static data in these templates which is why my test uses an empty 'template' file, if I didn't say this someone would suggest using streaming, which would be nice, but I can't lose my template sheets (or the styles on them).
So, here's a fairly simple demonstration:
//import org.apache.poi.hssf.usermodel.HSSFWorkbook;//import org.apache.poi.xssf.usermodel.XSSFWorkbook;import org.apache.poi.ss.usermodel.*;import org.apache.poi.ss.util.*;import java.io.*;import java.util.*;public class ExcelTest { static public int NUM_ROWS = 10000; static public int NUM_COLS = 25; private Workbook workbook; private Sheet cSheet; private boolean useHSSF; public ExcelTest(boolean useHSSF) { this.useHSSF = useHSSF; } public boolean Test() { boolean tbr = true; try { //Open template xlsx file. This is a single workbook with a single, empty sheet: this.workbook = WorkbookFactory.create(new FileInputStream((this.useHSSF ? "data/test.xls" : "data/test.xlsx"))); //generate a new sheet: this.cSheet = this.workbook.createSheet(); //write NUM_ROWS rows for (int i = 0; i < NUM_ROWS; i++) { Row r = this.cSheet.createRow(i); //write NUM_COLS columns for (int j = 0; j < NUM_COLS; j++) { Cell c = r.getCell(j); if (c == null) { c = r.createCell(j); } c.setCellValue("TEST " + i + ":" + j); } } } catch (Exception e) { tbr = false; e.printStackTrace(System.out); } return tbr; } public boolean Export(String filename) { boolean tbr = true; try { FileOutputStream out = new FileOutputStream(filename); this.workbook.write(out); out.close(); } catch (Exception e) { tbr = false; e.printStackTrace(System.out); } return tbr; } static public void main(String[] args) { long start; boolean didTest; ExcelTest et; System.out.println("Starting test..."); start = System.currentTimeMillis(); et = new ExcelTest(true); didTest = et.Test(); if (didTest) { et.Export("TEST.xls"); } System.out.println("XLS (HSSF) version took " + (System.currentTimeMillis() - start) + " ms."); start = System.currentTimeMillis(); et = new ExcelTest(false); didTest = et.Test(); if (didTest) { et.Export("TEST.xlsx"); } System.out.println("XLSX (XSSF) version took " + (System.currentTimeMillis() - start) + " ms."); }}
You will notice that even for a relatively-small data set of 10k rows with 25 cols, the xls version takes ~2 seconds, while the xlsx version takes ~20 seconds. Does anyone know how/if I can speed up POI's performance when working with large xlsx files?
Does the same happen with antivirus off for directory where you export?
xlsx is a zip, it takes second extra to compress and another for antivirus to decompress..
FLEA2007
ASKER
I've requested that this question be deleted for the following reason:
xlsx is a zip, it takes second extra to compress and another for antivirus to decompress..