Link to home
Start Free TrialLog in
Avatar of allelopath
allelopath

asked on

Problem with POI HssfListener

The problem is that I am getting the EOF Record before the last sheet (ie  bofRecord.TYPE_WORKSHEET for the last sheet) and hence the last sheet is not being processed. The output of the attached code is:

MyHSSFListener.processRecord(): Encountered workbook
MyHSSFListener.processRecord(): New sheet named: MySheet1      12792
MyHSSFListener.processRecord(): New sheet named: MySheet2      66686
MyHSSFListener.processRecord(): New sheet named: MySheet3      78425
MyHSSFListener.processRecord(): Encountered sheet reference 2057       processing: MySheet1
MyHSSFListener.processRecord(): Encountered sheet reference 2057       processing: MySheet2
MyHSSFListener.processRecord(): EOFRecord: hssfWorkbook: 2
MyHSSFListener.processRecord(): Encountered sheet reference 2057       processing: MySheet3
done.

I would expect the last 3 lines to be in this order:
MyHSSFListener.processRecord(): Encountered sheet reference 2057       processing: MySheet3
MyHSSFListener.processRecord(): EOFRecord: hssfWorkbook: 2
done.

SSCCE and excel data file attached


import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.HashMap;
import java.util.Map;

import org.apache.poi.hssf.eventusermodel.HSSFEventFactory;
import org.apache.poi.hssf.eventusermodel.HSSFListener;
import org.apache.poi.hssf.eventusermodel.HSSFRequest;
import org.apache.poi.hssf.record.BOFRecord;
import org.apache.poi.hssf.record.BoundSheetRecord;
import org.apache.poi.hssf.record.EOFRecord;
import org.apache.poi.hssf.record.FormulaRecord;
import org.apache.poi.hssf.record.LabelSSTRecord;
import org.apache.poi.hssf.record.NumberRecord;
import org.apache.poi.hssf.record.Record;
import org.apache.poi.hssf.record.RowRecord;
import org.apache.poi.hssf.record.SSTRecord;
import org.apache.poi.hssf.record.common.UnicodeString;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFDateUtil;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;

/**
 * Use the event API for reading an excel file
 */
public class MyHSSFListener implements HSSFListener {

	private SSTRecord sstRecord;

	private boolean processingSheet1;
	private boolean processingSheet2;
	private boolean processingSheet3;

	// no way to directly link row to sheet or to directly know which sheet we are actually processing.
	// it seems first to inventory the sheets via BoundSheetRecord events, so create a HashMap of sheets with processing this event
	private Map<Integer, String> sheetMap;
	private int totalNumberOfSheets; // for counting BoundSheetRecord
	private int sheetReferenceNumber; // for counting bofRecord.TYPE_WORKSHEET

	final private String MYSHEET1 = "MySheet1";
	final private String MYSHEET2 = "MySheet2";
	final private String MYSHEET3 = "MySheet3";

	private HSSFWorkbook hssfWorkbook;
	private	HSSFSheet currentHssfSheet;

	final protected int DATE_COLUMN = 1; // on 2nd sheet
    final String DATE_FORMAT = "dd/MM/yyyy"; // MM because mm is minutes
    final SimpleDateFormat simpleDateFormat = new SimpleDateFormat(DATE_FORMAT);
    
	// when this count is the same as the number of sheets,
	// then we have found the last sheet and hence really the EOF
	protected int eofCount;

	/**
	 * constructor
	 */
	public MyHSSFListener () {

		super();

		sheetMap = new HashMap<Integer, String>();
		totalNumberOfSheets = 0;
		sheetReferenceNumber = 0;

		this.processingSheet1 = false;
		this.processingSheet2 = false;
		this.processingSheet3 = false;

	}

	/**
	 * This method listens for incoming records and handles them as required.
	 * @param record The record that was found while reading.
	 */
	public void processRecord(Record record) {

		switch (record.getSid()) {

		// the BOFRecord can represent either the beginning of a sheet or the workbook
		case BOFRecord.sid: // Beginning Of File
			BOFRecord bofRecord = (BOFRecord) record;
			if (bofRecord.getType() == bofRecord.TYPE_WORKBOOK) {
				System.out.println("MyHSSFListener.processRecord(): Encountered workbook ");
				hssfWorkbook = new HSSFWorkbook();
			}
			else if (bofRecord.getType() == bofRecord.TYPE_WORKSHEET) {

				String sheetName =  sheetMap.get(sheetReferenceNumber);
				System.out.println("MyHSSFListener.processRecord(): Encountered sheet reference "+ bofRecord.getSid() + "\t processing: " + sheetName);

				currentHssfSheet = hssfWorkbook.createSheet(sheetName);
				//System.out.println ("MyHSSFListener.processRecord(): currentHssfSheet: " + currentHssfSheet.getSheetName());		

				if (sheetName.startsWith(MYSHEET1)) {
					this.processingSheet1 = true;
					this.processingSheet2 = false;
					this.processingSheet3 = false;
				}
				else if (sheetName.startsWith(MYSHEET2)) {
					this.processingSheet1 = false;
					this.processingSheet2 = true;
					this.processingSheet3 = false;			
				}
				else if (sheetName.startsWith(MYSHEET3)) {
					this.processingSheet1 = false;
					this.processingSheet2 = false;
					this.processingSheet3 = true;
				}
				else {
					this.processingSheet1 = false;
					this.processingSheet2 = false;
					this.processingSheet3 = false;
				}

				sheetReferenceNumber++;
			}
			break;

		case BoundSheetRecord.sid:
			BoundSheetRecord boundSheetRecord = (BoundSheetRecord) record;
			String sheetName =  boundSheetRecord.getSheetname();
			System.out.println("MyHSSFListener.processRecord(): New sheet named: " + sheetName + "\t" + boundSheetRecord.getPositionOfBof());
			sheetMap.put(totalNumberOfSheets, sheetName);
			totalNumberOfSheets++;

			break;

		case RowRecord.sid:
			final RowRecord rowRecord = (RowRecord) record;
			int firstColumn = rowRecord.getFirstCol();
			int lastColumn = rowRecord.getLastCol();
			//System.out.println("MyHSSFListener.processRecord(): Row number: " + rowRecord.getRowNumber() + "\t first/last column: " + firstColumn + " / " + lastColumn);

			HSSFRow hssfRow = currentHssfSheet.createRow(rowRecord.getRowNumber());

			// create cells for this row
			for (int column = firstColumn; column <= lastColumn; column++) {
				HSSFCell newHssfCell = hssfRow.createCell(column);
				//newHssfCell.setCellValue(column); // set value later
			}

			break;

		case NumberRecord.sid:
			
			NumberRecord numberRecord = (NumberRecord) record;
			//System.out.println("MyHSSFListener.processRecord(): Cell found with value " + numberRecord.getValue() + " at row " + numberRecord.getRow() + " and column " + numberRecord.getColumn());

			int row = numberRecord.getRow();
			int column = numberRecord.getColumn();

			// get corresponding cell, if any
			hssfRow = currentHssfSheet.getRow(row); 
			HSSFCell hssfCell = hssfRow.getCell(column);
			if (hssfCell != null) { // if cell exists
				double doubleValue = numberRecord.getValue();
				if (processingSheet2 && column == DATE_COLUMN) {
					//System.out.println("MyHSSFListener.processRecord(): double: "  + doubleValue);
					
					java.util.Date utilDate  = HSSFDateUtil.getJavaDate(numberRecord.getValue());
					String string = simpleDateFormat.format(utilDate);
					// swap mm/dd
					String swapString = string.substring(3, 5) + "/" + string.substring(0, 2) +  string.substring(5);
					hssfCell.setCellValue(swapString);
				}
				else { // not a date, just a number
					hssfCell.setCellValue(doubleValue); // then set the value
				}
			}

			break;

			// SSTRecords store a array of unique strings used in Excel.
		case SSTRecord.sid: // Static String Table
			//if (processingSheet2) {
			sstRecord = (SSTRecord) record;
			int numberOfUniqueStrings = sstRecord.getNumUniqueStrings();
			for (int k = 0; k < sstRecord.getNumUniqueStrings(); k++) {
				//System.out.println("MyHSSFListener.processRecord(): String table value " + k + " = " + sstRecord.getString(k));
				//System.out.println("MyHSSFListener.processRecord(): cell " + sstRecord.getString table value " + k + " = " + sstRecord.getString(k));
			}
			//}
			break;

		case LabelSSTRecord.sid:

			LabelSSTRecord labelSSTRecord = (LabelSSTRecord) record;
			row = labelSSTRecord.getRow();
			column = labelSSTRecord.getColumn();
			final UnicodeString unicodeString  = sstRecord.getString(labelSSTRecord.getSSTIndex());
			final String string = unicodeString.toString();

			// get corresponding cell, if any
			hssfRow = currentHssfSheet.getRow(row); 
			hssfCell = hssfRow.getCell(column);
			if (hssfCell != null) { // if cell exists
				if (column != DATE_COLUMN) {
					hssfCell.setCellValue(string); // then set the value
				}
			}
			break;

		case FormulaRecord.sid:
			
			FormulaRecord formulaRecord = (FormulaRecord) record;
			row = formulaRecord.getRow();
			column = formulaRecord.getColumn();
			double doubleValue = formulaRecord.getValue();
			
			// get corresponding cell, if any
			hssfRow = currentHssfSheet.getRow(row); 
			hssfCell = hssfRow.getCell(column);
			if (hssfCell != null) { // if cell exists
				//if (column != DATE_COLUMN) {
					hssfCell.setCellValue(doubleValue); // then set the value
				//}
			}
			
			break;
			
		case EOFRecord.sid: // End Of File
			//EOFRecord eofRecord = (EOFRecord) record;
			eofCount++;
			// if this is last sheet
			if (eofCount == totalNumberOfSheets) {
				//System.out.println("MyHSSFListener.processRecord(): EOF: sid:" + eofRecord.getSid() + "\t toString: " + eofRecord.toString());
				System.out.println("MyHSSFListener.processRecord(): EOFRecord: hssfWorkbook: " + hssfWorkbook.getNumberOfSheets());
				final String filename = "C:\\hssf\\myHssfWorkbook-" + getCurrentTime() + ".xls";
				FileOutputStream fileOutputStream = null;
				try {
					fileOutputStream = new FileOutputStream(filename);
					hssfWorkbook.write(fileOutputStream);
				}
				catch (FileNotFoundException e) {
					e.printStackTrace();
				}
				catch (IOException e) {
					e.printStackTrace();
				}
			}
			break;

		}
	}

	/**
	 * 
	 * @return
	 */
	private String getCurrentTime() {

		final String DATE_FORMAT_NOW = "yyyy-MM-dd_HH-mm-ss";

		Calendar calendar = Calendar.getInstance();
		SimpleDateFormat simpleDateFormat = new SimpleDateFormat(DATE_FORMAT_NOW);
		return simpleDateFormat.format(calendar.getTime());

	}

	/**
	 * Read an excel file and spit out what we find.
	 *
	 * @param args Expect one argument that is the file to read.
	 * @throws IOException  When there is an error processing the file.
	 */
	public static void main(String[] args) throws IOException {

		// create a new file input stream with the input file specified at the command line
		FileInputStream fileInputStream = new FileInputStream(args[0]);

		// create a new org.apache.poi.poifs.filesystem.Filesystem
		POIFSFileSystem poiFSFileSystem = new POIFSFileSystem(fileInputStream);

		// get the Workbook (excel part) stream in a InputStream
		InputStream inputStream = poiFSFileSystem.createDocumentInputStream("Workbook");

		// construct out HSSFRequest object
		HSSFRequest hssfRequest = new HSSFRequest();

		// lazy listen for ALL records with the listener shown above
		hssfRequest.addListenerForAllRecords(new MyHSSFListener());

		// create our event factory
		HSSFEventFactory hssfEventFactory = new HSSFEventFactory();

		// process our events based on the document input stream
		hssfEventFactory.processEvents(hssfRequest, inputStream);

		// once all the events are processed close our file input stream
		fileInputStream.close();

		// and our document input stream (don't want to leak these!)
		inputStream.close();

		System.out.println("done.");
	}
}

Open in new window

test.xls
Avatar of for_yan
for_yan
Flag of United States of America image



Looks like the order of events in HSSFListener puzzled also other people.

This is a rather old posting, and it suggests rather difficult to access sources
with explanation:
http://web.archiveorange.com/archive/v/V3EvmYRBjO0s3MXPwOMG

There should be something newer and more accessible, but I could not find it yet.
Avatar of allelopath
allelopath

ASKER

I've just posted this question to the user@poi.apached.org mailing list as well.

ASKER CERTIFIED SOLUTION
Avatar of for_yan
for_yan
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
In the meantime, trying to find a workaround, I tried adding a dummy sheet, but that doesn't seem to help:

case BOFRecord.sid: // Beginning Of File
	BOFRecord bofRecord = (BOFRecord) record;
	if (bofRecord.getType() == bofRecord.TYPE_WORKBOOK) {
		System.out.println("MyHSSFListener.processRecord(): Encountered workbook ");
		hssfWorkbook = new HSSFWorkbook();
		hssfWorkbook.createSheet("Dummy Sheet"); // add this line
	}

Open in new window


This what I read in the link I mentioned above:

---------------------------------------
In my current setup, I put all bound sheets into a list, and have a
pointer starting at -1 which I increment by one each time I encouter a
sheet, just to get the sheet name for a given entry. This appears clumsy
to me, but I cannot set how to solve that problem otherwise when using
the event-based API.
------------------------------------------

and I remember when I was trying to work with HSSFListener I did the same thing.
So at least it looks like the sheets are encountered in the expected order.




yes, that was your previous question when I already once was trying to get this correlation:
https://www.experts-exchange.com/questions/27051819/POI-HSSFListener-determine-sheet.html?sfQueryTermInfo=1+10+30+poif+yan
On the mailing list, it was suggested that I look at the file using BiffViewer. In the output (attached) there are only 2 BOF RECORD - worksheets, when I know there are 3 sheets.

Offset=0x000143FE(82942) recno=4155 sid=0x0809 size=0x0010(16)
[BOF RECORD]
    .version  = 0x0600
    .type     = 0x0010 (worksheet)
    .build    = 0x1FAA
    .buildyear= 1997
    .history  = 0x000100C9
    .reqver   = 0x00000406
[/BOF RECORD]


BiffViewerOutput.txt
Oh, the output file is incomplete, disregard that .
I've found the problem. There is an EOF for each sheet and an EOF for the actual End Of File. So I needed to change one line in my example:

if (eofCount == totalNumberOfSheets + 1) {
Oh, I see, so they still follow the order - good to know for the future.
This HSSFListener is not the simple stuff, I should admit.
Maybe it even makes sense to try to get hold of taht old book they were mentioning with the description of the format.
Thanks.