allelopath
asked on
Problem with POI HssfListener
The problem is that I am getting the EOF Record before the last sheet (ie bofRecord.TYPE_WORKSHEET for the last sheet) and hence the last sheet is not being processed. The output of the attached code is:
MyHSSFListener.processReco rd(): Encountered workbook
MyHSSFListener.processReco rd(): New sheet named: MySheet1 12792
MyHSSFListener.processReco rd(): New sheet named: MySheet2 66686
MyHSSFListener.processReco rd(): New sheet named: MySheet3 78425
MyHSSFListener.processReco rd(): Encountered sheet reference 2057 processing: MySheet1
MyHSSFListener.processReco rd(): Encountered sheet reference 2057 processing: MySheet2
MyHSSFListener.processReco rd(): EOFRecord: hssfWorkbook: 2
MyHSSFListener.processReco rd(): Encountered sheet reference 2057 processing: MySheet3
done.
I would expect the last 3 lines to be in this order:
MyHSSFListener.processReco rd(): Encountered sheet reference 2057 processing: MySheet3
MyHSSFListener.processReco rd(): EOFRecord: hssfWorkbook: 2
done.
SSCCE and excel data file attached
MyHSSFListener.processReco
MyHSSFListener.processReco
MyHSSFListener.processReco
MyHSSFListener.processReco
MyHSSFListener.processReco
MyHSSFListener.processReco
MyHSSFListener.processReco
MyHSSFListener.processReco
done.
I would expect the last 3 lines to be in this order:
MyHSSFListener.processReco
MyHSSFListener.processReco
done.
SSCCE and excel data file attached
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.HashMap;
import java.util.Map;
import org.apache.poi.hssf.eventusermodel.HSSFEventFactory;
import org.apache.poi.hssf.eventusermodel.HSSFListener;
import org.apache.poi.hssf.eventusermodel.HSSFRequest;
import org.apache.poi.hssf.record.BOFRecord;
import org.apache.poi.hssf.record.BoundSheetRecord;
import org.apache.poi.hssf.record.EOFRecord;
import org.apache.poi.hssf.record.FormulaRecord;
import org.apache.poi.hssf.record.LabelSSTRecord;
import org.apache.poi.hssf.record.NumberRecord;
import org.apache.poi.hssf.record.Record;
import org.apache.poi.hssf.record.RowRecord;
import org.apache.poi.hssf.record.SSTRecord;
import org.apache.poi.hssf.record.common.UnicodeString;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFDateUtil;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
/**
* Use the event API for reading an excel file
*/
public class MyHSSFListener implements HSSFListener {
private SSTRecord sstRecord;
private boolean processingSheet1;
private boolean processingSheet2;
private boolean processingSheet3;
// no way to directly link row to sheet or to directly know which sheet we are actually processing.
// it seems first to inventory the sheets via BoundSheetRecord events, so create a HashMap of sheets with processing this event
private Map<Integer, String> sheetMap;
private int totalNumberOfSheets; // for counting BoundSheetRecord
private int sheetReferenceNumber; // for counting bofRecord.TYPE_WORKSHEET
final private String MYSHEET1 = "MySheet1";
final private String MYSHEET2 = "MySheet2";
final private String MYSHEET3 = "MySheet3";
private HSSFWorkbook hssfWorkbook;
private HSSFSheet currentHssfSheet;
final protected int DATE_COLUMN = 1; // on 2nd sheet
final String DATE_FORMAT = "dd/MM/yyyy"; // MM because mm is minutes
final SimpleDateFormat simpleDateFormat = new SimpleDateFormat(DATE_FORMAT);
// when this count is the same as the number of sheets,
// then we have found the last sheet and hence really the EOF
protected int eofCount;
/**
* constructor
*/
public MyHSSFListener () {
super();
sheetMap = new HashMap<Integer, String>();
totalNumberOfSheets = 0;
sheetReferenceNumber = 0;
this.processingSheet1 = false;
this.processingSheet2 = false;
this.processingSheet3 = false;
}
/**
* This method listens for incoming records and handles them as required.
* @param record The record that was found while reading.
*/
public void processRecord(Record record) {
switch (record.getSid()) {
// the BOFRecord can represent either the beginning of a sheet or the workbook
case BOFRecord.sid: // Beginning Of File
BOFRecord bofRecord = (BOFRecord) record;
if (bofRecord.getType() == bofRecord.TYPE_WORKBOOK) {
System.out.println("MyHSSFListener.processRecord(): Encountered workbook ");
hssfWorkbook = new HSSFWorkbook();
}
else if (bofRecord.getType() == bofRecord.TYPE_WORKSHEET) {
String sheetName = sheetMap.get(sheetReferenceNumber);
System.out.println("MyHSSFListener.processRecord(): Encountered sheet reference "+ bofRecord.getSid() + "\t processing: " + sheetName);
currentHssfSheet = hssfWorkbook.createSheet(sheetName);
//System.out.println ("MyHSSFListener.processRecord(): currentHssfSheet: " + currentHssfSheet.getSheetName());
if (sheetName.startsWith(MYSHEET1)) {
this.processingSheet1 = true;
this.processingSheet2 = false;
this.processingSheet3 = false;
}
else if (sheetName.startsWith(MYSHEET2)) {
this.processingSheet1 = false;
this.processingSheet2 = true;
this.processingSheet3 = false;
}
else if (sheetName.startsWith(MYSHEET3)) {
this.processingSheet1 = false;
this.processingSheet2 = false;
this.processingSheet3 = true;
}
else {
this.processingSheet1 = false;
this.processingSheet2 = false;
this.processingSheet3 = false;
}
sheetReferenceNumber++;
}
break;
case BoundSheetRecord.sid:
BoundSheetRecord boundSheetRecord = (BoundSheetRecord) record;
String sheetName = boundSheetRecord.getSheetname();
System.out.println("MyHSSFListener.processRecord(): New sheet named: " + sheetName + "\t" + boundSheetRecord.getPositionOfBof());
sheetMap.put(totalNumberOfSheets, sheetName);
totalNumberOfSheets++;
break;
case RowRecord.sid:
final RowRecord rowRecord = (RowRecord) record;
int firstColumn = rowRecord.getFirstCol();
int lastColumn = rowRecord.getLastCol();
//System.out.println("MyHSSFListener.processRecord(): Row number: " + rowRecord.getRowNumber() + "\t first/last column: " + firstColumn + " / " + lastColumn);
HSSFRow hssfRow = currentHssfSheet.createRow(rowRecord.getRowNumber());
// create cells for this row
for (int column = firstColumn; column <= lastColumn; column++) {
HSSFCell newHssfCell = hssfRow.createCell(column);
//newHssfCell.setCellValue(column); // set value later
}
break;
case NumberRecord.sid:
NumberRecord numberRecord = (NumberRecord) record;
//System.out.println("MyHSSFListener.processRecord(): Cell found with value " + numberRecord.getValue() + " at row " + numberRecord.getRow() + " and column " + numberRecord.getColumn());
int row = numberRecord.getRow();
int column = numberRecord.getColumn();
// get corresponding cell, if any
hssfRow = currentHssfSheet.getRow(row);
HSSFCell hssfCell = hssfRow.getCell(column);
if (hssfCell != null) { // if cell exists
double doubleValue = numberRecord.getValue();
if (processingSheet2 && column == DATE_COLUMN) {
//System.out.println("MyHSSFListener.processRecord(): double: " + doubleValue);
java.util.Date utilDate = HSSFDateUtil.getJavaDate(numberRecord.getValue());
String string = simpleDateFormat.format(utilDate);
// swap mm/dd
String swapString = string.substring(3, 5) + "/" + string.substring(0, 2) + string.substring(5);
hssfCell.setCellValue(swapString);
}
else { // not a date, just a number
hssfCell.setCellValue(doubleValue); // then set the value
}
}
break;
// SSTRecords store a array of unique strings used in Excel.
case SSTRecord.sid: // Static String Table
//if (processingSheet2) {
sstRecord = (SSTRecord) record;
int numberOfUniqueStrings = sstRecord.getNumUniqueStrings();
for (int k = 0; k < sstRecord.getNumUniqueStrings(); k++) {
//System.out.println("MyHSSFListener.processRecord(): String table value " + k + " = " + sstRecord.getString(k));
//System.out.println("MyHSSFListener.processRecord(): cell " + sstRecord.getString table value " + k + " = " + sstRecord.getString(k));
}
//}
break;
case LabelSSTRecord.sid:
LabelSSTRecord labelSSTRecord = (LabelSSTRecord) record;
row = labelSSTRecord.getRow();
column = labelSSTRecord.getColumn();
final UnicodeString unicodeString = sstRecord.getString(labelSSTRecord.getSSTIndex());
final String string = unicodeString.toString();
// get corresponding cell, if any
hssfRow = currentHssfSheet.getRow(row);
hssfCell = hssfRow.getCell(column);
if (hssfCell != null) { // if cell exists
if (column != DATE_COLUMN) {
hssfCell.setCellValue(string); // then set the value
}
}
break;
case FormulaRecord.sid:
FormulaRecord formulaRecord = (FormulaRecord) record;
row = formulaRecord.getRow();
column = formulaRecord.getColumn();
double doubleValue = formulaRecord.getValue();
// get corresponding cell, if any
hssfRow = currentHssfSheet.getRow(row);
hssfCell = hssfRow.getCell(column);
if (hssfCell != null) { // if cell exists
//if (column != DATE_COLUMN) {
hssfCell.setCellValue(doubleValue); // then set the value
//}
}
break;
case EOFRecord.sid: // End Of File
//EOFRecord eofRecord = (EOFRecord) record;
eofCount++;
// if this is last sheet
if (eofCount == totalNumberOfSheets) {
//System.out.println("MyHSSFListener.processRecord(): EOF: sid:" + eofRecord.getSid() + "\t toString: " + eofRecord.toString());
System.out.println("MyHSSFListener.processRecord(): EOFRecord: hssfWorkbook: " + hssfWorkbook.getNumberOfSheets());
final String filename = "C:\\hssf\\myHssfWorkbook-" + getCurrentTime() + ".xls";
FileOutputStream fileOutputStream = null;
try {
fileOutputStream = new FileOutputStream(filename);
hssfWorkbook.write(fileOutputStream);
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
}
break;
}
}
/**
*
* @return
*/
private String getCurrentTime() {
final String DATE_FORMAT_NOW = "yyyy-MM-dd_HH-mm-ss";
Calendar calendar = Calendar.getInstance();
SimpleDateFormat simpleDateFormat = new SimpleDateFormat(DATE_FORMAT_NOW);
return simpleDateFormat.format(calendar.getTime());
}
/**
* Read an excel file and spit out what we find.
*
* @param args Expect one argument that is the file to read.
* @throws IOException When there is an error processing the file.
*/
public static void main(String[] args) throws IOException {
// create a new file input stream with the input file specified at the command line
FileInputStream fileInputStream = new FileInputStream(args[0]);
// create a new org.apache.poi.poifs.filesystem.Filesystem
POIFSFileSystem poiFSFileSystem = new POIFSFileSystem(fileInputStream);
// get the Workbook (excel part) stream in a InputStream
InputStream inputStream = poiFSFileSystem.createDocumentInputStream("Workbook");
// construct out HSSFRequest object
HSSFRequest hssfRequest = new HSSFRequest();
// lazy listen for ALL records with the listener shown above
hssfRequest.addListenerForAllRecords(new MyHSSFListener());
// create our event factory
HSSFEventFactory hssfEventFactory = new HSSFEventFactory();
// process our events based on the document input stream
hssfEventFactory.processEvents(hssfRequest, inputStream);
// once all the events are processed close our file input stream
fileInputStream.close();
// and our document input stream (don't want to leak these!)
inputStream.close();
System.out.println("done.");
}
}
test.xls
ASKER
I've just posted this question to the user@poi.apached.org mailing list as well.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
In the meantime, trying to find a workaround, I tried adding a dummy sheet, but that doesn't seem to help:
case BOFRecord.sid: // Beginning Of File
BOFRecord bofRecord = (BOFRecord) record;
if (bofRecord.getType() == bofRecord.TYPE_WORKBOOK) {
System.out.println("MyHSSFListener.processRecord(): Encountered workbook ");
hssfWorkbook = new HSSFWorkbook();
hssfWorkbook.createSheet("Dummy Sheet"); // add this line
}
This what I read in the link I mentioned above:
--------------------------
In my current setup, I put all bound sheets into a list, and have a
pointer starting at -1 which I increment by one each time I encouter a
sheet, just to get the sheet name for a given entry. This appears clumsy
to me, but I cannot set how to solve that problem otherwise when using
the event-based API.
--------------------------
and I remember when I was trying to work with HSSFListener I did the same thing.
So at least it looks like the sheets are encountered in the expected order.
yes, that was your previous question when I already once was trying to get this correlation:
https://www.experts-exchange.com/questions/27051819/POI-HSSFListener-determine-sheet.html?sfQueryTermInfo=1+10+30+poif+yan
ASKER
On the mailing list, it was suggested that I look at the file using BiffViewer. In the output (attached) there are only 2 BOF RECORD - worksheets, when I know there are 3 sheets.
Offset=0x000143FE(82942) recno=4155 sid=0x0809 size=0x0010(16)
[BOF RECORD]
.version = 0x0600
.type = 0x0010 (worksheet)
.build = 0x1FAA
.buildyear= 1997
.history = 0x000100C9
.reqver = 0x00000406
[/BOF RECORD]
BiffViewerOutput.txt
Offset=0x000143FE(82942) recno=4155 sid=0x0809 size=0x0010(16)
[BOF RECORD]
.version = 0x0600
.type = 0x0010 (worksheet)
.build = 0x1FAA
.buildyear= 1997
.history = 0x000100C9
.reqver = 0x00000406
[/BOF RECORD]
BiffViewerOutput.txt
ASKER
Oh, the output file is incomplete, disregard that .
ASKER
I've found the problem. There is an EOF for each sheet and an EOF for the actual End Of File. So I needed to change one line in my example:
if (eofCount == totalNumberOfSheets + 1) {
if (eofCount == totalNumberOfSheets + 1) {
Oh, I see, so they still follow the order - good to know for the future.
This HSSFListener is not the simple stuff, I should admit.
Maybe it even makes sense to try to get hold of taht old book they were mentioning with the description of the format.
Thanks.
This HSSFListener is not the simple stuff, I should admit.
Maybe it even makes sense to try to get hold of taht old book they were mentioning with the description of the format.
Thanks.
Looks like the order of events in HSSFListener puzzled also other people.
This is a rather old posting, and it suggests rather difficult to access sources
with explanation:
http://web.archiveorange.com/archive/v/V3EvmYRBjO0s3MXPwOMG
There should be something newer and more accessible, but I could not find it yet.