2 notes on using Java & Apache POI to read & write Excel spreadsheets

I’m currently working on a project for a small manufacturing operation. They wanted to automate the way they schedule & produce work orders or jobs. Their biggest concern was what I like to call “forward compatibility”. Forward compatibility means that if they need to make changes to their system in the future, then they won’t need me to redesign their system, there should be a simple method to add, edit, update procedures, etc… When I took a look at their existing processes, I found a lot of Excel spreadsheets driving their operations. For instance, there is a spreadsheet that drives their “smart” part numbering system. There is a spreadsheet that drives their product testing operations. Plus, these spreadsheets are all Revision Controlled (that’s Version Control in the IT world).

My solution was to use their existing spreadsheets as inputs to my Custom Job Control System. I designed the system to always use the most current version of the relevant spreadsheets (which are always stored on the network in a static location) as data. That way, they can simply control those documents as they always have and they won’t have to reconfigure my system when they have changes.

The main thing that makes this system possible is the POI project from the Apache Software Foundation. The POI project is a pure Java API for reading & writing Microsoft Office Documents. It allows simple Java Object Oriented access to MS Excel, Word, PowerPoint, Visio, Outlook & Publisher. However, some parts are still in their infancy. In particular, MS Word access is simple and not very well developed. MS Excel access, however, is about as powerful as you could possibly need. Accessing spreadsheets is simple and powerful, IF you know what parts of the spreadsheet you need to access. The POI project makes it simple to access Excel documents, but Microsoft didn’t make Excel documents simple to use (at least from the perspective of the underlying implementation). So, without further ado, here are the 2 most important lessons that I’ve learned about reading Excel spreadsheets with Apache POI.

First, get access to any cell in a spreadsheet like so:
XSSFWorkbook workbook = new XSSFWorkbook("Path/to/spreadsheet/file.xlsx");

//Get access to the workbook itself
XSSFSheet sheet = workbook.getSheet("Name of Tab");

//Get an iterator through all of the rows in the sheet
Iterator rows = sheet.rowIterator();
while (rows.hasNext ()){
XSSFRow row = (XSSFRow) rows.next ();
// once we get a row its time to iterate through cells.
Iterator cells = row.cellIterator ();
while (cells.hasNext ()){
XSSFCell cell = (XSSFCell) cells.next ();
cell.doSomething();
}
}

1. Always test the cell type before attempting to parse its data. POI can read both text and numeric cells, but it has to know which is which. One way to do that is like this:
//get an iterator throught all of the cells in a row
Iterator cells = row.cellIterator ();
while (cells.hasNext ()){
XSSFCell cell = (XSSFCell) cells.next ();
//We need a numeric value from this cell, so check to make sure it's numeric
if(cell.getCellType()== XSSFCell.CELL_TYPE_NUMERIC){
variable = cell.getNumericCellValue()
}

2. This is an issue with Excel rather than with POI, but it can be a real bear to figure out. When you set a cell’s format in Excel, you are actually setting the “default” format of the cell. When you first add data to a cell in Excel, it’s format is “General” and Excel attempts to be helpful by guessing whether its textual or numeric. Sometimes it gets it wrong, but even if it doesn’t and you decide to change the format, it doesn’t change the format of data already entered. For example, you have a column of numbers that you want to get with POI, but you don’t know how they might be formatted, so you want to et the String representation of each number. Well, you can’t. Even if you go into the spreadsheet and set the cells format to “Text”, Excel still thinks they’re numbers, and you will get the “cannot get text from numeric cell” exception. What you have to do is ensure that the cells are set to the correct format before any data is entered, or you can set the format, then re-enter the data.

Once you have a firm grasp of how Excel represents its own data structures, then POI makes Excel access as easy as pie. With MS Excel firmly entrenched with so many small & medium businesses, Apache POI is a godsend. It’s a simple set of modules that allows developers easy access to all of those legacy MS Office documents.