Solved

Reading chinese characters using POI

Posted on 2010-11-24
31
1,745 Views
Last Modified: 2012-05-10
Hi,

I have a file(.xls) which contains some chinese characters. Can you tell me
whether POI can read the chinese characters from the cells.

I do not wanted solution using "sun.jdbc.odbc.JdbcOdbcDriver" because my first 2 row has some information and depending on header starts from row 3 or 4.

and using sun.jdbc.odbc.JdbcOdbcDriver select query will be difficult

Therefore POI solution will be highly required.

Please Guide

Thanks
0
Comment
Question by:tia_kamakshi
  • 13
  • 11
  • 7
31 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 34207294
afaik Unicode is supported
0
 
LVL 92

Expert Comment

by:objects
ID: 34209368
POI has had some issues handling chinese characters correctly, but I think most of them have been fixed. So make sure you are usong the latest version.

You will also need to correctly set the encoding used when writing each cell

0
 

Author Comment

by:tia_kamakshi
ID: 34210820
Please give me some example code to read excel file to correct encoding.

What latest version I should use for POI? What version you recommend.

I am getting excel file from client. I can open excel file and I am able to read chinese charaters from excel file

Please suggest with example

Many Thanks

0
 
LVL 92

Expert Comment

by:objects
ID: 34210920
0
 

Author Comment

by:tia_kamakshi
ID: 34211407
Thanks for your reply.

In this example we are reading plain text in an excel file

Which I am alreadng.

I need to read chinese characater

Please help me in reading chinese from POI

All chinese character is displaying as ????

Please Guide
0
 
LVL 92

Expert Comment

by:objects
ID: 34211430
Did you set the cell encoding?

 cell.setEncoding(HSSFCell.ENCODING_UTF_16);

And how are you displaying the cell text? Make sure you are using a font that supports chinese
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 34211880
If you post your file, we can take a look
0
 

Author Comment

by:tia_kamakshi
ID: 34212100
Please find sample excel file attached. Please see product sheet only

Many Thanks
VLookupSample.xls
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 34212690
Using this : http://technojeeves.com/joomla/index.php/free/113-read-excel-spreadsheet-with-java  i was able to read it. You can see the result below:
chin.png
0
 

Author Comment

by:tia_kamakshi
ID: 34212908
Hi,

I have also ran the code in Netbean6.5. My output was comming as ????

Then I again my jar file with the same code in command prompt, there also I get the output as ?????

I am using POI version 3.7. what POI version you have used?

Please find attached my output attached and librairies used

Please suggest
screenshot.PNG
librariesScreenshot.PNG
0
 

Author Comment

by:tia_kamakshi
ID: 34212989
With addition to above comments.

As suggested by objects to use

 cell.setEncoding

I don't know why my cell do not have method setEncoding

My HSSFCell do not have method setEncoding

Error says cannot find the symbol setEncoding

HSSFCell cell = (HSSFCell) list.get(j);
cell.setEncoding(HSSFCell.ENCODING_UTF_16);

Currently I am displaying text as
System.out.print(cell.getRichStringCellValue());

It is displaying output on Netbean6.5 output window.

Also I tried on command prompt.


I have the same issue

Please suggest
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 34213659
>>Then I again my jar file with the same code in command prompt, there also I get the output as ?????

You're VERY unlikely ever to see Chinese characters in a Windows command prompt. There are several hoops to jump through before that can happen

>>I have also ran the code in Netbean6.5. My output was comming as ????

You need to ensure that Netbeans is using a font that supports Chinese
0
 
LVL 92

Expert Comment

by:objects
ID: 34214494
> System.out.print(cell.getRichStringCellValue());

you cannot do that to display chinese
instead save it to a file and display file with application that supports chinese
0
 

Author Comment

by:tia_kamakshi
ID: 34216710
Hi,

I have saved the text in file. Still no joy.

In output file I am still getting chinese character as ??????

Please see my log file attached with code I am using

Please guide

Many Thanks
package com.pbms.exceltest.businesslogic;



import org.apache.poi.hssf.usermodel.HSSFCell;

import org.apache.poi.hssf.usermodel.HSSFCellStyle;

import org.apache.poi.hssf.usermodel.HSSFDataFormat;

import org.apache.poi.hssf.usermodel.HSSFFont;

import org.apache.poi.hssf.usermodel.HSSFRichTextString;

import org.apache.poi.hssf.usermodel.HSSFRow;

import org.apache.poi.hssf.usermodel.HSSFSheet;

import org.apache.poi.hssf.usermodel.HSSFWorkbook;

import org.apache.poi.ss.usermodel.Cell;

import org.apache.poi.ss.usermodel.Row;

import org.apache.poi.ss.util.CellRangeAddress;



/* ====================================================================

   Licensed to the Apache Software Foundation (ASF) under one or more

   contributor license agreements.  See the NOTICE file distributed with

   this work for additional information regarding copyright ownership.

   The ASF licenses this file to You under the Apache License, Version 2.0

   (the "License"); you may not use this file except in compliance with

   the License.  You may obtain a copy of the License at



       http://www.apache.org/licenses/LICENSE-2.0



   Unless required by applicable law or agreed to in writing, software

   distributed under the License is distributed on an "AS IS" BASIS,

   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

   See the License for the specific language governing permissions and

   limitations under the License.

==================================================================== */

import java.io.FileInputStream;

import java.io.FileOutputStream;

import java.io.IOException;



import java.util.Iterator;

import org.apache.log4j.Logger;



/**

 * File for HSSF testing/examples

 *

 * THIS IS NOT THE MAIN HSSF FILE!! This is a utility for testing functionality.

 * It does contain sample API usage that may be educational to regular API

 * users.

 *

 * @see #main

 * @author Andrew Oliver (acoliver at apache dot org)

 * @author Charles Johnson

 */

public class HssfDumper {

    Logger mL = Logger.getLogger(HssfDumper.class);

    /**

     * creates an {@link HSSFWorkbook} the specified OS filename.

     */

    private static HSSFWorkbook readFile(String filename)

        throws IOException {

        return new HSSFWorkbook(new FileInputStream(filename));

    }



    public void processExcel(String file) {

//        if (file.length < 1) {

//            System.err.println("java HssfDumper <file to dump>");

//

//            return;

//        }



        ExcelTestBaseBL baseBL = new ExcelTestBaseBL();



        String fileName = file;



        try {

            HSSFWorkbook wb = HssfDumper.readFile(fileName);



            for (int k = 0; k < wb.getNumberOfSheets(); k++) {

                HSSFSheet sheet = wb.getSheetAt(k);

                int numRows = sheet.getPhysicalNumberOfRows();

                System.out.printf("Sheet %d '%s' has %d row(s)\n", k,

                    wb.getSheetName(k), numRows);



               // mL.info("Sheet %d '%s' has %d row(s)\n"+wb.getSheetName(k)+""+numRows );



                for (Iterator<Row> rows = sheet.rowIterator(); rows.hasNext();) {

                    HSSFRow row = (HSSFRow) rows.next();



                    if (row == null) {

                        continue;

                    }



                    int numCells = row.getPhysicalNumberOfCells();

                    System.out.printf("Row %d has %d cell(s)\n",

                        row.getRowNum(), numCells);



                    for (Iterator<Cell> cells = row.cellIterator();

                            cells.hasNext();) {

                        HSSFCell cell = (HSSFCell) cells.next();

                        String value = null;



                        switch (cell.getCellType()) {

                            case HSSFCell.CELL_TYPE_FORMULA:

                                value = "FORMULA value=" +

                                    cell.getCellFormula();



                                break;



                            case HSSFCell.CELL_TYPE_NUMERIC:

                                value = "NUMERIC value=" +

                                    cell.getNumericCellValue();



                                break;



                            case HSSFCell.CELL_TYPE_STRING:

                                value = "STRING value=" +

                                    cell.getStringCellValue();



                                break;



                            default:

                        }



                        System.out.printf("Cell (%d,%d) %s\n",

                            cell.getRowIndex(), cell.getColumnIndex(), value);



                        mL.info(value);

                        baseBL.saveLog(value);

                    }

                }

            }

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

}

Open in new window

// Saving log to file in attached (Log.txt)



public void saveLog(String text)

            throws Exception {

        if (text != null) {

            try {

                String fileName = "E:\\SMEC\\Logs\\ExcelTest\\log.txt";

                FileWriter fstream = new FileWriter(fileName, true);

                BufferedWriter out = new BufferedWriter(fstream);

                out.write(text);

                String newLine = System.getProperty("line.separator");

                out.write(newLine);

                out.close();

                fstream.close();

            } catch (Exception ex) {

                mL.error("Unable to print text: " + text, ex);

                System.out.print("Unable to print " + text);

                System.exit(0);

            }

        }

    }

Open in new window

log.txt
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 34216727
You don't specify any encoding for output, so it will use that returned by

System.out.println(System.getProperty("file.encoding"));

(which probably doesn't support Chinese)



0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 86

Expert Comment

by:CEHJ
ID: 34216937
I adapted the code to use an optional utf-8-encoded output file

http://technojeeves.com/joomla/index.php/free/113-read-excel-spreadsheet-with-java

Attached is the result:




xls-out.txt
0
 

Author Comment

by:tia_kamakshi
ID: 34217452
I have also used the same code I have posted above

What I have changed is I have modified main method to method processExcel() and then called it from my main method

And then I am prinitng it to my text file using method saveLog(value);

I have also pasted my savelog method above which is printing text to file

Pasting my this code below again

I can see in the link you have sent that PrintWriter is using UTF-8 as follows

out = new PrintWriter(new OutputStreamWriter(
                            new FileOutputStream(args[1]), "UTF-8"));


Do I need to modify to my method saveLog(String text) to reveive and write in UTF-8 encode only

Please see my code and advice

Many Thanks
// Saving log to file in attached (Log.txt)



public void saveLog(String text)

            throws Exception {

        if (text != null) {

            try {

                String fileName = "E:\\SMEC\\Logs\\ExcelTest\\log.txt";

                FileWriter fstream = new FileWriter(fileName, true);

                BufferedWriter out = new BufferedWriter(fstream);

                out.write(text);

                String newLine = System.getProperty("line.separator");

                out.write(newLine);

                out.close();

                fstream.close();

            } catch (Exception ex) {

                mL.error("Unable to print text: " + text, ex);

                System.out.print("Unable to print " + text);

                System.exit(0);

            }

        }

    }

Open in new window

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 34217624
>>Do I need to modify to my method saveLog(String text) to reveive and write in UTF-8 encode only

Yes - or another encoding that allows Chinese, but since utf-8 is universal, i'd stick with that
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 34217641
Incidentally, in case you don't realise, you can just use log4j (encoding set to utf-8) to write to a file instead of using that highly inefficient saveLog method
0
 

Author Comment

by:tia_kamakshi
ID: 34218013
Hi CEHJ,

These are my test methods. In my project I need to create output as html file after extracting these values.

Therefore I cannot use log4j

Can you please help me in modifying method saveLog()

Then I may need to change the code at line 123 (baseBL.saveLog(value);) in my main code

Please help me in fixing this problem

Many Thanks
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 34218613
Why don't you simply use the code at the link i posted?
0
 
LVL 92

Expert Comment

by:objects
ID: 34219864
> Why don't you simply use the code at the link i posted?

Thats just a copy of what I had already posted earlier

for your saveLog() method try this:


public void saveLog(String text)
            throws Exception {
        if (text != null) {
            try {
                String fileName = "E:\\SMEC\\Logs\\ExcelTest\\log.txt";
                Writer fstream = new OutputStreamWriter(new FileOutputStream(fileName, true));
                BufferedWriter out = new BufferedWriter(fstream);
                out.write(text);
                String newLine = System.getProperty("line.separator");
                out.write(newLine);
                out.close();
                fstream.close();
            } catch (Exception ex) {
                mL.error("Unable to print text: " + text, ex);
                System.out.print("Unable to print " + text);
                System.exit(0);
            }
        }
    }
0
 

Author Comment

by:tia_kamakshi
ID: 34238978
Thanks for your replies

I am using the same code you have posted in the URL.

You can see my code is exactly same what you have in same URL

Modifying the method "saveLog(String text)" also has not helped

Please guide
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 34239080
The last code posted in this question is wrong btw. As i mentioned above, you need to use UTF-8 in your Writer
0
 

Author Comment

by:tia_kamakshi
ID: 34239471
Can you please help me in fixing the code which saves text in my file

public void saveLog(String text)
            throws Exception {
        if (text != null) {
            try {
                String fileName = "E:\\SMEC\\Logs\\ExcelTest\\log.txt";
                Writer fstream = new OutputStreamWriter(new FileOutputStream(fileName, true));
                BufferedWriter out = new BufferedWriter(fstream);
                out.write(text);
                String newLine = System.getProperty("line.separator");
                out.write(newLine);
                out.close();
                fstream.close();
            } catch (Exception ex) {
                mL.error("Unable to print text: " + text, ex);
                System.out.print("Unable to print " + text);
                System.exit(0);
            }
        }
    }
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 34239577
I'm wondering why you're not using ready-made code that works and does what you want? (The code i posted)

As i mentioned, you need to specify UTF-8 encoding


Writer fstream = new OutputStreamWriter(new FileOutputStream(fileName, true), "UTF-8");

Open in new window

0
 
LVL 92

Expert Comment

by:objects
ID: 34242990
> Can you please help me in fixing the code which saves text in my file

good to see your using what I suggested above, all you need is to now specify the correct encoding, for example if it is UTF-8 then you would use:

public void saveLog(String text)
            throws Exception {
        if (text != null) {
            try {
                String fileName = "E:\\SMEC\\Logs\\ExcelTest\\log.txt";
                Writer fstream = new OutputStreamWriter(new FileOutputStream(fileName, true), "UTF-8");
                BufferedWriter out = new BufferedWriter(fstream);
                out.write(text);
                String newLine = System.getProperty("line.separator");
                out.write(newLine);
                out.close();
                fstream.close();
            } catch (Exception ex) {
                mL.error("Unable to print text: " + text, ex);
                System.out.print("Unable to print " + text);
                System.exit(0);
            }
        }
    }


0
 
LVL 86

Expert Comment

by:CEHJ
ID: 34244060
>>... for example if it is UTF-8 then you would use: ...

There's

a. no need to repeat code already posted
b. no need to reinforce the use of bad practice
0
 
LVL 92

Expert Comment

by:objects
ID: 34244260
Grow up. I didn't repeat anything, I was just expanding on my earlier suggestion
0
 

Author Closing Comment

by:tia_kamakshi
ID: 34248832
Many Thanks it worked great now
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 34248873
:)
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Introduction This article is the first of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article explains our test automation goals. Then rationale is given for the tools we use to a…
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
Viewers learn about the “while” loop and how to utilize it correctly in Java. Additionally, viewers begin exploring how to include conditional statements within a while loop and avoid an endless loop. Define While Loop: Basic Example: Explanatio…
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now