[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

UTF8.txt file to UTF8 xml

Posted on 2006-07-05
9
Medium Priority
?
278 Views
Last Modified: 2008-02-01
Hi.
 I have a UTF8 file that I need to read and convert to xml (allso utf8).
The UTF file is in CZECH and I'm on an US system.
I have tree questions:
I'm reading the file like this:
            input = new BufferedReader( new FileReader(aFile) );
            String line = null;
            int i = 0;
            while (( line = input.readLine()) != null && i < 100){
                i ++;
                line.trim();
                System.out.println("l:" + line);

1. When I print it out, I get lines like: Oznámení zadávacího řízenÃ. And that does not look correct. (I don't know Czech, but I'd ecpect som Czech letters.) ­ Any Idea what I do wrong?
2. I need to read the file line for line and I neet to remove "\t" "\n" and "  " from the start of the line. Is there any way I can print these "hidden" chars so I can see what the original file is using?
3. I have some text like this: "
  TI: UK-Cardiff: KOBO CESIE EEIG  
  PD: 20060620
  ND: 121873-2006"
And I need to convert it to "<ti><country>UK</country><city>Cardiff</city><name>KOBO CESIE EEIG</name></ti>
<pd>20060620</pd>
<nd>121873-2006</nd>
Any ideas how I can split the text like this? Just to make things worse, sometimes the TI: text has several lines separatet with "\n" and serveral " " (banks or space).

0
Comment
Question by:kristian_gr
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
  • 2
  • +1
9 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 17041076
You need to use a font that support Czech charcters

>>line.trim();

should be

line =
line.trim();
0
 
LVL 92

Expert Comment

by:objects
ID: 17041080
try:

            input = new BufferedReader( new InputStreamReader(new FileInputStream(aFile), "UTF8") );

you'll also need to have a font installed that support czech to display it
0
 
LVL 14

Expert Comment

by:StillUnAware
ID: 17041086
System.out.println(...) uses a console or a command line, so don't expect to see there the foreign charaters, You could see them only in an awt or swing component like JTextArea and only in the font which has the glyphs defined for the language You are using. I would suggest You to convert the text file to xml and only then check whether the characters are recognizable.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 86

Expert Comment

by:CEHJ
ID: 17041090
>>Just to make things worse,

If it's not too big, you'd be better off reading it all into one String
0
 

Author Comment

by:kristian_gr
ID: 17041321
ok, this seams to work:  
            BufferedWriter w = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File("c:/test.txt")), "UTF8"));
            BufferedReader input = new BufferedReader( new InputStreamReader(new FileInputStream(aFile), "UTF8") );
            String line = null;
            int i = 0;
            while (( line = input.readLine()) != null && i < 400){
                i ++;
//                line = line.trim();
                w.write(line);
                w.newLine();
            }
            w.cl

If I open my test.txt in word it seams to have the right cahracters.
Using trim is not a good idea. The file is formated in a way that using 4 blanks/space in the start of a line, it indicates that it realy belongs to the line above.
I therfore thik I need to do if(! String line starts with 4 blanks){w.newline();}
But how can I test if a line starts with 4 blanks?  
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 2000 total points
ID: 17041367
>>But how can I test if a line starts with 4 blanks?  

if (line.startsWith("    "))
0
 

Author Comment

by:kristian_gr
ID: 17042044
blush!
Some days even the easiest things it hard. This is one of those.
tnx CEHJ.

And sinse I know you are good at regex'es, you probably have an idea about this:
Some of my lines starts with XX: as in two Uppercase chars, and a ":". If that occures I'd like to split the String at the first ":" into String[0] and String[1]. But the text in String[1] might allso contain serveral ":".
eks: String test = "PT: testString : has some more text";
String[0] result: PT
String[1] result: "testString : has some more text";
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 17042081
if line.matches("^[A-Z]{2}:.+")) {
    String[]  tokens  = line.split(":", 2);
}
0
 
LVL 92

Expert Comment

by:objects
ID: 17049149
how does that answer your original question?
the important thing was to specify the appropriate encoding.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

An old method to applying the Singleton pattern in your Java code is to check if a static instance, defined in the same class that needs to be instantiated once and only once, is null and then create a new instance; otherwise, the pre-existing insta…
For beginner Java programmers or at least those new to the Eclipse IDE, the following tutorial will show some (four) ways in which you can import your Java projects to your Eclipse workbench. Introduction While learning Java can be done with…
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
Suggested Courses

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question