Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Java XML Parser question - How do i ????

Posted on 2004-04-05
40
Medium Priority
?
415 Views
Last Modified: 2013-11-23
I'm not a java programmer but i've inherited a system i now have to maintain.

I have a servlet that riggers a program that parses an xml file, upon finding tag "<WebContent>"
it creates a document in a Lotus Notes database ... works no problem.

I now have to check in the "<Content"> of the xml for a "new line" which i believe is "\n"  and replace with text "<p>" so the Lotus notes document will see a paragraph tag.  

sounds simple but got me stumped at the moment ...

see program & sample xml.



package com.vnunet.editkit;

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.util.*;
import lotus.domino.*;
import java.io.*;

public class SimpleHandler extends HandlerBase {

  private Session session = null;
  private Database database = null;
  private Document document = null;
  private StringBuffer buffer = null;

  public void startElement(String name, AttributeList attr)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
        NotesThread.sinitThread();

        this.session = NotesFactory.createSession();
        if (session == null) throw new SAXException("Cannot open Notes session");

        this.database = session.getDatabase(null, "dir\\kit.nsf");
        if (database == null) throw new SAXException("Cannot open Notes database");

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
        this.document = database.createDocument();

      if (document == null) throw new SAXException("Cannot create Notes document");



        // start code to replace line ending "\n" with "<p>" tag  05/04/2004




        // end code to replace line ending "\n" with "<p>" tag  05/04/2004



      } else {
        this.buffer = new StringBuffer();
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

  public void endElement(String name)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
        // Do nothing (notes doesn't like close)

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
      //added by mh to add formname to doc
      String Form = "Form";
      this.document.replaceItemValue("Form", "XMLDOc");

      String ModificationStatus = "ModificationStatus";
      this.document.replaceItemValue("ModificationStatus", "New");

      this.document.save(true, false);
        this.document = null;

      } else {
        document.replaceItemValue(name, this.buffer.toString());
        this.buffer = null;
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

  public void characters(char[] ch, int start, int length)
  throws SAXException {
    this.buffer.append(ch, start, length);
  }
}





<?xml version="1.0" encoding="UTF-8"?>
<Editkit>
<Home>
<WebContent>
<AssetID>xyz123</AssetID>
<Author>Fred Fish</Author>
<Title>This is a test</Title>
<Country>UK</Country>
<Content>This is some test text for testing to send through to the kit</Content>
</WebContent>
</Home>
<Home>
<WebContent>
<AssetID>abc123</AssetID>
<Author>Mr fish</Author>
<Title>This is a another test</Title>
<Country>USA</Country>
<Content>This is some more test text for testing to send through to the kit
This is some more test text for testing to send through to the kit
This is some more test text for testing to send through to the kit
</Content>
</WebContent>
</Home>
</Editkit>
0
Comment
Question by:stirnpanzer
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 21
  • 15
  • 2
40 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 10756467
First thing that should be said is that parsers have no obligation to preserve certain types of whitespace (such as newline) anyway - are you sure it would appear in the output anyway?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10756482
And it wouldn't occur in startElement. The place to be looking is in characters. I'd recommend replacing with <br /> as it's going to be a lot easier
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10756511
public void characters(char[] ch, int start, int length) throws SAXException {
      for(int i = start;i <= ch.length;i++) {
            if (ch[i] == '\n') {
                  buffer.append("<br />");
            }
            else {
                  buffer.append(ch[i]);
            }
      }
   
}

probably would do it. Test that loop counter, as i'm not sure it shouldn't actually be

      for(int i = start;i < ch.length;i++) {
0
Build and deliver software with DevOps

A digital transformation requires faster time to market, shorter software development lifecycles, and the ability to adapt rapidly to changing customer demands. DevOps provides the solution.

 
LVL 9

Expert Comment

by:mmuruganandam
ID: 10756523
Element element = // current element;

String str = element.getNodeValue();

StringTokenizer tokenizer = new StringTokenizer(str, "\r\n");

while (tokenizer.hasMoreTokens())
{
     buffer = buffer.append("<P>").append(tokenzier.nextToken()).append("</P>");
}

0
 

Author Comment

by:stirnpanzer
ID: 10756607
no problem using the "<br/>"

i need to know the code needs to go between the comments in the program :-  

// start code to replace line ending "\n" with "<p>" tag  05/04/2004
 
// end code to replace line ending "\n" with "<p>" tag  05/04/2004

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10756637
As i mentioned, according to my interpretation, the code does not go in between those comments. The code simpy replaces the existing characters method
0
 

Author Comment

by:stirnpanzer
ID: 10759029
here is the program now .... it now checks for "newline" and "return" .

but it keeps blowing up "</AssetID> expected"  ????
the xml hasn't changed ?? any ideas ... this is the first tag in the xml after the <webcontent>

   

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.util.*;
import lotus.domino.*;
import java.io.*;

public class SimpleHandler extends HandlerBase {

  private Session session = null;
  private Database database = null;
  private Document document = null;
  private StringBuffer buffer = null;

  public void startElement(String name, AttributeList attr)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
//      if ("Home".equals(name)) {
        NotesThread.sinitThread();

        this.session = NotesFactory.createSession();
        if (session == null) throw new SAXException("Cannot open Notes session");

        this.database = session.getDatabase(null, "dir\\kit.nsf");
        if (database == null) throw new SAXException("Cannot open Notes database");

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
        this.document = database.createDocument();

      if (document == null) throw new SAXException("Cannot create Notes document");


      } else {
        this.buffer = new StringBuffer();
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

  public void endElement(String name)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
//      if ("Home".equals(name)) {
        // Do nothing (notes doesn't like close)

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
      //added by mh to add formname to doc
      String Form = "Form";
      this.document.replaceItemValue("Form", "DMS");

      String ModificationStatus = "ModificationStatus";
      this.document.replaceItemValue("ModificationStatus", "New");

      this.document.save(true, false);
        this.document = null;

      } else {
        document.replaceItemValue(name, this.buffer.toString());
        this.buffer = null;
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

//  public void characters(char[] ch, int start, int length)
//  throws SAXException {
//    this.buffer.append(ch, start, length);
//  }
//}

     public void characters(char[] ch, int start, int length) throws SAXException {
     for(int i = start;i <= ch.length;i++) {
          if ((ch[i] == '\n') || (ch[i] == '\r')) {
               buffer.append("<br />");
          }
          else {
               buffer.append(ch[i]);
          }
     }
}
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10760377
Ah - of course if there's a DTD, it's going to have to specify that the element <br /> can appear, otherwise you're not going to be able to insert it arbitrarily, or you'll have to use namespaces. btw that newline code you've cooked is not quite right but we can discuss that later as it's relatively trivial
0
 

Author Comment

by:stirnpanzer
ID: 10766308
Did'nt have a DTD (have just read up on them) and come up with the following and included
<!DOCTYPE Editkit SYSTEM "c:\dir\my.dtd">  in the xml.
But now the parser is saying the xml structure is invalid, works fine locally....

Have i specified the BR correctly ???

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT AssetID (#PCDATA)>
<!ELEMENT Author (#PCDATA)>
<!ELEMENT BR (#PCDATA)>
<!ELEMENT Content (#PCDATA)>
<!ELEMENT Country (#PCDATA)>
<!ELEMENT Editkit (Home+)>
<!ELEMENT Home (WebContent)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT WebContent (AssetID, Author, Title, Country, Content)>
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10766591
You don't *have* to have a DTD. If you don't need one, i wouldn't include one at this stage
0
 

Author Comment

by:stirnpanzer
ID: 10766642
I'd rather not have one  (as xml is also a new area to me )..... how do you suggest i progress it,
ie:-  insert the "<br />" without the parser going wrong ??

Options were dtd or namespaces.  ???
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10766769
No the dtd thing was brought up as a possible reason for it going wrong actually. Let's try something else - based on your original <p> thing:




public void characters(char[] ch, int start, int length) throws SAXException {
      int paragraphCount = 0;
      for(int i = start;i <= ch.length;i++) {
            if (ch[i] == '\n')) {
                  ++paragraphCount;
                  if (paragraphCount % 2 == 0)
                        buffer.append("</p>");      // close para if previous open para exists
                  }
                  buffer.append("<p>");
            }
            else if(ch[i] == '\r') {
                  /* ignore it */
            }
            else {
                  buffer.append(ch[i]);
            }
      }
}      


0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10766947
Sorry stirnpanzer - that tag closing code's not quite right - i'm not quite concentrating enough (meant to be doing other things ;-)) - but you get the idea - the <p> tags should always be closed with </p>
0
 

Author Comment

by:stirnpanzer
ID: 10767378
made a couple of minor amends (see code now below), but it parser still blows up with
"</AssetID> expected"

Have put a few "system.outs" to see whats going on. And it parsing all the data, hitting the final tag  </Editkit>  , then looping through some blank data before the "</AssetID> expected" message ???

any ideas


public void characters(char[] ch, int start, int length) throws SAXException {
     int paragraphCount = 0;
     for(int i = start;i <= ch.length;i++) {
        System.out.println("loop 1 : "+ch[i]);
          if (ch[i] == '\r') {
               ++paragraphCount;
               if (paragraphCount % 2 == 0) {
 System.out.println("loop 2 : "+paragraphCount);
                    buffer.append("</p>");     // close para if previous open para exists
               }
               buffer.append("</p>");
          }
 
          else if(ch[i] == '\n') {
  System.out.println("loop 3: "+ch[i]);
               /* ignore it */
          }
          else {
   System.out.println("loop 4: "+ch[i]);
               buffer.append(ch[i]);
          }
     }
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10767456
You keep altering my linefeed code. Are you reading a Macintosh file?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10767509
...but you're not pushing the startElement onto the buffer by the looks of things
0
 

Author Comment

by:stirnpanzer
ID: 10767547
windows ???
I have changed the "\n" to "\r" as i have  found in the xml the hidden linefeed is a "return", not a "new line".
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10767594
OK - it's probably come from a Mac then IF there's no \n. If there is, it's just a standard windows linefeed \r\n pair.

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10767607
Perhaps you'd better say precisely what xml is meant to be going over to Lotus. Everything - or just bits?
0
 

Author Comment

by:stirnpanzer
ID: 10767624
yes... your right.
 ran again whith the "system.out" on , and the first tag being read in is <AssetID>.
Where's me root element going ???
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10767940
Looking at your code again, it doesn't actually make a lot of sense to me. Can you tell me briefly what you're intending to do?
0
 

Author Comment

by:stirnpanzer
ID: 10770746
This program is triggered by a servlet. Listening on a domino web server.
The servlet is triggered by posting an xml file to the servlet.

This program parses the posted xml file

If root attribute = "<Editkit>" opens an existing notes database

When attribute "<WebContent>" is reached it creates a new document in the notes database.
All attribute values after "<WebContent>" are written to the notes document

It saves the notes docuement and calls it "XMLDoc"

keeps creating documents until it finds "</Editkit>

Program the terminates.

Working fine until started looking at adding the <p> ... to replace the \n or \r

0
 

Author Comment

by:stirnpanzer
ID: 10773214
here is the servlet ....

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class SimpleServlet extends HttpServlet {

  public void service(HttpServletRequest req, HttpServletResponse res)
  throws ServletException, IOException {
    res.setContentType("text/plain");

    PrintWriter out=new PrintWriter(res.getOutputStream());
    InputStream in=req.getInputStream();

    Parser parser = new com.ibm.xml.parsers.SAXParser();
    InputSource source = new InputSource(in);

    try {
      SimpleHandler handler = new SimpleHandler();
      parser.setDocumentHandler(handler);
      parser.setDTDHandler(handler);
      parser.setEntityResolver(handler);
      parser.setErrorHandler(handler);
      parser.parse(source);
      res.setStatus(200);
      out.println("OK");
    } catch (SAXException e) {
      res.setStatus(500);
      out.println("Not OK");
      e.printStackTrace();
      e.printStackTrace(out);
      if (e instanceof SAXParseException) {
        SAXParseException spe = (SAXParseException) e;
        out.println("Line number: " + spe.getLineNumber());
        out.println("Column number: " + spe.getColumnNumber());
      }
      Exception x = e.getException();
      if (x != null) {
        out.print("Nested Exception: ");
        x.printStackTrace(out);
        System.err.print("Nested Exception: ");
        x.printStackTrace();
        if (x instanceof SAXParseException) {
          SAXParseException spe = (SAXParseException) x;
          out.println("Line number: " + spe.getLineNumber());
          out.println("Column number: " + spe.getColumnNumber());
        }
      }
    }
    out.flush();
  }
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10773527
OK, the first thing i think you should do is to make a test, using a hard-coded approach to inserting <p>xxx</p>

Please test this and report how you're doing this and the results - you need to determine this works. Once that's done, getting it working dynamically should be reasonably simple
0
 

Author Comment

by:stirnpanzer
ID: 10775352
am having all sorts of probs doing this. as the stingbuffer wants a start interger ....am i doing it wrong ?
 
whats your suggested method / code ??
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10775407
>>as the stingbuffer wants a start interger

How do you mean? If you mean the function (characters) uses a start integer, just ignore it. Hard code something in there e.g

sb.append("<p>test</p>");
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10775491
Oh and make sure you do enough debug statements so that you're certain of everything that's going to be output
0
 

Author Comment

by:stirnpanzer
ID: 10775968
ok , have used the  stringbuffer.append("<p>test</p>");

using the code below it sucessfully added <p>test</p> to the end of each parsed field.   Progress......

but if it use the code with the "if ...else ..."  blows up with "</AssetID> expected" ???



Below works kind of ....

 public void characters(char[] ch, int start, int length) throws SAXException {
 this.buffer.append(ch, start, length);
   buffer.append("<p>test</p>");
 }



Below doesnt work   ....blows up with "</AssetID> expected" ???

public void characters(char[] ch, int start, int length) throws SAXException {

     int paragraphCount = 0;
     for(int i = start;i <= ch.length;i++) {

         if (ch[i] == '\r') {
                    System.out.println("Found \r : "+ch[i]);
             //       buffer.append("<br>");
               buffer.append("<p>test</p>");
          }
          else if(ch[i] == '\n') {
                    System.out.println("Found \n : "+ch[i]);
              // buffer.append("<br>");
         buffer.append("<p>test</p>");
          }
        else {
          System.out.println("charachter : "+ch[i]);
              buffer.append(ch[i]);
}

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10776021
>>Below works kind of ....

Meaning?
0
 

Author Comment

by:stirnpanzer
ID: 10776155
using the following code :-

 public void characters(char[] ch, int start, int length) throws SAXException {
 this.buffer.append(ch, start, length);
   buffer.append("<p>test</p>");
 }

a   <p>test</p>  is appended to each field written to the dotes document....

but it should only be looking in the xml for Tag <content> and looking for \p or \r until <\content> is found.    to do what i really want it to do...

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10776199
>>a   <p>test</p>  is appended to each field written to the dotes document....

OK - but the point is - is the Notes doc working as a result? I'm assuming yes, but please say if that's not the case.

A further test - and then onto the real thing:

>>but it should only be looking in the xml for Tag <content> and looking for \p ...

In that case, just do the append when the current element (save the current one to an instance variable) is <content>. So

public void characters(char[] ch, int start, int length) throws SAXException {
 if ("content".equals(currentElement.getName()...

etc

0
 

Author Comment

by:stirnpanzer
ID: 10776682
yes notes document is ok.

is the format below correct as i cant compile .... (must learn java ..... need more time... :-(  

public void characters(char[] ch, int start, int length) throws SAXException {
 if ("Content".equals(currentElement.getName()))   {

     for(int i = start;i <= ch.length;i++) {
         System.out.println("loop : "+i);


          if ((ch[i] == '\n') || (ch[i] == '\r')) {
            System.out.println("loop 1: "+i);

               buffer.append("<br />");
          } else {

              System.out.println("loop 2: "+i);
               buffer.append(ch[i]);
          }
     }
}
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777466
Easiest is to cache the current node as a String in startElement

// in startElement
currentNodeName = name; // 'currentNode' is an instance variable

then you can do

if ("Content".equals(currentNodeName())   {

don't forget that names are case-sensitive

Why have you gone back to <br /> btw?

0
 

Author Comment

by:stirnpanzer
ID: 10782351
re:- <br/> typo....

What i'm confused with is i already hold "name" in a function earlier in the program.
how can i then pass it to the following function ??? as its been discarded ??
if i could would the function support it ???

public void characters(char[] ch, int start, int length) throws SAXException {
 if ("Content".equals(name))   {

     for(int i = start;i <= ch.length;i++) {
         System.out.println("loop : "+i);


          if ((ch[i] == '\n') || (ch[i] == '\r')) {
            System.out.println("loop 1: "+i);

               buffer.append("<p></p>");
          } else {

              System.out.println("loop 2: "+i);
               buffer.append(ch[i]);
          }
     }
}
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10782389
>>how can i then pass it to the following function ??? as its been discarded ??

It isn't discarded - it's an instance variable (or should be) and it holds the name of the current element being parsed. It's just overwritten each time startElement is called and so it's valid during characters. Incidentally 'name' isn't a very good thing to call that variable as it's too general and therefore likely to confuse you now or later. I would choose 'currentElementName'
0
 

Author Comment

by:stirnpanzer
ID: 10783416
ok 'name' is perhaps a bad one . But here its declared in an earlier method.

public void startElement(String name, AttributeList attr)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
        NotesThread.sinitThread();

        this.session = NotesFactory.createSession();
        if (session == null) throw new SAXException("Cannot open Notes session");

        this.database = session.getDatabase(null, "dir\\kit.nsf");
        if (database == null) throw new SAXException("Cannot open Notes database");

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
        this.document = database.createDocument();

     if (document == null) throw new SAXException("Cannot create Notes document")
      } else {
        this.buffer = new StringBuffer();
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

in the


in the method :-

public void characters(char[] ch, int start, int length) throws SAXException {
 if ("content".equals(currentElement.getName()...

how can it identify "Content" ??
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 2000 total points
ID: 10793950
>>But here its declared in an earlier method.

That's a different thing - 'name' is a parameter name in a function. It is in there that you should assign it to an instance variable:

currentElementName = name;

*then* you can get it in characters:

if ("content".equals(currentElementName))
0
 
LVL 9

Expert Comment

by:mmuruganandam
ID: 10958039
no objections..

Regards,
Muruga
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Since I wrote the original article about Handling Date and Time in PHP and MySQL several years ago, it seemed like now was a good time to update it for object-oriented PHP.  This article does that, replacing as much as possible the pr…
Originally, this post was published on Monitis Blog, you can check it here . In business circles, we sometimes hear that today is the “age of the customer.” And so it is. Thanks to the enormous advances over the past few years in consumer techno…
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
Suggested Courses

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question