Java XML Parser question - How do i ????

I'm not a java programmer but i've inherited a system i now have to maintain.

I have a servlet that riggers a program that parses an xml file, upon finding tag "<WebContent>"
it creates a document in a Lotus Notes database ... works no problem.

I now have to check in the "<Content"> of the xml for a "new line" which i believe is "\n"  and replace with text "<p>" so the Lotus notes document will see a paragraph tag.  

sounds simple but got me stumped at the moment ...

see program & sample xml.



package com.vnunet.editkit;

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.util.*;
import lotus.domino.*;
import java.io.*;

public class SimpleHandler extends HandlerBase {

  private Session session = null;
  private Database database = null;
  private Document document = null;
  private StringBuffer buffer = null;

  public void startElement(String name, AttributeList attr)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
        NotesThread.sinitThread();

        this.session = NotesFactory.createSession();
        if (session == null) throw new SAXException("Cannot open Notes session");

        this.database = session.getDatabase(null, "dir\\kit.nsf");
        if (database == null) throw new SAXException("Cannot open Notes database");

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
        this.document = database.createDocument();

      if (document == null) throw new SAXException("Cannot create Notes document");



        // start code to replace line ending "\n" with "<p>" tag  05/04/2004




        // end code to replace line ending "\n" with "<p>" tag  05/04/2004



      } else {
        this.buffer = new StringBuffer();
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

  public void endElement(String name)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
        // Do nothing (notes doesn't like close)

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
      //added by mh to add formname to doc
      String Form = "Form";
      this.document.replaceItemValue("Form", "XMLDOc");

      String ModificationStatus = "ModificationStatus";
      this.document.replaceItemValue("ModificationStatus", "New");

      this.document.save(true, false);
        this.document = null;

      } else {
        document.replaceItemValue(name, this.buffer.toString());
        this.buffer = null;
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

  public void characters(char[] ch, int start, int length)
  throws SAXException {
    this.buffer.append(ch, start, length);
  }
}





<?xml version="1.0" encoding="UTF-8"?>
<Editkit>
<Home>
<WebContent>
<AssetID>xyz123</AssetID>
<Author>Fred Fish</Author>
<Title>This is a test</Title>
<Country>UK</Country>
<Content>This is some test text for testing to send through to the kit</Content>
</WebContent>
</Home>
<Home>
<WebContent>
<AssetID>abc123</AssetID>
<Author>Mr fish</Author>
<Title>This is a another test</Title>
<Country>USA</Country>
<Content>This is some more test text for testing to send through to the kit
This is some more test text for testing to send through to the kit
This is some more test text for testing to send through to the kit
</Content>
</WebContent>
</Home>
</Editkit>
stirnpanzerAsked:
Who is Participating?
 
CEHJCommented:
>>But here its declared in an earlier method.

That's a different thing - 'name' is a parameter name in a function. It is in there that you should assign it to an instance variable:

currentElementName = name;

*then* you can get it in characters:

if ("content".equals(currentElementName))
0
 
CEHJCommented:
First thing that should be said is that parsers have no obligation to preserve certain types of whitespace (such as newline) anyway - are you sure it would appear in the output anyway?
0
 
CEHJCommented:
And it wouldn't occur in startElement. The place to be looking is in characters. I'd recommend replacing with <br /> as it's going to be a lot easier
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
CEHJCommented:
public void characters(char[] ch, int start, int length) throws SAXException {
      for(int i = start;i <= ch.length;i++) {
            if (ch[i] == '\n') {
                  buffer.append("<br />");
            }
            else {
                  buffer.append(ch[i]);
            }
      }
   
}

probably would do it. Test that loop counter, as i'm not sure it shouldn't actually be

      for(int i = start;i < ch.length;i++) {
0
 
mmuruganandamCommented:
Element element = // current element;

String str = element.getNodeValue();

StringTokenizer tokenizer = new StringTokenizer(str, "\r\n");

while (tokenizer.hasMoreTokens())
{
     buffer = buffer.append("<P>").append(tokenzier.nextToken()).append("</P>");
}

0
 
stirnpanzerAuthor Commented:
no problem using the "<br/>"

i need to know the code needs to go between the comments in the program :-  

// start code to replace line ending "\n" with "<p>" tag  05/04/2004
 
// end code to replace line ending "\n" with "<p>" tag  05/04/2004

0
 
CEHJCommented:
As i mentioned, according to my interpretation, the code does not go in between those comments. The code simpy replaces the existing characters method
0
 
stirnpanzerAuthor Commented:
here is the program now .... it now checks for "newline" and "return" .

but it keeps blowing up "</AssetID> expected"  ????
the xml hasn't changed ?? any ideas ... this is the first tag in the xml after the <webcontent>

   

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.util.*;
import lotus.domino.*;
import java.io.*;

public class SimpleHandler extends HandlerBase {

  private Session session = null;
  private Database database = null;
  private Document document = null;
  private StringBuffer buffer = null;

  public void startElement(String name, AttributeList attr)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
//      if ("Home".equals(name)) {
        NotesThread.sinitThread();

        this.session = NotesFactory.createSession();
        if (session == null) throw new SAXException("Cannot open Notes session");

        this.database = session.getDatabase(null, "dir\\kit.nsf");
        if (database == null) throw new SAXException("Cannot open Notes database");

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
        this.document = database.createDocument();

      if (document == null) throw new SAXException("Cannot create Notes document");


      } else {
        this.buffer = new StringBuffer();
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

  public void endElement(String name)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
//      if ("Home".equals(name)) {
        // Do nothing (notes doesn't like close)

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
      //added by mh to add formname to doc
      String Form = "Form";
      this.document.replaceItemValue("Form", "DMS");

      String ModificationStatus = "ModificationStatus";
      this.document.replaceItemValue("ModificationStatus", "New");

      this.document.save(true, false);
        this.document = null;

      } else {
        document.replaceItemValue(name, this.buffer.toString());
        this.buffer = null;
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

//  public void characters(char[] ch, int start, int length)
//  throws SAXException {
//    this.buffer.append(ch, start, length);
//  }
//}

     public void characters(char[] ch, int start, int length) throws SAXException {
     for(int i = start;i <= ch.length;i++) {
          if ((ch[i] == '\n') || (ch[i] == '\r')) {
               buffer.append("<br />");
          }
          else {
               buffer.append(ch[i]);
          }
     }
}
}
0
 
CEHJCommented:
Ah - of course if there's a DTD, it's going to have to specify that the element <br /> can appear, otherwise you're not going to be able to insert it arbitrarily, or you'll have to use namespaces. btw that newline code you've cooked is not quite right but we can discuss that later as it's relatively trivial
0
 
stirnpanzerAuthor Commented:
Did'nt have a DTD (have just read up on them) and come up with the following and included
<!DOCTYPE Editkit SYSTEM "c:\dir\my.dtd">  in the xml.
But now the parser is saying the xml structure is invalid, works fine locally....

Have i specified the BR correctly ???

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT AssetID (#PCDATA)>
<!ELEMENT Author (#PCDATA)>
<!ELEMENT BR (#PCDATA)>
<!ELEMENT Content (#PCDATA)>
<!ELEMENT Country (#PCDATA)>
<!ELEMENT Editkit (Home+)>
<!ELEMENT Home (WebContent)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT WebContent (AssetID, Author, Title, Country, Content)>
0
 
CEHJCommented:
You don't *have* to have a DTD. If you don't need one, i wouldn't include one at this stage
0
 
stirnpanzerAuthor Commented:
I'd rather not have one  (as xml is also a new area to me )..... how do you suggest i progress it,
ie:-  insert the "<br />" without the parser going wrong ??

Options were dtd or namespaces.  ???
0
 
CEHJCommented:
No the dtd thing was brought up as a possible reason for it going wrong actually. Let's try something else - based on your original <p> thing:




public void characters(char[] ch, int start, int length) throws SAXException {
      int paragraphCount = 0;
      for(int i = start;i <= ch.length;i++) {
            if (ch[i] == '\n')) {
                  ++paragraphCount;
                  if (paragraphCount % 2 == 0)
                        buffer.append("</p>");      // close para if previous open para exists
                  }
                  buffer.append("<p>");
            }
            else if(ch[i] == '\r') {
                  /* ignore it */
            }
            else {
                  buffer.append(ch[i]);
            }
      }
}      


0
 
CEHJCommented:
Sorry stirnpanzer - that tag closing code's not quite right - i'm not quite concentrating enough (meant to be doing other things ;-)) - but you get the idea - the <p> tags should always be closed with </p>
0
 
stirnpanzerAuthor Commented:
made a couple of minor amends (see code now below), but it parser still blows up with
"</AssetID> expected"

Have put a few "system.outs" to see whats going on. And it parsing all the data, hitting the final tag  </Editkit>  , then looping through some blank data before the "</AssetID> expected" message ???

any ideas


public void characters(char[] ch, int start, int length) throws SAXException {
     int paragraphCount = 0;
     for(int i = start;i <= ch.length;i++) {
        System.out.println("loop 1 : "+ch[i]);
          if (ch[i] == '\r') {
               ++paragraphCount;
               if (paragraphCount % 2 == 0) {
 System.out.println("loop 2 : "+paragraphCount);
                    buffer.append("</p>");     // close para if previous open para exists
               }
               buffer.append("</p>");
          }
 
          else if(ch[i] == '\n') {
  System.out.println("loop 3: "+ch[i]);
               /* ignore it */
          }
          else {
   System.out.println("loop 4: "+ch[i]);
               buffer.append(ch[i]);
          }
     }
}
0
 
CEHJCommented:
You keep altering my linefeed code. Are you reading a Macintosh file?
0
 
CEHJCommented:
...but you're not pushing the startElement onto the buffer by the looks of things
0
 
stirnpanzerAuthor Commented:
windows ???
I have changed the "\n" to "\r" as i have  found in the xml the hidden linefeed is a "return", not a "new line".
0
 
CEHJCommented:
OK - it's probably come from a Mac then IF there's no \n. If there is, it's just a standard windows linefeed \r\n pair.

0
 
CEHJCommented:
Perhaps you'd better say precisely what xml is meant to be going over to Lotus. Everything - or just bits?
0
 
stirnpanzerAuthor Commented:
yes... your right.
 ran again whith the "system.out" on , and the first tag being read in is <AssetID>.
Where's me root element going ???
0
 
CEHJCommented:
Looking at your code again, it doesn't actually make a lot of sense to me. Can you tell me briefly what you're intending to do?
0
 
stirnpanzerAuthor Commented:
This program is triggered by a servlet. Listening on a domino web server.
The servlet is triggered by posting an xml file to the servlet.

This program parses the posted xml file

If root attribute = "<Editkit>" opens an existing notes database

When attribute "<WebContent>" is reached it creates a new document in the notes database.
All attribute values after "<WebContent>" are written to the notes document

It saves the notes docuement and calls it "XMLDoc"

keeps creating documents until it finds "</Editkit>

Program the terminates.

Working fine until started looking at adding the <p> ... to replace the \n or \r

0
 
stirnpanzerAuthor Commented:
here is the servlet ....

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class SimpleServlet extends HttpServlet {

  public void service(HttpServletRequest req, HttpServletResponse res)
  throws ServletException, IOException {
    res.setContentType("text/plain");

    PrintWriter out=new PrintWriter(res.getOutputStream());
    InputStream in=req.getInputStream();

    Parser parser = new com.ibm.xml.parsers.SAXParser();
    InputSource source = new InputSource(in);

    try {
      SimpleHandler handler = new SimpleHandler();
      parser.setDocumentHandler(handler);
      parser.setDTDHandler(handler);
      parser.setEntityResolver(handler);
      parser.setErrorHandler(handler);
      parser.parse(source);
      res.setStatus(200);
      out.println("OK");
    } catch (SAXException e) {
      res.setStatus(500);
      out.println("Not OK");
      e.printStackTrace();
      e.printStackTrace(out);
      if (e instanceof SAXParseException) {
        SAXParseException spe = (SAXParseException) e;
        out.println("Line number: " + spe.getLineNumber());
        out.println("Column number: " + spe.getColumnNumber());
      }
      Exception x = e.getException();
      if (x != null) {
        out.print("Nested Exception: ");
        x.printStackTrace(out);
        System.err.print("Nested Exception: ");
        x.printStackTrace();
        if (x instanceof SAXParseException) {
          SAXParseException spe = (SAXParseException) x;
          out.println("Line number: " + spe.getLineNumber());
          out.println("Column number: " + spe.getColumnNumber());
        }
      }
    }
    out.flush();
  }
}
0
 
CEHJCommented:
OK, the first thing i think you should do is to make a test, using a hard-coded approach to inserting <p>xxx</p>

Please test this and report how you're doing this and the results - you need to determine this works. Once that's done, getting it working dynamically should be reasonably simple
0
 
stirnpanzerAuthor Commented:
am having all sorts of probs doing this. as the stingbuffer wants a start interger ....am i doing it wrong ?
 
whats your suggested method / code ??
0
 
CEHJCommented:
>>as the stingbuffer wants a start interger

How do you mean? If you mean the function (characters) uses a start integer, just ignore it. Hard code something in there e.g

sb.append("<p>test</p>");
0
 
CEHJCommented:
Oh and make sure you do enough debug statements so that you're certain of everything that's going to be output
0
 
stirnpanzerAuthor Commented:
ok , have used the  stringbuffer.append("<p>test</p>");

using the code below it sucessfully added <p>test</p> to the end of each parsed field.   Progress......

but if it use the code with the "if ...else ..."  blows up with "</AssetID> expected" ???



Below works kind of ....

 public void characters(char[] ch, int start, int length) throws SAXException {
 this.buffer.append(ch, start, length);
   buffer.append("<p>test</p>");
 }



Below doesnt work   ....blows up with "</AssetID> expected" ???

public void characters(char[] ch, int start, int length) throws SAXException {

     int paragraphCount = 0;
     for(int i = start;i <= ch.length;i++) {

         if (ch[i] == '\r') {
                    System.out.println("Found \r : "+ch[i]);
             //       buffer.append("<br>");
               buffer.append("<p>test</p>");
          }
          else if(ch[i] == '\n') {
                    System.out.println("Found \n : "+ch[i]);
              // buffer.append("<br>");
         buffer.append("<p>test</p>");
          }
        else {
          System.out.println("charachter : "+ch[i]);
              buffer.append(ch[i]);
}

0
 
CEHJCommented:
>>Below works kind of ....

Meaning?
0
 
stirnpanzerAuthor Commented:
using the following code :-

 public void characters(char[] ch, int start, int length) throws SAXException {
 this.buffer.append(ch, start, length);
   buffer.append("<p>test</p>");
 }

a   <p>test</p>  is appended to each field written to the dotes document....

but it should only be looking in the xml for Tag <content> and looking for \p or \r until <\content> is found.    to do what i really want it to do...

0
 
CEHJCommented:
>>a   <p>test</p>  is appended to each field written to the dotes document....

OK - but the point is - is the Notes doc working as a result? I'm assuming yes, but please say if that's not the case.

A further test - and then onto the real thing:

>>but it should only be looking in the xml for Tag <content> and looking for \p ...

In that case, just do the append when the current element (save the current one to an instance variable) is <content>. So

public void characters(char[] ch, int start, int length) throws SAXException {
 if ("content".equals(currentElement.getName()...

etc

0
 
stirnpanzerAuthor Commented:
yes notes document is ok.

is the format below correct as i cant compile .... (must learn java ..... need more time... :-(  

public void characters(char[] ch, int start, int length) throws SAXException {
 if ("Content".equals(currentElement.getName()))   {

     for(int i = start;i <= ch.length;i++) {
         System.out.println("loop : "+i);


          if ((ch[i] == '\n') || (ch[i] == '\r')) {
            System.out.println("loop 1: "+i);

               buffer.append("<br />");
          } else {

              System.out.println("loop 2: "+i);
               buffer.append(ch[i]);
          }
     }
}
}
0
 
CEHJCommented:
Easiest is to cache the current node as a String in startElement

// in startElement
currentNodeName = name; // 'currentNode' is an instance variable

then you can do

if ("Content".equals(currentNodeName())   {

don't forget that names are case-sensitive

Why have you gone back to <br /> btw?

0
 
stirnpanzerAuthor Commented:
re:- <br/> typo....

What i'm confused with is i already hold "name" in a function earlier in the program.
how can i then pass it to the following function ??? as its been discarded ??
if i could would the function support it ???

public void characters(char[] ch, int start, int length) throws SAXException {
 if ("Content".equals(name))   {

     for(int i = start;i <= ch.length;i++) {
         System.out.println("loop : "+i);


          if ((ch[i] == '\n') || (ch[i] == '\r')) {
            System.out.println("loop 1: "+i);

               buffer.append("<p></p>");
          } else {

              System.out.println("loop 2: "+i);
               buffer.append(ch[i]);
          }
     }
}
}
0
 
CEHJCommented:
>>how can i then pass it to the following function ??? as its been discarded ??

It isn't discarded - it's an instance variable (or should be) and it holds the name of the current element being parsed. It's just overwritten each time startElement is called and so it's valid during characters. Incidentally 'name' isn't a very good thing to call that variable as it's too general and therefore likely to confuse you now or later. I would choose 'currentElementName'
0
 
stirnpanzerAuthor Commented:
ok 'name' is perhaps a bad one . But here its declared in an earlier method.

public void startElement(String name, AttributeList attr)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
        NotesThread.sinitThread();

        this.session = NotesFactory.createSession();
        if (session == null) throw new SAXException("Cannot open Notes session");

        this.database = session.getDatabase(null, "dir\\kit.nsf");
        if (database == null) throw new SAXException("Cannot open Notes database");

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
        this.document = database.createDocument();

     if (document == null) throw new SAXException("Cannot create Notes document")
      } else {
        this.buffer = new StringBuffer();
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

in the


in the method :-

public void characters(char[] ch, int start, int length) throws SAXException {
 if ("content".equals(currentElement.getName()...

how can it identify "Content" ??
0
 
mmuruganandamCommented:
no objections..

Regards,
Muruga
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.