Solved

Java XML Parser question - How do i ????

Posted on 2004-04-05
40
408 Views
Last Modified: 2013-11-23
I'm not a java programmer but i've inherited a system i now have to maintain.

I have a servlet that riggers a program that parses an xml file, upon finding tag "<WebContent>"
it creates a document in a Lotus Notes database ... works no problem.

I now have to check in the "<Content"> of the xml for a "new line" which i believe is "\n"  and replace with text "<p>" so the Lotus notes document will see a paragraph tag.  

sounds simple but got me stumped at the moment ...

see program & sample xml.



package com.vnunet.editkit;

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.util.*;
import lotus.domino.*;
import java.io.*;

public class SimpleHandler extends HandlerBase {

  private Session session = null;
  private Database database = null;
  private Document document = null;
  private StringBuffer buffer = null;

  public void startElement(String name, AttributeList attr)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
        NotesThread.sinitThread();

        this.session = NotesFactory.createSession();
        if (session == null) throw new SAXException("Cannot open Notes session");

        this.database = session.getDatabase(null, "dir\\kit.nsf");
        if (database == null) throw new SAXException("Cannot open Notes database");

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
        this.document = database.createDocument();

      if (document == null) throw new SAXException("Cannot create Notes document");



        // start code to replace line ending "\n" with "<p>" tag  05/04/2004




        // end code to replace line ending "\n" with "<p>" tag  05/04/2004



      } else {
        this.buffer = new StringBuffer();
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

  public void endElement(String name)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
        // Do nothing (notes doesn't like close)

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
      //added by mh to add formname to doc
      String Form = "Form";
      this.document.replaceItemValue("Form", "XMLDOc");

      String ModificationStatus = "ModificationStatus";
      this.document.replaceItemValue("ModificationStatus", "New");

      this.document.save(true, false);
        this.document = null;

      } else {
        document.replaceItemValue(name, this.buffer.toString());
        this.buffer = null;
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

  public void characters(char[] ch, int start, int length)
  throws SAXException {
    this.buffer.append(ch, start, length);
  }
}





<?xml version="1.0" encoding="UTF-8"?>
<Editkit>
<Home>
<WebContent>
<AssetID>xyz123</AssetID>
<Author>Fred Fish</Author>
<Title>This is a test</Title>
<Country>UK</Country>
<Content>This is some test text for testing to send through to the kit</Content>
</WebContent>
</Home>
<Home>
<WebContent>
<AssetID>abc123</AssetID>
<Author>Mr fish</Author>
<Title>This is a another test</Title>
<Country>USA</Country>
<Content>This is some more test text for testing to send through to the kit
This is some more test text for testing to send through to the kit
This is some more test text for testing to send through to the kit
</Content>
</WebContent>
</Home>
</Editkit>
0
Comment
Question by:stirnpanzer
  • 21
  • 15
  • 2
40 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 10756467
First thing that should be said is that parsers have no obligation to preserve certain types of whitespace (such as newline) anyway - are you sure it would appear in the output anyway?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10756482
And it wouldn't occur in startElement. The place to be looking is in characters. I'd recommend replacing with <br /> as it's going to be a lot easier
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10756511
public void characters(char[] ch, int start, int length) throws SAXException {
      for(int i = start;i <= ch.length;i++) {
            if (ch[i] == '\n') {
                  buffer.append("<br />");
            }
            else {
                  buffer.append(ch[i]);
            }
      }
   
}

probably would do it. Test that loop counter, as i'm not sure it shouldn't actually be

      for(int i = start;i < ch.length;i++) {
0
 
LVL 9

Expert Comment

by:mmuruganandam
ID: 10756523
Element element = // current element;

String str = element.getNodeValue();

StringTokenizer tokenizer = new StringTokenizer(str, "\r\n");

while (tokenizer.hasMoreTokens())
{
     buffer = buffer.append("<P>").append(tokenzier.nextToken()).append("</P>");
}

0
 

Author Comment

by:stirnpanzer
ID: 10756607
no problem using the "<br/>"

i need to know the code needs to go between the comments in the program :-  

// start code to replace line ending "\n" with "<p>" tag  05/04/2004
 
// end code to replace line ending "\n" with "<p>" tag  05/04/2004

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10756637
As i mentioned, according to my interpretation, the code does not go in between those comments. The code simpy replaces the existing characters method
0
 

Author Comment

by:stirnpanzer
ID: 10759029
here is the program now .... it now checks for "newline" and "return" .

but it keeps blowing up "</AssetID> expected"  ????
the xml hasn't changed ?? any ideas ... this is the first tag in the xml after the <webcontent>

   

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.util.*;
import lotus.domino.*;
import java.io.*;

public class SimpleHandler extends HandlerBase {

  private Session session = null;
  private Database database = null;
  private Document document = null;
  private StringBuffer buffer = null;

  public void startElement(String name, AttributeList attr)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
//      if ("Home".equals(name)) {
        NotesThread.sinitThread();

        this.session = NotesFactory.createSession();
        if (session == null) throw new SAXException("Cannot open Notes session");

        this.database = session.getDatabase(null, "dir\\kit.nsf");
        if (database == null) throw new SAXException("Cannot open Notes database");

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
        this.document = database.createDocument();

      if (document == null) throw new SAXException("Cannot create Notes document");


      } else {
        this.buffer = new StringBuffer();
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

  public void endElement(String name)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
//      if ("Home".equals(name)) {
        // Do nothing (notes doesn't like close)

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
      //added by mh to add formname to doc
      String Form = "Form";
      this.document.replaceItemValue("Form", "DMS");

      String ModificationStatus = "ModificationStatus";
      this.document.replaceItemValue("ModificationStatus", "New");

      this.document.save(true, false);
        this.document = null;

      } else {
        document.replaceItemValue(name, this.buffer.toString());
        this.buffer = null;
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

//  public void characters(char[] ch, int start, int length)
//  throws SAXException {
//    this.buffer.append(ch, start, length);
//  }
//}

     public void characters(char[] ch, int start, int length) throws SAXException {
     for(int i = start;i <= ch.length;i++) {
          if ((ch[i] == '\n') || (ch[i] == '\r')) {
               buffer.append("<br />");
          }
          else {
               buffer.append(ch[i]);
          }
     }
}
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10760377
Ah - of course if there's a DTD, it's going to have to specify that the element <br /> can appear, otherwise you're not going to be able to insert it arbitrarily, or you'll have to use namespaces. btw that newline code you've cooked is not quite right but we can discuss that later as it's relatively trivial
0
 

Author Comment

by:stirnpanzer
ID: 10766308
Did'nt have a DTD (have just read up on them) and come up with the following and included
<!DOCTYPE Editkit SYSTEM "c:\dir\my.dtd">  in the xml.
But now the parser is saying the xml structure is invalid, works fine locally....

Have i specified the BR correctly ???

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT AssetID (#PCDATA)>
<!ELEMENT Author (#PCDATA)>
<!ELEMENT BR (#PCDATA)>
<!ELEMENT Content (#PCDATA)>
<!ELEMENT Country (#PCDATA)>
<!ELEMENT Editkit (Home+)>
<!ELEMENT Home (WebContent)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT WebContent (AssetID, Author, Title, Country, Content)>
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10766591
You don't *have* to have a DTD. If you don't need one, i wouldn't include one at this stage
0
 

Author Comment

by:stirnpanzer
ID: 10766642
I'd rather not have one  (as xml is also a new area to me )..... how do you suggest i progress it,
ie:-  insert the "<br />" without the parser going wrong ??

Options were dtd or namespaces.  ???
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10766769
No the dtd thing was brought up as a possible reason for it going wrong actually. Let's try something else - based on your original <p> thing:




public void characters(char[] ch, int start, int length) throws SAXException {
      int paragraphCount = 0;
      for(int i = start;i <= ch.length;i++) {
            if (ch[i] == '\n')) {
                  ++paragraphCount;
                  if (paragraphCount % 2 == 0)
                        buffer.append("</p>");      // close para if previous open para exists
                  }
                  buffer.append("<p>");
            }
            else if(ch[i] == '\r') {
                  /* ignore it */
            }
            else {
                  buffer.append(ch[i]);
            }
      }
}      


0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10766947
Sorry stirnpanzer - that tag closing code's not quite right - i'm not quite concentrating enough (meant to be doing other things ;-)) - but you get the idea - the <p> tags should always be closed with </p>
0
 

Author Comment

by:stirnpanzer
ID: 10767378
made a couple of minor amends (see code now below), but it parser still blows up with
"</AssetID> expected"

Have put a few "system.outs" to see whats going on. And it parsing all the data, hitting the final tag  </Editkit>  , then looping through some blank data before the "</AssetID> expected" message ???

any ideas


public void characters(char[] ch, int start, int length) throws SAXException {
     int paragraphCount = 0;
     for(int i = start;i <= ch.length;i++) {
        System.out.println("loop 1 : "+ch[i]);
          if (ch[i] == '\r') {
               ++paragraphCount;
               if (paragraphCount % 2 == 0) {
 System.out.println("loop 2 : "+paragraphCount);
                    buffer.append("</p>");     // close para if previous open para exists
               }
               buffer.append("</p>");
          }
 
          else if(ch[i] == '\n') {
  System.out.println("loop 3: "+ch[i]);
               /* ignore it */
          }
          else {
   System.out.println("loop 4: "+ch[i]);
               buffer.append(ch[i]);
          }
     }
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10767456
You keep altering my linefeed code. Are you reading a Macintosh file?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10767509
...but you're not pushing the startElement onto the buffer by the looks of things
0
 

Author Comment

by:stirnpanzer
ID: 10767547
windows ???
I have changed the "\n" to "\r" as i have  found in the xml the hidden linefeed is a "return", not a "new line".
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10767594
OK - it's probably come from a Mac then IF there's no \n. If there is, it's just a standard windows linefeed \r\n pair.

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10767607
Perhaps you'd better say precisely what xml is meant to be going over to Lotus. Everything - or just bits?
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 

Author Comment

by:stirnpanzer
ID: 10767624
yes... your right.
 ran again whith the "system.out" on , and the first tag being read in is <AssetID>.
Where's me root element going ???
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10767940
Looking at your code again, it doesn't actually make a lot of sense to me. Can you tell me briefly what you're intending to do?
0
 

Author Comment

by:stirnpanzer
ID: 10770746
This program is triggered by a servlet. Listening on a domino web server.
The servlet is triggered by posting an xml file to the servlet.

This program parses the posted xml file

If root attribute = "<Editkit>" opens an existing notes database

When attribute "<WebContent>" is reached it creates a new document in the notes database.
All attribute values after "<WebContent>" are written to the notes document

It saves the notes docuement and calls it "XMLDoc"

keeps creating documents until it finds "</Editkit>

Program the terminates.

Working fine until started looking at adding the <p> ... to replace the \n or \r

0
 

Author Comment

by:stirnpanzer
ID: 10773214
here is the servlet ....

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class SimpleServlet extends HttpServlet {

  public void service(HttpServletRequest req, HttpServletResponse res)
  throws ServletException, IOException {
    res.setContentType("text/plain");

    PrintWriter out=new PrintWriter(res.getOutputStream());
    InputStream in=req.getInputStream();

    Parser parser = new com.ibm.xml.parsers.SAXParser();
    InputSource source = new InputSource(in);

    try {
      SimpleHandler handler = new SimpleHandler();
      parser.setDocumentHandler(handler);
      parser.setDTDHandler(handler);
      parser.setEntityResolver(handler);
      parser.setErrorHandler(handler);
      parser.parse(source);
      res.setStatus(200);
      out.println("OK");
    } catch (SAXException e) {
      res.setStatus(500);
      out.println("Not OK");
      e.printStackTrace();
      e.printStackTrace(out);
      if (e instanceof SAXParseException) {
        SAXParseException spe = (SAXParseException) e;
        out.println("Line number: " + spe.getLineNumber());
        out.println("Column number: " + spe.getColumnNumber());
      }
      Exception x = e.getException();
      if (x != null) {
        out.print("Nested Exception: ");
        x.printStackTrace(out);
        System.err.print("Nested Exception: ");
        x.printStackTrace();
        if (x instanceof SAXParseException) {
          SAXParseException spe = (SAXParseException) x;
          out.println("Line number: " + spe.getLineNumber());
          out.println("Column number: " + spe.getColumnNumber());
        }
      }
    }
    out.flush();
  }
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10773527
OK, the first thing i think you should do is to make a test, using a hard-coded approach to inserting <p>xxx</p>

Please test this and report how you're doing this and the results - you need to determine this works. Once that's done, getting it working dynamically should be reasonably simple
0
 

Author Comment

by:stirnpanzer
ID: 10775352
am having all sorts of probs doing this. as the stingbuffer wants a start interger ....am i doing it wrong ?
 
whats your suggested method / code ??
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10775407
>>as the stingbuffer wants a start interger

How do you mean? If you mean the function (characters) uses a start integer, just ignore it. Hard code something in there e.g

sb.append("<p>test</p>");
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10775491
Oh and make sure you do enough debug statements so that you're certain of everything that's going to be output
0
 

Author Comment

by:stirnpanzer
ID: 10775968
ok , have used the  stringbuffer.append("<p>test</p>");

using the code below it sucessfully added <p>test</p> to the end of each parsed field.   Progress......

but if it use the code with the "if ...else ..."  blows up with "</AssetID> expected" ???



Below works kind of ....

 public void characters(char[] ch, int start, int length) throws SAXException {
 this.buffer.append(ch, start, length);
   buffer.append("<p>test</p>");
 }



Below doesnt work   ....blows up with "</AssetID> expected" ???

public void characters(char[] ch, int start, int length) throws SAXException {

     int paragraphCount = 0;
     for(int i = start;i <= ch.length;i++) {

         if (ch[i] == '\r') {
                    System.out.println("Found \r : "+ch[i]);
             //       buffer.append("<br>");
               buffer.append("<p>test</p>");
          }
          else if(ch[i] == '\n') {
                    System.out.println("Found \n : "+ch[i]);
              // buffer.append("<br>");
         buffer.append("<p>test</p>");
          }
        else {
          System.out.println("charachter : "+ch[i]);
              buffer.append(ch[i]);
}

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10776021
>>Below works kind of ....

Meaning?
0
 

Author Comment

by:stirnpanzer
ID: 10776155
using the following code :-

 public void characters(char[] ch, int start, int length) throws SAXException {
 this.buffer.append(ch, start, length);
   buffer.append("<p>test</p>");
 }

a   <p>test</p>  is appended to each field written to the dotes document....

but it should only be looking in the xml for Tag <content> and looking for \p or \r until <\content> is found.    to do what i really want it to do...

0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10776199
>>a   <p>test</p>  is appended to each field written to the dotes document....

OK - but the point is - is the Notes doc working as a result? I'm assuming yes, but please say if that's not the case.

A further test - and then onto the real thing:

>>but it should only be looking in the xml for Tag <content> and looking for \p ...

In that case, just do the append when the current element (save the current one to an instance variable) is <content>. So

public void characters(char[] ch, int start, int length) throws SAXException {
 if ("content".equals(currentElement.getName()...

etc

0
 

Author Comment

by:stirnpanzer
ID: 10776682
yes notes document is ok.

is the format below correct as i cant compile .... (must learn java ..... need more time... :-(  

public void characters(char[] ch, int start, int length) throws SAXException {
 if ("Content".equals(currentElement.getName()))   {

     for(int i = start;i <= ch.length;i++) {
         System.out.println("loop : "+i);


          if ((ch[i] == '\n') || (ch[i] == '\r')) {
            System.out.println("loop 1: "+i);

               buffer.append("<br />");
          } else {

              System.out.println("loop 2: "+i);
               buffer.append(ch[i]);
          }
     }
}
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10777466
Easiest is to cache the current node as a String in startElement

// in startElement
currentNodeName = name; // 'currentNode' is an instance variable

then you can do

if ("Content".equals(currentNodeName())   {

don't forget that names are case-sensitive

Why have you gone back to <br /> btw?

0
 

Author Comment

by:stirnpanzer
ID: 10782351
re:- <br/> typo....

What i'm confused with is i already hold "name" in a function earlier in the program.
how can i then pass it to the following function ??? as its been discarded ??
if i could would the function support it ???

public void characters(char[] ch, int start, int length) throws SAXException {
 if ("Content".equals(name))   {

     for(int i = start;i <= ch.length;i++) {
         System.out.println("loop : "+i);


          if ((ch[i] == '\n') || (ch[i] == '\r')) {
            System.out.println("loop 1: "+i);

               buffer.append("<p></p>");
          } else {

              System.out.println("loop 2: "+i);
               buffer.append(ch[i]);
          }
     }
}
}
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10782389
>>how can i then pass it to the following function ??? as its been discarded ??

It isn't discarded - it's an instance variable (or should be) and it holds the name of the current element being parsed. It's just overwritten each time startElement is called and so it's valid during characters. Incidentally 'name' isn't a very good thing to call that variable as it's too general and therefore likely to confuse you now or later. I would choose 'currentElementName'
0
 

Author Comment

by:stirnpanzer
ID: 10783416
ok 'name' is perhaps a bad one . But here its declared in an earlier method.

public void startElement(String name, AttributeList attr)
  throws SAXException {
    try {
      if ("Editkit".equals(name)) {
        NotesThread.sinitThread();

        this.session = NotesFactory.createSession();
        if (session == null) throw new SAXException("Cannot open Notes session");

        this.database = session.getDatabase(null, "dir\\kit.nsf");
        if (database == null) throw new SAXException("Cannot open Notes database");

    } else if ("Home".equals(name)) {
    } else if ("WebContent".equals(name)) {
        this.document = database.createDocument();

     if (document == null) throw new SAXException("Cannot create Notes document")
      } else {
        this.buffer = new StringBuffer();
      }
    } catch (NotesException e) {
      throw new SAXException("Exception populating: id="
          + e.id + ": " + e.text, e);
    }
  }

in the


in the method :-

public void characters(char[] ch, int start, int length) throws SAXException {
 if ("content".equals(currentElement.getName()...

how can it identify "Content" ??
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 500 total points
ID: 10793950
>>But here its declared in an earlier method.

That's a different thing - 'name' is a parameter name in a function. It is in there that you should assign it to an instance variable:

currentElementName = name;

*then* you can get it in characters:

if ("content".equals(currentElementName))
0
 
LVL 9

Expert Comment

by:mmuruganandam
ID: 10958039
no objections..

Regards,
Muruga
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Shoutout to Emily Plummer (http://www.experts-exchange.com/members/eplummer26.html) for giving me this article! She did most of it, I just finished it up and posted it for her :)    Introduction In a previous article (http://www.experts-exchang…
JavaScript has plenty of pieces of code people often just copy/paste from somewhere but never quite fully understand. Self-Executing functions are just one good example that I'll try to demystify here.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will the learn the benefit of plain text editors and code an HTML5 based template for use in further tutorials.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now