Link to home
Start Free TrialLog in
Avatar of aman0711
aman0711

asked on

Commons Digester for XML parsing.

Hi Experts,
                   I am back again with the very much similar problem I posted earlier.
                   Earlier I used a sample XML file to parse with commons digester , with the help of CEHJ and Objects, I was able to store the XML data in a Vector.

                    Today when I started working with the actual XML file, I realized all the data is present as attributes. How do I extract the data out of attributes?
   
                    I have attached the sample XML file and the two java codes I am using. please put some light on this.


<TXN_DATA_FEED>

  <TXN_META_DATA agreement_id="11111">

     <AGENT_META_DATA agent_id="35162" instance_id="37203" weight="1" />
     <AGENT_META_DATA agent_id="35162" instance_id="37203" weight="1" />
     <SLOT_META_DATA slot_id="762" slot_alias="Dummy" pages="3">
      <PAGE_META_DATA page_alias="Dummy 1" page_seq="1" />
      <PAGE_META_DATA page_alias="Dummy 2" page_seq="2" />
      <PAGE_META_DATA page_alias="Dummy 3" page_seq="3" />
     </SLOT_META_DATA>
     <PROFILE_META_DATA access_sub_type="1kb" profile_name="khol" access_type="pul" />
     <PROFILE_META_DATA access_sub_type="lls" profile_name="lipo" access_type="pul" />
     
  </ TXN_META_DATA>

</TXN_DATA_FEED>

     


   
   


 
package com.KeyNote;
 
public class Txn_Data_Feed {
	
	     private String TXN_META_DATA;
	     private String AGENT_META_DATA;
	     private String SLOT_META_DATA;
	     private String PAGE_META_DATA;
	     private String PROFILE_META_DATA;
	     
	     public Txn_Data_Feed() {
	     }
	     
	     public String getTXN_META_DATA() {
	    	  return TXN_META_DATA;
	     }
	     
	     
	     public void setTXN_META_DATA(String newTXN_META_DATA) {
	    	 
	    	 TXN_META_DATA = newTXN_META_DATA;
	     }
	     
	     public String getAGENT_META_DATA() {
	    	  return AGENT_META_DATA;
	     }
	     
	     public void setAGENT_META_DATA(String newAGENT_META_DATA){
	    	 AGENT_META_DATA = newAGENT_META_DATA;
	     }
	     
	     public String getSLOT_META_DATA() {
	    	   return SLOT_META_DATA;
	     }
	     
         public void setSLOT_META_DATA(String newSLOT_META_DATA){
        	 SLOT_META_DATA = newSLOT_META_DATA;
        	 
         }
         
         public String getPAGE_META_DATA() {
        	 return   PAGE_META_DATA;
         }
         
         public void setPAGE_META_DATA(String newPAGE_META_DATA) {
        	 PAGE_META_DATA = newPAGE_META_DATA;
         }
         
         public String getPROFILE_META_DATA() {
        	  return PROFILE_META_DATA;
         }
         
         public void setPROFILE_META_DATA(String newPROFILE_META_DATA){
        	 
        	 PROFILE_META_DATA = newPROFILE_META_DATA;
         }
         public String toString() {
        	  
        	 return("First: " + this.TXN_META_DATA + "Second: " + this.AGENT_META_DATA );
         }
         
}
 
 
*******************************************************************************************************************
 
package com.KeyNote;
 
import java.io.IOException;
import java.util.Vector;
import org.apache.commons.digester.Digester;
import org.xml.sax.SAXException;
 
public class DigestKeyNote {
	   
	   Vector transactions;
	   
	     public DigestKeyNote() {
	    	 
	    	 transactions = new Vector();
	     }
 
	     public static void main(String  args[]){
	    	 
	    	 DigestKeyNote digestKeyNote = new DigestKeyNote();
	    	 digestKeyNote.digest();
	    	 
	     }
	    
	     private void digest() {
	    	 
	    	  try {
	    	   Digester digester = new Digester();
	    	   //Push the current object on to the Stack
	    	   
	    	   digester.push(this);
	    	   
	    	   //Create new instance of Txn_Data_Feed class
	    	   digester.addObjectCreate("TXN_DATA_FEED/TXN_META_DATA", Txn_Data_Feed.class);
	    	   
	    	   digester.addBeanPropertySetter("TXN_DATE_FEED/TXN_META_DATA/AGENT_META_DATA");
	    	   
	    	   digester.addBeanPropertySetter("TXN_DATE_FEED/TXN_META_DATA/SLOT_META_DATA");
	    	   
	    	   digester.addBeanPropertySetter("TXN_DATE_FEED/TXN_META_DATA/SLOT_META_DATA/PAGE_META_DATA");
	    	   
	    	   digester.addBeanPropertySetter("TXN_DATE_FEED/TXN_META_DATA/PROFILE_META_DATA");
	    	   
	    	   digester.addSetNext("TXN_DATE_FEED/TXN_META_DATA/AGENT_META_DATA", "addAGENT_META_DATA");
	    	   
	    	   digester.addSetNext("TXN_DATE_FEED/TXN_META_DATA/SLOT_META_DATA", "addSLOT_META_DATA");
	    	   
	    	   digester.addSetNext("TXN_DATE_FEED/TXN_META_DATA/SLOT_META_DATA/PAGE_META_DATA", "addPAGE_META_DATA");
	    	   
	    	   digester.addSetNext("TXN_DATE_FEED/TXN_META_DATA/PROFILE_META_DATA", "addPROFILE_META_DATA");
	    	   
	    	 
				DigestKeyNote dkn = (DigestKeyNote) digester.parse(this.getClass()
				           .getClassLoader()
				           .getResourceAsStream("20090312_0000.xml"));
				
				System.out.println("Students Vector "+ dkn.transactions );
				
			} catch (IOException e) {
				
				e.printStackTrace();
			} catch (SAXException e) {
				
				e.printStackTrace();
			}
	    	    	   
	     }
	     
	     public void addAGENT_META_DATA(Txn_Data_Feed txn ) {
	    	 //Add a new Student instance to the Vector
	         transactions.add( txn );
	     }
	     
	     public void addSLOT_META_DATA ( Txn_Data_Feed txn ) {
	    	 
	    	 transactions.add( txn );
	     }
	     
	     public void addPAGE_META_DATA ( Txn_Data_Feed txn ) {
	    	 
	    	 transactions.add( txn );
	     }
	     
	     public void addPROFILE_META_DATA ( Txn_Data_Feed txn ) {
	    	 
	    	 transactions.add( txn );
	     }
   
}

Open in new window

Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

It would help to see the DTD or schema
Avatar of aman0711
aman0711

ASKER

Hmmm.. ok I will post the schema.
But can we fetch the values of these attributes CEHJ?
Yes
Personally i would observe Java naming conventions in your classes but as they stand, use not

>>PAGE_META_DATA = newPAGE_META_DATA;

but

this.PAGE_META_DATA = PAGE_META_DATA;
ok will change this.. and show you the schema
Thanks CEHJ :-)
ASKER CERTIFIED SOLUTION
Avatar of Mick Barry
Mick Barry
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Nice example Objects :)
 
  Few questions:

   d.addCallMethod("address-book/person/email", "addEmail", 2);
>> I got it addEmail is a call to the function for Email tag. but what is 2 here?
   
        d.addCallParam("address-book/person/email", 0, "type");
>> The same thing here, first go down till the email tag level, then the attribute name is type... but what is 0 here?


        d.addCallParam("address-book/person/email", 1);
>> and 1 in this case?

           Oh by the way I started following you on Twitter :-)

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ohk got it :)
I will work on my code and post it here :-) though that will be night time for u
>>don't really need to see the schema

That's quite wrong. The schema fundamentally affects the way this is designed
Hiii CEHJ and Objects,
 
              I tried the link Objects gave me. That example is very informative but I am little confused and stuck.

              as asked by CEHJ, I have attached the XML file here.

               in objects example..  the email tag is like

               <email type="business">gonzo@muppets.com</email>

 and hence they are storing it in a HashMap (please correct me if I am wrong), where as my XML file doesnt have tags like this:

           my XML file is something like:
       <sample tag name="customer" type="business">
       
          and I guess I cant store this in hashMap... how do I go about this. I have attached the code I did till now.

         Similar to the example, I made seperate class files for seperate tags.
        Below is the code for first tag, i.e. <AGENT_META_DATA>

package com.KeyNote;
 
import java.util.HashMap;
import java.util.Iterator;
 
public class AgentMetaData {
	 
	private String agentId;
	private String instanceId;
	private String country;
	private String description;
	private String region;
	private String ip;
	private String backBone;
	private String weight;
	private String city;
	
	
	public String toString () {
		StringBuffer sb = new StringBuffer();
		sb.append( " AGENT_META_DATA (agentId "+ agentId + ")\n");
		sb.append( "        " + instanceId  + "\n");
		sb.append( "        " + country  + "\n");
		sb.append( "        " + description  + "\n");
		sb.append( "        " + region  + "\n");
		sb.append( "        " + ip  + "\n");
		sb.append( "        " + backBone  + "\n");
		sb.append( "        " + weight  + "\n");
		sb.append( "        " + city  + "\n");
    	return sb.toString();
	}
 
     public void print( java.io.PrintStream out, int indentAmount ) {
    	 StringBuffer indentStr = new StringBuffer(indentAmount);
    	 for (; indentAmount > 0; --indentAmount ) {
    		 indentStr.append(' ');
    	 	 }
    	 
    	  out.print(indentStr);
    	  out.print("agentId=");
    	  out.println(agentId);
    	  
    	  out.print(indentStr);
    	  out.println(" " + instanceId);
    	  
    	  out.print(indentStr);
    	  out.println(" " + country + " " + description + " " + region + " " + ip );
    	  
    	  out.print(indentStr);
    	  out.println(" " + backBone + " " + weight + " " + city);
    	  
    	}
 
     /*
      * Getter and Setter Methods for all the Attributes
      * */
          
     
	public String getAgentId() {
		return agentId;
	}
 
	public void setAgentId(String agentId) {
		this.agentId = agentId;
	}
 
	public String getBackBone() {
		return backBone;
	}
 
	public void setBackBone(String backBone) {
		this.backBone = backBone;
	}
 
	public String getCity() {
		return city;
	}
 
	public void setCity(String city) {
		this.city = city;
	}
 
	public String getCountry() {
		return country;
	}
 
	public void setCountry(String country) {
		this.country = country;
	}
 
	public String getDescription() {
		return description;
	}
 
	public void setDescription(String description) {
		this.description = description;
	}
 
	public String getInstanceId() {
		return instanceId;
	}
 
	public void setInstanceId(String instanceId) {
		this.instanceId = instanceId;
	}
 
	public String getIp() {
		return ip;
	}
 
	public void setIp(String ip) {
		this.ip = ip;
	}
 
	public String getRegion() {
		return region;
	}
 
	public void setRegion(String region) {
		this.region = region;
	}
 
	public String getWeight() {
		return weight;
	}
 
	public void setWeight(String weight) {
		this.weight = weight;
	}
    
     
}

Open in new window

XML-File.txt
Objects, CEHJ... anyone???????????? :(
You need to organise your parsing based on the DTD or schema of your content
Hi CEHJ,
                    Thanks for your response :)


                    Could you tell me, some kind of implementation to store the attributes?

                    Like with this tag,
                     <email type="business">gonzo@muppets.com</email>

                    The example was storing it as HasMap.. Key Value

                  where as my tags are like:

                    <sample tag name="customer" type="business">

                   
You need to organise the relationship between the root parent and all the nested childern first. The attributes are just attributes of those nested children
Hmmm.. ok
sorry, been flat out with student assignments. Where are you at with this?

oh thanks objects< i was waiting for your reply from last 3 days :)

    If you see my last post where I posted the code along with XML.
   Thats where I am
>          and I guess I cant store this in hashMap... how do I go about this. I have attached the code I did till now.

you can store it however you want. Once you create your bean(s) to store your data, then you just setup your mappings to call your bean methods as required to store the necessary data

Hmmm.. it actually went over my head :)

what I meant was
 
         with this type of email:
 
         <email type="business">gonzo@muppets.com</email>

        the example is storing it as key value,  business = gonzo@....

but my tags are like:

        <sample tag name="customer" type="business">


        How do I store this...?

       


 
thats up to you :) you create the java beans to store your data, once you know how the data needs to be stored you can map what calls needs to be made.

I am so sorry if I am acting real dumb (which I know , I am :-) )

this is how the example puts the value in HashMap

public void addEmail(String type, String address) {
      emails.put(type, address);
  }

In my case,
             
               in the code below.. all these strings are attributes of one tag.
               How do i put that in an Arraylist?

               Wow I am confused here.. Sorry :)



 
   

public class AgentMetaData {
	 
	private String agentId;
	private String instanceId;
	private String country;
	private String description;
	private String region;
	private String ip;
	private String backBone;
	private String weight;
	private String city;

Open in new window

they look like simple properties, in which case you would just call the propertys setter method

ok, I will work on it from home..

Please keep an eye on this question Objects :-)
will do, though I'm dealing with a lot of questions at the moment, ping me if I don't get back to you straight away :)

Thanks :)
but how do I ping you. Dont have your email :(

can we do that through twitter?
yes that'll work, or see my EE profile page

oki doki, Thanks a ton Mick :)
>>Below is the code for first tag, i.e. <AGENT_META_DATA>

It's not the first tag actually. It has a parent of TXN_META_DATA, which in turn has a parent.

As i mentioned above, before you start coding anything, you need to work out not only the classes but the containment hierarchy based on the dtd of your xml. If it doesn't have a dtd, you need to construct one, at least mentally. Once you've done that, you can start coding the entities and mechanisms for their containment. Only *then* do you need to do construct the digester rules
Your xml is reasonably complex and it might be better to use a different xml-bean mapping technology or you could be in for quite a lot of work. Whichever method you use will require you to start with a schema/dtd (at least notional) so that's your first focus of effort. Once you've done that, you could try XMLBeans, which should construct all your mappings for you, given a schema. See

http://www.theserverside.com/discussions/thread.tss?thread_id=44668
Thanks for the info CEHJ,

                     now I am really confused and stuck. Should I use XML beans, Commons digester or the other EE link you gave me :(

   
Probably XMLBeans but keep your options open for now. Start with constructing your schema for your xml
Hmm.. ok, let me try to come up with schema and post it here.. Let me know if I am heading the right way
CEHJ,
              I have the vendor document for that XML file.

              In the document, I have got :

             - Database Schema
         
             - Aggregation Logic

             - Data feed XML output - Annotated DTD.

               will any of these gonna be helpful

           
>>- Data feed XML output - Annotated DTD.

is the one
ohk... Let me type that in real quick and I will post it here, Please guide me how to go ahead after that
hi CEHJ,
                    This is the DTD for the first part of XML. How do I start from here?

< ! ELEMENT TXN_DATA_FEED   ( TXN_META_DATA?  ,  DP_TXN_MEASUREMENTS?  )  >
< ! ELEMENT_TXN_DATA  ( AGENT_META_DATA+  , SLOT_META_DATA+ ,  PROFILE_META_DATA*  )  >
< ! ATTLIST TXN_META_DATA  agreement _id   CDATA #REQUIRED  >  recommended measurement meta table
< ! ELEMENT AGENT_META_DATA  EMPTY >
< ! ATTLIST AGENT_META_DATA    agent_id        CDATA   #REQUIRED  >  required agent meta table
< ! ATTLIST AGENT_META_DATA    description     CDATA   #REQUIRED  >  recommended, agent meta table
< ! ATTLIST AGENT_META_DATA    weight    CDATA   #REQUIRED  >  note recommended  deprecated field
< ! ATTLIST AGENT_META_DATA   ip    CDATA   #REQUIRED  >  recommended, agent meta table
< ! ATTLIST AGENT_META_DATA   backbone     CDATA   #REQUIRED  >  recommended, agent meta table
< ! ATTLIST AGENT_META_DATA   instance_id    CDATA   #REQUIRED   >  required agent meta table
< ! ATTLIST AGENT_META_DATA   city    CDATA   #IMPLIED  >  recommended, agent meta table
 
 
< ! ELEMENT PROFILE_META_DATA  EMPTY  >
< ! ATTLIST PROFILE_META_DATA   profile_id   CDATA   #REQUIRED  >  recommended, profile meta data
< ! ATTLIST PROFILE_META_DATA   profile_name   CDATA   #REQUIRED  >  recommended, profile meta data
< ! ATTLIST PROFILE_META_DATA   access_type   CDATA   #REQUIRED  >  recommended, profile meta data
< ! ATTLIST PROFILE_META_DATA   access_sub_type   CDATA   #REQUIRED  >  recommended, profile meta data
 
 
< ! ATTLIST SLOT_META_DATA  ( PAGE_META_DATA+ )  >  recommended, profile meta data
< ! ATTLIST SLOT_META_DATA  slot_id   CDATA  #REQUIRED  > required measurement meta table
< ! ATTLIST SLOT_META_DATA  slot_alias   CDATA  #REQUIRED  > required measurement meta table
< ! ATTLIST SLOT_META_DATA  pages   CDATA  #REQUIRED  > recommended measurement meta table
< ! ATTLIST SLOT_META_DATA  subservices   CDATA  #REQUIRED  > recommended,  measurement meta table
 
< ! ELEMENT PAGE_META_DATA  EMPTY  >
< ! ATTLIST PAGE_META_DATA   page_seq          CDATA  #REQUIRED  > recommended,  measurement page meta table
< ! ATTLIST PAGE_META_DATA   page_alias        CDATA  #REQUIRED  > recommended,  measurement page meta table

Open in new window

> Should I use XML beans, Commons digester

digester, can't understand the reasoning for suggesting using xmlbeans. Its not really suited to what you are doing. Which probably explains why you are having such issues getting it to work.

hmmmm  :-(


have you defined the data structure that you want to copy the xml data into yet?

No, thats what I dont know.
Should I store everything in arraylist or vector?
depends on how it is going to be used, for example if you are going to need to look up something by key then use a Map

by the looks you will have a variety of different bean classes
eg. TxnMetaData, PageMetaData, AgentMetaData ....

Start by identifying the different types of data you will be receiving and the relation ships between them
eg. TxnMetaData contains AgentMetaData and SlotMetaData ...
SlotMetaData contains a collection of PageMetaData


Ohk.. thanks for guidance. something got cleared..
but Mick, lets just consider agent_meta_data for now... it has the following attributes:

 agentId;
  instanceId;
       country;
       description;
 region;
      ip;
       backBone;
       weight;
       city;

         and lets imagine, the agent_id attribute is one which is required by another tag say txn_meta_data.
           
             so in this case, how do I store it as a map? what would be the key here?
Looks like the Agent is a child of the Meta, so you when you extract each Meta instance it would get added to the Meta parent.

Big confusion :)
Sorry for acting so dumb.

I will start working on this as well.. and keep bugging u :)
You should really get the people who generate this xml to supply the schema/dtd rather than trying to reverse engineer it
no need
Hi folks,
                at last i trimmed down my XML to last minute level.
 
                I hope now the requirement will get very clear. One of the table that is desired out of this XML is:

 
SUMMARY data table: (field name, type, length, * denotes inclusion in primary key)  
*SLOT_ID, number, 15  
*AGENT_ID, number, 15  
*AGENT_INSTANCE_ID, number 15  
*DATETIME, data/time  
DELTA_MSEC, number, 7  
DELTA_USER_MSEC, number, 7  
ERROR_CODE, number, 7  
CONTENT_ERRS, number, 7  
PROFILE_ID, number, 9  
PHONE, number, 20  
SETUP_MSEC, number, 7  
NUM_CON_ATTEMPTS, number, 3  
CONNECTION_SPEED, number, 7  

 

               

<TXN_DATA_FEED>
	<TXN_META_DATA agreement_id="">
		<AGENT_META_DATA region="" ip="" backbone="" city="" weight="" agent_id="" description="" instance_id="" country=""/>
		<SLOT_META_DATA subservice="" slot_alias="" pages="" slot_id="">
			<PAGE_META_DATA page_alias="" page_seq=""/>
		</SLOT_META_DATA>
		<PROFILE_META_DATA access_sub_type="" profile_id="" access_type="" provider="" profile_name=""/>
	</TXN_META_DATA>
<DP_TXN_MEASUREMENTS>
	<TXN_MEASUREMENT datetime="" agent_inst="" agent="" profile="" slot="" target="">
		<TXN_SUMMARY element_count="" resp_bytes="" estimated_cache_delta_msec="" trans_level_comp_msec="" content_errors="" delta_msec="" delta_user_msec=""/>
		<TXN_PAGE page_seq="">
			<TXN_PAGE_PERFORMANCE first_byte_msec="" remain_packets_delta="" request_delta="" estimated_cache_delta_msec="" system_delta="" start_msec="" first_packet_delta="" delta_msec="" delta_user_msec=""/>
			<TXN_PAGE_OBJECT element_count="" page_bytes=""/>
			<TXN_PAGE_STATUS content_errors=""/>
			<TXN_PAGE_DETAILS page="">
				<TXN_BASE_PAGE record_seq="">
					<TXN_DETAIL_PERFORMANCE remain_packets_delta="" request_delta="" system_delta="" first_packet_delta="" element_delta=""/>
					<TXN_DETAIL_OBJECT request_bytes="" content_bytes="" content_type="" conn_string_text="" ip_address="" msmt_conn_id="" header_bytes="" element_cached="" object_text=""/>
					<TXN_DETAIL_STATUS status_code=""/>
				</TXN_BASE_PAGE>
			</TXN_PAGE_DETAILS>
		</TXN_PAGE>
	</TXN_MEASUREMENT>
</DP_TXN_MEASUREMENTS>
</TXN_DATA_FEED>

Open in new window

so no repeating elements?

Well There are multiple tags , but this is the most trimmed down version.
and that is the table that I want to extract out of it :-(
I meant multiple instances of the same tag. for example in the xml you originally posted some tags were repeated.

Yes... Tags are repeated :-)


have you made a start on the code?
I have a lot of demands on my time these days and only have the time to guide, especially at EE (sorry)
No I understand :-)
I have started the code... just guide me wherever I go wrong.. will post my version soon :-)

Thanks :)
just take it slow, start at the top level and gradually add new attributes one at a time, making sure its working before moving to the next
Cool... thanks Mick. Your help is always awesome :-)
Aman, regarding your comment at http:#24319192 : this throws a different light on things - are you expecting to write these details to a database?
Hi Charles,
                       Actually the requirement is to make three tables out of that XML file and one of the table is that :-)
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Oh really? if we use XMLspy to generate DB Schema, can we automate that process to run everyday?

Another thing, Can I specify the fields I want in my Table using XMLspy :)
I don't know the finer details unfortunately, but the XMLSpy docs should be able to tell you. You might also want to consider the issue of object-relational mapping too, e.g. Hibernate or some such
O ok... but it is do able right?
I am working with Commons digester and now after this schema, willl work with your suggested XML beans.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hmmm.. what exactly is object un marshalling Charles?

and very very thanks for that schema. you made it look so simple now :)
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
O ok.