Solved

awk - xml parsing

Posted on 2010-11-13
7
935 Views
Last Modified: 2012-05-10
I wanted to print the tag name after <Transaction> in the attached file. That is I wanted to print "AirAvailability_12". I did the following

awk '
BEGIN {
FS="[<>]";
n=0;
}
{
   if ( $2 == "Transaction") n=1;
   if ( n == 2 ) {
   print $2;
   exit;
   }
   if ( n == 1) n++;
}'

This works in s.txt  but not s1.txt (both files attached).

Any thoughts? Appreciate if someone points something I am missing.
s.txt
s1.txt
0
Comment
Question by:vignesh_prabhu
7 Comments
 

Author Comment

by:vignesh_prabhu
ID: 34127943
s.txt and s1.txt have the same contents. Only formatting differs.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34127989
That's because in s1.txt there is no linefeed between >Transaction> and <AirAvailability_12>.

You'll have to do something like

awk '{FS="[<>]"; if($6 == "Transaction") print $8}' s1.txt
0
 

Author Comment

by:vignesh_prabhu
ID: 34128034
I am sorry but I do not see any output when I execute

awk '{FS="[<>]"; if($6 == "Transaction") print $8}' s1.txt
0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 34128338
I have to be sorry - I overlooked that there are no linefeeds at all!

So please try

awk 'BEGIN {FS="[<>]"} {for (i=1;i<=NF;i++) if($i=="Transaction") print $(i+2)}'

Open in new window


wmp
0
 
LVL 77

Expert Comment

by:arnold
ID: 34128727
To make the awk script uniform, you should convert the single line XML file s1.txt into one matching s.txt.
I.e. create a perl script that will go and reformat the xml file that is being fed into a common format.  I.e. you define which types of entries must be on a line by them selves, which entries have the open value close. etc.

Are there multiple processes that generate these XML files.

IMHO it is easier to make the input file have uniform layout versus trying to come up with a script that will match any variation.
0
 
LVL 7

Expert Comment

by:Hatrix76
ID: 34129152
May a propose a different approach than awk?

xpath will deliver the first element after Transaction (with all it's childs), so cut out the first shown element and you have the name of the element directly following Transaction, and it does not matter how the XML is formatted, or how often Transaction is in the HTML:

xpath file.xml "//Transaction/*[1]" 2>/dev/null | sed -e 's/^<\([^>]*\)>.*/\1/g'

Open in new window

explanation:
"//Transaction/*[1]"  <- Xpath query to select the first element after all Transaction elements

2>/dev/null <- Xpath has some additional output on stderr, eliminate it

sed -e  's/^<\([^>]*\)>.*/\1/g' <- cut aut the first element to display it


xpath should be available or easily installable on any unix
best
Ray
0
 

Author Comment

by:vignesh_prabhu
ID: 34130842
arnold - Yes, the XML needs to be formatted so the script works. Unfortunately the XML are always in the s1.txt format. To be on the safer side, I have slightly modified woolmilkporc code as below. This works for both s.txt and s1.txt.

awk '
BEGIN {
FS="[<>]"
}
{
for (i=1;i<=NF;i++) {
 if ($i != "") {
  if ($i == "Transaction") {
  j=1
  continue
  }
  if (j==1) {
  print $i
  exit
  }
 }
}
}'

Thanks woolmilkporc.

hatrix76 - Thanks for the suggestion. Unfortunately I do not have xpath installed in our servers. Requested the admin to do so. Until then I have to go with the awk variant.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction We as admins face situation where we need to redirect websites to another. This may be required as a part of an upgrade keeping the old URL but website should be served from new URL. This document would brief you on different ways ca…
How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

943 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now