Solved

Parse html tags using awk sed

Posted on 2009-04-13
5
2,618 Views
Last Modified: 2013-12-26
Please can someone help with ideas on how to parse html tags using sed and awk in linux

for example,

test.html:
<TS>SOMETHING</TS><TD>EXAMPLE</TD>

My script should be able to just output what's inside a tag

myscript TD
EXAMPLE  
myscript TS
SOMETHING

Any help is very much appreciated.

Many thanks,
Krish
0
Comment
Question by:Jkrish
  • 2
5 Comments
 
LVL 48

Accepted Solution

by:
Tintin earned 250 total points
ID: 24132305
sed and awk aren't suitable tools for parsing HTML.  

*if* you HTML is consistently formatted as per above, then you can do
#!/bin/sh
sed "s/.*<$1>\(.*\)<\/$1>.*/\1/g" test.html

Open in new window

0
 
LVL 7

Assisted Solution

by:Murugesan Nagarajan
Murugesan Nagarajan earned 250 total points
ID: 24792939

Sample shell scripting for awk, sed commands.

Open in new window

test.txt
0
 
LVL 7

Expert Comment

by:Murugesan Nagarajan
ID: 24986388
#!/bin/sh
#Same from the attached file previously (test.txt)
echo "OUTPUT FROM awk:"
export PARAM=$1
awk -F"$PARAM" 'BEGIN{a=ENVIRON["PARAM"]}
{
{
if(substr($2, 1,4)=="<ENVIRON["PARAM"]>")
{
printf "%s      %s\n", substr($1, 6),substr($1, 1,1);
}
printf substr($0, index($0,"<"ENVIRON["PARAM"]">")+4, -4+index($0,"</"ENVIRON["PARAM"]">")-index($0,"<"ENVIRON["PARAM"]">"))"\n";
}
}' test.html
# In awk set the environment variable $PARAM
# Take PARAM as delimiter
# Dispaly the string that appears between <$PARAM>...</PARAM>
echo "


"

echo "OUTPUT FROM sed:"
sed "s/.*<$1>\(.*\)<\/$1>.*/\1/g" test.html
#      In sed replace
#            .*<$1>\(.*\)<\/$1>.*
#            any set of characters followed by <$PARAM>any set of characters excluding backslash.
#      With
#            \1
#            Display the string that appears between any set of characters followed AND backslash.
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Utilizing an array to gracefully append to a list of EmailAddresses
This article describes how to create custom column layout styles for Bootstrap. The article uses 5 columns to illustrate the concept, but the principle can be extended to any number of columns.
In this tutorial viewers will learn how to position items using CSS's three positioning types Create a new HTML document with an internal stylesheet.: Create another div in CSS and name it Absolute : Type "position:absolute;" and "top:10px; left:50p…
In this tutorial viewers will learn how to embed videos in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: "<!DOCTYPE html>": Use the <video> tag to insert a video. Define the src as the URL of your video; this is similar to …

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question