Parse html tags using awk sed

Posted on 2009-04-13
Medium Priority
Last Modified: 2013-12-26
Please can someone help with ideas on how to parse html tags using sed and awk in linux

for example,


My script should be able to just output what's inside a tag

myscript TD
myscript TS

Any help is very much appreciated.

Many thanks,
Question by:Jkrish
  • 2
LVL 48

Accepted Solution

Tintin earned 1000 total points
ID: 24132305
sed and awk aren't suitable tools for parsing HTML.  

*if* you HTML is consistently formatted as per above, then you can do
sed "s/.*<$1>\(.*\)<\/$1>.*/\1/g" test.html

Open in new window

LVL 10

Assisted Solution

by:Murugesan Nagarajan
Murugesan Nagarajan earned 1000 total points
ID: 24792939

Sample shell scripting for awk, sed commands.

Open in new window

LVL 10

Expert Comment

by:Murugesan Nagarajan
ID: 24986388
#Same from the attached file previously (test.txt)
echo "OUTPUT FROM awk:"
export PARAM=$1
if(substr($2, 1,4)=="<ENVIRON["PARAM"]>")
printf "%s      %s\n", substr($1, 6),substr($1, 1,1);
printf substr($0, index($0,"<"ENVIRON["PARAM"]">")+4, -4+index($0,"</"ENVIRON["PARAM"]">")-index($0,"<"ENVIRON["PARAM"]">"))"\n";
}' test.html
# In awk set the environment variable $PARAM
# Take PARAM as delimiter
# Dispaly the string that appears between <$PARAM>...</PARAM>
echo "


echo "OUTPUT FROM sed:"
sed "s/.*<$1>\(.*\)<\/$1>.*/\1/g" test.html
#      In sed replace
#            .*<$1>\(.*\)<\/$1>.*
#            any set of characters followed by <$PARAM>any set of characters excluding backslash.
#      With
#            \1
#            Display the string that appears between any set of characters followed AND backslash.

Featured Post

Get 10% Off Your First Squarespace Website

Ready to showcase your work, publish content or promote your business online? With Squarespace’s award-winning templates and 24/7 customer service, getting started is simple. Head to Squarespace.com and use offer code ‘EXPERTS’ to get 10% off your first purchase.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Utilizing an array to gracefully append to a list of EmailAddresses
This article discusses how to create an extensible mechanism for linked drop downs.
In this tutorial viewers will learn how to style elements, such a divs, with a "drop shadow" effect using the CSS box-shadow property Start with a normal styled element, such as a div.: In the element's style, type the box shadow property: "box-shad…
The viewer will receive an overview of the basics of CSS showing inline styles. In the head tags set up your style tags: (CODE) Reference the nav tag and set your properties.: (CODE) Set the reference for the UL element and styles for it to ensu…

624 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question