how can i retrieve the data from this text file with out all the HTML tags?

Hi All

<td>CSG6206</td>
<td>151</td>
<td>Advanced Scripting Languages</td>
<td>Off Campus</td>
<td>OFF</td>
<td>14 of 999 </td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td><a href="http://www.ecu.edu.au/handbook/unit?id=CSG6206&year=2015" target="_blank">Handbook</a></td>
<td>CSG6206</td>
<td>151</td>
<td>Advanced Scripting Languages</td>
<td>Mount Lawley</td>
<td>ON</td>
<td>21 of 999 </td>
<td><a href="http://apps.wcms.ecu.edu.au/semester-timetable/lookup?sq_content_src=%2BdXJsPWh0dHAlM0ElMkYlMkYxMC42Ny4xMjQuMTMxJTNBNzc4MCUyRmFwcHM$
<td>&nbsp;</td>
<td><a href="http://www.ecu.edu.au/handbook/unit?id=CSG6206&year=2015" target="_blank">Handbook</a></td>


how can i retrieve the data from this text file with out all the HTML tags?
vishnu kalakotaAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
Which script language(s)?
0
vishnu kalakotaAuthor Commented:
Bash scripting
0
jmcgOwnerCommented:
For the general case, converting HTML to text is a task that should probably not be attempted with simple regular expression matching.

For your sample text, there are only two tag types: <td> and <a>

This means that you can use a simple sed command to strip away the HTML. Let me give you an example for the <td> tags, then you'll have to explain how you want the <a> tags handled.

 sed -e 's/<td>//' -e 's/<\/td>//' <sed-example-text.txt

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
tel2Commented:
If you have 'lynx' on your system, put your input in a file with a .htm or .html extension (e.g. csi3207.htm), then run this kind of thing:
    lynx -dump csi3207.htm >csi3207.txt

But the result may not be very useful for identifying field boundaries, because each table row of input data gets converted to a single line of output, like this:
    CSG6206 151 Advanced Scripting Languages Off Campus OFF 14 of 999
   [1]Handbook CSG6206 151 Advanced Scripting Languages Mount Lawley ON 21
   of 999 [2]Handbook

References

   1. http://www.ecu.edu.au/handbook/unit?id=CSG6206&year=2015
   2. http://apps.wcms.ecu.edu.au/semester-timetable/lookup?sq_content_src=%2BdXJsPWh0dHAlM0ElMkYlMkYxMC42Ny4xMjQuMTMxJTNBNzc4MCUyRmFwcHM$<td>%C2%A0</td><td><ahref=

Open in new window

0
vishnu kalakotaAuthor Commented:
good
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Shell Scripting

From novice to tech pro — start learning today.