how can i retrieve the data from this text file with out all the HTML tags?

Hi All

<td>CSG6206</td>
<td>151</td>
<td>Advanced Scripting Languages</td>
<td>Off Campus</td>
<td>OFF</td>
<td>14 of 999 </td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td><a href="http://www.ecu.edu.au/handbook/unit?id=CSG6206&year=2015" target="_blank">Handbook</a></td>
<td>CSG6206</td>
<td>151</td>
<td>Advanced Scripting Languages</td>
<td>Mount Lawley</td>
<td>ON</td>
<td>21 of 999 </td>
<td><a href="http://apps.wcms.ecu.edu.au/semester-timetable/lookup?sq_content_src=%2BdXJsPWh0dHAlM0ElMkYlMkYxMC42Ny4xMjQuMTMxJTNBNzc4MCUyRmFwcHM$
<td>&nbsp;</td>
<td><a href="http://www.ecu.edu.au/handbook/unit?id=CSG6206&year=2015" target="_blank">Handbook</a></td>


how can i retrieve the data from this text file with out all the HTML tags?
vishnu kalakotaAsked:
Who is Participating?
 
jmcgOwnerCommented:
For the general case, converting HTML to text is a task that should probably not be attempted with simple regular expression matching.

For your sample text, there are only two tag types: <td> and <a>

This means that you can use a simple sed command to strip away the HTML. Let me give you an example for the <td> tags, then you'll have to explain how you want the <a> tags handled.

 sed -e 's/<td>//' -e 's/<\/td>//' <sed-example-text.txt

Open in new window

0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Which script language(s)?
0
 
vishnu kalakotaAuthor Commented:
Bash scripting
0
 
tel2Commented:
If you have 'lynx' on your system, put your input in a file with a .htm or .html extension (e.g. csi3207.htm), then run this kind of thing:
    lynx -dump csi3207.htm >csi3207.txt

But the result may not be very useful for identifying field boundaries, because each table row of input data gets converted to a single line of output, like this:
    CSG6206 151 Advanced Scripting Languages Off Campus OFF 14 of 999
   [1]Handbook CSG6206 151 Advanced Scripting Languages Mount Lawley ON 21
   of 999 [2]Handbook

References

   1. http://www.ecu.edu.au/handbook/unit?id=CSG6206&year=2015
   2. http://apps.wcms.ecu.edu.au/semester-timetable/lookup?sq_content_src=%2BdXJsPWh0dHAlM0ElMkYlMkYxMC42Ny4xMjQuMTMxJTNBNzc4MCUyRmFwcHM$<td>%C2%A0</td><td><ahref=

Open in new window

0
 
vishnu kalakotaAuthor Commented:
good
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.