Need to convert data in HTML table into a tab separated file Using ColdFusion

I have several blocks of text in an HTML table.  Below is a short snippet of one from row of table data.
<TR BGCOLOR=DCD0CF>
<TD ALIGN=LEFT VALIGN=TOP><B>
<A HREF="/cgi-bin/fg.cgi?page=cr&CRid=66527&CScnty=2037&CSsr=201&">Union Building</A>
</b><a href="http://www.findagrave.com/cgi-bin/fg.cgi?page=cr&CRid=66527#beginMap"><img src='/icons2/icons20/map.gif' border=0></a> <a href="http://www.mysite.com/cgi-bin/fg.cgi?page=pif&CRid=66527&&PIcrid=66527&PIMode=cemetery&ShowCemPhotos=Y&"><img src='/icons2/icons20/camera.gif' border=0></a><BR><FONT SIZE=-1>Bedford<BR>Westchester County<BR>New York<BR>USA</FONT></TD><TD ALIGN=CENTER VALIGN=TOP>- </TD>  <TD ALIGN=CENTER VALIGN=TOP>
<A HREF="/cgi-bin/fg.cgi?page=gsr&GScid=66527&CScnty=2037&CSsr=201&">172</A>
  </TD></TR><TR>  <TD>  </TD></TR>

Open in new window



Out of this, I want to return a tab separated list of data that includes the CRid out of the first link, the name contained in the first <a> tag, the town name which appears after the first <br>, the County which appears after the second <br> and the Country that appears after the third <br>

I know I have to use regular expressions here, but I have no idea where to start.  Regular expressions is surely something I should learn, but I need to try to get this parsed ASAP.

Thanks.

Phil

SiriusPhilAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

käµfm³d 👽Commented:
I'm not sure whether you are wanting to do this in code or if you are talking about finding this in a find/replace dialog, but I believe the patterns you need are:

Find:
    <[tT][rR][^>]*>\n?<]tT][dD][^>]*><[bB]>\n?<[aA]\s*[hH][rR][eE][fF]="?[^">]*?CRid=(\d+)[^>]*>([^<]*)</[aA]>\n?</[bB]>.+?<[bB][rR]><[^>]>([^<]*)<[bB][rR]>([^<]*)<[bB][rR]>([^<]*)

Replace:

   $1\t$2\t$3\t$4\t$5

-- OR (depending on your regex engine) --

    \1\t\2\t\3\t\4\t\5
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
ZxesesCommented:
I did this in Perl, simply because it was easy to create your regex there, however special note:

I had to use the /s switch to treat the input data as a single line for this regex to work.  PHP will handle this fine, other engines, your progress may vary.

Here is my regex and the code I used to verify it.

CRid=(\d+).+?>(.+?)<.+?<BR><?.+?>(.+?)<.*?BR>(.+?)<.*?BR>(.+?)<.*?BR>(.+?)<

I didnt bother with case because most engines allow a case-insensitive switch such as /i
#!/usr/bin/perl

$string = qq[

<TR BGCOLOR=DCD0CF>
<TD ALIGN=LEFT VALIGN=TOP><B>
<A HREF="/cgi-bin/fg.cgi?page=cr&CRid=66527&CScnty=2037&CSsr=201&">Union Building</A>
</b><a href="http://www.findagrave.com/cgi-bin/fg.cgi?page=cr&CRid=66527#beginMap">
<img src='/icons2/icons20/map.gif' border=0></a> 
<a href="http://www.mysite.com/cgi-bin/fg.cgi?page=pif&CRid=66527&&PIcrid=66527&PIMode=cemetery&ShowCemPhotos=Y&">
<img src='/icons2/icons20/camera.gif' border=0></a><BR><FONT SIZE=-1>Bedford<BR>Westchester County<BR>New York<BR>USA</FONT></TD>
<TD ALIGN=CENTER VALIGN=TOP>- </TD>  <TD ALIGN=CENTER VALIGN=TOP>
<A HREF="/cgi-bin/fg.cgi?page=gsr&GScid=66527&CScnty=2037&CSsr=201&">172</A>
  </TD></TR><TR>  <TD>  </TD></TR>

];

$regex = qr/CRid=(\d+).+?>(.+?)<.+?<BR><?.+?>(.+?)<.*?BR>(.+?)<.*?BR>(.+?)<.*?BR>(.+?)</s;

if ($string =~ $regex)
   {
   print "$1\t$2\t$3\t$4\t$5\t$6\t$7\t$8\t$9\n";
   }
else
   {
   print "Not found\n";
   }

Open in new window

0
ZxesesCommented:
Since you choose not to accept my answer as helpful, may I obtain some feedback as to why my answer was not better then the previous answer?  I did some extensive testing and alteration with the previous answer and didnt feel it helped in all cases which is why my answer followed.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
ColdFusion Language

From novice to tech pro — start learning today.