?
Solved

Need to convert data in HTML table into a tab separated file Using ColdFusion

Posted on 2010-08-13
3
Medium Priority
?
316 Views
Last Modified: 2012-08-13
I have several blocks of text in an HTML table.  Below is a short snippet of one from row of table data.
<TR BGCOLOR=DCD0CF>
<TD ALIGN=LEFT VALIGN=TOP><B>
<A HREF="/cgi-bin/fg.cgi?page=cr&CRid=66527&CScnty=2037&CSsr=201&">Union Building</A>
</b><a href="http://www.findagrave.com/cgi-bin/fg.cgi?page=cr&CRid=66527#beginMap"><img src='/icons2/icons20/map.gif' border=0></a> <a href="http://www.mysite.com/cgi-bin/fg.cgi?page=pif&CRid=66527&&PIcrid=66527&PIMode=cemetery&ShowCemPhotos=Y&"><img src='/icons2/icons20/camera.gif' border=0></a><BR><FONT SIZE=-1>Bedford<BR>Westchester County<BR>New York<BR>USA</FONT></TD><TD ALIGN=CENTER VALIGN=TOP>- </TD>  <TD ALIGN=CENTER VALIGN=TOP>
<A HREF="/cgi-bin/fg.cgi?page=gsr&GScid=66527&CScnty=2037&CSsr=201&">172</A>
  </TD></TR><TR>  <TD>  </TD></TR>

Open in new window



Out of this, I want to return a tab separated list of data that includes the CRid out of the first link, the name contained in the first <a> tag, the town name which appears after the first <br>, the County which appears after the second <br> and the Country that appears after the third <br>

I know I have to use regular expressions here, but I have no idea where to start.  Regular expressions is surely something I should learn, but I need to try to get this parsed ASAP.

Thanks.

Phil

0
Comment
Question by:SiriusPhil
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 2000 total points
ID: 33431006
I'm not sure whether you are wanting to do this in code or if you are talking about finding this in a find/replace dialog, but I believe the patterns you need are:

Find:
    <[tT][rR][^>]*>\n?<]tT][dD][^>]*><[bB]>\n?<[aA]\s*[hH][rR][eE][fF]="?[^">]*?CRid=(\d+)[^>]*>([^<]*)</[aA]>\n?</[bB]>.+?<[bB][rR]><[^>]>([^<]*)<[bB][rR]>([^<]*)<[bB][rR]>([^<]*)

Replace:

   $1\t$2\t$3\t$4\t$5

-- OR (depending on your regex engine) --

    \1\t\2\t\3\t\4\t\5
0
 
LVL 4

Expert Comment

by:Zxeses
ID: 33438827
I did this in Perl, simply because it was easy to create your regex there, however special note:

I had to use the /s switch to treat the input data as a single line for this regex to work.  PHP will handle this fine, other engines, your progress may vary.

Here is my regex and the code I used to verify it.

CRid=(\d+).+?>(.+?)<.+?<BR><?.+?>(.+?)<.*?BR>(.+?)<.*?BR>(.+?)<.*?BR>(.+?)<

I didnt bother with case because most engines allow a case-insensitive switch such as /i
#!/usr/bin/perl

$string = qq[

<TR BGCOLOR=DCD0CF>
<TD ALIGN=LEFT VALIGN=TOP><B>
<A HREF="/cgi-bin/fg.cgi?page=cr&CRid=66527&CScnty=2037&CSsr=201&">Union Building</A>
</b><a href="http://www.findagrave.com/cgi-bin/fg.cgi?page=cr&CRid=66527#beginMap">
<img src='/icons2/icons20/map.gif' border=0></a> 
<a href="http://www.mysite.com/cgi-bin/fg.cgi?page=pif&CRid=66527&&PIcrid=66527&PIMode=cemetery&ShowCemPhotos=Y&">
<img src='/icons2/icons20/camera.gif' border=0></a><BR><FONT SIZE=-1>Bedford<BR>Westchester County<BR>New York<BR>USA</FONT></TD>
<TD ALIGN=CENTER VALIGN=TOP>- </TD>  <TD ALIGN=CENTER VALIGN=TOP>
<A HREF="/cgi-bin/fg.cgi?page=gsr&GScid=66527&CScnty=2037&CSsr=201&">172</A>
  </TD></TR><TR>  <TD>  </TD></TR>

];

$regex = qr/CRid=(\d+).+?>(.+?)<.+?<BR><?.+?>(.+?)<.*?BR>(.+?)<.*?BR>(.+?)<.*?BR>(.+?)</s;

if ($string =~ $regex)
   {
   print "$1\t$2\t$3\t$4\t$5\t$6\t$7\t$8\t$9\n";
   }
else
   {
   print "Not found\n";
   }

Open in new window

0
 
LVL 4

Expert Comment

by:Zxeses
ID: 33683802
Since you choose not to accept my answer as helpful, may I obtain some feedback as to why my answer was not better then the previous answer?  I did some extensive testing and alteration with the previous answer and didnt feel it helped in all cases which is why my answer followed.
0

Featured Post

Monthly Recap

May was a big month for new releases from Linux Academy! Take a look at what our team built recently in our blog. You can access the newest releases from our blog.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is an updated version of a post made on my blog over 3 years ago. It is unfortunately, still very relevant as we continue to see both SQLi (SQL injection) and XSS (cross site scripting) attacks hitting some of the most recognizable website and …
Hi. There are several upload tutorials using jquery and coldfusion. I found a very interesting one here Upload Your Files using Jquery & ColdFusion and Preview them (http://www.randhawaworld.com/) . I did keep the main js functions but made sever…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question