Link to home
Start Free TrialLog in
Avatar of kingent85
kingent85Flag for United States of America

asked on

Parse Html File

I have an html file that I need to parse. The html file needs to be unchangeable. I simply need to pull information from tags. There are multiple of the same tags and multiple tables. I have found something that I think may work but I'm not sure if it will.

When I simply do just one table the info pulls just fine, however when I put another one in it crashes.

My goal is to import certain information from this html file into an array so that I can pull it into my database.

I'm attaching the main code and it references a test file that I created. This is just a simple file. Very small so I'll paste in here

----------------------code------------------------
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>

<body>
Hello World!

<p>Hello Dustin!</p>



<table id="CDFTradeDetailFull1" cellspacing="0" cellpadding="0" width="650" summary="" border="0">
  <tbody>
    <tr>
      <td class="arial11Black" valign="bottom"><br />
          <strong>ALLTEL
            COMMUNICATIONS </strong></td>
      <td></td>
      <td></td>
    </tr>
  </tbody>
</table>

</body>
</html>





-------------------------end code-----------------------

Please help
<?php
 
/**
 * HTML/XML Parser Class
 *
 * This is a helper class that is used to parse HTML and XML. A unique feature of this parsing class
 * is the fact that it includes support for innerHTML (which isn't easy to do).
 *
 * @author Dennis Pallett
 * @copyright Dennis Pallett 2006
 * @package HTML_Parser
 * @version 1.0
 */
 
// Helper Class
// To parse HTML/XML
Class HTML_Parser {
    // Private properties
    var $_parser;
    var $_tags = array();
    var $_html;
    var $output = array();
    var $strXmlData;
    var $_level = 0;
    var $_outline;
    var $_tagcount = array();
    var $xml_error = false;
    var $xml_error_code;
    var $xml_error_string;
    var $xml_error_line_number;
 
    function get_html () {
        return $this->_html;
    }
 
    function parse($strInputXML) {
        $this->output = array();
 
        // Translate entities
        $strInputXML = $this->translate_entities($strInputXML);
 
        $this->_parser = xml_parser_create ();
        xml_parser_set_option($this->_parser, XML_OPTION_CASE_FOLDING, true);
        xml_set_object($this->_parser,$this);
        xml_set_element_handler($this->_parser, "tagOpen", "tagClosed");
 
        xml_set_character_data_handler($this->_parser, "tagData");
 
        $this->strXmlData = xml_parse($this->_parser,$strInputXML );
 
        if (!$this->strXmlData) {
            $this->xml_error = true;
            $this->xml_error_code = xml_get_error_code($this->_parser);
            $this->xml_error_string = xml_error_string(xml_get_error_code($this->_parser));
            $this->xml_error_line_number =  xml_get_current_line_number($this->_parser);
            return false;
        }
 
        return $this->output;
    }
 
 
    function tagOpen($parser, $name, $attr) {
        // Increase level
        $this->_level++;
 
        // Create tag:
        $newtag = $this->create_tag($name, $attr);
 
        // Build tag
        $tag = array("name"=>$name,"attr"=>$attr, "level"=>$this->_level);
 
        // Add tag
        array_push ($this->output, $tag);
 
        // Add tag to this level
        $this->_tags[$this->_level] = $tag;
 
        // Add to HTML
        $this->_html .= $newtag;
 
        // Add to outline
        $this->_outline .= $this->_level . $newtag;
    }
 
    function create_tag ($name, $attr) {
        // Create tag:
        # Begin with name
        $tag = '<' . strtolower($name) . ' ';
 
        # Create attribute list
        foreach ($attr as $key=>$val) {
            $tag .= strtolower($key) . '="' . htmlentities($val) . '" ';
        }
 
        # Finish tag
        $tag = trim($tag);
 
        switch(strtolower($name)) {
            case 'br':
            case 'input':
                $tag .= ' /';
            break;
        }
 
        $tag .= '>';
 
        return $tag;
    }
 
    function tagData($parser, $tagData) {
        if(trim($tagData)) {
            if(isset($this->output[count($this->output)-1]['tagData'])) {
                $this->output[count($this->output)-1]['tagData'] .= $tagData;
            } else {
                $this->output[count($this->output)-1]['tagData'] = $tagData;
            }
        }
 
        $this->_html .= htmlentities($tagData);
        $this->_outline .= htmlentities($tagData);
    }
 
    function tagClosed($parser, $name) {
        // Add to HTML and outline
        switch (strtolower($name)) {
            case 'br':
            case 'input':
                break;
            default:
            $this->_outline .= $this->_level . '</' . strtolower($name) . '>';
            $this->_html .= '</' . strtolower($name) . '>';
        }
 
        // Get tag that belongs to this end
        $tag = $this->_tags[$this->_level];
        $tag = $this->create_tag($tag['name'], $tag['attr']);
 
        // Try to get innerHTML
        $regex = '%' . preg_quote($this->_level . $tag, '%') . '(.*?)' . preg_quote($this->_level . '</' . strtolower($name) . '>', '%') . '%is';
        preg_match ($regex, $this->_outline, $matches);
 
        // Get innerHTML
        if (isset($matches['1'])) {
            $innerhtml = $matches['1'];
        }
 
        // Remove level identifiers
        $this->_outline = str_replace($this->_level . $tag, $tag, $this->_outline);
        $this->_outline = str_replace($this->_level . '</' . strtolower($name) . '>', '</' . strtolower($name) . '>', $this->_outline);
 
        // Add innerHTML
        if (isset($innerhtml)) {
            $this->output[count($this->output)-1]['innerhtml'] = $innerhtml;
        }
 
        // Fix tree
        $this->output[count($this->output)-2]['children'][] = $this->output[count($this->output)-1];
        array_pop($this->output);
 
        // Decrease level
        $this->_level--;
    }
 
    function translate_entities($xmlSource, $reverse =FALSE) {
        static $literal2NumericEntity;
 
        if (empty($literal2NumericEntity)) {
            $transTbl = get_html_translation_table(HTML_ENTITIES);
 
            foreach ($transTbl as $char => $entity) {
                if (strpos('&#038;"<>', $char) !== FALSE) continue;
                    $literal2NumericEntity[$entity] = '&#'.ord($char).';';
                }
            }
 
            if ($reverse) {
                return strtr($xmlSource, array_flip($literal2NumericEntity));
            } else {
                return strtr($xmlSource, $literal2NumericEntity);
            }
      }
}
 
//#####################################
// get contents of a file into a string
//$filename = "testfile.html";
//$handle = fopen($filename, "r");
//$html = fread($handle, filesize($filename));
//fclose($handle);
//#####################################
 
 
//#####################################
// get contents of a file into a string
$filename = "testfile.html";
$handle = fopen($filename, "r");
$html = fread($handle, filesize($filename));
fclose($handle);
//#####################################
 
// To be used like this
$parser = new HTML_Parser;
$output = $parser->parse($html);
 
 
$tag = $output['0'];
$text = $tag['children']['1']['tagData'];
$text3 = $tag[innerhtml];
 
$text2 = $tag['children']['1']['children']['1']['children']['0']['children']['0']['children']['0']['children']['1']['tagData'];
 
echo "Text is $text<br>$text2<bR>";
//echo "$text3<br>";
 
 
echo "<pre>";
print_r ($output);
echo "</pre>";
?>

Open in new window

Avatar of kingent85
kingent85
Flag of United States of America image

ASKER

Here is the full file that crashes.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
 
<body>
<table id="CDFTradeDetailFull1" cellspacing="0" cellpadding="0" width="650"
      summary="" border="0">
  <tbody>
    <tr>
      <td class="arial11Black" valign="bottom"><br />
          <strong>ALLTEL
            COMMUNICATIONS </strong></td>
      <td></td>
      <td></td>
    </tr>
    <tr>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
      <td width="1.54%" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
    </tr>
    <tr>
      <td width="49.23%"><table id="CDFTradeDetailFull2" cellspacing="0" cellpadding="0"
            width="100%" summary="" border="0">
        <tbody>
          <tr>
            <td valign="top" align="left" width="50%"><span
                  class="arial12Rusty">Address:&nbsp;</span><br />
                  <span
                  class="arial11Black">1 ALLIED DR BLDG 5<br />
                    LITTLE ROCK,
                    AR&nbsp;72202<br />
                    (800) 255-8351 <br />
                  </span></td>
            <td valign="top" align="left" width="50%"><span
                  class="arial12Rusty">Account Number:</span><br />
                  <span
                  class="arial11Black">916024.... </span></td>
          </tr>
        </tbody>
      </table></td>
      <td width="1.54%"></td>
      <td valign="top" align="left" width="49.23%"></SPAN></td>
    </tr>
    <tr>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/clear.gif"
          width="100%" /></td>
      <td width="1.54%" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/clear.gif"
          width="100%" /></td>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/clear.gif"
          width="100%" /></td>
    </tr>
    <tr>
      <td valign="top" align="left" width="49.23%"><span
            class="arial12Rusty">Address Identification Number:</span><br />
          <span
            class="arial11Black">0054389474 </span><br /></td>
      <td width="1.54%"></td>
      <td width="49.23%"></td>
    </tr>
    <tr>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
      <td width="1.54%" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
    </tr>
    <tr>
      <td valign="top" align="left" width="49.23%"><span
            class="arial12Rusty">Status:&nbsp;</span> <span
            class="arial11Black">Account charged off. $129 written off. </span><br /></td>
      <td width="1.54%"></td>
      <td valign="top" align="left" width="49.23%"><span
            class="arial12Rusty">Status Details:&nbsp;</span> <span
            class="arial11Black">This account is scheduled to continue on record
        until Feb 2014. <br />
        This item was verified on May 2008 and remained
        unchanged. <br />
        <br />
      </span></td>
    </tr>
    <tr>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
      <td width="1.54%" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
    </tr>
    <tr>
      <td valign="top" align="left" width="49.23%"><table id="CDFTradeDetailFull3" cellspacing="0" cellpadding="0"
            width="100%" summary="" border="0">
        <tbody>
          <tr>
            <td valign="top" align="left" width="50%"><span
                  class="arial12Rusty">Date Opened:</span><br />
                  <span
                  class="arial11Black">04/2007 </span></td>
            <td valign="top" align="left" width="50%"><span
                  class="arial12Rusty">Type:</span><br />
                  <span
                  class="arial11Black">Revolving </span></td>
          </tr>
          <tr>
            <td valign="top" align="left" width="50%"><span
                  class="arial12Rusty">Reported Since:</span><br />
                  <span
                  class="arial11Black">07/2008 </span></td>
            <td valign="top" align="left" width="50%"><span
                  class="arial12Rusty">Terms:</span> <br />
                  <span
                  class="arial11Black">NA </span></td>
          </tr>
          <tr>
            <td valign="top" align="left" width="50%"><span
                  class="arial12Rusty">Date of Status:</span><br />
                  <span
                  class="arial11Black">07/2008 </span></td>
            <td valign="top" align="left" width="50%"><span
                  class="arial12Rusty">Monthly Payment:</span><br />
                  <span
                  class="arial11Black">$0 </span></td>
          </tr>
          <tr>
            <td valign="top" align="left" width="50%"><span
                  class="arial12Rusty">Last Reported:</span><br />
                  <span
                  class="arial11Black">07/2008 </span></td>
            <td valign="top" align="left" width="50%"><span
                  class="arial12Rusty">Responsibility:</span><br />
                  <span
                  class="arial11Black">Individual </span><br /></td>
          </tr>
        </tbody>
      </table></td>
      <td width="1.54%"></td>
      <td valign="top" align="left" width="49.23%"><table id="CDFTradeDetailFull4" cellspacing="0" cellpadding="0"
            width="100%" summary="" border="0">
        <tbody>
          <tr>
            <td valign="top" align="left" width="100%"><span
                  class="arial12Rusty">Credit Limit/Original
              Amount:</span><br />
              <span class="arial11Black">NA </span></td>
          </tr>
          <tr>
            <td valign="top" align="left" width="100%"><span
                  class="arial12Rusty">High Balance:</span><br />
                  <span
                  class="arial11Black">$129 </span></td>
          </tr>
          <tr>
            <td valign="top" align="left" width="100%"><span
                  class="arial12Rusty">Recent Balance:</span><br />
                  <span
                  class="arial11Black">$129 as of 07/2008 </span></td>
          </tr>
          <tr>
            <td valign="top" align="left" width="100%"><span
                  class="arial12Rusty">Recent Payment:</span><br />
                  <span
                  class="arial11Black">$0 </span></td>
          </tr>
        </tbody>
      </table></td>
    </tr>
    <tr>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
      <td width="1.54%" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
    </tr>
    <tr>
      <td valign="top" align="left" width="49.23%"><span
            class="arial12Rusty">Creditor's statement:&nbsp;</span> <span
            class="arial11Black">Transferred to recovery. <br />
            <br />
      </span></td>
    </tr>
    <tr>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
      <td width="1.54%" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
    </tr>
    <tr>
      <td valign="top" align="left" width="49.23%"><span
            class="arial12Rusty">Account History:</span><br />
          <span
            class="arial11Black">Charge Off as of Jul 2008<br />
            <br />
          </span></td>
      <td width="1.54%">&nbsp;</td>
      <td width="49.23%">&nbsp;</td>
    </tr>
    <tr>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
      <td width="1.54%" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
      <td width="49.23%" bgcolor="#556aab" height="1"><img height="1" alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%" /></td>
    </tr>
  </tbody>
</table>
 
<!-- the following post must point at the correct url and ini-->
<TABLE id=CDFTradeDetailFull1 cellSpacing=0 cellPadding=0 width=650
      summary="" border=0>
        <TBODY>
        <TR>
          <TD class=arial11Black vAlign=bottom><BR><STRONG>ARROW FINANCIAL
            SERVICE </STRONG></TD>
          <TD></TD>
          <TD></TD></TR>
        <TR>
          <TD width="49.23%" bgColor=#556aab height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD>
          <TD width="1.54%" height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD>
          <TD width="49.23%" bgColor=#556aab height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD></TR>
        <TR>
          <TD width="49.23%">
            <TABLE id=CDFTradeDetailFull2 cellSpacing=0 cellPadding=0
            width="100%" summary="" border=0>
              <TBODY>
              <TR>
                <TD vAlign=top align=left width="50%"><SPAN
                  class=arial12Rusty>Address:&nbsp;</SPAN><BR><SPAN
                  class=arial11Black>5996 W TOUHY AVE<BR>NILES,
                  IL&nbsp;60714<BR>(800) 279-0224 <BR></SPAN></TD>
                <TD vAlign=top align=left width="50%"><SPAN
                  class=arial12Rusty>Account Number:</SPAN><BR><SPAN
                  class=arial11Black>42069899 </SPAN></TD></TR></TBODY></TABLE></TD>
          <TD width="1.54%"></TD>
          <TD vAlign=top align=left width="49.23%"><SPAN
            class=arial12Rusty>Original Creditor:</SPAN><BR><SPAN
            class=arial11Black>PREMIER BANKCARD INC.<BR><BR></SPAN></TD></TR>
        <TR>
          <TD width="49.23%" bgColor=#556aab height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/clear.gif"
          width="100%"></TD>
          <TD width="1.54%" height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/clear.gif"
          width="100%"></TD>
          <TD width="49.23%" bgColor=#556aab height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/clear.gif"
          width="100%"></TD></TR>
        <TR>
          <TD vAlign=top align=left width="49.23%"><SPAN
            class=arial12Rusty>Address Identification Number:</SPAN><BR><SPAN
            class=arial11Black>0054581419 </SPAN><BR></TD>
          <TD width="1.54%"></TD>
          <TD width="49.23%"></TD></TR>
        <TR>
          <TD width="49.23%" bgColor=#556aab height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD>
          <TD width="1.54%" height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD>
          <TD width="49.23%" bgColor=#556aab height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD></TR>
        <TR>
          <TD vAlign=top align=left width="49.23%"><SPAN
            class=arial12Rusty>Status:&nbsp;</SPAN> <SPAN
            class=arial11Black>Collection account. $453 past due as of Jul 2008.
            </SPAN><BR></TD>
          <TD width="1.54%"></TD>
          <TD vAlign=top align=left width="49.23%"><SPAN
            class=arial12Rusty>Status Details:&nbsp;</SPAN> <SPAN
            class=arial11Black>This account is scheduled to continue on record
            until Sep 2012. <BR><BR><BR></SPAN></TD></TR>
        <TR>
          <TD width="49.23%" bgColor=#556aab height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD>
          <TD width="1.54%" height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD>
          <TD width="49.23%" bgColor=#556aab height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD></TR>
        <TR>
          <TD vAlign=top align=left width="49.23%">
            <TABLE id=CDFTradeDetailFull3 cellSpacing=0 cellPadding=0
            width="100%" summary="" border=0>
              <TBODY>
              <TR>
                <TD vAlign=top align=left width="50%"><SPAN
                  class=arial12Rusty>Date Opened:</SPAN><BR><SPAN
                  class=arial11Black>03/2008 </SPAN></TD>
                <TD vAlign=top align=left width="50%"><SPAN
                  class=arial12Rusty>Type:</SPAN><BR><SPAN
                  class=arial11Black>Collection </SPAN></TD></TR>
              <TR>
                <TD vAlign=top align=left width="50%"><SPAN
                  class=arial12Rusty>Reported Since:</SPAN><BR><SPAN
                  class=arial11Black>06/2008 </SPAN></TD>
                <TD vAlign=top align=left width="50%"><SPAN
                  class=arial12Rusty>Terms:</SPAN> <BR><SPAN
                  class=arial11Black>1 Months </SPAN></TD></TR>
              <TR>
                <TD vAlign=top align=left width="50%"><SPAN
                  class=arial12Rusty>Date of Status:</SPAN><BR><SPAN
                  class=arial11Black>06/2008 </SPAN></TD>
                <TD vAlign=top align=left width="50%"><SPAN
                  class=arial12Rusty>Monthly Payment:</SPAN><BR><SPAN
                  class=arial11Black>$0 </SPAN></TD></TR>
              <TR>
                <TD vAlign=top align=left width="50%"><SPAN
                  class=arial12Rusty>Last Reported:</SPAN><BR><SPAN
                  class=arial11Black>07/2008 </SPAN></TD>
                <TD vAlign=top align=left width="50%"><SPAN
                  class=arial12Rusty>Responsibility:</SPAN><BR><SPAN
                  class=arial11Black>Individual
          </SPAN><BR></TD></TR></TBODY></TABLE></TD>
          <TD width="1.54%"></TD>
          <TD vAlign=top align=left width="49.23%">
            <TABLE id=CDFTradeDetailFull4 cellSpacing=0 cellPadding=0
            width="100%" summary="" border=0>
              <TBODY>
              <TR>
                <TD vAlign=top align=left width="100%"><SPAN
                  class=arial12Rusty>Credit Limit/Original
                  Amount:</SPAN><BR><SPAN class=arial11Black>$336 </SPAN></TD></TR>
              <TR>
                <TD vAlign=top align=left width="100%"><SPAN
                  class=arial12Rusty>High Balance:</SPAN><BR><SPAN
                  class=arial11Black>NA </SPAN></TD></TR>
              <TR>
                <TD vAlign=top align=left width="100%"><SPAN
                  class=arial12Rusty>Recent Balance:</SPAN><BR><SPAN
                  class=arial11Black>$453 as of 07/2008 </SPAN></TD></TR>
              <TR>
                <TD vAlign=top align=left width="100%"><SPAN
                  class=arial12Rusty>Recent Payment:</SPAN><BR><SPAN
                  class=arial11Black>$0 </SPAN></TD></TR></TBODY></TABLE></TD></TR>
        <TR>
          <TD width="49.23%" bgColor=#556aab height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD>
          <TD width="1.54%" height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD>
          <TD width="49.23%" bgColor=#556aab height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD></TR>
        <TR></TR>
        <TR>
          <TD vAlign=top align=left width="49.23%"><SPAN
            class=arial12Rusty>Account History:</SPAN><BR><SPAN
            class=arial11Black>Collection as of Jul 2008, Jun
          2008<BR><BR></SPAN></TD>
          <TD width="1.54%">&nbsp;</TD>
          <TD width="49.23%">&nbsp;</TD></TR>
        <TR>
          <TD width="49.23%" bgColor=#556aab height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD>
          <TD width="1.54%" height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD>
          <TD width="49.23%" bgColor=#556aab height=1><IMG height=1 alt=""
            src="Experian - Printable Full Report_files/bg_spacer.gif"
            width="100%"></TD></TR></TBODY></TABLE>
 
<!-- the following post must point at the correct url and ini-->
</body>
</html>

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of ahalya
ahalya
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
What does the $this value need to be set up as? When I put that in it just returns a $this instead of the variable. I'd like to see where it shows the errors.

Also, the problem is the html is generated by going to file and savas html document. Then we want to parse through that file. We are on a secure server and can't access it directly so we have to save it then parse it. To edit the file manually for everyone is a bit hectic but will do if I have too.

Thanks
Alright I've gone through the html and I see that there are tons of mistakes in the code. What is a better way to export to html via web browser for this to work?
SOLUTION
Avatar of b0lsc0tt
b0lsc0tt
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'm trying to get data displayed via the web. I need to get Company name, Date Opened, etc... What is the best way to save the source code? Should I parse it as a txt file or what exactly? There may be 5 Of the same type.

------Example-----

Company name: Crednology
Date Opened: 08/24/2006
Date Closed: NA

-----End------

That's the kind of information I'm trying to pull. I've played with setting up variables and name them Company name etc and it worked except I couldn't figure out how to have multiple Company Names etc and actually work. Also because it was tables inside of tables it tended not to work.

I'd like to be able to bypass the end tags and mistakes because we can't edit the html file everytime.

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well I have it parsed now and have it going out to an array where I can specify the element via [children][0][tag] etc... depending on where it is in the array.

Although now that I have it filtered out what I'd like to do is be able to do something like this

-------------------example--------------------------------

<table>
<tr>
<td>
Creditor: American Express</td>
<br> Date Opened:  08/05/08</br>
</tr>

<tr>
<td>
Creditor: Discover</td>
<br> Date Opened:  08/06/08</br>
Date Closed: na
</tr>
</table>

-------------------end--------------------------------

I have it so the html is proper so there won't be any errors. What's the best way to search through that and pull out Creditor and Date Opened so that it looks like.

----------------------output------------------------------
American Express
08/05/08

Discover
08/06/08
---------------------end-----------------------------------
Just pulls out exactly what's needed.  Notice I put date closed in there but did not put inside of the example of the output. This means that I'd like to be able to specify what to look for and it search through and pull it out. I've setup an array that kind of did this, but it didn't really pan out. The problem was I didn't know how to have it loop to pull multiple things like the creditor and date open for everyone that exsisted like in the example above. See I need to be able to place a distinct value on it so that I can pul it into the database.

Any ideas?
I'll split the points