Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 4712
  • Last Modified:

how to using php read data from word (.doc) file, data in a table.

I need import the data from a word document using php.
The data is in a table which in word(.doc) file.
Anyone know how?

Thanks
0
Tim
Asked:
Tim
  • 6
  • 5
  • 2
  • +1
2 Solutions
 
COBOLdinosaurCommented:
If the word doc is saved as HTML, it is possible that a lot of parsing might be able to make sense of it, but in general a word document is so loaded with proprietary codes controls and formatting that it is not suitable for uses by anything outside of Office, and sometimes even other Office components have a problem with compatibility.

Cd&
0
 
Ray PaseurCommented:
Please post the test data and show us what you want to get for output, thanks. ~Ray
0
 
TimSenior PHP DeveloperAuthor Commented:
Comment edited to put the code into the code snippet ~Ray

I use this code to read the word document.

function parseWord($userDoc) 
{
    $fileHandle = fopen($userDoc, "r");
    $word_text = @fread($fileHandle, filesize($userDoc));
    $line = "";
    $tam = filesize($userDoc);
    $nulos = 0;
    $caracteres = 0;
    for($i=1536; $i<$tam; $i++)
    {
        $line .= $word_text[$i];

        if( $word_text[$i] == 0)
        {
            $nulos++;
        }
        else
        {
            $nulos=0;
            $caracteres++;
        }

        if( $nulos>1996)
        {   
            break;  
        }
    }

    //echo $caracteres;

    $lines = explode(chr(0x0D),$line);
    //$outtext = "<pre>";

    $outtext = "";
    foreach($lines as $thisline)
    {
        $tam = strlen($thisline);
        if( !$tam )
        {
            continue;
        }

        $new_line = ""; 
        for($i=0; $i<$tam; $i++)
        {
            $onechar = $thisline[$i];
            if( $onechar > chr(240) )
            {
                continue;
            }

            if( $onechar >= chr(0x20) )
            {
                $caracteres++;
                $new_line .= $onechar;
            }

            if( $onechar == chr(0x14) )
            {
                $new_line .= "</a>";
            }

            if( $onechar == chr(0x07) )
            {
                $new_line .= "\t";
                if( isset($thisline[$i+1]) )
                {
                    if( $thisline[$i+1] == chr(0x07) )
                    {
                        $new_line .= "\n";
                    }
                }
            }
        }
        //troca por hiperlink
        $new_line = str_replace("HYPERLINK" ,"<a href=",$new_line); 
        $new_line = str_replace("\o" ,">",$new_line); 
        $new_line .= "\n";

        //link de imagens
        $new_line = str_replace("INCLUDEPICTURE" ,"<br><img src=",$new_line); 
        $new_line = str_replace("\*" ,"><br>",$new_line); 
        $new_line = str_replace("MERGEFORMATINET" ,"",$new_line); 


        $outtext .= nl2br($new_line);
    }

 return $outtext;
} 

$userDoc = "customers.doc";
$text = parseWord($userDoc);

Open in new window

Also  I use some code to get the  data array:
$olines=explode("<br />", $text);

$slines=array();
foreach( $olines as $orow ){
	$ovalue=explode("	", $orow);
	if(count($ovalue)>5)$slines[]=$ovalue;
}
print_r($slines);

Open in new window

For now, it can show the data, but it is not good enough.
Anyone can help me.

Thanks
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
Ray PaseurCommented:
Thanks for posting that code.  I don't know if it works or can be made to work because I do not have any test data and I do not know exactly what output you want to get from your test data.  Without the test data, we would just be wasting your time by guessing.  Please post the test data and show us what you want to get for output.  This article explains why we want test data.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html

Thanks, ~Ray
0
 
TimSenior PHP DeveloperAuthor Commented:
Please check the attach, it is data file.

I would like get the data array.
but now the array do not show perfect.

Array
(
    [0] => Array
        (
            [0] => x €é(¼ê ìÞÞx ÞÞÞÞÞ$$x ÞÞÞy$ÞÞÞÞé(ÞÞÞÞÞÞÞÞÞr ’</a>:Platform
            [1] => 21Y-68
            [2] => Ceiling
            [3] => Plaster
            [4] => 1500sf
            [5] => 0
            [6] => 0
            [7] => PC1.2% Chrysotile
            [8] => No
            [9] => Yes
            [10] => C
            [11] => YUS-S03-AS02
            [12] => Not identified
            [13] => See Photograph #1
            [14] => Condition to be reassessed annually.
            [15] =>
        )

    [1] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-69
            [3] => Ceiling
            [4] => Plaster
            [5] => 1500sf
            [6] => 0
            [7] => 0
            [8] =>
        )

    [2] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [3] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-73
            [3] => Ceiling
            [4] => Plaster
            [5] => Xx
            [6] => 0
            [7] => 0
            [8] =>
        )

    [4] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [5] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-74
            [3] => Ceiling
            [4] => Plaster
            [5] => Xx
            [6] => 0
            [7] => 0
            [8] =>
        )

    [6] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [7] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-75
            [3] => Ceiling
            [4] => Plaster
            [5] => Xx
            [6] => 0
            [7] => 0
            [8] =>
        )

    [8] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [9] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Chelsea NY
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [10] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Lotto Centre
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [11] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Cinnabon
            [3] => Ceiling
            [4] => Acoustic Ceililng Tile
            [5] => 100sf
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [12] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Gateway News Stand
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [13] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Rainbow ‘n’ Things
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [14] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => 51P-330 Elevator Machine Rm.
            [3] => Piping
            [4] => TransiteTM
            [5] => xx
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => No
            [10] => Yes
            [11] => B
            [12] => NS, SACM
            [13] => Not identified
            [14] =>
        )

    [15] => Array
        (
            [0] =>

            [1] => Platform
            [2] => Line Southbound Platform
            [3] => Ceiling above luxalon
            [4] => Sprayed fire proofing
            [5] => 1500sf
            [6] => 0
            [7] => 0
            [8] => PC 4.8% Chrysotile
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled
            [13] => Sampled. Sample # 1759489-008. Asbestos content PC 4.8% Chrysotile
            [14] => Coffey did not resample, but presence was confirmed by visual inspection.
        )

    [16] => Array
        (
            [0] =>

            [1] => Platform
            [2] =>  Line Southbound Platform
            [3] => Ceiling above luxalon
            [4] => Sprayed fire proofing
            [5] => 1500sf
            [6] => 0
            [7] => 0
            [8] => PC 5.1% Chrysotile
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled
            [13] => Sampled.  Sample # 1759495-014. Asbestos content PC 5.1% Chrysotile
            [14] => Coffey did not resample, but presence was confirmed by visual inspection.
        )

    [17] => Array
        (
            [0] =>

            [1] => Level
            [2] => Area/Room
            [3] => System Component
            [4] => Component Material
            [5] => Condition (Estimated Quantity)***
            [6] => Asbestos Content
            [7] => Friable?
            [8] => Visible?
            [9] => Access.
            [10] => Coffey's Sample Number**
            [11] => Pinchin Report Findings
            [12] => Comments/ Notes
            [13] => Recommendations
            [14] =>
        )

)
customers.doc
0
 
Ray PaseurCommented:
OK, I can read the document.
http://www.laprbass.com/RAY_temp_zcfyhome.php

Now can you please show me what you want to extract from the document?  Thanks, ~Ray
0
 
TimSenior PHP DeveloperAuthor Commented:
The array have some issue.
in array[0][0], the have " x €é(¼ê ìÞÞx ÞÞÞÞÞ$$x ÞÞÞy$ÞÞÞÞé(ÞÞÞÞÞÞÞÞÞr ’</a>:"
not sure why.

the array[1], array[2] looks broken by one array.

Also the array looks not show the table data well.
the table have 15 columns, but array not.

Because I have lots day.
It is hard to check manually.

Could you help.
0
 
skullnobrainsCommented:
parsing word docs from php is prone to fail when it is saved with a different word version or when the user copy-pastes formatted text

can you ask the users to use a different format ?

did you try to use antiword on the source file to convert it to text and parse it afterwards ? seems likely to produce more stable results
0
 
Ray PaseurCommented:
Just a thought... Are you running PHP on Windows?  If so, you may be able to use COM
0
 
TimSenior PHP DeveloperAuthor Commented:
Thanks all for help
0
 
Ray PaseurCommented:
We need an explanation of the bad grade.  Please see the grading guidelines here:
http://support.experts-exchange.com/customer/portal/articles/481419
0
 
skullnobrainsCommented:
i don't care about the grade, but feel free to post information regarding what you ended up with
0
 
TimSenior PHP DeveloperAuthor Commented:
Sorry for the grade. not sure this grade mean, just  click . anyways thanks both.
I give up this process after hours try, looks not good idea for this.
0
 
TimSenior PHP DeveloperAuthor Commented:
If anyone can change the grade , please do it.
0

Featured Post

NEW Veeam Backup for Microsoft Office 365 1.5

With Office 365, it’s your data and your responsibility to protect it. NEW Veeam Backup for Microsoft Office 365 eliminates the risk of losing access to your Office 365 data.

  • 6
  • 5
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now