how to using php read data from word (.doc) file, data in a table.

I need import the data from a word document using php.
The data is in a table which in word(.doc) file.
Anyone know how?

Thanks
TimSenior PHP DeveloperAsked:
Who is Participating?
 
skullnobrainsConnect With a Mentor Commented:
parsing word docs from php is prone to fail when it is saved with a different word version or when the user copy-pastes formatted text

can you ask the users to use a different format ?

did you try to use antiword on the source file to convert it to text and parse it afterwards ? seems likely to produce more stable results
0
 
COBOLdinosaurCommented:
If the word doc is saved as HTML, it is possible that a lot of parsing might be able to make sense of it, but in general a word document is so loaded with proprietary codes controls and formatting that it is not suitable for uses by anything outside of Office, and sometimes even other Office components have a problem with compatibility.

Cd&
0
 
Ray PaseurCommented:
Please post the test data and show us what you want to get for output, thanks. ~Ray
0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
TimSenior PHP DeveloperAuthor Commented:
Comment edited to put the code into the code snippet ~Ray

I use this code to read the word document.

function parseWord($userDoc) 
{
    $fileHandle = fopen($userDoc, "r");
    $word_text = @fread($fileHandle, filesize($userDoc));
    $line = "";
    $tam = filesize($userDoc);
    $nulos = 0;
    $caracteres = 0;
    for($i=1536; $i<$tam; $i++)
    {
        $line .= $word_text[$i];

        if( $word_text[$i] == 0)
        {
            $nulos++;
        }
        else
        {
            $nulos=0;
            $caracteres++;
        }

        if( $nulos>1996)
        {   
            break;  
        }
    }

    //echo $caracteres;

    $lines = explode(chr(0x0D),$line);
    //$outtext = "<pre>";

    $outtext = "";
    foreach($lines as $thisline)
    {
        $tam = strlen($thisline);
        if( !$tam )
        {
            continue;
        }

        $new_line = ""; 
        for($i=0; $i<$tam; $i++)
        {
            $onechar = $thisline[$i];
            if( $onechar > chr(240) )
            {
                continue;
            }

            if( $onechar >= chr(0x20) )
            {
                $caracteres++;
                $new_line .= $onechar;
            }

            if( $onechar == chr(0x14) )
            {
                $new_line .= "</a>";
            }

            if( $onechar == chr(0x07) )
            {
                $new_line .= "\t";
                if( isset($thisline[$i+1]) )
                {
                    if( $thisline[$i+1] == chr(0x07) )
                    {
                        $new_line .= "\n";
                    }
                }
            }
        }
        //troca por hiperlink
        $new_line = str_replace("HYPERLINK" ,"<a href=",$new_line); 
        $new_line = str_replace("\o" ,">",$new_line); 
        $new_line .= "\n";

        //link de imagens
        $new_line = str_replace("INCLUDEPICTURE" ,"<br><img src=",$new_line); 
        $new_line = str_replace("\*" ,"><br>",$new_line); 
        $new_line = str_replace("MERGEFORMATINET" ,"",$new_line); 


        $outtext .= nl2br($new_line);
    }

 return $outtext;
} 

$userDoc = "customers.doc";
$text = parseWord($userDoc);

Open in new window

Also  I use some code to get the  data array:
$olines=explode("<br />", $text);

$slines=array();
foreach( $olines as $orow ){
	$ovalue=explode("	", $orow);
	if(count($ovalue)>5)$slines[]=$ovalue;
}
print_r($slines);

Open in new window

For now, it can show the data, but it is not good enough.
Anyone can help me.

Thanks
0
 
Ray PaseurCommented:
Thanks for posting that code.  I don't know if it works or can be made to work because I do not have any test data and I do not know exactly what output you want to get from your test data.  Without the test data, we would just be wasting your time by guessing.  Please post the test data and show us what you want to get for output.  This article explains why we want test data.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html

Thanks, ~Ray
0
 
TimSenior PHP DeveloperAuthor Commented:
Please check the attach, it is data file.

I would like get the data array.
but now the array do not show perfect.

Array
(
    [0] => Array
        (
            [0] => x €é(¼ê ìÞÞx ÞÞÞÞÞ$$x ÞÞÞy$ÞÞÞÞé(ÞÞÞÞÞÞÞÞÞr ’</a>:Platform
            [1] => 21Y-68
            [2] => Ceiling
            [3] => Plaster
            [4] => 1500sf
            [5] => 0
            [6] => 0
            [7] => PC1.2% Chrysotile
            [8] => No
            [9] => Yes
            [10] => C
            [11] => YUS-S03-AS02
            [12] => Not identified
            [13] => See Photograph #1
            [14] => Condition to be reassessed annually.
            [15] =>
        )

    [1] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-69
            [3] => Ceiling
            [4] => Plaster
            [5] => 1500sf
            [6] => 0
            [7] => 0
            [8] =>
        )

    [2] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [3] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-73
            [3] => Ceiling
            [4] => Plaster
            [5] => Xx
            [6] => 0
            [7] => 0
            [8] =>
        )

    [4] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [5] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-74
            [3] => Ceiling
            [4] => Plaster
            [5] => Xx
            [6] => 0
            [7] => 0
            [8] =>
        )

    [6] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [7] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-75
            [3] => Ceiling
            [4] => Plaster
            [5] => Xx
            [6] => 0
            [7] => 0
            [8] =>
        )

    [8] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [9] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Chelsea NY
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [10] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Lotto Centre
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [11] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Cinnabon
            [3] => Ceiling
            [4] => Acoustic Ceililng Tile
            [5] => 100sf
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [12] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Gateway News Stand
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [13] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Rainbow ‘n’ Things
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [14] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => 51P-330 Elevator Machine Rm.
            [3] => Piping
            [4] => TransiteTM
            [5] => xx
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => No
            [10] => Yes
            [11] => B
            [12] => NS, SACM
            [13] => Not identified
            [14] =>
        )

    [15] => Array
        (
            [0] =>

            [1] => Platform
            [2] => Line Southbound Platform
            [3] => Ceiling above luxalon
            [4] => Sprayed fire proofing
            [5] => 1500sf
            [6] => 0
            [7] => 0
            [8] => PC 4.8% Chrysotile
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled
            [13] => Sampled. Sample # 1759489-008. Asbestos content PC 4.8% Chrysotile
            [14] => Coffey did not resample, but presence was confirmed by visual inspection.
        )

    [16] => Array
        (
            [0] =>

            [1] => Platform
            [2] =>  Line Southbound Platform
            [3] => Ceiling above luxalon
            [4] => Sprayed fire proofing
            [5] => 1500sf
            [6] => 0
            [7] => 0
            [8] => PC 5.1% Chrysotile
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled
            [13] => Sampled.  Sample # 1759495-014. Asbestos content PC 5.1% Chrysotile
            [14] => Coffey did not resample, but presence was confirmed by visual inspection.
        )

    [17] => Array
        (
            [0] =>

            [1] => Level
            [2] => Area/Room
            [3] => System Component
            [4] => Component Material
            [5] => Condition (Estimated Quantity)***
            [6] => Asbestos Content
            [7] => Friable?
            [8] => Visible?
            [9] => Access.
            [10] => Coffey's Sample Number**
            [11] => Pinchin Report Findings
            [12] => Comments/ Notes
            [13] => Recommendations
            [14] =>
        )

)
customers.doc
0
 
Ray PaseurCommented:
OK, I can read the document.
http://www.laprbass.com/RAY_temp_zcfyhome.php

Now can you please show me what you want to extract from the document?  Thanks, ~Ray
0
 
TimSenior PHP DeveloperAuthor Commented:
The array have some issue.
in array[0][0], the have " x €é(¼ê ìÞÞx ÞÞÞÞÞ$$x ÞÞÞy$ÞÞÞÞé(ÞÞÞÞÞÞÞÞÞr ’</a>:"
not sure why.

the array[1], array[2] looks broken by one array.

Also the array looks not show the table data well.
the table have 15 columns, but array not.

Because I have lots day.
It is hard to check manually.

Could you help.
0
 
Ray PaseurConnect With a Mentor Commented:
Just a thought... Are you running PHP on Windows?  If so, you may be able to use COM
0
 
TimSenior PHP DeveloperAuthor Commented:
Thanks all for help
0
 
Ray PaseurCommented:
We need an explanation of the bad grade.  Please see the grading guidelines here:
http://support.experts-exchange.com/customer/portal/articles/481419
0
 
skullnobrainsCommented:
i don't care about the grade, but feel free to post information regarding what you ended up with
0
 
TimSenior PHP DeveloperAuthor Commented:
Sorry for the grade. not sure this grade mean, just  click . anyways thanks both.
I give up this process after hours try, looks not good idea for this.
0
 
TimSenior PHP DeveloperAuthor Commented:
If anyone can change the grade , please do it.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.