Solved

how to using php read data from word (.doc) file, data in a table.

Posted on 2013-12-13
15
4,161 Views
Last Modified: 2014-04-06
I need import the data from a word document using php.
The data is in a table which in word(.doc) file.
Anyone know how?

Thanks
0
Comment
Question by:Tim
  • 6
  • 5
  • 2
  • +1
15 Comments
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 39718892
If the word doc is saved as HTML, it is possible that a lot of parsing might be able to make sense of it, but in general a word document is so loaded with proprietary codes controls and formatting that it is not suitable for uses by anything outside of Office, and sometimes even other Office components have a problem with compatibility.

Cd&
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39719146
Please post the test data and show us what you want to get for output, thanks. ~Ray
0
 

Author Comment

by:Tim
ID: 39722276
Comment edited to put the code into the code snippet ~Ray

I use this code to read the word document.

function parseWord($userDoc) 
{
    $fileHandle = fopen($userDoc, "r");
    $word_text = @fread($fileHandle, filesize($userDoc));
    $line = "";
    $tam = filesize($userDoc);
    $nulos = 0;
    $caracteres = 0;
    for($i=1536; $i<$tam; $i++)
    {
        $line .= $word_text[$i];

        if( $word_text[$i] == 0)
        {
            $nulos++;
        }
        else
        {
            $nulos=0;
            $caracteres++;
        }

        if( $nulos>1996)
        {   
            break;  
        }
    }

    //echo $caracteres;

    $lines = explode(chr(0x0D),$line);
    //$outtext = "<pre>";

    $outtext = "";
    foreach($lines as $thisline)
    {
        $tam = strlen($thisline);
        if( !$tam )
        {
            continue;
        }

        $new_line = ""; 
        for($i=0; $i<$tam; $i++)
        {
            $onechar = $thisline[$i];
            if( $onechar > chr(240) )
            {
                continue;
            }

            if( $onechar >= chr(0x20) )
            {
                $caracteres++;
                $new_line .= $onechar;
            }

            if( $onechar == chr(0x14) )
            {
                $new_line .= "</a>";
            }

            if( $onechar == chr(0x07) )
            {
                $new_line .= "\t";
                if( isset($thisline[$i+1]) )
                {
                    if( $thisline[$i+1] == chr(0x07) )
                    {
                        $new_line .= "\n";
                    }
                }
            }
        }
        //troca por hiperlink
        $new_line = str_replace("HYPERLINK" ,"<a href=",$new_line); 
        $new_line = str_replace("\o" ,">",$new_line); 
        $new_line .= "\n";

        //link de imagens
        $new_line = str_replace("INCLUDEPICTURE" ,"<br><img src=",$new_line); 
        $new_line = str_replace("\*" ,"><br>",$new_line); 
        $new_line = str_replace("MERGEFORMATINET" ,"",$new_line); 


        $outtext .= nl2br($new_line);
    }

 return $outtext;
} 

$userDoc = "customers.doc";
$text = parseWord($userDoc);

Open in new window

Also  I use some code to get the  data array:
$olines=explode("<br />", $text);

$slines=array();
foreach( $olines as $orow ){
	$ovalue=explode("	", $orow);
	if(count($ovalue)>5)$slines[]=$ovalue;
}
print_r($slines);

Open in new window

For now, it can show the data, but it is not good enough.
Anyone can help me.

Thanks
0
PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39722914
Thanks for posting that code.  I don't know if it works or can be made to work because I do not have any test data and I do not know exactly what output you want to get from your test data.  Without the test data, we would just be wasting your time by guessing.  Please post the test data and show us what you want to get for output.  This article explains why we want test data.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html

Thanks, ~Ray
0
 

Author Comment

by:Tim
ID: 39724086
Please check the attach, it is data file.

I would like get the data array.
but now the array do not show perfect.

Array
(
    [0] => Array
        (
            [0] => x €é(¼ê ìÞÞx ÞÞÞÞÞ$$x ÞÞÞy$ÞÞÞÞé(ÞÞÞÞÞÞÞÞÞr ’</a>:Platform
            [1] => 21Y-68
            [2] => Ceiling
            [3] => Plaster
            [4] => 1500sf
            [5] => 0
            [6] => 0
            [7] => PC1.2% Chrysotile
            [8] => No
            [9] => Yes
            [10] => C
            [11] => YUS-S03-AS02
            [12] => Not identified
            [13] => See Photograph #1
            [14] => Condition to be reassessed annually.
            [15] =>
        )

    [1] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-69
            [3] => Ceiling
            [4] => Plaster
            [5] => 1500sf
            [6] => 0
            [7] => 0
            [8] =>
        )

    [2] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [3] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-73
            [3] => Ceiling
            [4] => Plaster
            [5] => Xx
            [6] => 0
            [7] => 0
            [8] =>
        )

    [4] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [5] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-74
            [3] => Ceiling
            [4] => Plaster
            [5] => Xx
            [6] => 0
            [7] => 0
            [8] =>
        )

    [6] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [7] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-75
            [3] => Ceiling
            [4] => Plaster
            [5] => Xx
            [6] => 0
            [7] => 0
            [8] =>
        )

    [8] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [9] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Chelsea NY
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [10] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Lotto Centre
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [11] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Cinnabon
            [3] => Ceiling
            [4] => Acoustic Ceililng Tile
            [5] => 100sf
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [12] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Gateway News Stand
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [13] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Rainbow ‘n’ Things
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [14] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => 51P-330 Elevator Machine Rm.
            [3] => Piping
            [4] => TransiteTM
            [5] => xx
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => No
            [10] => Yes
            [11] => B
            [12] => NS, SACM
            [13] => Not identified
            [14] =>
        )

    [15] => Array
        (
            [0] =>

            [1] => Platform
            [2] => Line Southbound Platform
            [3] => Ceiling above luxalon
            [4] => Sprayed fire proofing
            [5] => 1500sf
            [6] => 0
            [7] => 0
            [8] => PC 4.8% Chrysotile
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled
            [13] => Sampled. Sample # 1759489-008. Asbestos content PC 4.8% Chrysotile
            [14] => Coffey did not resample, but presence was confirmed by visual inspection.
        )

    [16] => Array
        (
            [0] =>

            [1] => Platform
            [2] =>  Line Southbound Platform
            [3] => Ceiling above luxalon
            [4] => Sprayed fire proofing
            [5] => 1500sf
            [6] => 0
            [7] => 0
            [8] => PC 5.1% Chrysotile
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled
            [13] => Sampled.  Sample # 1759495-014. Asbestos content PC 5.1% Chrysotile
            [14] => Coffey did not resample, but presence was confirmed by visual inspection.
        )

    [17] => Array
        (
            [0] =>

            [1] => Level
            [2] => Area/Room
            [3] => System Component
            [4] => Component Material
            [5] => Condition (Estimated Quantity)***
            [6] => Asbestos Content
            [7] => Friable?
            [8] => Visible?
            [9] => Access.
            [10] => Coffey's Sample Number**
            [11] => Pinchin Report Findings
            [12] => Comments/ Notes
            [13] => Recommendations
            [14] =>
        )

)
customers.doc
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39724217
OK, I can read the document.
http://www.laprbass.com/RAY_temp_zcfyhome.php

Now can you please show me what you want to extract from the document?  Thanks, ~Ray
0
 

Author Comment

by:Tim
ID: 39724261
The array have some issue.
in array[0][0], the have " x €é(¼ê ìÞÞx ÞÞÞÞÞ$$x ÞÞÞy$ÞÞÞÞé(ÞÞÞÞÞÞÞÞÞr ’</a>:"
not sure why.

the array[1], array[2] looks broken by one array.

Also the array looks not show the table data well.
the table have 15 columns, but array not.

Because I have lots day.
It is hard to check manually.

Could you help.
0
 
LVL 27

Accepted Solution

by:
skullnobrains earned 250 total points
ID: 39726635
parsing word docs from php is prone to fail when it is saved with a different word version or when the user copy-pastes formatted text

can you ask the users to use a different format ?

did you try to use antiword on the source file to convert it to text and parse it afterwards ? seems likely to produce more stable results
0
 
LVL 110

Assisted Solution

by:Ray Paseur
Ray Paseur earned 250 total points
ID: 39734769
Just a thought... Are you running PHP on Windows?  If so, you may be able to use COM
0
 

Author Closing Comment

by:Tim
ID: 39976478
Thanks all for help
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39976503
We need an explanation of the bad grade.  Please see the grading guidelines here:
http://support.experts-exchange.com/customer/portal/articles/481419
0
 
LVL 27

Expert Comment

by:skullnobrains
ID: 39976957
i don't care about the grade, but feel free to post information regarding what you ended up with
0
 

Author Comment

by:Tim
ID: 39977960
Sorry for the grade. not sure this grade mean, just  click . anyways thanks both.
I give up this process after hours try, looks not good idea for this.
0
 

Author Comment

by:Tim
ID: 39977964
If anyone can change the grade , please do it.
0

Featured Post

PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Preface This is the third article about the EE Collaborative Login Project. A Better Website Login System (http://www.experts-exchange.com/A_2902.html) introduces the Login System and shows how to implement a login page. The EE Collaborative Logi…
Access developers frequently have requirements to interact with Excel (import from or output to) in their applications.  You might be able to accomplish this with the TransferSpreadsheet and OutputTo methods, but in this series of articles I will di…
The viewer will receive an overview of the basics of CSS showing inline styles. In the head tags set up your style tags: (CODE) Reference the nav tag and set your properties.: (CODE) Set the reference for the UL element and styles for it to ensu…
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question