Solved

how to using php read data from word (.doc) file, data in a table.

Posted on 2013-12-13
15
4,265 Views
Last Modified: 2014-04-06
I need import the data from a word document using php.
The data is in a table which in word(.doc) file.
Anyone know how?

Thanks
0
Comment
Question by:Tim
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
  • 2
  • +1
15 Comments
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 39718892
If the word doc is saved as HTML, it is possible that a lot of parsing might be able to make sense of it, but in general a word document is so loaded with proprietary codes controls and formatting that it is not suitable for uses by anything outside of Office, and sometimes even other Office components have a problem with compatibility.

Cd&
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39719146
Please post the test data and show us what you want to get for output, thanks. ~Ray
0
 

Author Comment

by:Tim
ID: 39722276
Comment edited to put the code into the code snippet ~Ray

I use this code to read the word document.

function parseWord($userDoc) 
{
    $fileHandle = fopen($userDoc, "r");
    $word_text = @fread($fileHandle, filesize($userDoc));
    $line = "";
    $tam = filesize($userDoc);
    $nulos = 0;
    $caracteres = 0;
    for($i=1536; $i<$tam; $i++)
    {
        $line .= $word_text[$i];

        if( $word_text[$i] == 0)
        {
            $nulos++;
        }
        else
        {
            $nulos=0;
            $caracteres++;
        }

        if( $nulos>1996)
        {   
            break;  
        }
    }

    //echo $caracteres;

    $lines = explode(chr(0x0D),$line);
    //$outtext = "<pre>";

    $outtext = "";
    foreach($lines as $thisline)
    {
        $tam = strlen($thisline);
        if( !$tam )
        {
            continue;
        }

        $new_line = ""; 
        for($i=0; $i<$tam; $i++)
        {
            $onechar = $thisline[$i];
            if( $onechar > chr(240) )
            {
                continue;
            }

            if( $onechar >= chr(0x20) )
            {
                $caracteres++;
                $new_line .= $onechar;
            }

            if( $onechar == chr(0x14) )
            {
                $new_line .= "</a>";
            }

            if( $onechar == chr(0x07) )
            {
                $new_line .= "\t";
                if( isset($thisline[$i+1]) )
                {
                    if( $thisline[$i+1] == chr(0x07) )
                    {
                        $new_line .= "\n";
                    }
                }
            }
        }
        //troca por hiperlink
        $new_line = str_replace("HYPERLINK" ,"<a href=",$new_line); 
        $new_line = str_replace("\o" ,">",$new_line); 
        $new_line .= "\n";

        //link de imagens
        $new_line = str_replace("INCLUDEPICTURE" ,"<br><img src=",$new_line); 
        $new_line = str_replace("\*" ,"><br>",$new_line); 
        $new_line = str_replace("MERGEFORMATINET" ,"",$new_line); 


        $outtext .= nl2br($new_line);
    }

 return $outtext;
} 

$userDoc = "customers.doc";
$text = parseWord($userDoc);

Open in new window

Also  I use some code to get the  data array:
$olines=explode("<br />", $text);

$slines=array();
foreach( $olines as $orow ){
	$ovalue=explode("	", $orow);
	if(count($ovalue)>5)$slines[]=$ovalue;
}
print_r($slines);

Open in new window

For now, it can show the data, but it is not good enough.
Anyone can help me.

Thanks
0
Don't Cry: How Liquid Web is Ensuring Security

WannaCry is just the start. Read how Liquid Web is protecting itself and its customers against new threats.

 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39722914
Thanks for posting that code.  I don't know if it works or can be made to work because I do not have any test data and I do not know exactly what output you want to get from your test data.  Without the test data, we would just be wasting your time by guessing.  Please post the test data and show us what you want to get for output.  This article explains why we want test data.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html

Thanks, ~Ray
0
 

Author Comment

by:Tim
ID: 39724086
Please check the attach, it is data file.

I would like get the data array.
but now the array do not show perfect.

Array
(
    [0] => Array
        (
            [0] => x €é(¼ê ìÞÞx ÞÞÞÞÞ$$x ÞÞÞy$ÞÞÞÞé(ÞÞÞÞÞÞÞÞÞr ’</a>:Platform
            [1] => 21Y-68
            [2] => Ceiling
            [3] => Plaster
            [4] => 1500sf
            [5] => 0
            [6] => 0
            [7] => PC1.2% Chrysotile
            [8] => No
            [9] => Yes
            [10] => C
            [11] => YUS-S03-AS02
            [12] => Not identified
            [13] => See Photograph #1
            [14] => Condition to be reassessed annually.
            [15] =>
        )

    [1] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-69
            [3] => Ceiling
            [4] => Plaster
            [5] => 1500sf
            [6] => 0
            [7] => 0
            [8] =>
        )

    [2] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [3] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-73
            [3] => Ceiling
            [4] => Plaster
            [5] => Xx
            [6] => 0
            [7] => 0
            [8] =>
        )

    [4] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [5] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-74
            [3] => Ceiling
            [4] => Plaster
            [5] => Xx
            [6] => 0
            [7] => 0
            [8] =>
        )

    [6] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [7] => Array
        (
            [0] =>

            [1] => Platform
            [2] => 21Y-75
            [3] => Ceiling
            [4] => Plaster
            [5] => Xx
            [6] => 0
            [7] => 0
            [8] =>
        )

    [8] => Array
        (
            [0] =>

            [1] => No
            [2] => Yes
            [3] => C
            [4] => Visually similar to YUS-S03-S02
            [5] => Not identified
            [6] =>
        )

    [9] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Chelsea NY
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [10] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Lotto Centre
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [11] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Cinnabon
            [3] => Ceiling
            [4] => Acoustic Ceililng Tile
            [5] => 100sf
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [12] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Gateway News Stand
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [13] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => Rainbow ‘n’ Things
            [3] => Ceiling
            [4] => Acoustic Ceiling Tile
            [5] => NQ
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled due to entry restrictions
            [13] =>
        )

    [14] => Array
        (
            [0] =>

            [1] => Concourse
            [2] => 51P-330 Elevator Machine Rm.
            [3] => Piping
            [4] => TransiteTM
            [5] => xx
            [6] => 0
            [7] => 0
            [8] => SACM
            [9] => No
            [10] => Yes
            [11] => B
            [12] => NS, SACM
            [13] => Not identified
            [14] =>
        )

    [15] => Array
        (
            [0] =>

            [1] => Platform
            [2] => Line Southbound Platform
            [3] => Ceiling above luxalon
            [4] => Sprayed fire proofing
            [5] => 1500sf
            [6] => 0
            [7] => 0
            [8] => PC 4.8% Chrysotile
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled
            [13] => Sampled. Sample # 1759489-008. Asbestos content PC 4.8% Chrysotile
            [14] => Coffey did not resample, but presence was confirmed by visual inspection.
        )

    [16] => Array
        (
            [0] =>

            [1] => Platform
            [2] =>  Line Southbound Platform
            [3] => Ceiling above luxalon
            [4] => Sprayed fire proofing
            [5] => 1500sf
            [6] => 0
            [7] => 0
            [8] => PC 5.1% Chrysotile
            [9] => Yes
            [10] => Yes
            [11] => C
            [12] => Not sampled
            [13] => Sampled.  Sample # 1759495-014. Asbestos content PC 5.1% Chrysotile
            [14] => Coffey did not resample, but presence was confirmed by visual inspection.
        )

    [17] => Array
        (
            [0] =>

            [1] => Level
            [2] => Area/Room
            [3] => System Component
            [4] => Component Material
            [5] => Condition (Estimated Quantity)***
            [6] => Asbestos Content
            [7] => Friable?
            [8] => Visible?
            [9] => Access.
            [10] => Coffey's Sample Number**
            [11] => Pinchin Report Findings
            [12] => Comments/ Notes
            [13] => Recommendations
            [14] =>
        )

)
customers.doc
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39724217
OK, I can read the document.
http://www.laprbass.com/RAY_temp_zcfyhome.php

Now can you please show me what you want to extract from the document?  Thanks, ~Ray
0
 

Author Comment

by:Tim
ID: 39724261
The array have some issue.
in array[0][0], the have " x €é(¼ê ìÞÞx ÞÞÞÞÞ$$x ÞÞÞy$ÞÞÞÞé(ÞÞÞÞÞÞÞÞÞr ’</a>:"
not sure why.

the array[1], array[2] looks broken by one array.

Also the array looks not show the table data well.
the table have 15 columns, but array not.

Because I have lots day.
It is hard to check manually.

Could you help.
0
 
LVL 27

Accepted Solution

by:
skullnobrains earned 250 total points
ID: 39726635
parsing word docs from php is prone to fail when it is saved with a different word version or when the user copy-pastes formatted text

can you ask the users to use a different format ?

did you try to use antiword on the source file to convert it to text and parse it afterwards ? seems likely to produce more stable results
0
 
LVL 110

Assisted Solution

by:Ray Paseur
Ray Paseur earned 250 total points
ID: 39734769
Just a thought... Are you running PHP on Windows?  If so, you may be able to use COM
0
 

Author Closing Comment

by:Tim
ID: 39976478
Thanks all for help
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39976503
We need an explanation of the bad grade.  Please see the grading guidelines here:
http://support.experts-exchange.com/customer/portal/articles/481419
0
 
LVL 27

Expert Comment

by:skullnobrains
ID: 39976957
i don't care about the grade, but feel free to post information regarding what you ended up with
0
 

Author Comment

by:Tim
ID: 39977960
Sorry for the grade. not sure this grade mean, just  click . anyways thanks both.
I give up this process after hours try, looks not good idea for this.
0
 

Author Comment

by:Tim
ID: 39977964
If anyone can change the grade , please do it.
0

Featured Post

Revamp Your Training Process

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article helps those who get the 0xc004d307 error when trying to rearm (reset the license) Office 2013 in a Virtual Desktop Infrastructure (VDI) and/or those trying to prep the master image for Microsoft Key Management (KMS) activation. (i.e.- C…
This article describes how to use a set of graphical playing cards to create a Draw Poker game in Excel or VB6.
The viewer will learn how to count occurrences of each item in an array.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

689 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question