Read word file using php

Hi,

How can we retrieve all the contents of a word document including its style format exactly in PHP?

Thanks in advance.
Web_SightAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

NeuropsykopatCommented:
Not if you use ms word < 2007 but you can read the text inside (see below)

maybe possible if you save in docx format which is xml standard.

Another option is to export your doc in HTML for exemple, to retreive styles and formtas attributes before reading it with PHP (http://holloway.co.nz/docvert/index.html)

<?php

/*****************************************************************
This approach uses detection of NUL (chr(00)) and end line (chr(13))
to decide where the text is:
- divide the file contents up by chr(13)
- reject any slices containing a NUL
- stitch the rest together again
- clean up with a regular expression
*****************************************************************/

function parseWord($userDoc)
{
    $fileHandle = fopen($userDoc, "r");
    $word_text = @fread($fileHandle, filesize($userDoc));
    $line = "";
    $tam = filesize($userDoc);
    $nulos = 0;
    $caracteres = 0;
    for($i=1536; $i<$tam; $i++)
    {
        $line .= $word_text[$i];

        if( $word_text[$i] == 0)
        {
            $nulos++;
        }
        else
        {
            $nulos=0;
            $caracteres++;
        }

        if( $nulos>1996)
        {  
            break;  
        }
    }

    //echo $caracteres;

    $lines = explode(chr(0x0D),$line);
    //$outtext = "<pre>";

    $outtext = "";
    foreach($lines as $thisline)
    {
        $tam = strlen($thisline);
        if( !$tam )
        {
            continue;
        }

        $new_line = "";
        for($i=0; $i<$tam; $i++)
        {
            $onechar = $thisline[$i];
            if( $onechar > chr(240) )
            {
                continue;
            }

            if( $onechar >= chr(0x20) )
            {
                $caracteres++;
                $new_line .= $onechar;
            }

            if( $onechar == chr(0x14) )
            {
                $new_line .= "</a>";
            }

            if( $onechar == chr(0x07) )
            {
                $new_line .= "\t";
                if( isset($thisline[$i+1]) )
                {
                    if( $thisline[$i+1] == chr(0x07) )
                    {
                        $new_line .= "\n";
                    }
                }
            }
        }
        //troca por hiperlink
        $new_line = str_replace("HYPERLINK" ,"<a href=",$new_line);
        $new_line = str_replace("\o" ,">",$new_line);
        $new_line .= "\n";

        //link de imagens
        $new_line = str_replace("INCLUDEPICTURE" ,"<br><img src=",$new_line);
        $new_line = str_replace("\*" ,"><br>",$new_line);
        $new_line = str_replace("MERGEFORMATINET" ,"",$new_line);


        $outtext .= nl2br($new_line);
    }

 return $outtext;
}

$userDoc = "custo.doc";
$userDoc = "Cultura.doc";
$text = parseWord($userDoc);

echo $text;


?>
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Mohamed AbowardaSoftware EngineerCommented:
Reading from a Word Document with COM in PHP:
http://drewd.com/2007/01/25/reading-from-a-word-document-with-com-in-php
0
Angular Fundamentals

Learn the fundamentals of Angular 2, a JavaScript framework for developing dynamic single page applications.

Ray PaseurCommented:
Please describe what you mean by, "retrieve all the contents."  An example or two would be helpful.  Do you want to preserve the Word file or change it?  Have you tried using .doc files or .docx files?  Is there any test case showing the inputs and outputs that you want?  With that we might be able to provide more concrete help.
0
Web_SightAuthor Commented:
Hi,

As per our requirement, we will have some standard template word documents spanning about 30-40 pages and there would be some blank spaces on some pages where some names, designation, role etc will need to be filled in dynamically based on details submitted from a form. After the dynamic population of data in the doc file, the user should be able to save this file explicitly.

We need to know if a solution to do this is available using PHP.

We are aware of similar solutions using PDF/PHP, not sure if a solution exists for Word.

Thanks

 
0
Ray PaseurCommented:
Can you create a small test case, please?  It is impossible for us to guess what you might have in 30-40 pages of a hypothetical word document.  

Please show us the HTML form and the word document so we can see the relationships you want to establish between the HTML form input controls and the contents of the word document.  All we really need to see would be the HTML form, and the two word documents, showing "before" and "after" conditions.  From that we may be able to help you find the design patterns that would be needed to implement a solution.

Thanks, ~Ray
0
Web_SightAuthor Commented:
Partial solution
0
Ray PaseurCommented:
@Web_Sight:  I think the reason you got a "partial solution" is because you asked a "partial question."  There is a reason why I wrote, Can you create a small test case, please?  It is impossible for us to guess what you might have in 30-40 pages of a hypothetical word document.

The reason for that request (which you chose to ignore) is simply this:  You can get good solutions if you can show us what your input looks like and show us what you want for output.  We are experts, but not mind readers.  Inquiries that are broad, vague and hypothetical may not get answers that are as succinct and effective as inquiries that have actual URLs and clearly expressed questions.  

If you want us to be able to share working code, we need you to show us where you have put your test data.  If you have no test data, please create some.  We do not want you to post "live" passwords and such.  Instead, please set up a testbed and show us the links to that test bed, instead of the live data.

If you do those things you will have no problem at all getting very high quality solutions from the Experts here at EE.  Good luck with your project, ~Ray
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.