Read word file using php

Hi,

How can we retrieve all the contents of a word document including its style format exactly in PHP?

Thanks in advance.
Web_SightAsked:
Who is Participating?
 
NeuropsykopatConnect With a Mentor Commented:
Not if you use ms word < 2007 but you can read the text inside (see below)

maybe possible if you save in docx format which is xml standard.

Another option is to export your doc in HTML for exemple, to retreive styles and formtas attributes before reading it with PHP (http://holloway.co.nz/docvert/index.html)

<?php

/*****************************************************************
This approach uses detection of NUL (chr(00)) and end line (chr(13))
to decide where the text is:
- divide the file contents up by chr(13)
- reject any slices containing a NUL
- stitch the rest together again
- clean up with a regular expression
*****************************************************************/

function parseWord($userDoc)
{
    $fileHandle = fopen($userDoc, "r");
    $word_text = @fread($fileHandle, filesize($userDoc));
    $line = "";
    $tam = filesize($userDoc);
    $nulos = 0;
    $caracteres = 0;
    for($i=1536; $i<$tam; $i++)
    {
        $line .= $word_text[$i];

        if( $word_text[$i] == 0)
        {
            $nulos++;
        }
        else
        {
            $nulos=0;
            $caracteres++;
        }

        if( $nulos>1996)
        {  
            break;  
        }
    }

    //echo $caracteres;

    $lines = explode(chr(0x0D),$line);
    //$outtext = "<pre>";

    $outtext = "";
    foreach($lines as $thisline)
    {
        $tam = strlen($thisline);
        if( !$tam )
        {
            continue;
        }

        $new_line = "";
        for($i=0; $i<$tam; $i++)
        {
            $onechar = $thisline[$i];
            if( $onechar > chr(240) )
            {
                continue;
            }

            if( $onechar >= chr(0x20) )
            {
                $caracteres++;
                $new_line .= $onechar;
            }

            if( $onechar == chr(0x14) )
            {
                $new_line .= "</a>";
            }

            if( $onechar == chr(0x07) )
            {
                $new_line .= "\t";
                if( isset($thisline[$i+1]) )
                {
                    if( $thisline[$i+1] == chr(0x07) )
                    {
                        $new_line .= "\n";
                    }
                }
            }
        }
        //troca por hiperlink
        $new_line = str_replace("HYPERLINK" ,"<a href=",$new_line);
        $new_line = str_replace("\o" ,">",$new_line);
        $new_line .= "\n";

        //link de imagens
        $new_line = str_replace("INCLUDEPICTURE" ,"<br><img src=",$new_line);
        $new_line = str_replace("\*" ,"><br>",$new_line);
        $new_line = str_replace("MERGEFORMATINET" ,"",$new_line);


        $outtext .= nl2br($new_line);
    }

 return $outtext;
}

$userDoc = "custo.doc";
$userDoc = "Cultura.doc";
$text = parseWord($userDoc);

echo $text;


?>
0
 
Mohamed AbowardaSoftware EngineerCommented:
Reading from a Word Document with COM in PHP:
http://drewd.com/2007/01/25/reading-from-a-word-document-with-com-in-php
0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

 
Ray PaseurCommented:
Please describe what you mean by, "retrieve all the contents."  An example or two would be helpful.  Do you want to preserve the Word file or change it?  Have you tried using .doc files or .docx files?  Is there any test case showing the inputs and outputs that you want?  With that we might be able to provide more concrete help.
0
 
Web_SightAuthor Commented:
Hi,

As per our requirement, we will have some standard template word documents spanning about 30-40 pages and there would be some blank spaces on some pages where some names, designation, role etc will need to be filled in dynamically based on details submitted from a form. After the dynamic population of data in the doc file, the user should be able to save this file explicitly.

We need to know if a solution to do this is available using PHP.

We are aware of similar solutions using PDF/PHP, not sure if a solution exists for Word.

Thanks

 
0
 
Ray PaseurCommented:
Can you create a small test case, please?  It is impossible for us to guess what you might have in 30-40 pages of a hypothetical word document.  

Please show us the HTML form and the word document so we can see the relationships you want to establish between the HTML form input controls and the contents of the word document.  All we really need to see would be the HTML form, and the two word documents, showing "before" and "after" conditions.  From that we may be able to help you find the design patterns that would be needed to implement a solution.

Thanks, ~Ray
0
 
Web_SightAuthor Commented:
Partial solution
0
 
Ray PaseurCommented:
@Web_Sight:  I think the reason you got a "partial solution" is because you asked a "partial question."  There is a reason why I wrote, Can you create a small test case, please?  It is impossible for us to guess what you might have in 30-40 pages of a hypothetical word document.

The reason for that request (which you chose to ignore) is simply this:  You can get good solutions if you can show us what your input looks like and show us what you want for output.  We are experts, but not mind readers.  Inquiries that are broad, vague and hypothetical may not get answers that are as succinct and effective as inquiries that have actual URLs and clearly expressed questions.  

If you want us to be able to share working code, we need you to show us where you have put your test data.  If you have no test data, please create some.  We do not want you to post "live" passwords and such.  Instead, please set up a testbed and show us the links to that test bed, instead of the live data.

If you do those things you will have no problem at all getting very high quality solutions from the Experts here at EE.  Good luck with your project, ~Ray
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.