Convert Word Document to XML using PHP script

Posted on 2003-03-18
Medium Priority
Last Modified: 2011-10-03
I want to convert a Word document to an XML format using an online PHP script... so users can submit their Word document and my script will process it and store it in a database. An XML dataset would be created and PHP would filter out the standard headers etc and look for the information that the user filled out.

If anyone knows a predefined script that does this, or can help me program something like this it would be awesome. I've been looking everywhere for help on this subject... but have not been succesfull in finding an application that does exactly what I need (no PHP script anyways, just JAVA apps etc).
Question by:-Darkness-
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Accepted Solution

bobsledbob earned 1000 total points
ID: 8164593

This is probably about as close as you're going to get:


This will (they say) convert a word doc to html using the wv library.

From there, you could use 'HTML Tidy' to get your html into xhtml format:


Once in xhtml format, you should be able to pass by it with an xml parser and extract what you're looking for.

OR, the other option is to wait for Microsoft's next release of Word which should store its files in XML format by default.

OR, and this is my preferred way, you can get your users to start using OpenOffice which already stores it's contents in XML format.

Hope this helps,


p.s.  Google is your friend.

Author Comment

ID: 8165030
Sounds good... installed WvWare for Windows... and got Tidy. I can give him an HTML file using the CMD console for Windows... but I cannot get an outputted XML/XHTML file, do you have any clue which command I use for that? I am scrolling through the help but the support isn't that clear imho.

If you can help me with this that'dd be awesome, thx in advance :)

Author Comment

ID: 8165174
Never mind my last comment... I already got Tidy to work using a config file. Just have a problem with WvWare... when I try to run WvWare.exe in Windows2000 I get this error:

The dynamic link library libiconv.dll could not be found in the specified path C:\phpTools\WvWare\bin; etc... (lot more dirs).

Anyone know where to get this file or how to fix this problem?
Why Off-Site Backups Are The Only Way To Go

You are probably backing up your data—but how and where? Ransomware is on the rise and there are variants that specifically target backups. Read on to discover why off-site is the way to go.


Author Comment

ID: 8165498
Ok, fixed the dll errors... now I don't know which command console syntax to use to actually CONVERT a word doc to html... cannot seem to find a manual on it. I have WvWare.exe, but the WvHtml is just a plain file with no extention. Anyone happen to know how to actually use this?

Author Comment

ID: 8165527
Ok, errors fixed, just need to know how to output the WvWare result to a file, cause right now I get all the HTML code in the command console which does not really work well :)

Expert Comment

ID: 8168427

I assume that all you'll need to do is redirect standard output to a file.  This is typically done by using the > (greater than) character on the command line.  It would look like this:

C:\> some.exe > somefile.html

real life example:

C:\> dir > dir.txt

The file dir.txt will be created in your c:\ directory.  Of course, you can specify the path to save it in...

C:\> dir > /some/path/here/dir.txt

Hope this is what you meant.


Author Comment

ID: 8193655
no that is not the problem... I cannot execute any shell programs. I am using Win2k to host my apache/php and I read that it causes problems, can you tell me how to fix this? when I use my command in shell I get the wanted result so that is all good.

Expert Comment

ID: 10711672
use the PHP system() function.
even exec()
LVL 33

Expert Comment

ID: 11934743
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I will leave the following recommendation for this question in the Cleanup topic area:
    Accept: bobsledbob {http:#8164593}

Any objections should be posted here in the next 4 days. After that time, the question will be closed.

EE Cleanup Volunteer

Featured Post

WordPress Tutorial 1: Installation & Setup

WordPress is a very popular option for running your web site and can be used to get your content online quickly for the world to see. This guide will walk you through installing the WordPress server software and the initial setup process.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
This article discusses four methods for overlaying images in a container on a web page
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question