• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2525
  • Last Modified:

Convert Word Document to XML using PHP script

I want to convert a Word document to an XML format using an online PHP script... so users can submit their Word document and my script will process it and store it in a database. An XML dataset would be created and PHP would filter out the standard headers etc and look for the information that the user filled out.

If anyone knows a predefined script that does this, or can help me program something like this it would be awesome. I've been looking everywhere for help on this subject... but have not been succesfull in finding an application that does exactly what I need (no PHP script anyways, just JAVA apps etc).
0
-Darkness-
Asked:
-Darkness-
1 Solution
 
bobsledbobCommented:

This is probably about as close as you're going to get:

http://www.hotscripts.com/Detailed/13628.html

This will (they say) convert a word doc to html using the wv library.

From there, you could use 'HTML Tidy' to get your html into xhtml format:

http://www.w3.org/People/Raggett/tidy/

Once in xhtml format, you should be able to pass by it with an xml parser and extract what you're looking for.

OR, the other option is to wait for Microsoft's next release of Word which should store its files in XML format by default.

OR, and this is my preferred way, you can get your users to start using OpenOffice which already stores it's contents in XML format.

Hope this helps,

Adam

p.s.  Google is your friend.
0
 
-Darkness-Author Commented:
Sounds good... installed WvWare for Windows... and got Tidy. I can give him an HTML file using the CMD console for Windows... but I cannot get an outputted XML/XHTML file, do you have any clue which command I use for that? I am scrolling through the help but the support isn't that clear imho.

If you can help me with this that'dd be awesome, thx in advance :)
0
 
-Darkness-Author Commented:
Never mind my last comment... I already got Tidy to work using a config file. Just have a problem with WvWare... when I try to run WvWare.exe in Windows2000 I get this error:

The dynamic link library libiconv.dll could not be found in the specified path C:\phpTools\WvWare\bin; etc... (lot more dirs).

Anyone know where to get this file or how to fix this problem?
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
-Darkness-Author Commented:
Ok, fixed the dll errors... now I don't know which command console syntax to use to actually CONVERT a word doc to html... cannot seem to find a manual on it. I have WvWare.exe, but the WvHtml is just a plain file with no extention. Anyone happen to know how to actually use this?
0
 
-Darkness-Author Commented:
Ok, errors fixed, just need to know how to output the WvWare result to a file, cause right now I get all the HTML code in the command console which does not really work well :)
0
 
bobsledbobCommented:

I assume that all you'll need to do is redirect standard output to a file.  This is typically done by using the > (greater than) character on the command line.  It would look like this:

C:\> some.exe > somefile.html

real life example:

C:\> dir > dir.txt

The file dir.txt will be created in your c:\ directory.  Of course, you can specify the path to save it in...

C:\> dir > /some/path/here/dir.txt


Hope this is what you meant.

Adam
0
 
-Darkness-Author Commented:
no that is not the problem... I cannot execute any shell programs. I am using Win2k to host my apache/php and I read that it causes problems, can you tell me how to fix this? when I use my command in shell I get the wanted result so that is all good.
0
 
udendraCommented:
use the PHP system() function.
or
even exec()
0
 
snoyes_jwCommented:
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I will leave the following recommendation for this question in the Cleanup topic area:
    Accept: bobsledbob {http:#8164593}

Any objections should be posted here in the next 4 days. After that time, the question will be closed.

snoyes_jw
EE Cleanup Volunteer
0

Featured Post

[Webinar] Database Backup and Recovery

Does your company store data on premises, off site, in the cloud, or a combination of these? If you answered “yes”, you need a data backup recovery plan that fits each and every platform. Watch now as as Percona teaches us how to build agile data backup recovery plan.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now