We help IT Professionals succeed at work.

Edit .doc, .xlxs in php/javascript

movieprodw asked

I have a new project and the client is looking for the following, any help would be awesome.

- Ability to open and view .doc/.xlxs file
- Edit the file, in an excel or word style editor
- save the file

Watch Question

So, basically, you want to reproduce Google Docs?
I think it's a bit ambitious.

My 2¢.

Overall, it can't really be done in a simple project.

1. You mentioned .doc (Office 2003 and earlier) and .xlsx (Office 2007+) formats. Besides being two different applications (Word and Excel), those are two very different formats. The earlier is a proprietary Microsoft format, while the later is actually a renamed ZIP file that contains a bunch of XML. So if you want to support both versions, then you have to have libraries for both.

2. The editor aspect is probably the most difficult part of all. You DO have components out there that provide editable grids (Excel) and rich-text editing (Word), but reading the existing files into those components and formatting the contents properly is a huge project in itself. As soon as you do this, the content is going to be irreversibly converted to a format that works with those components. So if you were to then make changes and save, you'd probably end up with all sorts of unwanted changes, and the results wouldn't look correct in Word or Excel.

Plus on top of all that, if it were that easy, nobody would buy Office, because they could just use a web app to do the editing and skip all the licensing costs.

Sorry, but there's a LOT of problems with what you're trying to achieve. Even Microsoft can't fully replicate the same experience of Word/Excel on the web. They offer web-based versions of Word and Excel in their Office 365 suite (which is your best bet, by the way), but those web-based versions do not offer all of the same features as the desktop versions. And that's Microsoft's own product!

Your best bet is either:
1. Try to integrate the process / flow into Office 365, so the client ends up using Microsoft's own web-based versions for editing and saving.
2. Try to create plugins for Word/Excel (this isn't too difficult in .NET - there is an Office SDK that makes it fairly easy), and have those plugins interact with your web site (e.g. post the saved versions automatically after save, etc).


The issue is hat it is on an internal network and because of the laws of our industry it has to be hosted in house.

I have a hard time believing that there is not a solution.

And as Dan mentioned Google Docs, that's a good example of how a web app can import data, but it then gets forcibly converted to a different format / end result, which has the potential to lose unsupported features / content.

Sorry to say it, but there is no solution that is likely to meet your needs here. I think some people look at Office apps and get the idea that it should be so simple to build something like that, and they have literally no idea how complex it REALLY is.

If you don't want to listen to the "no" answers and just want to see for yourself, then I would suggest:

1. Go to phpclasses.org and find classes that interact with .doc and .xls files (legacy Office formats). Then find classes that interact with .docx and .xlsx files (newer Office formats).

2. Once you have those classes, make a list of the features they support (both reading AND writing). Save those results somewhere in a document and go open up Word or Excel and start making your way through the list of features that it has and list out which features won't be supported. I'm talking everything from bookmarks to merge fields, to simple strike-through formatting, embedded tables, formatting and spacing on those tables, hovering text boxes, and so on.

That will give you an idea of the limits you will hit with non-Microsoft technology.

Now go to Microsoft and download their Office SDK. This is THE way that Microsoft allows you to interact with your Office apps. Go look at some of the sample projects that use Office SDK, which should be a gallery that shows off what the SDK can do. If you want to be certain about it, there's a whole section in the MSDN on what Office SDK supports, down to the type of value in a parameter for a specific function call for a specific application. Read through it and again compare what the SDK offers to what features are in the main product.

Office is a complex, complex, complex beast. Building a web UI that supports editing of the files (even just for ONE of the four formats) is a huge project.
Top Expert 2015
Windows PHP can control office applications via COM.
Other option is to use libreoffice headless and convert documents back and forth (at loss of macros and formatting)

@gheist - the COM interface is more or less the same thing as the office SDK that I mentioned, but is less stable, in my opinion. Scripting with COM objects tends to open up windowless instances of the office Apps and leave them open, even after the script is finished, and it doesn't really offer up any more functionality.

There is libreoffice, Google Docs, OpenOffice, etc... but they all require format conversion that is never 100% the same as the original. Even in their native desktop apps, they still can lose data from the original. I saw one instance where a particular Word template resulted in about 50% of the document being missing in OpenOffice, for example. The compatibility is decent, but you're still talking about a desktop app. You'd still have to build out a web UI that would lose even more functionality, and then try to convert back to the original format, which would lose EVEN MORE functionality.

I'm not trying to be negative just for the sake of being negative. I'm usually the one that finds ways of doing things, even if they're not "officially" supported, but I've been down this same road several times. It's a LOT of wasted hours.

If they cannot stand to use the desktop versions with plug-ins (which should be -THE- most compatible way of everything to go), then I'd still say your next best bet would be to integrate with Office 365. They have REST APIs.

Top Expert 2015

Office SDK is same cloud c-@p as all the webdocs....
If one wants to use own computer and not share docs with cloud providers I mentioned two options.


Thank you for the good info.

This looks like it is close but it was not working for me, it looks like the had a good start then gave up.


I have a feeling that this either is not a project requirement or the OP did not probe the client enough to understand the functionality the client needs.

If the client has doc/docx/xls/xlsx files, then probably has a way to generate them (MS Office/LibreOffice/etc) and is looking for a way to collaborate on them. This can be achieved in any number of ways, a web editor being only an option.


Clearly they have a way to edit and create them, Dan.

They want to have them managed on their intranet and not downloaded to the users computer.

So first, you should probably accept the fact that it is not doable in this manner. Fully-compatible, web-based editing of office docs on an intranet simply cannot be done. Again, if it could be, Microsoft would lose billions of dollars in licensing because nobody would have to have the Office application loaded on their PCs. Just let that sink in.

Usually there's a greater purpose behind avoiding having users download the files. Typically it's either:

1. Security-driven (e.g. compliance regulations), where the data cannot reside on a physical workstation, so that if a workstation is compromised or lost, there's no danger to the data being leaked.

There's not much you can do here except use disk-based encryption so that the data is encrypted at rest. Lost or stolen laptops will be useless to thieves. If someone doesn't follow proper protocol, though, it doesn't matter if it's web-based or desktop-based editing - someone who leaves their laptop unattended and unlocked at a coffee shop is going to leave their system open to data theft.

2. Convenience/flow-driven, where you don't want multiple users stepping on each other, working on the same documents at the same time, etc...

On an intranet setting, this can be a huge advantage of using Sharepoint. Sharepoint can basically take uploaded office docs and allow people to "check out" the files and work on them, and "check in" the changes. This enforces a proper workflow with document management, and it ties together with documents fairly well so that the transition between the Sharepoint site and the application itself is almost seamless.

There's a lot of features in Sharepoint, but that's one of the ones I always see in use at almost every company that has this kind of issue (e.g. multiple departments that still access and work the same Excel doc).

Anyway, getting back to the issue at hand, there's not too much else you can do if you're stuck with the Office document format and cannot use anything outside of the intranet. Again, I'm not trying to be a downer - this is just a pretty common request within businesses (the past 3 small-to-medium clients of mine have all wanted this during some point in their growth - typically at around the 100-200 employee mark), and if it were as easy as it sounds, everyone would be using that approach instead of paying Microsoft through the nose for Office licenses.

I -will- say that if you are dealing with a few specific documents (instead of "any Office document"), you could probably convert all of the documents to the newer Office format (.docx and .xlsx), and then you could have a web-based app that acts as a "content manager" and simply updates specific, known parts of the documents.

For example, you could have a script that unpacks the file (again, the newer formats are just a ZIP file with a different file extension), finds the correct XML file, updates the desired value, then re-ZIPs the file structure back into the original filename.

It wouldn't be true Excel/Word editing - you would need to know the locations ahead of time, and you wouldn't have that same kind of rich editor, but you could definitely make specific content updates to the document without losing any functionality, and then build a user interface to reflect this.


Thanks for the info