[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

XSL to clean up Word's xhtml ?

Posted on 2004-11-26
7
Medium Priority
?
452 Views
Last Modified: 2012-06-21
Does anyone know of an XSL styesheet that will take a Word doc (saved as xhtml) and clean it up.

By this I mean, return the Paragraphs only. And maybe bullet points also.

Drop everytinng else (indentation, images, line art, pagenumbeing etc etc). Also tidy up "empty" paragraphs.

PS I have used xml and xsl in the past (for web applications) but never with MS Office. So any help is appreciated :)
0
Comment
Question by:eamonroche
  • 4
  • 3
7 Comments
 
LVL 6

Accepted Solution

by:
PeterCiuffetti earned 375 total points
ID: 12680951
Hi,

When I have had to convert Word documents to (usable) XML, I first started with a utility called UpCast

http://www.infinity-loop.de/products/upcast/

This converts directly from the .doc format to an XML format.  You get to choose from among a number of DTDs specifying the output format, one of which is relatively simplified with all styles being set up for control from an external CSS file.

I then found that writing XSL for this simplified format to be much easier than wrtiting XSL for the XML output of Word.

Pete

0
 

Author Comment

by:eamonroche
ID: 12681330
I understand your comment but.....

There will be about 200 users who will be writing the Word doc. I do not want to install Upcast on each of their PCs. I will need them to so "Save as Web page" then find an xsl file to tidy up this xhtml file. Thanks.

0
 
LVL 6

Expert Comment

by:PeterCiuffetti
ID: 12682142
Can you make this a web server function?  So for example, after one of the users saves their Word document, they could upload it, via a web form, to a server you provide to convert documents to XML.   If you are working in Java on your server, you could get the JAR version of UpCast.  Then the server could use its API to do the conversion after the save.

0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 

Author Comment

by:eamonroche
ID: 12682309
i see what you are getting at. If there was a "com" component version of Upcast that I could interface from an asp script then this would work.
0
 

Author Comment

by:eamonroche
ID: 12682319
....but really I dont mind if the xhtml produced by word is messy. Surely there must be a smart xsl file that will clean this up.
0
 
LVL 6

Expert Comment

by:PeterCiuffetti
ID: 12683895
Hi again,

I couldn't find any.  I  took a look at the XSD schemas offered through Microsoft here (http://www.microsoft.com/downloads/details.aspx?FamilyID=fe118952-3547-420a-a412-00a2662442d9&DisplayLang=en)

Not surprisingly, the markup language is considerably complex.  Given all the styles, tables, footers, headers, footnotes, etc, 'cleaning it up' would require quite a bit of XSL.  If you just wanted the paragraphs, I suppose it wouldn't be too bad.  But then you'd have to work around the objects that can break up a paragrah just to reassemble them.  If you enumerate a short list of elements you want to capture, i can try to put something small together, and then maybe you could build on that.

Pete
0
 

Author Comment

by:eamonroche
ID: 12694571
Pete
Thanks for your help. The job wont be starting for a few months. I was just doing some preparation. I will have a go myself first and see how I get on.
Eamon
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examp…
Loops Section Overview
Is your OST file inaccessible, Need to transfer OST file from one computer to another? Want to convert OST file to PST? If the answer to any of the above question is yes, then look no further. With the help of Stellar OST to PST Converter, you can e…
Suggested Courses

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question