Formatting and layout in word or pdf

Hi All,

We have an requirement to adjust formatting and layout (like., fonts, line breaks in table, columns and margins) of word or PDF files using any script [Perl, Python or Ruby].

Can you please suggest or provide any references or sample codes for this or suggest which scripting languages will be good for such requirements.

Shailesh ShindeLocalization Engineering & AutomationAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Walter RitzelSenior Software EngineerCommented:
The best starting point I know is this:
It uses python and give pointers to python libraries that can handle PDF and word documents.
Colleen Kayter4D AssetsCommented:
Why are you scripting vs. applying a theme that sets all that on the fly? Just curious. With security restrictions becoming more prevalent, I would think that scripting might not work well everywhere.
Shailesh ShindeLocalization Engineering & AutomationAuthor Commented:
Hi Colleen Kayter,

The reason for scripting is to include this script in existing automated processing workflow.
This script will read the config file which will contains
and manipulate the input source word or pdf files.

OWASP Proactive Controls

Learn the most important control and control categories that every architect and developer should include in their projects.

Shailesh ShindeLocalization Engineering & AutomationAuthor Commented:
Hi All,

Can formatting be applied to the text level and the page level to a specific paragraph, a set of paragraphs, a range of pages. Is this possible using perl or python scripts?

Colleen Kayter4D AssetsCommented:
Shailesh, different type of formatting are applied/stored to different elements depending on the type of formatting.

Fonts/font attributes (everything you see in the Fonts group on the Home tab) are applied at text level, with a default being stored at paragraph level.

Bullets, justification, line spacing, keep together, keep with next, tabs, etc. (everything in the Paragraphs tab) are stored at paragraph level.

Nothing is stored at the page level, but SECTIONS... Sections can force page breaks or they can be continuous, allowing you to format different parts of the same page in different ways. Margins, orientation, columns (everything in the Page Setup group of the Layout tab) PLUS headers and footers, page numbering, and backgrounds are stored at SECTION level.

Off the top of my head, I think the only thing that are stored at document level are the theme, available styles, tables of contents, and bookmarks/cross references.

If you want to see all the parts, make a copy of one of your documents, replace the .docx extension with .zip and view the contents of the zip file. the one named document.xml contains your text.

As for programming with Python or Perl, I'll leave that answer to one of the coder experts.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Shailesh ShindeLocalization Engineering & AutomationAuthor Commented:

Waiting for coder experts comments.

Ray PaseurCommented:
This looks at things from a PHP perspective, so it may or may not fit your environment, but since PHP is free and open-source it could be worth considering.

PHP has two well-supported libraries for building PDF documents: FPDF and TCPDF.  Both are self-contained object-oriented libraries.  The documentation is pretty good. and they have online examples.  I have never used them to import and adjust pre-existing PDF files, but some others in the E-E forums claim this can be done.  Most of my work has been to take external inputs (forms, databases, API data) and build PDF documents.  For this kind of work, either of the extensions will work well, giving you access to a variety of fonts, colors, layouts, and image placements.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.