How to find sections and Subsections present in Word Document

Posted on 2014-10-14
Last Modified: 2014-10-31
Assume document with the following section and sub sections

1   Example
      1.1      Example
      1.2      Example
            1.2.1      Example
2      Example
      2.1      Example
            2.1.1      Example
•      Level 1
o      Level 2
      Level 3
      Level4
            2.1.2       Example
               a      Example
                     a.1 Example

How to find count of each section have how many subsections Using C# or ASP,net?
Question by:mannevenu26
1 Comment
It is difficult to parse Microsoft Word format, as there have been many releases and versions of it.  What works in one release may not work in the next.

However, if you open the Word document and save it in Rich Text Format, parsing an RTF file is far easier.  RTF files are a human-readable text markup language.  Search for the section separator tags (which one you need to search for depends on how the sections were created, so I can't be specific here), count them, and you're done.

In RTF, the file looks like this:

{\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f16\froman\fcharset238\fprq2 Times New Roman CE;}{\f17\froman\fcharset204\fprq2 Times New Roman Cyr;}
{\f19\froman\fcharset161\fprq2 Times New Roman Greek;}{\f20\froman\fcharset162\fprq2 Times New Roman Tur;}{\f21\froman\fcharset186\fprq2 Times New Roman Baltic;}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;
\red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\widctlpar\adjustright \fs20\cgrid \snext0 Normal;}{\*\cs10 \additive Default Paragraph Font;}}{\*\listtable{\list\listtemplateid67698703\listsimple{\listlevel\levelnfc0\leveljc0\levelfollow0
\levelstartat1\levelspace0\levelindent0{\leveltext\'02\'00.;}{\levelnumbers\'01;}\fi-360\li360\jclisttab\tx360 }{\listname ;}\listid888301111}}{\*\listoverridetable{\listoverride\listid888301111\listoverridecount0\ls1}}{\info
{\title Page \{ PAGE \} of Section \{ SECTION \}}{\author Windows User}{\operator Windows User}{\creatim\yr2014\mo10\dy14\hr21\min24}{\revtim\yr2014\mo10\dy14\hr21\min24}{\version2}{\edmins0}{\nofpages2}{\nofwords16}{\nofchars94}{\*\company  }
{\nofcharsws115}{\vern59}}\widowctrl\ftnbj\aenddoc\formshade\viewkind4\viewscale75\pgbrdrhead\pgbrdrfoot \fet0\sectd \linex0\endnhere\sectdefaultcl {\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl2
\pnucltr\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl3\pndec\pnstart1\pnindent720\pnhang{\pntxta .}}{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang{\pntxta )}}{\*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl6
\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang
{\pntxtb (}{\pntxta )}}\pard\plain \widctlpar\adjustright \fs20\cgrid {Page \{ PAGE \} of Section \{ SECTION \}
\par \sect }\sectd \sbknone\linex0\endnhere\sectdefaultcl \pard\plain \widctlpar\adjustright \fs20\cgrid {
\par Page \{ PAGE \} of Section \{ SECTION \}
\par \sect }\sectd \linex0\endnhere\sectdefaultcl \pard\plain \widctlpar\adjustright \fs20\cgrid {
\par Page \{ PAGE \} of Section \{ SECTION \}
\par }}

In this example, search for occurrence of "\par \sect" to count sections.

