Improve company productivity with a Business Account.Sign Up

x
?
Solved

Regex in Word VBA

Posted on 2014-01-10
13
Medium Priority
?
1,570 Views
Last Modified: 2014-01-20
Hello,
     This is related to this question: http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Word/Q_28334735.html 

What I am trying to do is find text that matches this regex:
(ARTICLE)(\s+)(\d+)(\s+)(–)(\s+)(.*)(\n)

Open in new window

(The string ARTICLE followed by a space, followed by a digit, followed by a space, followed by an em dash, followed by a space, followed by any number of words, then a newline. An example is "ARTICLE 5 – DESCRIPTION...")

And this regex:
(\d+)(\.)(\d+)(?:\.)(?:\d+)(\s+)(.*)(\n)

Open in new window

(An integer, followed by a period, followed by another integer with an optional period and integer, followed by any text, and a newline. An example is "2.1.3 Subsection 2...")

If text matches, the macro should apply a style that includes heading level so I can generate a Table of Contents and populate the navigation pane. If my regex is incorrect, please correct me.

Thank you!
0
Comment
Question by:indigo6
  • 9
  • 3
13 Comments
 
LVL 15

Expert Comment

by:DrTribos
ID: 39772860
To give you an idea of the syntax, this will find up to Description...
(ARTICLE )[0-9]( )^=( )<[0-9A-z ]*>

Need to turn Wild Cards on in the search
0
 
LVL 15

Accepted Solution

by:
DrTribos earned 1500 total points
ID: 39772996
And this:

(ARTICLE )[0-9]( )^+( )<[0-9A-z ]*[^13]

will find the first paragraph (including marker) starting with "Article #"

note:
em dash is ^+
en dash is ^=
0
 
LVL 15

Assisted Solution

by:DrTribos
DrTribos earned 1500 total points
ID: 39773010
And this should do the second part...
[0-9]{1,}[.][0-9]{1,}[*][! ]( )<[0-9A-z ]*[^13]
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 39773064
You mention VBA, so I assume that you are implementing the VBScript regex objects.  In that case, your pattern string for the 2nd example should be:

(\d+)(\.)(\d+)(\.)?(\d+)?(\s+)(.*)(\n)

Not that DrTribos's recommendation is for using the search function built into Word, which is accessible via VBA.
0
 
LVL 15

Expert Comment

by:DrTribos
ID: 39773067
Ahhhh... I was wondering.  I've never used RegEx.
0
 

Author Comment

by:indigo6
ID: 39777322
DrTribos, actually that's fine, I was wondering about what the wildcard syntax would be! I tweaked the second one to be (ARTICLE )[0-9]( )^=( )<[0-9A-z ]*[^13] since it was an En dash, and I had to run a second round with (ARTICLE )[0-9][0-9]( )^=( )<[0-9A-z ]*[^13] to detect article numbers 10 and larger. But that worked great!

     The second one has me perplexed though. It detected strings like the following:
2.2 Access
2.7 Employee Time Off
21.11.3 Class Size (Unit 3)

But not:

5.2 No Lockout
5.4 Savings Clause

From what I can tell, they are formatted the same way. I'm not sure why it does not find these.

matthewspatrick, thanks for the VBA regex! How would I write a macro that changes the style of strings that match that regex?

Thanks!
0
 
LVL 15

Expert Comment

by:DrTribos
ID: 39778446
Sorry I thought I posted this yesterday - apparently it failed.  Here is some code to make the changes using the wild card...
Sub UpdateSections()

Dim para As Paragraph
Dim str As String

str = "(ARTICLE )[0-9]( )^=( )<[0-9A-z ]*[^13]"

ActiveDocument.Range.Select
    With Selection.Find
        .Text = str
        .Replacement.Text = ""
        .Forward = True
        .Font.Name = "Arial Narrow"
        .Font.Size = 24
        .MatchWildcards = True
        .Forward = True
        .Replacement.Style = "Heading 1"
        .Execute Replace:=wdReplaceAll
    End With

End Sub

Open in new window


You'll have to change the font names and sizes to suit... :-)
0
 
LVL 15

Expert Comment

by:DrTribos
ID: 39778450
Gah! My other comment is also missing - regarding the strange results... perhaps there is a double space or something confusing the issue.  Can you copy a few of the offending lines and paste them into a sample document?

Also, not sure if you are in the habit of using show - hide... the backward P thing on the home tab of the ribbon (paragraph marker in the paragraph section)... this will show spaces and might make diagnosis easier :-)
0
 
LVL 15

Expert Comment

by:DrTribos
ID: 39778456
Another possible reason that the text is not detected is because there is a soft return instead of a hard return... soft return looks like the symbol on the keyboard for the enter button. Hard return is the 'backward P' paragraph marker....  The wild card search is explicitly looking for a hard return (^13), not sure how other characters would impact...
0
 

Author Comment

by:indigo6
ID: 39783981
They seem to all be hard returns. Any other reasons why this could be?
0
 
LVL 15

Expert Comment

by:DrTribos
ID: 39784155
Pls copy the offending paragraphs to a sample document and upload.
Sorry for the brevity - on phone
0
 

Author Closing Comment

by:indigo6
ID: 39795552
I would have to strip too much of the info out, plus the requirements changed, so I'll close this question. It got me moving in the right direction though. Thanks!
0
 
LVL 15

Expert Comment

by:DrTribos
ID: 39796168
No worries - glad to help :-)
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Introduction This tutorial provides instructions on how to properly format your Word document using the inbuilt tools provided. The benefits of using these tools means your documents are more accessible and easily portable to other applications an…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Office 365 is currently available in five editions. Three of them are for business use: Office 365 Business Essentials, Office 365 Business, and Office 365 Business Premium. Two of them are for home/personal use: Office 365 Home and Office 365 Perso…

587 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question