UDF -- Need help with tweaking VBA code

In a previous post (http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28254914.html), expert 'Rgonzo1971' helped me turn a formula into a UDF.

It was my fault not listing all possible "cases" that needed to be extracted from a string (in Excel).   Thus, I kindly request additional assistance from any VBA expert on how to tweak the existing function.

Please see attached XLS which provides specific details as to what needs to be scanned, what's working, and what's missing.

Thanks you in advance,
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

[ fanpages ]IT Services ConsultantCommented:

Background Information:
- column A contains citations.  I need to extract the publication title.
- Expert 'Rgonzo1971' helped me to take a formula and turn it into a UDF.  See column B.   Unfortunately, in my original question, I only mentioned to identify titles between double/single quotes.
- After 'Rgonzo1971' provided a wonderfully working UDF, I realized that I have other cases such as follows:
  a. A title may not be surronded by double/single quotes.   Instead, it may follow directly the ")" parenthesis [see cell A6].  
  b. Alternatively, it may follow a "." after the ")" parenthesis [see cell A2].
  c. Or, lastly, it may follow a comma after the ")" [see cell A9].
 d. Also, the title may not always end with a period.  Sometimes it is followed by a comma [see cell A8].

Interim Work:
- I have tried to expand on 'Rgonzo1971' suggested UDF.  Please see columns B and C.  
- While additional titles are found, the results are not entirely "clean".

- Expand the existing UDF so that all (or at least the bulk of) titels can be extracted via the function: =ExtractTitle()

Just to pre-empt a query that may arise when an(other) "Expert" works on this.

Re: Point d)

Will there never be any commas within the titles?  Will some titles contain more than one sentence & will the first end with a period (and the whole title will not be enclosed in quotes)?


ExpExchHelpAuthor Commented:

Thanks for the prompt response.   Here's my feedback.

1. Yes, some titles may contain commas.   It's not the bulk of cases though.    Anyhow, the answer is yes.

2. Although the exception, some titles contain 2 sentences.  Cell A8 (i.e., "Managing process flows. Principles of Ops Mgmt") provides an example.   Other scenarios include a colon (see example in cell A5 "Re-creating the corporation: a design of organizations for the 21st century").

3. In respect to ending w/ a "." and being enclosed with quotes.   I've seen a variety of cases.   I think what's being shown in those 14 examples provides covers the bulk (i.e., ~85-90%).

Pls let me know if you require additional information.

Thanks in advance,
[ fanpages ]IT Services ConsultantCommented:
Hi again,

Thank you for your clarification.

Due to the fact that there is no definitive starting point, &/or ending point, &/or way of determining with complete certainly what those two extremes of any given title will be, I can foresee that any routine provided will never be 100% accurate.

Some titles will be incorrectly derived; either incomplete (with aspects of the full title missing), or will include additional text (other details contained within the reference text) following the natural end of any given title.

When reading the references, a human can decipher the end of a sentence, or a clause within text denoted by punctuation, & then determine if what follows will be part of the title, or not.  Without conscious thought, a human will read the whole line, decide the starting point, & the possible finish, but then make corrections as the text is read & processed.

A programmatically approach whilst not impossible to achieve, will be difficult to replicate what a human does to achieve the same goal.

As I said, not impossible, but there will be many instances of trial & error where you may find one particular reference that is not "parsed" correctly & the routine will need adjusting to cater for another case.

Most of this back & forth between you & the "Expert" can be reduced (if not eliminated) if you can produce a full set of test data (between you) that includes every possible permutation/combination of the individual rules you have mentioned.

Without this, I would envisage, that there is never going to be one routine that is a "catch all" for absolutely every reference from which you need a title to be extracted.

Yes, this could be possible given enough time (monkeys, & typewriters), but I just thought I would share my concerns that the whole process is going to very time-consuming.

A majority of the work is likely to be analysis of the various combinations of references & how the text is structured within them, rather than actual development work.

Anything you can do to aid that process I am sure will be a benefit to any "Expert" willing to take this task on.

Stripping text from a string of characters with a clear set of rules is fairly trivial.  Performing the same task when there are so many variable combinations is going to be less than an attractive pastime for anybody with the programming skills to help you.

I, for example, could look at your requirements, but the actual time involved to reach a mutually satisfactory outcome is such an open-ended duration that it is difficult to estimate what the level of commitment will be.


Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

ExpExchHelpAuthor Commented:

Thanks... I appreciate the feedback.

Btw, I was never expecting the 100% solutions.   More than likely the 80 solution.

Either way, do you have any thoughts on integrating the formulas in either column B and/or C into the existing UDF.  

That would help out a lot.

Rory ArchibaldCommented:
This is by no means perfect, for all the reasons mentioned:

Function ExtractTitle3(strCheckWord) As String
   Static RegExp            As Object
   Dim matches
   Dim strPattern As String
   If RegExp Is Nothing Then Set RegExp = CreateObject("vbscript.regexp")
   strPattern = "(?:\)[., ]*['""]?)([^.,'""]+)(?:['""]?[.,])"
   With RegExp
      .Global = True
      .Pattern = strPattern
      If .test(strCheckWord) Then
         Set matches = .Execute(strCheckWord)
         ExtractTitle3 = matches(0).submatches(0)
      End If
   End With
End Function

Open in new window


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
ExpExchHelpAuthor Commented:
Wow... it's nearly perfect!   It caught 12 out of 14 examples.   That's about 85%.  

Thanks so much, rorya!   :)

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Excel

From novice to tech pro — start learning today.