Link to home
Start Free TrialLog in
Avatar of paddycobbett
paddycobbett

asked on

Need help retrieving substrings using REGEX (Coldfusion 8)

I need a function which returns certain substrings of a slightly larger string. Sample string would be:

"[one][two][three]", would return "one", "two", "three".

I cannot just split using "[]" as delimiters how ever since anything could appear imbetween the entries, i.e:

"abc [one]/[two] good morning [three].pdf", and this should still return "one", "two", "three".

Left to my own devises i would be inclined to get the indexes of "[" and "]" alternatively, grabbing subsequent substrings, but think this is both inelegant and likely slow?? Would i be right in thinking that best solution would involve regex? I am not familiar with the regex expression paths, so if this is the correct tool to use can anyone provide me a simple code sample (in Coldfusion) for retrieving the values for the sample output above? The response can be an array, list or anything manageable.

If regex is not the best way then perhaps you can suggest something better. Side note: the sample input is typically no more than 40-50 characters.

Thanks :)
Avatar of kaufmed
kaufmed
Flag of United States of America image

Does the following give you what you need? (Forgive me if the syntax is bad--I'm not a CF programmer) ;)
# <cfset arrTitles = REMatch(
# "\[(\w+\)]",
# string_to_search
# ) />

Open in new window

Line 2 of the previous post is the important part. I *believe* REMatch is what you need to use.
Avatar of paddycobbett
paddycobbett

ASKER

Thanks alot for the response, yes you're quite right that REMatch is the correct coldfusion function :)

However i get an error "Unmatched parenthesis" which i suspect is thrown by the regex engine. Are you sure the expression path is syntatically correct? If you can double check that then i can be sure it's some other cause.
<cfset path="[one]  [two] [three]">
<cfset arrTitles = REMatch("\[(\w+\)]", path) />
 
<cfdump var="#arrTitles#">

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Spot on! Works perfect with all my samples, thanks! :)
Oh shoot, jumped the gun also!! Could you please ammend so that the result values don't include the "[" and "]"? So the values returned are "One" and not "[One]" as specified in original question. Sorry and thanks
That's the part I'm not sure about. Under normal RegEx, the parentheses I included would have captured  just the part between them. I'll do some more research, but I am fearful that you may have to take that array that is returned and then remove the parens when you process them.

I'll post back if I find anything helpful.
It seems as though the CF regex engine isn't all that powerful on its own. To get the power you are searching for, it is suggested to use the Java regex engine. This link talks about it (toward the middle of page).

http://www.bennadel.com/blog/769-Learning-ColdFusion-8-REMatch-For-Regular-Expression-Matching.htm

The patter should be as follows:
REMatch("(?<+\[)\w+(?=\])"

Open in new window

Ok thanks again for coming back to the question, i will remove the brackets from the result which will work for now.
Hello again! :)

The solution breaks when there are spaces in the square brackets, for example:

"[good morning] [hello]" only returns "[hello]"

I'm guessing this is a straight forward fix (for those that know how!), however if not let me know and will investigate this myself.
I think it only fair that i present the space requirement in a new question:

https://www.experts-exchange.com/questions/24354552/I-need-a-REGEX-to-match-all-strings-in-square-brackets.html