Link to home
Start Free TrialLog in
Avatar of Michael Robinson
Michael RobinsonFlag for United States of America

asked on

How to write a regex to do this...

How would I write a regex  that would search through a paragraph of content for any "." period characters except those used to mark the end of a sentence.

Why?

I have a paragraph that I want to parse into individual sentences, by using the period at the end of each sentence as the delimiter.

But there are some extra periods in the content.  They are used after abbreviations.  

As a side note, these extra periods always show up in between brackets.

For example - my original paragraph:

Cleans brushes and floor, using solvent or soap and water. May transfer items to and from work area, using hoist or handtruck. May be designated according to article painted as Last-Code Striper (wood prod., nec); Painter, Drum (any industry); Painter, Mannequin (fabrication, nec); Pipe Coater (steel & rel.); or according to coating applied as Japanner (any industry); Lacquerer (machine shop); Car Varnisher (railroad equip.).

I want to remove any periods that are not end of sentence markers so I get this:

Cleans brushes and floor, using solvent or soap and water. May transfer items to and from work area, using hoist or handtruck. May be designated according to article painted as Last-Code Striper (wood prod, nec); Painter, Drum (any industry); Painter, Mannequin (fabrication, nec); Pipe Coater (steel & rel); or according to coating applied as Japanner (any industry); Lacquerer (machine shop); Car Varnisher (railroad equip).

Any ideas.

I'm doing this in ColdFusion if that makes any difference.
Avatar of kaufmed
kaufmed
Flag of United States of America image

I don't think regex is going to be a good tool for this as this is really more of a parsing question, but you might try:


ReReplace(input, "(\([^.]*)\.([^)]*\))", "\1")

Open in new window


...but I think it will require execution within a loop unless you are guaranteed never to encounter more than one period within any given set of brackets.
ASKER CERTIFIED SOLUTION
Avatar of Michael Robinson
Michael Robinson
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
It's super-awesome to actually get feedback from the author when something isn't quite doing the job. It makes it that much easier to tweak suggestions.

It's also quite amusing that you say "no complete solutions," yet you also say "found a regex within a loop." Maybe it's because I just woke up and I still have the eye crustees, but did I not say, " I think it will require execution within a loop"?
Avatar of Michael Robinson

ASKER

No complete solutions were offered by others