Michael Robinson
asked on
How to write a regex to do this...
How would I write a regex that would search through a paragraph of content for any "." period characters except those used to mark the end of a sentence.
Why?
I have a paragraph that I want to parse into individual sentences, by using the period at the end of each sentence as the delimiter.
But there are some extra periods in the content. They are used after abbreviations.
As a side note, these extra periods always show up in between brackets.
For example - my original paragraph:
Cleans brushes and floor, using solvent or soap and water. May transfer items to and from work area, using hoist or handtruck. May be designated according to article painted as Last-Code Striper (wood prod., nec); Painter, Drum (any industry); Painter, Mannequin (fabrication, nec); Pipe Coater (steel & rel.); or according to coating applied as Japanner (any industry); Lacquerer (machine shop); Car Varnisher (railroad equip.).
I want to remove any periods that are not end of sentence markers so I get this:
Cleans brushes and floor, using solvent or soap and water. May transfer items to and from work area, using hoist or handtruck. May be designated according to article painted as Last-Code Striper (wood prod, nec); Painter, Drum (any industry); Painter, Mannequin (fabrication, nec); Pipe Coater (steel & rel); or according to coating applied as Japanner (any industry); Lacquerer (machine shop); Car Varnisher (railroad equip).
Any ideas.
I'm doing this in ColdFusion if that makes any difference.
Why?
I have a paragraph that I want to parse into individual sentences, by using the period at the end of each sentence as the delimiter.
But there are some extra periods in the content. They are used after abbreviations.
As a side note, these extra periods always show up in between brackets.
For example - my original paragraph:
Cleans brushes and floor, using solvent or soap and water. May transfer items to and from work area, using hoist or handtruck. May be designated according to article painted as Last-Code Striper (wood prod., nec); Painter, Drum (any industry); Painter, Mannequin (fabrication, nec); Pipe Coater (steel & rel.); or according to coating applied as Japanner (any industry); Lacquerer (machine shop); Car Varnisher (railroad equip.).
I want to remove any periods that are not end of sentence markers so I get this:
Cleans brushes and floor, using solvent or soap and water. May transfer items to and from work area, using hoist or handtruck. May be designated according to article painted as Last-Code Striper (wood prod, nec); Painter, Drum (any industry); Painter, Mannequin (fabrication, nec); Pipe Coater (steel & rel); or according to coating applied as Japanner (any industry); Lacquerer (machine shop); Car Varnisher (railroad equip).
Any ideas.
I'm doing this in ColdFusion if that makes any difference.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
It's super-awesome to actually get feedback from the author when something isn't quite doing the job. It makes it that much easier to tweak suggestions.
It's also quite amusing that you say "no complete solutions," yet you also say "found a regex within a loop." Maybe it's because I just woke up and I still have the eye crustees, but did I not say, " I think it will require execution within a loop"?
It's also quite amusing that you say "no complete solutions," yet you also say "found a regex within a loop." Maybe it's because I just woke up and I still have the eye crustees, but did I not say, " I think it will require execution within a loop"?
ASKER
No complete solutions were offered by others
Open in new window
...but I think it will require execution within a loop unless you are guaranteed never to encounter more than one period within any given set of brackets.