Link to home
Start Free TrialLog in
Avatar of David Glover
David GloverFlag for United Kingdom of Great Britain and Northern Ireland

asked on

How to alter this REGEX pattern to match whole words rather than parts?

So in this example, the word man is found in amongst HTML markup.
(MAN)(?=[^>]*<)
It is matched ok but it also matches for occurrences in the HTML of MANAGEMENT or GERMANY etc.  It isn't as easy as saying I want to see a space on each side either since punctuation and non alpha numeric characters would be acceptable.
How can I enhance the pattern so that alpha numeric characters preceding or following prevent matching?
Thanks!
Avatar of Duy Pham
Duy Pham
Flag of Viet Nam image

Try this pattern:  ([^a-z\b0-9]?)(MAN)([^a-z\b0-9]?).
Avatar of David Glover

ASKER

I tried it with man but sadly it did not work.

([^a-z\b0-9]?)(man)([^a-z\b0-9]?)
but unfortunately this matched man in 'germany' and other words.
My code adds ( ) and (?=[^>]*<) to your expression to filter HTML to become
(([^a-z\b0-9]?)(man)([^a-z\b0-9]?))(?=[^>]*<)
Would this have broken it?
Sorry I forgot about the case sensitive when copy paste here. This is the best pattern that I can think of (and already checked):  (([^a-zA-Z]+)|(\b))(MAN)(([^>a-zA-Z])|(\b))

Hope it could help.
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks Ozo that worked fine!