Link to home
Start Free TrialLog in
Avatar of Rakesh Shukla
Rakesh Shukla

asked on

create regular expression in C using regex.h which eliminates last dot character

Can someone please help to create regular pattern in c using regex.h to match sub string without matching terminating . character.

For example, from below string pattern should match “hello world.” and “hello world” (there is no . char at the end of second match)

|hello world.|hello world.

Thanks for your input!
Avatar of Fabrice Lambert
Fabrice Lambert
Flag of France image

With regex, $ is an assertion at the end of the string.
Just omit it.

hello world ==> match any string holding "hello world" anywhere in the string.
^hello world ==> match any string starting with "hello world".
hello world$ ==> match any string finishing with "hello world".
^hello world$ ==> match "hello world" and nothing else.
you won't need regex for this:

if (strstr(szTextLine, "hello world") == szTextLine && strlen("hello world") <= strlen(szTextLine))
{
     // we have a match
}

Open in new window


^hello world\.?$

Open in new window

==> match "hello world" or "hello world."

generally

^                 string begin
any text    must match
\.                some characters must be escaped
?                 the before character occurs one or zero
$                 string ends

Sara
Avatar of Rakesh Shukla
Rakesh Shukla

ASKER

Look ahead  is not supported in regex.h, so I tried below expression but it is matching  

([\|][^\|]*hello[^\|]*[^\.|])

First Match -  'hello world'
Second Match -  'hello world'

Expected is :
First Match -  'hello world.'
Second Match -  'hello world'
Alright, so you want anything that isn't a pipe ? (your question isn't clear about this).
Try with the following:
[^|]+

(Assuming "hello world" is just a data sample and not the actual data).
Hi Fabrice Lambert,

Thank you for your reply ! please see the below mentioned requirement. Hope this would help you to  understand my question:

I have a string having multiple fields separated by pipes. I need to create a pattern to delete any sub field by getting any sub string of that field. But if there is a dot character at the end of the last matching field then it should not be deleted.
 
For example if my string is
 
foo|hello world..|hello world.
 
If I get sub string “hello” then pattern should match “|hello world..” and “|hello world” so that after deleting by this pattern result string will be “foo.” (last dot character is not deleted)
I have a string having multiple fields separated by pipes. I need to create a pattern to delete any sub field by getting any sub string of that field. But if there is a dot character at the end of the last matching field then it should not be deleted.

- is 'multiple field' the same as 'any sub string'?
- by 'delete' do you mean you will delete all between two pipe characters with both left and right pipe character included?
- 'last matching field' of your sample is 'hello world.', right?
- but you will not delete the final dot, right?

if all answers are yes, what happens to the following entries?

(1)  foo | hello world. | not hello world but hello all | xxxx
(2)  foo | hello world. | foo | again
(3)  foo | hello xxxx | foo.  

is it

(1) foo hello world. xxxx

because 'last matching field with dot' was not deleted.

(2) foo | hello world. | foo | again

because there was only one matching field with a dot

(3) foo foo

because hello xxxx matches and was deleted?

Sara
To make it simple, in string  ‘foo|hello world.|foo again|hello world again.’  there are four pipe separated fields. I need to match complete field including first pipe (pipe at end not included)
 
If I get ‘foo’ then I need to create pattern which will match field which has ‘foo’, so
‘foo’ should match ‘|foo again’
‘hello’ should match ‘|hello world.’ and ‘|hello world again’
‘again’ should match ‘|foo again’ and ‘|hello world again’

My only problem is, if there is dot at the end of whole string then it is also getting included in the match, but it should not be matching.
Apart from that other dots any where in the string should match.
Mind you, I'm not used to manipulate regex in C, but you can proceed in two steps:

- First: Extract whatever match your token, thus only eliminating pipes (sample with the "world" token):
[^|]*world[^|]*
- Second: check your that you last extracted substring doesn't hold a dot as the last character (no need regex for that).
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.