RegEx N'th Occurrence

I have a file that has values separated by spaces.  I only want to grab the third space on each line. How would I do that?
lconnellAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

käµfm³d 👽Commented:
What programming language or text editor are you using?

You might try:

^ [^ ]+ [^ ]+( )

Open in new window


Also, I took your question quite literally (as a regex would!), so I'm sure the above isn't exactly what you are looking for. Can you clarify what you are after?
0
lconnellAuthor Commented:
Sublime Text Editor, also would be nice to know for VIM.

That did not work when using the RegEx search in Sublime.
0
käµfm³d 👽Commented:
I don't know if you saw the edit in my comment, but can you clarify what you are after? It seems weird that you would want the third space. I suspect what you meant was what follows the third space.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

lconnellAuthor Commented:
So I want to edit a file using multi-selection. I have 100 lines of the following text.

data1 data2 data3 data4 data5
...
...
...

I want to use Sublime or any editor to find the 3rd space so I can edit every line at once at that space. So this way I can modify data4 on every line at one time to say "test_data4". Data4 can be any value that's why I want to match at the third space.
0
käµfm³d 👽Commented:
OK, I see where I went wrong. This should be correct now:

^[^ ]+ [^ ]+ [^ ]+ 

Open in new window


This pattern assumes that a line never starts with a space.

Screenshot
0
aikimarkCommented:
Here's an alternative pattern
\w+ \w+ \w+ (\w+)

Open in new window


You can then use the regex Replace method against the \1 capture group
0
käµfm³d 👽Commented:
@aikimark

There's no perceived benefit to using the "word character" class over "not a space". In the worst case the pattern won't match if there are any characters other than alphabetic, numeric, or underscores.
0
aikimarkCommented:
@kaufmed

I realize that.  Normally, I would use the not-a-space pattern.  But you'd already used it and I find that \w+ is simpler to type than [^ ]+
Three characters versus five characters.

What I hope I've added is the grouping of the fourth 'word' that will allow the Replace method to be used.
0
aikimarkCommented:
It looks like my pattern needed tweaking.  It should be: (\w+ \w+ \w+ )(\w+)( .*?\r\n)
Example:
    Dim strData As String
    Dim oRE As Object
    Dim oMatches As Object, oM As Object
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True
    oRE.Pattern = "(\w+ \w+ \w+ )(\w+)( .*?\r\n)"
    strData = "data1 data2 data3 data4 data5" & vbCrLf
    strData = strData & "data21 data22 data23 data24 data25" & vbCrLf
    strData = strData & "data31 data32 data33 data34 data35" & vbCrLf
    If oRE.test(strData) Then
        Debug.Print oRE.Replace(strData, "$1test_$2$3")
    End If

Open in new window

Contents of Immediate window after running the above code:
data1 data2 data3 test_data4 data5
data21 data22 data23 test_data24 data25
data31 data32 data33 test_data34 data35

Open in new window

0
aikimarkCommented:
Yes.  It is possible to use the not-a-space pattern: ([^ ]+ [^ ]+ [^ ]+ )([^ ]+)( .*?\r\n)
0
SurranoSystem EngineerCommented:
vim pattern:

:%s/^\(\([^ ]* \)\{3\}\)\([^ ]*\)/\1test_\3/

Open in new window

0
lconnellAuthor Commented:
Thanks for the assistance everyone. So there is still a problem here. I only want to select the actual white space in the third column, not the text up to the 3rd white space.
0
aikimarkCommented:
@lconnell

Please test the code I posted
0
lconnellAuthor Commented:
aikimark, it does not work.  It actually doesn't match anything.
0
aikimarkCommented:
It actually doesn't match anything.
Does your actual data reflect the sample data you posted?

Have you changed my code to read your data or are you expecting my sample code to change your file data?  The code shows how to use regular expression to do a replace.  I used string literals that was meant to simulate the data you used in your example.
0
käµfm³d 👽Commented:
The problem you face is that ST uses the Boost regex engine, which does not support arbitrary-length lookbehinds, which is what you would need in order to effectively skip over the first two spaces without actually including them in the match. The only thing you can do at this point is to do a find/replace as aikimark described above, except that you would capture the whole string, not just the last non-space:

e.g.

Find
(^[^ ]+ [^ ]+ [^ ]+ )

Open in new window


Replace
$1test_

Open in new window


 Screenshot
0
lconnellAuthor Commented:
Perfect, that works fine using the replace with what is already highlighted. Can you explain the actual regex?
0
aikimarkCommented:
in Notepad++, the following find/replace operation gives the same results:
Find what: ([^ ]+ [^ ]+ [^ ]+ )([^ ]+)( .*?\r\n)
Replace with: $1Test_$2$3

Results:

data1 data2 data3 Test_data4 data 5
data1 data2 data3 Test_data4 data 5
data1 data2 data3 Test_data4 data 5
data1 data2 data3 Test_data4 data 5
data1 data2 data3 Test_data4 data 5

Open in new window

0
käµfm³d 👽Commented:
Find
(       - Start of capture group (first, and only, group)
^       - Start of line
[^ ]+   - One or more ( + ) of any character not a space ( [^ ] ) -- The ^ means "not"
        - Literal space
[^ ]+   - One or more ( + ) of any character not a space ( [^ ] ) -- The ^ means "not"
        - Literal space
[^ ]+   - One or more ( + ) of any character not a space ( [^ ] ) -- The ^ means "not"
        - Literal space
)       - End of capture group

Open in new window


Replace
$1      - Whatever was captured in capture group 1
test_   - Literal text

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
lconnellAuthor Commented:
Great explanation and examples
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.