Solved

RegEx N'th Occurrence

Posted on 2014-07-29
20
249 Views
Last Modified: 2014-07-30
I have a file that has values separated by spaces.  I only want to grab the third space on each line. How would I do that?
0
Comment
Question by:lconnell
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
  • 6
  • +1
20 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 40227905
What programming language or text editor are you using?

You might try:

^ [^ ]+ [^ ]+( )

Open in new window


Also, I took your question quite literally (as a regex would!), so I'm sure the above isn't exactly what you are looking for. Can you clarify what you are after?
0
 

Author Comment

by:lconnell
ID: 40227906
Sublime Text Editor, also would be nice to know for VIM.

That did not work when using the RegEx search in Sublime.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 40227913
I don't know if you saw the edit in my comment, but can you clarify what you are after? It seems weird that you would want the third space. I suspect what you meant was what follows the third space.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:lconnell
ID: 40227941
So I want to edit a file using multi-selection. I have 100 lines of the following text.

data1 data2 data3 data4 data5
...
...
...

I want to use Sublime or any editor to find the 3rd space so I can edit every line at once at that space. So this way I can modify data4 on every line at one time to say "test_data4". Data4 can be any value that's why I want to match at the third space.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 40227962
OK, I see where I went wrong. This should be correct now:

^[^ ]+ [^ ]+ [^ ]+ 

Open in new window


This pattern assumes that a line never starts with a space.

Screenshot
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40228375
Here's an alternative pattern
\w+ \w+ \w+ (\w+)

Open in new window


You can then use the regex Replace method against the \1 capture group
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 40228402
@aikimark

There's no perceived benefit to using the "word character" class over "not a space". In the worst case the pattern won't match if there are any characters other than alphabetic, numeric, or underscores.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40228408
@kaufmed

I realize that.  Normally, I would use the not-a-space pattern.  But you'd already used it and I find that \w+ is simpler to type than [^ ]+
Three characters versus five characters.

What I hope I've added is the grouping of the fourth 'word' that will allow the Replace method to be used.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40228412
It looks like my pattern needed tweaking.  It should be: (\w+ \w+ \w+ )(\w+)( .*?\r\n)
Example:
    Dim strData As String
    Dim oRE As Object
    Dim oMatches As Object, oM As Object
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True
    oRE.Pattern = "(\w+ \w+ \w+ )(\w+)( .*?\r\n)"
    strData = "data1 data2 data3 data4 data5" & vbCrLf
    strData = strData & "data21 data22 data23 data24 data25" & vbCrLf
    strData = strData & "data31 data32 data33 data34 data35" & vbCrLf
    If oRE.test(strData) Then
        Debug.Print oRE.Replace(strData, "$1test_$2$3")
    End If

Open in new window

Contents of Immediate window after running the above code:
data1 data2 data3 test_data4 data5
data21 data22 data23 test_data24 data25
data31 data32 data33 test_data34 data35

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 40228413
Yes.  It is possible to use the not-a-space pattern: ([^ ]+ [^ ]+ [^ ]+ )([^ ]+)( .*?\r\n)
0
 
LVL 8

Expert Comment

by:Surrano
ID: 40228613
vim pattern:

:%s/^\(\([^ ]* \)\{3\}\)\([^ ]*\)/\1test_\3/

Open in new window

0
 

Author Comment

by:lconnell
ID: 40229189
Thanks for the assistance everyone. So there is still a problem here. I only want to select the actual white space in the third column, not the text up to the 3rd white space.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40229242
@lconnell

Please test the code I posted
0
 

Author Comment

by:lconnell
ID: 40229283
aikimark, it does not work.  It actually doesn't match anything.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40229397
It actually doesn't match anything.
Does your actual data reflect the sample data you posted?

Have you changed my code to read your data or are you expecting my sample code to change your file data?  The code shows how to use regular expression to do a replace.  I used string literals that was meant to simulate the data you used in your example.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 40229817
The problem you face is that ST uses the Boost regex engine, which does not support arbitrary-length lookbehinds, which is what you would need in order to effectively skip over the first two spaces without actually including them in the match. The only thing you can do at this point is to do a find/replace as aikimark described above, except that you would capture the whole string, not just the last non-space:

e.g.

Find
(^[^ ]+ [^ ]+ [^ ]+ )

Open in new window


Replace
$1test_

Open in new window


 Screenshot
0
 

Author Comment

by:lconnell
ID: 40229909
Perfect, that works fine using the replace with what is already highlighted. Can you explain the actual regex?
0
 
LVL 45

Assisted Solution

by:aikimark
aikimark earned 150 total points
ID: 40229952
in Notepad++, the following find/replace operation gives the same results:
Find what: ([^ ]+ [^ ]+ [^ ]+ )([^ ]+)( .*?\r\n)
Replace with: $1Test_$2$3

Results:

data1 data2 data3 Test_data4 data 5
data1 data2 data3 Test_data4 data 5
data1 data2 data3 Test_data4 data 5
data1 data2 data3 Test_data4 data 5
data1 data2 data3 Test_data4 data 5

Open in new window

0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 350 total points
ID: 40230017
Find
(       - Start of capture group (first, and only, group)
^       - Start of line
[^ ]+   - One or more ( + ) of any character not a space ( [^ ] ) -- The ^ means "not"
        - Literal space
[^ ]+   - One or more ( + ) of any character not a space ( [^ ] ) -- The ^ means "not"
        - Literal space
[^ ]+   - One or more ( + ) of any character not a space ( [^ ] ) -- The ^ means "not"
        - Literal space
)       - End of capture group

Open in new window


Replace
$1      - Whatever was captured in capture group 1
test_   - Literal text

Open in new window

0
 

Author Closing Comment

by:lconnell
ID: 40230130
Great explanation and examples
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
AWK: How to strip all double quotes from a file? 7 31
$_GET call between URL 3 45
Java array 10 63
Merging text files strings with filename 18 46
In this post we will learn how to make Android Gesture Tutorial and give different functionality whenever a user Touch or Scroll android screen.
Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question