Solved

RegEx N'th Occurrence

Posted on 2014-07-29
20
240 Views
Last Modified: 2014-07-30
I have a file that has values separated by spaces.  I only want to grab the third space on each line. How would I do that?
0
Comment
Question by:lconnell
  • 7
  • 6
  • 6
  • +1
20 Comments
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 40227905
What programming language or text editor are you using?

You might try:

^ [^ ]+ [^ ]+( )

Open in new window


Also, I took your question quite literally (as a regex would!), so I'm sure the above isn't exactly what you are looking for. Can you clarify what you are after?
0
 

Author Comment

by:lconnell
ID: 40227906
Sublime Text Editor, also would be nice to know for VIM.

That did not work when using the RegEx search in Sublime.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 40227913
I don't know if you saw the edit in my comment, but can you clarify what you are after? It seems weird that you would want the third space. I suspect what you meant was what follows the third space.
0
 

Author Comment

by:lconnell
ID: 40227941
So I want to edit a file using multi-selection. I have 100 lines of the following text.

data1 data2 data3 data4 data5
...
...
...

I want to use Sublime or any editor to find the 3rd space so I can edit every line at once at that space. So this way I can modify data4 on every line at one time to say "test_data4". Data4 can be any value that's why I want to match at the third space.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 40227962
OK, I see where I went wrong. This should be correct now:

^[^ ]+ [^ ]+ [^ ]+ 

Open in new window


This pattern assumes that a line never starts with a space.

Screenshot
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40228375
Here's an alternative pattern
\w+ \w+ \w+ (\w+)

Open in new window


You can then use the regex Replace method against the \1 capture group
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 40228402
@aikimark

There's no perceived benefit to using the "word character" class over "not a space". In the worst case the pattern won't match if there are any characters other than alphabetic, numeric, or underscores.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40228408
@kaufmed

I realize that.  Normally, I would use the not-a-space pattern.  But you'd already used it and I find that \w+ is simpler to type than [^ ]+
Three characters versus five characters.

What I hope I've added is the grouping of the fourth 'word' that will allow the Replace method to be used.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40228412
It looks like my pattern needed tweaking.  It should be: (\w+ \w+ \w+ )(\w+)( .*?\r\n)
Example:
    Dim strData As String
    Dim oRE As Object
    Dim oMatches As Object, oM As Object
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True
    oRE.Pattern = "(\w+ \w+ \w+ )(\w+)( .*?\r\n)"
    strData = "data1 data2 data3 data4 data5" & vbCrLf
    strData = strData & "data21 data22 data23 data24 data25" & vbCrLf
    strData = strData & "data31 data32 data33 data34 data35" & vbCrLf
    If oRE.test(strData) Then
        Debug.Print oRE.Replace(strData, "$1test_$2$3")
    End If

Open in new window

Contents of Immediate window after running the above code:
data1 data2 data3 test_data4 data5
data21 data22 data23 test_data24 data25
data31 data32 data33 test_data34 data35

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 40228413
Yes.  It is possible to use the not-a-space pattern: ([^ ]+ [^ ]+ [^ ]+ )([^ ]+)( .*?\r\n)
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 8

Expert Comment

by:Surrano
ID: 40228613
vim pattern:

:%s/^\(\([^ ]* \)\{3\}\)\([^ ]*\)/\1test_\3/

Open in new window

0
 

Author Comment

by:lconnell
ID: 40229189
Thanks for the assistance everyone. So there is still a problem here. I only want to select the actual white space in the third column, not the text up to the 3rd white space.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40229242
@lconnell

Please test the code I posted
0
 

Author Comment

by:lconnell
ID: 40229283
aikimark, it does not work.  It actually doesn't match anything.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40229397
It actually doesn't match anything.
Does your actual data reflect the sample data you posted?

Have you changed my code to read your data or are you expecting my sample code to change your file data?  The code shows how to use regular expression to do a replace.  I used string literals that was meant to simulate the data you used in your example.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 40229817
The problem you face is that ST uses the Boost regex engine, which does not support arbitrary-length lookbehinds, which is what you would need in order to effectively skip over the first two spaces without actually including them in the match. The only thing you can do at this point is to do a find/replace as aikimark described above, except that you would capture the whole string, not just the last non-space:

e.g.

Find
(^[^ ]+ [^ ]+ [^ ]+ )

Open in new window


Replace
$1test_

Open in new window


 Screenshot
0
 

Author Comment

by:lconnell
ID: 40229909
Perfect, that works fine using the replace with what is already highlighted. Can you explain the actual regex?
0
 
LVL 45

Assisted Solution

by:aikimark
aikimark earned 150 total points
ID: 40229952
in Notepad++, the following find/replace operation gives the same results:
Find what: ([^ ]+ [^ ]+ [^ ]+ )([^ ]+)( .*?\r\n)
Replace with: $1Test_$2$3

Results:

data1 data2 data3 Test_data4 data 5
data1 data2 data3 Test_data4 data 5
data1 data2 data3 Test_data4 data 5
data1 data2 data3 Test_data4 data 5
data1 data2 data3 Test_data4 data 5

Open in new window

0
 
LVL 74

Accepted Solution

by:
käµfm³d   👽 earned 350 total points
ID: 40230017
Find
(       - Start of capture group (first, and only, group)
^       - Start of line
[^ ]+   - One or more ( + ) of any character not a space ( [^ ] ) -- The ^ means "not"
        - Literal space
[^ ]+   - One or more ( + ) of any character not a space ( [^ ] ) -- The ^ means "not"
        - Literal space
[^ ]+   - One or more ( + ) of any character not a space ( [^ ] ) -- The ^ means "not"
        - Literal space
)       - End of capture group

Open in new window


Replace
$1      - Whatever was captured in capture group 1
test_   - Literal text

Open in new window

0
 

Author Closing Comment

by:lconnell
ID: 40230130
Great explanation and examples
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

If you’re thinking to yourself “That description sounds a lot like two people doing the work that one could accomplish,” you’re not alone.
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now