Solved

REGEXEXTRACT to extract a subsection of text

Posted on 2014-10-04
10
259 Views
Last Modified: 2015-03-02
Hiya, I recently came across the REGEXEXTRACT function in google docs but there is no documentation. I think it will do what i want though, that is to split a windows file path into parts based on searches eg:

C:\user\Dropbox\property\maintenance & repairs\open\oddj smp1 slight leak through new window (normal).jpg

extract "open" with the rule: the text between the last two backslashes
extract "oddj" with the rule: from the last backslash to the next space
extract "smp1" with the rule: from "oddj" until the second space
extract  "slight leak through new window" with the rule from "smp1" until the full stop (or space before if there is one)
extract  "normal" with the rule text between the last set of brackets

Can anyone help with some or all of these? The rules can be bent as long as they give the correct answer in any similar example.

Thanks a million!
0
Comment
Question by:xenium
  • 7
  • 3
10 Comments
 

Author Comment

by:xenium
ID: 40361951
0
 

Author Comment

by:xenium
ID: 40361953
Item 3 extract "smp1" , the following almost does this except the bounding conditions are included in the result, how do i remove these?

=REGEXEXTRACT(fullpath, trade & "\w*\s")

Gives result: oddj smp1
I need just: smp1

Thanks again
0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 40362137
Using capture group should work:
=REGEXTRACT(fullpath, trade & "(\w*)\s")

Open in new window


It may return it in array context (not sure).  If so, you'd need to do whatever Google Docs expects to grab the first array element in scalar context.
0
 

Author Comment

by:xenium
ID: 40362157
Great thanks, that works straight off (bar typo in function name which is REGEXEXTRACT)

I'll have a look at the rest later as I have to dash now, but looks like that would work for a few of the points above. Meanwhile if you have any tips on any that maybe trickier feel free to post. cheers
0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 500 total points
ID: 40362229
The only one that looks like it would be possibly trickier than smp1 is the last one (capturing from the last parens).  Provided it follows the norm, you'll probably have to do this:
=REGEXEXTRACT(fullpath, ".*\(([^)]*)\)")

Open in new window


That should force it to skip as much as possible before grabbing what's inside the last set of parens.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:xenium
ID: 40362809
Thanks that works a treat! I'll have a closer look tomorrow and try to wrap up the rest too.
0
 

Author Comment

by:xenium
ID: 40363413
This is powerful stuff, but i'm still struggling, could you help with the first and fourth cases too?

Maybe also some tips how to "read" such expressions. eg i think the following is the case for the last example, can you confirm/correct, also how the logic ties together..

Look for any character except closed parentheses:
[^)]
Return all characters except closed  parentheses:
([^)]*)
Look for any number of characters:
.*
Look for open/closed parentheses:
\(
\)

Open in new window


Thanks again!
0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 500 total points
ID: 40364289
You have it correct for the last one.

The first case is basically the same as the last case:
=REGEXEXTRACT(fullpath, ".*\\([^\\]*)\\")

Open in new window


The fourth case is tricky unless you can guarantee there will not be a paren or period in it.  If you can guarantee that, it becomes:
=REGEXEXTRACT(fullpath, trade & "\w*\s([^.\(]+)")

Open in new window


If you can't guarantee no paren or period, the simplest way is to capture the previous and next result and obtain it that way.  For example, if rule3 and rule5 contain those results then it would be:
=REGEXEXTRACT(fullpath, rule3 & "\w*(.*)\w*\(" & rule5)

Open in new window


If there may not be a rule5 match, it gets even trickier.
0
 

Author Comment

by:xenium
ID: 40364654
Thanks a lot for your help. That does the job for now. If i need to dig further I'll repost new questions.
0
 

Author Closing Comment

by:xenium
ID: 40364656
Many thanks for your help!
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Working with spreadsheets can be a daunting task, especially when having to deal with large amounts of data. All you see are rows and rows of numbers and soon your eyes begin to glaze over. Take advantage of the tools in Google Sheets to create prof…
Google is more than just a search engine. Over the years the company has developed a wide range of online services that are readily available to all users. This article highlights how one can use Google services for simple project management.
This Micro Tutorial  demonstrates whether your site uses one subdomain or multiple subdomains, how to create full URLs from Google Analytics content reports. This procedure is called concatenation and can also be done with the CONCATENTATE function.…
This Micro Tutorial will demonstrate Google Calendar to monitor updates with top sites, such as Facebook, Google, Twitter, etc. with Marketing News. Each update of Google Calendar can be monitored, correlate dips and spikes in your website traffic, …

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now