Tech or Treat! Write an article about your scariest tech disaster to win gadgets!Learn more

x
?
Solved

How to find and replace text between two words?

Posted on 2014-02-25
12
Medium Priority
?
5,968 Views
Last Modified: 2014-03-11
hi

This is a my current data (extracted from html)
I need to delete everything between the first word(e.g AAAAAAA) and last word(e.g BBBBBBB) of each paragraph.
I wish the outcome to be

AAAAAAAA, BBBBBBBB

CCCCCCCCC,DDDDDDDD

EEEEEEEE, FFFFFFFFF

I will need export the data to Excel.
I tried to Google up and down, but cant find a way to select the data in between.
Hope you give a solution.
Thanks

----------------------------
Text need to be processed
----------------------------
AAAAAAAA</a></p><div class="box-listing_textCS6"><ol><li><div class="icon_mobileCS"></div><a id="csagentphone17556" style="color:#000; font-weight:bold;" class="csagentphonelead" href="findanagent.aspx?ty=as&ak=&rk=&pg=1&rmp=1000&st=PA&ct=&st1=&ct1=#17556" title="Click To Preview" data="BBBBBBBB

CCCCCCCCC</a></p><div class="box-listing_textCS6"><ol><li><div class="icon_mobileCS"></div><a id="csagentphone16127" style="color:#000; font-weight:bold;" class="csagentphonelead" href="findanagent.aspx?ty=as&ak=&rk=&pg=1&rmp=1000&st=PA&ct=&st1=&ct1=#16127" title="Click To Preview" data="DDDDDDDD

EEEEEEEE</a></p><div class="box-listing_textCS6"><ol><li><div class="icon_mobileCS"></div><a id="csagentphone15638" style="color:#000; font-weight:bold;" class="csagentphonelead" href="findanagent.aspx?ty=as&ak=&rk=&pg=1&rmp=1000&st=PA&ct=&st1=&ct1=#15638" title="Click To Preview" data="FFFFFFFFF
0
Comment
Question by:Shirley80
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +4
12 Comments
 
LVL 35

Accepted Solution

by:
Dan Craciun earned 800 total points
ID: 39886381
Why use Word, that has only basic wildcard abilities, and not use a true Regex enabled editor?

For ex, it's trivial to do this in Notepad++.
delete htmlJust use
(\w*).*"(\w*)
in the find box and
$1,$2
in the replace box.

PS: make sure the cursor is at the beginning of the file.

HTH,
Dan
0
 
LVL 93

Assisted Solution

by:Patrick Matthews
Patrick Matthews earned 200 total points
ID: 39886423
Easiest would be to:

1) Put your data into Excel

2) Add this function to a regular VBA module:

Function RegExpReplace(LookIn As String, PatternStr As String, Optional ReplaceWith As String = "", _
    Optional ReplaceAll As Boolean = True, Optional MatchCase As Boolean = True, _
    Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string, and replace parts of the string
    ' matching the specified pattern with another string.  The optional argument ReplaceAll
    ' controls whether all instances of the matched string are replaced (True) or just the first
    ' instance (False)
    
    ' If you need to replace the Nth match, or a range of matches, then use RegExpReplaceRange
    ' instead
    
    ' By default, RegExp is case-sensitive in pattern-matching.  To keep this, omit MatchCase or
    ' set it to True
    
    ' If you use this function from Excel, you may substitute range references for all the arguments
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = ReplaceAll
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
    
    RegExpReplace = RegX.Replace(LookIn, ReplaceWith)
    
End Function

Open in new window


3) Assuming your data start in A1, use a formula like this:

=RegExpReplace(A1,"^([A-Z]+)(.*?([A-Z]+))$","$1, $3")

I explain this approach in my article here: http://www.experts-exchange.com/Programming/Languages/Visual_Basic/A_1336-Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html
0
 
LVL 13

Expert Comment

by:Santosh Gupta
ID: 39886430
Hi,

use "Find and replace" and enter /a*data=" in the find box and. Now click on "Replace all"

after that again use "Find and replace" and enter < in the find box and , in the replace box. Now click on "Replace all"
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39886485
@patrick: your definition of "easy" is way different than mine :)

@sgupta1181: your solution works for the sample, but you have to check "Use wildcards". And if the html/xml code changes (so the end is no longer "data=") then it no longer works.
0
 
LVL 21

Assisted Solution

by:Eric Fletcher
Eric Fletcher earned 600 total points
ID: 39886518
Word's wildcards are not as elegant as a true Regex editor, but they are able to do what you want with Find and Replace.

Find what: (?[!<]{1,})(\<)(*" data=")(?[!^0013]{1,})
Replace with: \1,\4

This finds 4-part structures consisting of:
(1) any number of characters except <;
(2) the < character;
(3) any number of characters plus " data="; and
(4) any number of characters except the paragraph break (the ^0013).
Each element of the structure needs to be enclosed within parentheses within the Find definition so it can be referred to in the replace string.

It will then replace each found structure with the 1st and 4th elements separated by a comma.
0
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 39887131
@Dan,

That's because I hadn't seen your Notepad++ suggestion before I posted mine :)

Patrick
0
 
LVL 23

Assisted Solution

by:Ejgil Hedegaard
Ejgil Hedegaard earned 200 total points
ID: 39887132
With a formula in Excel, data in A1
=LEFT(A1,SEARCH("<",A1)-1)&","&RIGHT(A1,LEN(A1)-SEARCH(CHAR(255),SUBSTITUTE(A1,CHAR(34),CHAR(255),LEN(A1)-LEN(SUBSTITUTE(A1,CHAR(34),"")))))

Open in new window


Finds the first < for the word(s) before that, and finds last " for word(s) after that.
0
 
LVL 46

Assisted Solution

by:aikimark
aikimark earned 200 total points
ID: 39919374
1. find/replace all ^p with ^l^p  (no wild cards)
2. find/replace all <(*)>*data="<(*)>^l with \1, \2  (with wild cards)

Note: you can manually add the last manual line break at the end in lieu of step 1.
0
 

Author Comment

by:Shirley80
ID: 39919522
Sorry, I am new here.
Since they are so many solutions?
Which solutions should i accept ?
The first solution with the correct answer?
or accept multiple solutions and split the points for expert. thanks
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39919779
If the solutions are the same, the first one.
If the solutions are different, you can split the points.

HTH,
Dan
0
 

Author Closing Comment

by:Shirley80
ID: 39920590
Thanks all for the wonderful answers!
0

Featured Post

Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article describes how to import an Outlook PST file to Office 365 using a third party product to avoid Microsoft's Azure command line tool, saving you time.
Ever wonder what it's like to get hit by ransomware? "Tom" gives you all the dirty details first-hand – and conveys the hard lessons his company learned in the aftermath.
In a previous video Micro Tutorial here at Experts Exchange (http://www.experts-exchange.com/videos/1358/How-to-get-a-free-trial-of-Office-365-with-the-Office-2016-desktop-applications.html), I explained how to get a free, one-month trial of Office …
Do you want to know how to make a graph with Microsoft Access? First, create a query with the data for the chart. Then make a blank form and add a chart control. This video also shows how to change what data is displayed on the graph as well as form…

647 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question