Solved

Regex to find location of a recurring string

Posted on 2009-05-07
6
321 Views
Last Modified: 2012-08-14
Greetings,
I have a long text file with some HTML formatting in it.  Within this file I need to find the start and end points of a particular string.  These strings were endnotes that somehow got seen as regular text when saved as PDF and now need to be removed.

The string I'm searching for starts with
<P>US-United States industry only.

and ends with
United States industries are comparable.</P>

The kicker is that the string may or may not have leading or trailing spaces within the <P> tags.  So what I want to do, is find the starting and ending locations of this string and remove the text between them
0
Comment
Question by:andy_ee
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 15

Expert Comment

by:David L. Hansen
ID: 24325882
0
 

Author Comment

by:andy_ee
ID: 24325940
I am aware of the indexof function.  My problem is that leading and trailing spaces *within* the paragraph tags.
0
 
LVL 13

Accepted Solution

by:
iHadi earned 500 total points
ID: 24326449
Hi

What I understand from your question is that you have a text that looks like this:

<P>  US-United States industry only. text text United States industries are comparable.  </P>

and you want it like this:

<P>US-United States industry only. text text United States industries are comparable.</P>

to do so you can use the replace function of the Regex:

string input = "...";
string pattern = @"(?<start><P>)\s*(?<text>US-United States industry only\..*United States industries are comparable\.)\s*(?<end></P>)";
string replacement = @"${start}${text}${end}";
 
string result = Regex.Replace(input, pattern, replacement);
 
Console.WriteLine(result);

Open in new window

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:andy_ee
ID: 24326554
You are *SO* close!

I want to find and remove a string that starts with:
"<P>US-United States industry only." or "<P> US-United States industry only."

and ends with:
"United States industries are comparable.</P>" or "United States industries are comparable. </P>"

Please note the spaces after the <P> tag and before the </P> tag.
0
 
LVL 13

Expert Comment

by:iHadi
ID: 24326743
The previous code does that exactly
0
 

Author Closing Comment

by:andy_ee
ID: 31578998
Excellent!  Thanks!
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
ado.net AddWithValue when using IN condition 7 43
orderby list (from Json) 1 42
Return array 3 32
Can Selenium do Load Testing? 2 57
Introduction This article series is supposed to shed some light on the use of IDisposable and objects that inherit from it. In essence, a more apt title for this article would be: using (IDisposable) {}. I’m just not sure how many people would ge…
This article describes a simple method to resize a control at runtime.  It includes ready-to-use source code and a complete sample demonstration application.  We'll also talk about C# Extension Methods. Introduction In one of my applications…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question