• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 471
  • Last Modified:

regex that gives me the word before and after a specific word

Hi all,

I need a regex that gives me the word before and after a specific word, included the search word itself.

Like: "This is some dummy text to find a word" should give me a string of "dummy text to" when text is my search word.

Another question, it's possible that the string provided will contain more then once the search word so I must be able to retrieve all matches in that string with C#.

Like "This is some dummy text to find a word in a string full with text and words"

Should return:
"dummy text to"
"with text and"

I hope any one can help me. Thx!
0
Pit76
Asked:
Pit76
  • 10
  • 8
1 Solution
 
käµfm³d 👽Commented:
Try this:


Open in new window

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication24
{
    class Program
    {
        static void Main(string[] args)
        {
            string source = "This is some dummy text to find a word in a string full with text and words";
            MatchCollection matches = Regex.Matches(source, @"\b\w+\b text \b\w+\b");

            foreach (Match m in matches)
            {
                Console.WriteLine(m.Value);
            }

            Console.ReadKey();
        }
    }
}

Open in new window

0
 
Pit76Author Commented:
Thx, that works if my search word has words before and after. What if the search word is in the beginning of the end of the sentence? Then it doesn't give me a result, e.g.: "Text to read"
0
 
Pit76Author Commented:
I was't complete. Actually I should have all the matches returned that contain the search word. A few examples:
Text is too read. -> Text is
Read my text. -> my text
This is a text-field example -> a text-field example

I hope I'm more clear now. Sry that I didn't explained this from the start.

Thx for any help!
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
käµfm³d 👽Commented:
What if the search word is in the beginning of the end of the sentence?
I had my fingers crossed that you wouldn't say that. Alas...   = )

It's easy enough to fix:
string source = "Text is too read.";
MatchCollection matches = Regex.Matches(source, @"(?:\b\w+\b )?text(?: \b\w+\b)?", RegexOptions.IgnoreCase);

foreach (Match m in matches)
{
    Console.WriteLine(m.Value);
}

Open in new window

0
 
Pit76Author Commented:
That works great!
Now there is only one thing i need to fix and that is this one:
This is a text-field example to blabla -> a text-field example

So when the search word is connected with an other word it should see this.

other examples:
This is a text.field example to blabla-> a text.field example
This is a d'text example to blabla -> a d'text example

Sry for being such a pain :) You've definiatly earned your points.
Any good links where I can learn a bit about regex myself?

Actually it should grap the words from the space before the first word and after the word that comes after the search word, if you get what I mean.
0
 
käµfm³d 👽Commented:
Sry for being such a pain
Oh come on...  asking questions is the foundation of this site (not to mention learning)     = D


So when the search word is connected with an other word it should see this.
OK. Try:

string source = "Text is too read.";
MatchCollection matches = Regex.Matches(source, @"(?:\b\w+\b )?text(?:-\w)*(?: \b\w+\b)?", RegexOptions.IgnoreCase);

foreach (Match m in matches)
{
    Console.WriteLine(m.Value);
}

Open in new window


If you need to go the other way as well (e.g. this is some-text that I wrote), the you can change it to this:

string source = "Text is too read.";
MatchCollection matches = Regex.Matches(source, @"(?:\b\w+\b )?(?:\w-)*text(?:-\w)*(?: \b\w+\b)?", RegexOptions.IgnoreCase);

foreach (Match m in matches)
{
    Console.WriteLine(m.Value);
}

Open in new window


Any good links where I can learn a bit about regex myself?
I prefere www.regular-expressions.info
0
 
käµfm³d 👽Commented:
P.S.

"\w" encompasses:  a-z, A-Z, 0-9, and _ . I don't know if it is an issue, but if you'd feel more comfortable you can change the "\w" to "[a-zA-Z]".
0
 
Pit76Author Commented:
Ok, then I have a more difficult one I guess :)

Search word "Extrafilm":

Content:
<p>Kiitos ExtraFilm.fi -tilauksestasi. Tässä ovat tiedot, joita tarvitset tilauksesi seuraamiseksi. Ota meihin yhteyttä, jos sinulla on kysyttävää. Ja muista ottaa valokuvia tärkeistä pienistä hetkistä. Olemme mielellämme palveluksessasi uudelleen.</p>  <p>Tärkeää: Tärkeää, mikäli olet tehnyt tilauksesi Cd:lle poltettuna, muista lähettää se ExtraFilmiin.<br /><br />ExtraFilm<br />Pl 1440<br />00002 Helsinki</p>  <p>Ole hyvä ja kirjoita tilausnumero ja päivämäärä CD:n/DVD:n päälle.</p>  <p>Tässä on henkilökohtainen <b>tilausnumerosi</b><b>: {0}</b>.</p>  <p>Toimitamme tilauksesi postitse 5 - 8 arkipäivän kuluessa.</p>  <p>Tilaus toimitetaan osoitteeseen: <b>{1}</b>.</p>  <p>Olet valinnut maksaa summan {2} seuraavalla maksutavalla {3}.</p>  <p>Tilasitko useita tuotteita yhdellä kertaa? Saatat saada ne erikseen, koska toimitamme kunkin tuotteen heti sen valmistuttua.</p>  <p>Löydät vahvistuksen myös sähköpostistasi. Se sisältää nämä samat tiedot. Onko vahvistuksessa virhe? Onko sinulla kysyttävää koskien tilaustasi? Lähetä siinä tapauksessa viesti asiakaspalveluumme: <a href="mailto:aspalv@extrafilm.fi">aspalv@extrafilm.fi</span></a>. Selvitämme mahdollisen ongelman.</p>  <p>Toivomme, että käytät palvelujamme pian uudelleen</p>  <p>ExtraFilm-tiimi</p>

Should give:
<p>Kiitos ExtraFilm.fi
se ExtraFilmiin.<br
/>ExtraFilm<br
href="mailto:aspalv@extrafilm.fi">aspalv@extrafilm.fi</span></a>.
<p>ExtraFilm-tiimi</p>

So it should include the words one or two spaces before and after the search word. I don't even know if this is even possible with regex?
0
 
käµfm³d 👽Commented:
That's my fault. I missed where you put the period and quotation mark examples above. Corrected below. You can add whichever characters you expect to receive inside the square brackets. Be sure to leave the hyphen as either the first or last character inside the brackets (otherwise the regex engine thinks you are trying to introduce a character range).

string source = "Text is too read.";
MatchCollection matches = Regex.Matches(source, @"(?:\b\w+\b )?(?:\w[.'-])*text(?:[.'-]\w)*(?: \b\w+\b)?", RegexOptions.IgnoreCase);

foreach (Match m in matches)
{
    Console.WriteLine(m.Value);
}

Open in new window

0
 
käµfm³d 👽Commented:
Actually, in hindsight, those latest examples wont' work with what I just posted. We're delving outside of the concept of "words" now, which is fine. But regex is very specific when it comes to matching. Can you provide a definition of what you expect a "word" to be? My posts above were going on the assumption of a word consisting of letters, numbers, or underscores. You have introduced HTML tags, which, of course, don't fit that definition.
0
 
Pit76Author Commented:
Well it can be just a simple sentence but it's also possible that it can be very different html content e.g. like the content above.
Isn't it possible with regex to take the content between 2 spaces, with the search word somewhere in the middle?
Or should I look more into plain string manipulations?
0
 
käµfm³d 👽Commented:
Isn't it possible with regex to take the content between 2 spaces, with the search word somewhere in the middle?
Ah. I think I understand know. I'm a little thick today  = )

Let's see if this is what you are after. The first is if you want the words to the left and right of the preceding and trailing spaces, respectively; the second is for getting everything up to, but not including the spaces.

MatchCollection matches = Regex.Matches(source, @"\S*Extrafilm\S*", RegexOptions.IgnoreCase);

Open in new window


MatchCollection matches = Regex.Matches(source, @"\S*\s\S*Extrafilm\S*\s\S*", RegexOptions.IgnoreCase);

Open in new window

0
 
Pit76Author Commented:
Thx kaufmed, the second one is almost perfect! That one gives me the results I expect. Only case I found this far where it doesn't work is here:  le site Internet d'Extrafilm.
0
 
Pit76Author Commented:
Sry, found some more cases where this doesn't work:
ExtraFilm - Omat kuvani
Extrafilm
at Extrafilm
le site Internet d'Extrafilm.

It's very well possible that the search word is the first, last or even only word of the sentence.
0
 
Pit76Author Commented:
I have been trying myself to adjust the regex so that it takes the word also if it's at the beginning or end. This is what i have so far:
MatchCollection matches = Regex.Matches(source, @"(\S*\s\S*){3}|Extrafilm|(\S*\s\S*){2}", RegexOptions.IgnoreCase);
        Console.Write("Matches: " + matches.Count);
        Console.WriteLine();
        foreach (Match m in matches)
        {
          if ((m.Value.ToLower().Contains("extrafilm")))
            Console.WriteLine(m.Value);
        }

Open in new window


But now I have lost the following cases:
Extrafilm.fi
Extrafilm-action
mail@extrafilm

Any ideas?
Thx!
0
 
Pit76Author Commented:
Hi kaufmed,

It looks like it's only not working when there are no spaces in the words, like in the three examples above I it only returns Extrafilm.
0
 
käµfm³d 👽Commented:
And this?
MatchCollection matches = Regex.Matches(source, @"(?:^|\S*\s*)\S*Extrafilm\S*(?:$|\s*\S*)", RegexOptions.IgnoreCase | RegexOptions.Multiline);

Open in new window

0
 
Pit76Author Commented:
That does the trick! Now I get the results as I want them in all the different cases :)
Thank you very much for your time and effort!
Grts!
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 10
  • 8
Tackle projects and never again get stuck behind a technical roadblock.
Join Now