[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

Regex Match Sentences?

Posted on 2004-09-16
12
Medium Priority
?
1,558 Views
Last Modified: 2010-08-05
  I'm attempting to parse a text file into an SQL table using Regex Match. I have successfully matched just the port. However, I've never Matched a complete sentence before. The sentences all start with a preceding 'white space'...

Q. How can I Regex Match just the individual sentences?  


2 Death
20 Senna Spy FTP server
21 Back Construction, Blade Runner, Cattivik FTP Server, CC Invader, Dark FTP, Doly Trojan, Fore, Invisible FTP, Juggernaut 42, Larva, MotIv FTP, Net Administrator
22 Shaft
23 Fire HacKer, Tiny Telnet Server - TTS, Truva Atl
25 Ajan, Antigen, Barok, Email Password Sender - EPS, EPS II, Gip, Gris, Happy99, Hpteam mail, Hybris, I love you, Kuang2, Magic Horse, MBT (Mail Bombing Trojan)
31 Agent 31, Hackers Paradise, Masters Paradise
41 Deep Throat, Foreplay
48 DRAT
50 DRAT
58 DMSetup
59 DMSetup
79 CDK, Firehotcker
0
Comment
Question by:kvnsdr
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 3
12 Comments
 
LVL 4

Expert Comment

by:vigrid
ID: 12076548
What language do you intend to use the regex in?
0
 
LVL 1

Author Comment

by:kvnsdr
ID: 12076813
I program in C# .NET... I'm looking for a regular expression, not code.
0
 
LVL 4

Expert Comment

by:vigrid
ID: 12076910
Code snippet in C#:

Regex rx = new Regex("(?<port>[0-9]+) (?<sentence>.+)");
StreamReader sr = new StreamReader("Input.txt");
string file = sr.ReadToEnd();
MatchCollection mc = rx.Matches(file);
foreach(Match m in mc)
      Console.WriteLine("Port: {0}\tSentence: {1}", m.Groups["port"], m.Groups["sentence"]);

Regular expression: "(?<port>[0-9]+) (?<sentence>.+)".

Comments:

the (?<something>[blah]) creates a new regex group. You can name the groups and then access them in a way you like. So you create 2 groups: "port" and "sentence", and you access them via Groups property in Match object.

HTH
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Author Comment

by:kvnsdr
ID: 12077316
The Regex for 'sentence' returns the port and the sentenc. I need to exclude the port.

Regex:  (?<sentence>.+)".
Return: 25 AntiGen, Email Password Attacks

Need:   AntiGen, Email Password Attacks
0
 
LVL 4

Expert Comment

by:vigrid
ID: 12077420
Yes, you're absolutely right. You're just not using it right :). Just use the "sentence" group within a match, but use the whole regular expression for that. The first group ("port") is looking for digits and a space character, and the second group is looking for everything that is left until the end of line. If you deleted the first group, only the second group is still working, so it matches the whole line of text. The comma character stands for "any character" in regex.

Regex rx = new Regex("(?<port>[0-9]+) (?<sentence>.+)");
StreamReader sr = new StreamReader("Input.txt");
string file = sr.ReadToEnd();
MatchCollection mc = rx.Matches(file);
foreach(Match m in mc)
     AddToCollection(m.Groups["sentence"]);

Now does it make any more sense?

HTH :)
0
 
LVL 4

Accepted Solution

by:
vigrid earned 375 total points
ID: 12077481
Oh, you can use the regex:

Regex rx = new Regex("(?<port>[0-9]+)\\s+(?<sentence>.*)");

Two things are added here - the space character is exchanged to "\s|", which tells regex to match any whitespace character(s). And the comma is postfixed woth an asterisk rather than a plus. Plus sign stands for "1 or more occurences of the expression to the left", and asterisk stands for "0 or more occurences of the expression to the left".

Please note the double backslash character. It's needed by the C# compiler to tell that after the "\" is something else than an escape character.
0
 
LVL 1

Author Comment

by:kvnsdr
ID: 12077837
My Regex manual briefly references a "Positive Lookahead-Assertion" with the following example, Meaning; the pattern preceding the parentheses is searched and if the pattern within the parentheses is found, it is not part of the result return. That's what I'm attempting to do.  

"Positive Lookahead-Assertion" example:
...(?=...)

Current Regex that I think should work:
(?<sentence>.+)(?=\d{1,6})

Still something is wrong with this Regex
0
 
LVL 7

Expert Comment

by:aib_42
ID: 12077858
This would be a Perl-style regex. Convert and escape it accordingly:

/^(\d{2})\s(.*)$/

Optionally, use
/^(\d{2})\s(.*)\s?$/
to get rid of extra whitespace at the end of "sentence".
0
 
LVL 7

Expert Comment

by:aib_42
ID: 12077893
sorry, change the second regex with:
/^(\d{2})\s(.*)\s*?$/
0
 
LVL 7

Expert Comment

by:aib_42
ID: 12077945
and (\d{2}) assumes two digits. use {min,} {,max} or {min,max} if you have any minimum or maximum number of digits. For min=1 and max=infinity, use (\d+)
0
 
LVL 1

Author Comment

by:kvnsdr
ID: 12078389
And the correct answer is:

//  (?<sentence>\D*)           = Sentences without numbers    -- Using * or + return same result --
//  ((?<sentence>.+)           = Sentences with ANY characters

I will award the 125 points to vigrid because of a good partial answer leading to the correct answer.
0
 
LVL 4

Expert Comment

by:vigrid
ID: 12078698
Thank you! :)
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
The SignAloud Glove is capable of translating American Sign Language signs into text and audio.
Simple Linear Regression
Starting up a Project

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question