Solved

Regex Match Sentences?

Posted on 2004-09-16
12
1,503 Views
Last Modified: 2010-08-05
  I'm attempting to parse a text file into an SQL table using Regex Match. I have successfully matched just the port. However, I've never Matched a complete sentence before. The sentences all start with a preceding 'white space'...

Q. How can I Regex Match just the individual sentences?  


2 Death
20 Senna Spy FTP server
21 Back Construction, Blade Runner, Cattivik FTP Server, CC Invader, Dark FTP, Doly Trojan, Fore, Invisible FTP, Juggernaut 42, Larva, MotIv FTP, Net Administrator
22 Shaft
23 Fire HacKer, Tiny Telnet Server - TTS, Truva Atl
25 Ajan, Antigen, Barok, Email Password Sender - EPS, EPS II, Gip, Gris, Happy99, Hpteam mail, Hybris, I love you, Kuang2, Magic Horse, MBT (Mail Bombing Trojan)
31 Agent 31, Hackers Paradise, Masters Paradise
41 Deep Throat, Foreplay
48 DRAT
50 DRAT
58 DMSetup
59 DMSetup
79 CDK, Firehotcker
0
Comment
Question by:kvnsdr
  • 5
  • 4
  • 3
12 Comments
 
LVL 4

Expert Comment

by:vigrid
ID: 12076548
What language do you intend to use the regex in?
0
 
LVL 1

Author Comment

by:kvnsdr
ID: 12076813
I program in C# .NET... I'm looking for a regular expression, not code.
0
 
LVL 4

Expert Comment

by:vigrid
ID: 12076910
Code snippet in C#:

Regex rx = new Regex("(?<port>[0-9]+) (?<sentence>.+)");
StreamReader sr = new StreamReader("Input.txt");
string file = sr.ReadToEnd();
MatchCollection mc = rx.Matches(file);
foreach(Match m in mc)
      Console.WriteLine("Port: {0}\tSentence: {1}", m.Groups["port"], m.Groups["sentence"]);

Regular expression: "(?<port>[0-9]+) (?<sentence>.+)".

Comments:

the (?<something>[blah]) creates a new regex group. You can name the groups and then access them in a way you like. So you create 2 groups: "port" and "sentence", and you access them via Groups property in Match object.

HTH
0
 
LVL 1

Author Comment

by:kvnsdr
ID: 12077316
The Regex for 'sentence' returns the port and the sentenc. I need to exclude the port.

Regex:  (?<sentence>.+)".
Return: 25 AntiGen, Email Password Attacks

Need:   AntiGen, Email Password Attacks
0
 
LVL 4

Expert Comment

by:vigrid
ID: 12077420
Yes, you're absolutely right. You're just not using it right :). Just use the "sentence" group within a match, but use the whole regular expression for that. The first group ("port") is looking for digits and a space character, and the second group is looking for everything that is left until the end of line. If you deleted the first group, only the second group is still working, so it matches the whole line of text. The comma character stands for "any character" in regex.

Regex rx = new Regex("(?<port>[0-9]+) (?<sentence>.+)");
StreamReader sr = new StreamReader("Input.txt");
string file = sr.ReadToEnd();
MatchCollection mc = rx.Matches(file);
foreach(Match m in mc)
     AddToCollection(m.Groups["sentence"]);

Now does it make any more sense?

HTH :)
0
 
LVL 4

Accepted Solution

by:
vigrid earned 125 total points
ID: 12077481
Oh, you can use the regex:

Regex rx = new Regex("(?<port>[0-9]+)\\s+(?<sentence>.*)");

Two things are added here - the space character is exchanged to "\s|", which tells regex to match any whitespace character(s). And the comma is postfixed woth an asterisk rather than a plus. Plus sign stands for "1 or more occurences of the expression to the left", and asterisk stands for "0 or more occurences of the expression to the left".

Please note the double backslash character. It's needed by the C# compiler to tell that after the "\" is something else than an escape character.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 1

Author Comment

by:kvnsdr
ID: 12077837
My Regex manual briefly references a "Positive Lookahead-Assertion" with the following example, Meaning; the pattern preceding the parentheses is searched and if the pattern within the parentheses is found, it is not part of the result return. That's what I'm attempting to do.  

"Positive Lookahead-Assertion" example:
...(?=...)

Current Regex that I think should work:
(?<sentence>.+)(?=\d{1,6})

Still something is wrong with this Regex
0
 
LVL 7

Expert Comment

by:aib_42
ID: 12077858
This would be a Perl-style regex. Convert and escape it accordingly:

/^(\d{2})\s(.*)$/

Optionally, use
/^(\d{2})\s(.*)\s?$/
to get rid of extra whitespace at the end of "sentence".
0
 
LVL 7

Expert Comment

by:aib_42
ID: 12077893
sorry, change the second regex with:
/^(\d{2})\s(.*)\s*?$/
0
 
LVL 7

Expert Comment

by:aib_42
ID: 12077945
and (\d{2}) assumes two digits. use {min,} {,max} or {min,max} if you have any minimum or maximum number of digits. For min=1 and max=infinity, use (\d+)
0
 
LVL 1

Author Comment

by:kvnsdr
ID: 12078389
And the correct answer is:

//  (?<sentence>\D*)           = Sentences without numbers    -- Using * or + return same result --
//  ((?<sentence>.+)           = Sentences with ANY characters

I will award the 125 points to vigrid because of a good partial answer leading to the correct answer.
0
 
LVL 4

Expert Comment

by:vigrid
ID: 12078698
Thank you! :)
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
edit mobile phone number in office 365 15 53
Hide vba in gp 7 83
C# DataGridView_RowsAdded event not firing 1 66
"Black Box" Testing of Control System Software 2 26
Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Does the idea of dealing with bits scare or confuse you? Does it seem like a waste of time in an age where we all have terabytes of storage? If so, you're missing out on one of the core tools in every professional programmer's toolbox. Learn how to …
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…

861 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now