File parsing with Regular Expression

Hi,
I am using Visual Studio 2005 and language C#
I want to parse a file to extract some data  using Regular Expression.
I used              using System.Text.RegularExpressions;
Please help me  to parse a attached file using regular expression which is based on some pattern
I want exactly cert,Name and company based on pattern for regular expression.

Currently i am using
String pattern = "cert:(?<CertNo>.{9})\\s\\S* full name is\\s+(?<Name>.{12})\\s+company=(?<Company>.{11})";
as a pattern, But its parse only first line not 2nd line, because the contents on 2nd line is differ and not matched with the pattern. Please help me to write exact pattern to parse both line.


Let me know if you have any question.

Thanks in Advance
Ganesh Dutt Upadhyay
SampleRegexFile2.txt
LVL 9
gdupadhyayAsked:
Who is Participating?
 
ddrudikConnect With a Mentor Commented:
Try this revision:
using System;
using System.IO;
using System.Text.RegularExpressions;
namespace myapp
{
	class Class1
	{
		static void Main(string[] args)
		{
			StreamReader sr = new StreamReader(@"C:\UploadData\SampleRegexFile2.txt");
			String strFileText = sr.ReadToEnd();    // Reading the file and store the data in a string variable
			Regex re = new Regex(@"cert:(?<CertNo>\d+).*?Policy holder full name is (?<Name>.*?)\s*company= *(?<Company>.*)");
			MatchCollection mc = re.Matches(strFileText);
			int mIdx=0;
			foreach (Match m in mc)
			{
				for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
				{
					Console.WriteLine("[" + mIdx + "][" + re.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
				}
				mIdx++;
			}
		}
	}
}

Open in new window

0
 
ddrudikCommented:

Raw Match Pattern:
(?<=cert:)(?<CertNo>\d+)  Policy holder full name is (?<Name>.*?)\s*
company=(?<Company>.*)
 
C#.NET Code Example:
using System;
using System.Text.RegularExpressions;
namespace myapp
{
  class Class1
    {
      static void Main(string[] args)
        {
          String sourcestring = "source string to match with pattern";
          Regex re = new Regex(@"(?<=cert:)(?<CertNo>\d+)  Policy holder full name is (?<Name>.*?)\s*
company=(?<Company>.*)");
          MatchCollection mc = re.Matches(sourcestring);
          Int mIdx=0;
          foreach (Match m in mc)
           {
            for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
              {
                Console.WriteLine("[" + mIdx + "][" + re.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
              }
            mIdx++;
          }
        }
    }
}
 
$matches Array:
(
    [0] => Array
        (
            [0] => 123456789  Policy holder full name is Smith, John    
company=XYZ Company
        )
 
    [CertNo] => Array
        (
            [0] => 123456789
        )
 
    [1] => Array
        (
            [0] => 123456789
        )
 
    [Name] => Array
        (
            [0] => Smith, John
        )
 
    [2] => Array
        (
            [0] => Smith, John
        )
 
    [Company] => Array
        (
            [0] => XYZ Company
        )
 
    [3] => Array
        (
            [0] => XYZ Company
        )
 
)

Open in new window

0
 
gdupadhyayAuthor Commented:
Hi ddrudik,
Thanks for your response.
I run your code in my local system, its show mc.count=0.

Please Read this:
I am reading a file (I already attached it) and its contents is like this:

strFileText=cert:123456789  Policy holder full name is Smith, John    \r\ncompany=XYZ Company\r\ncert:99999999 some extra info here Policy holder full name is Thomas P. Johnson \r\ncompany= ACME\r\n"

When i run this code with:
MatchCollection mc = re.Matches(strFileText);

mc.count is zero in this case.
I am putting my code here:
*************************************************************************************************************
// strFile is Name of .txt file
string strFilePath = "UploadData" + "\\" + strFile;
StreamReader sr = new StreamReader(strFilePath);
strFileText = sr.ReadToEnd();
Regex re = new Regex(@"?<=cert:(?<CertNo>\\d+)Policy holder full name is(?<Name>.*?)\\s*company=(?<Company>.*)");
MatchCollection mc = re.Matches(strFileText);
*************************************************************************************************************
My Required output should me
Cert:123456789
Name:Smith, John
Company:XYZ Company

Cert:99999999
Name:Thomas P. Johnson
Company:ACME

Thanks
Ganesh

0
Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

 
ddrudikCommented:
You may want to show your code, loading your file from the provided URL your matches are found with an online regex tester:
http://www.myregextester.com/?r=213
0
 
gdupadhyayAuthor Commented:
But i am getting zero records on my local. Online tools also showing only 1 matched records. I want both records. As i explain before, I have to parse this file.

My Code is (Only 4 line) :
string strFilePath = "";
StreamReader sr = new StreamReader(C:\\UploadData\\SampleRegexFile2.txt);
strFileText = sr.ReadToEnd();    // Reading the file and store the data in a string variable
Regex re = new Regex(@"(?<=cert:)(?<CertNo>\d+) Policy holder full name is (?<Name>.*?)\s*company=(?<Company>.*)");
MatchCollection mc = re.Matches(strFileText);

Here in Line 4: mc.Count showing zero records.

Please see in my file again, I want only one expression to parse both line and will return both records.
Both records in this contains some extra different words.
Like 1st records is
cert:123456789  Policy holder full name is Smith, John  company=XYZ Company

and 2nd one is:
cert:99999999 some extra info here Policy holder full name is Thomas P. Johnson company= ACME

In both records, some extra info comes before "Policy holder full name is " words.

Now i want to parse both type of records by one expression:
The expression may be like this
Cert:xxxxxxxxx + Neglect all the words before "full name is" words + fullname is XXXX XXXX XX company=XXXXXXX

I don't have more idea about this kind regular expression, which neglect the words before particular words.

Please help me.

Thanks
Ganesh








Thanks
Ganesh
0
 
gdupadhyayAuthor Commented:
Hi ddrudik,
Thanks for understand my problems. Its really helpful.

I need your one more help. I want to learn more on Regex. Do you have some good stuff for me?

Thanks you very much.
Ganesh
0
 
ddrudikCommented:
If you visit that regex tester it has four link buttons in the upper right that would be the resources I use for regex, three online resources and one book.
0
 
ddrudikCommented:
Thanks for the question and the points.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.