Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1268
  • Last Modified:

File parsing with Regular Expression

Hi,
I am using Visual Studio 2005 and language C#
I want to parse a file to extract some data  using Regular Expression.
I used              using System.Text.RegularExpressions;
Please help me  to parse a attached file using regular expression which is based on some pattern
I want exactly cert,Name and company based on pattern for regular expression.

Currently i am using
String pattern = "cert:(?<CertNo>.{9})\\s\\S* full name is\\s+(?<Name>.{12})\\s+company=(?<Company>.{11})";
as a pattern, But its parse only first line not 2nd line, because the contents on 2nd line is differ and not matched with the pattern. Please help me to write exact pattern to parse both line.


Let me know if you have any question.

Thanks in Advance
Ganesh Dutt Upadhyay
SampleRegexFile2.txt
0
gdupadhyay
Asked:
gdupadhyay
  • 5
  • 3
1 Solution
 
ddrudikCommented:

Raw Match Pattern:
(?<=cert:)(?<CertNo>\d+)  Policy holder full name is (?<Name>.*?)\s*
company=(?<Company>.*)
 
C#.NET Code Example:
using System;
using System.Text.RegularExpressions;
namespace myapp
{
  class Class1
    {
      static void Main(string[] args)
        {
          String sourcestring = "source string to match with pattern";
          Regex re = new Regex(@"(?<=cert:)(?<CertNo>\d+)  Policy holder full name is (?<Name>.*?)\s*
company=(?<Company>.*)");
          MatchCollection mc = re.Matches(sourcestring);
          Int mIdx=0;
          foreach (Match m in mc)
           {
            for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
              {
                Console.WriteLine("[" + mIdx + "][" + re.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
              }
            mIdx++;
          }
        }
    }
}
 
$matches Array:
(
    [0] => Array
        (
            [0] => 123456789  Policy holder full name is Smith, John    
company=XYZ Company
        )
 
    [CertNo] => Array
        (
            [0] => 123456789
        )
 
    [1] => Array
        (
            [0] => 123456789
        )
 
    [Name] => Array
        (
            [0] => Smith, John
        )
 
    [2] => Array
        (
            [0] => Smith, John
        )
 
    [Company] => Array
        (
            [0] => XYZ Company
        )
 
    [3] => Array
        (
            [0] => XYZ Company
        )
 
)

Open in new window

0
 
gdupadhyayAuthor Commented:
Hi ddrudik,
Thanks for your response.
I run your code in my local system, its show mc.count=0.

Please Read this:
I am reading a file (I already attached it) and its contents is like this:

strFileText=cert:123456789  Policy holder full name is Smith, John    \r\ncompany=XYZ Company\r\ncert:99999999 some extra info here Policy holder full name is Thomas P. Johnson \r\ncompany= ACME\r\n"

When i run this code with:
MatchCollection mc = re.Matches(strFileText);

mc.count is zero in this case.
I am putting my code here:
*************************************************************************************************************
// strFile is Name of .txt file
string strFilePath = "UploadData" + "\\" + strFile;
StreamReader sr = new StreamReader(strFilePath);
strFileText = sr.ReadToEnd();
Regex re = new Regex(@"?<=cert:(?<CertNo>\\d+)Policy holder full name is(?<Name>.*?)\\s*company=(?<Company>.*)");
MatchCollection mc = re.Matches(strFileText);
*************************************************************************************************************
My Required output should me
Cert:123456789
Name:Smith, John
Company:XYZ Company

Cert:99999999
Name:Thomas P. Johnson
Company:ACME

Thanks
Ganesh

0
 
ddrudikCommented:
You may want to show your code, loading your file from the provided URL your matches are found with an online regex tester:
http://www.myregextester.com/?r=213
0
Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

 
gdupadhyayAuthor Commented:
But i am getting zero records on my local. Online tools also showing only 1 matched records. I want both records. As i explain before, I have to parse this file.

My Code is (Only 4 line) :
string strFilePath = "";
StreamReader sr = new StreamReader(C:\\UploadData\\SampleRegexFile2.txt);
strFileText = sr.ReadToEnd();    // Reading the file and store the data in a string variable
Regex re = new Regex(@"(?<=cert:)(?<CertNo>\d+) Policy holder full name is (?<Name>.*?)\s*company=(?<Company>.*)");
MatchCollection mc = re.Matches(strFileText);

Here in Line 4: mc.Count showing zero records.

Please see in my file again, I want only one expression to parse both line and will return both records.
Both records in this contains some extra different words.
Like 1st records is
cert:123456789  Policy holder full name is Smith, John  company=XYZ Company

and 2nd one is:
cert:99999999 some extra info here Policy holder full name is Thomas P. Johnson company= ACME

In both records, some extra info comes before "Policy holder full name is " words.

Now i want to parse both type of records by one expression:
The expression may be like this
Cert:xxxxxxxxx + Neglect all the words before "full name is" words + fullname is XXXX XXXX XX company=XXXXXXX

I don't have more idea about this kind regular expression, which neglect the words before particular words.

Please help me.

Thanks
Ganesh








Thanks
Ganesh
0
 
ddrudikCommented:
Try this revision:
using System;
using System.IO;
using System.Text.RegularExpressions;
namespace myapp
{
	class Class1
	{
		static void Main(string[] args)
		{
			StreamReader sr = new StreamReader(@"C:\UploadData\SampleRegexFile2.txt");
			String strFileText = sr.ReadToEnd();    // Reading the file and store the data in a string variable
			Regex re = new Regex(@"cert:(?<CertNo>\d+).*?Policy holder full name is (?<Name>.*?)\s*company= *(?<Company>.*)");
			MatchCollection mc = re.Matches(strFileText);
			int mIdx=0;
			foreach (Match m in mc)
			{
				for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
				{
					Console.WriteLine("[" + mIdx + "][" + re.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
				}
				mIdx++;
			}
		}
	}
}

Open in new window

0
 
gdupadhyayAuthor Commented:
Hi ddrudik,
Thanks for understand my problems. Its really helpful.

I need your one more help. I want to learn more on Regex. Do you have some good stuff for me?

Thanks you very much.
Ganesh
0
 
ddrudikCommented:
If you visit that regex tester it has four link buttons in the upper right that would be the resources I use for regex, three online resources and one book.
0
 
ddrudikCommented:
Thanks for the question and the points.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 5
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now