Solved

File parsing with Regular Expression

Posted on 2008-06-26
8
1,236 Views
Last Modified: 2013-12-16
Hi,
I am using Visual Studio 2005 and language C#
I want to parse a file to extract some data  using Regular Expression.
I used              using System.Text.RegularExpressions;
Please help me  to parse a attached file using regular expression which is based on some pattern
I want exactly cert,Name and company based on pattern for regular expression.

Currently i am using
String pattern = "cert:(?<CertNo>.{9})\\s\\S* full name is\\s+(?<Name>.{12})\\s+company=(?<Company>.{11})";
as a pattern, But its parse only first line not 2nd line, because the contents on 2nd line is differ and not matched with the pattern. Please help me to write exact pattern to parse both line.


Let me know if you have any question.

Thanks in Advance
Ganesh Dutt Upadhyay
SampleRegexFile2.txt
0
Comment
Question by:gdupadhyay
  • 5
  • 3
8 Comments
 
LVL 27

Expert Comment

by:ddrudik
ID: 21877842

Raw Match Pattern:

(?<=cert:)(?<CertNo>\d+)  Policy holder full name is (?<Name>.*?)\s*

company=(?<Company>.*)
 

C#.NET Code Example:

using System;

using System.Text.RegularExpressions;

namespace myapp

{

  class Class1

    {

      static void Main(string[] args)

        {

          String sourcestring = "source string to match with pattern";

          Regex re = new Regex(@"(?<=cert:)(?<CertNo>\d+)  Policy holder full name is (?<Name>.*?)\s*

company=(?<Company>.*)");

          MatchCollection mc = re.Matches(sourcestring);

          Int mIdx=0;

          foreach (Match m in mc)

           {

            for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)

              {

                Console.WriteLine("[" + mIdx + "][" + re.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);

              }

            mIdx++;

          }

        }

    }

}
 

$matches Array:

(

    [0] => Array

        (

            [0] => 123456789  Policy holder full name is Smith, John    

company=XYZ Company

        )
 

    [CertNo] => Array

        (

            [0] => 123456789

        )
 

    [1] => Array

        (

            [0] => 123456789

        )
 

    [Name] => Array

        (

            [0] => Smith, John

        )
 

    [2] => Array

        (

            [0] => Smith, John

        )
 

    [Company] => Array

        (

            [0] => XYZ Company

        )
 

    [3] => Array

        (

            [0] => XYZ Company

        )
 

)

Open in new window

0
 
LVL 9

Author Comment

by:gdupadhyay
ID: 21878233
Hi ddrudik,
Thanks for your response.
I run your code in my local system, its show mc.count=0.

Please Read this:
I am reading a file (I already attached it) and its contents is like this:

strFileText=cert:123456789  Policy holder full name is Smith, John    \r\ncompany=XYZ Company\r\ncert:99999999 some extra info here Policy holder full name is Thomas P. Johnson \r\ncompany= ACME\r\n"

When i run this code with:
MatchCollection mc = re.Matches(strFileText);

mc.count is zero in this case.
I am putting my code here:
*************************************************************************************************************
// strFile is Name of .txt file
string strFilePath = "UploadData" + "\\" + strFile;
StreamReader sr = new StreamReader(strFilePath);
strFileText = sr.ReadToEnd();
Regex re = new Regex(@"?<=cert:(?<CertNo>\\d+)Policy holder full name is(?<Name>.*?)\\s*company=(?<Company>.*)");
MatchCollection mc = re.Matches(strFileText);
*************************************************************************************************************
My Required output should me
Cert:123456789
Name:Smith, John
Company:XYZ Company

Cert:99999999
Name:Thomas P. Johnson
Company:ACME

Thanks
Ganesh

0
 
LVL 27

Expert Comment

by:ddrudik
ID: 21878397
You may want to show your code, loading your file from the provided URL your matches are found with an online regex tester:
http://www.myregextester.com/?r=213
0
 
LVL 9

Author Comment

by:gdupadhyay
ID: 21878680
But i am getting zero records on my local. Online tools also showing only 1 matched records. I want both records. As i explain before, I have to parse this file.

My Code is (Only 4 line) :
string strFilePath = "";
StreamReader sr = new StreamReader(C:\\UploadData\\SampleRegexFile2.txt);
strFileText = sr.ReadToEnd();    // Reading the file and store the data in a string variable
Regex re = new Regex(@"(?<=cert:)(?<CertNo>\d+) Policy holder full name is (?<Name>.*?)\s*company=(?<Company>.*)");
MatchCollection mc = re.Matches(strFileText);

Here in Line 4: mc.Count showing zero records.

Please see in my file again, I want only one expression to parse both line and will return both records.
Both records in this contains some extra different words.
Like 1st records is
cert:123456789  Policy holder full name is Smith, John  company=XYZ Company

and 2nd one is:
cert:99999999 some extra info here Policy holder full name is Thomas P. Johnson company= ACME

In both records, some extra info comes before "Policy holder full name is " words.

Now i want to parse both type of records by one expression:
The expression may be like this
Cert:xxxxxxxxx + Neglect all the words before "full name is" words + fullname is XXXX XXXX XX company=XXXXXXX

I don't have more idea about this kind regular expression, which neglect the words before particular words.

Please help me.

Thanks
Ganesh








Thanks
Ganesh
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 27

Accepted Solution

by:
ddrudik earned 500 total points
ID: 21878851
Try this revision:
using System;

using System.IO;

using System.Text.RegularExpressions;

namespace myapp

{

	class Class1

	{

		static void Main(string[] args)

		{

			StreamReader sr = new StreamReader(@"C:\UploadData\SampleRegexFile2.txt");

			String strFileText = sr.ReadToEnd();    // Reading the file and store the data in a string variable

			Regex re = new Regex(@"cert:(?<CertNo>\d+).*?Policy holder full name is (?<Name>.*?)\s*company= *(?<Company>.*)");

			MatchCollection mc = re.Matches(strFileText);

			int mIdx=0;

			foreach (Match m in mc)

			{

				for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)

				{

					Console.WriteLine("[" + mIdx + "][" + re.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);

				}

				mIdx++;

			}

		}

	}

}

Open in new window

0
 
LVL 9

Author Comment

by:gdupadhyay
ID: 21879041
Hi ddrudik,
Thanks for understand my problems. Its really helpful.

I need your one more help. I want to learn more on Regex. Do you have some good stuff for me?

Thanks you very much.
Ganesh
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 21879055
If you visit that regex tester it has four link buttons in the upper right that would be the resources I use for regex, three online resources and one book.
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 21879057
Thanks for the question and the points.
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
In my previous two articles we discussed Binary Serialization (http://www.experts-exchange.com/A_4362.html) and XML Serialization (http://www.experts-exchange.com/A_4425.html). In this article we will try to know more about SOAP (Simple Object Acces…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now