Solved

Parsing a text file using C#

Posted on 2009-07-13
13
541 Views
Last Modified: 2013-12-17

Hello group,

I have a text file and need to parse it using C#. What is the best way to parse it?


 Name : ABC  DEF                      Applicant ID:

 Date: 6/7/2009                        Test Form: A23

 Applied: Yes                            Code: 000001

 Number:163                            Score: 230

 
0
Comment
Question by:akohan
  • 6
  • 4
  • 3
13 Comments
 
LVL 3

Accepted Solution

by:
_Gerry_ earned 500 total points
ID: 24846016
There's more than one way to do this.
Line by line using something like

    using System.IO;
    ...
    using (var reader= new StreamReader("myfile.txt"))
    {
         var myline=string.Empty;
         while ((myline=reader.ReadLine()) != null)
         {
               // do something with the line....
         }
    }

or read the whole file into memory and work with it there:

using System.IO;
   ...
     var thewholefile = File.ReadAllText("myfile.txt");

    use string.split() etc. to break up the file afterwards.


I think I read somewhere that the first method is actually faster to execute.
As for what you do to the lines to parse them depends entirely on what you are expecting to find in the file :-)
 
0
 

Author Comment

by:akohan
ID: 24846908

Hi  Gerry,

Thanks yes that was I have done too. However, I had to do some cleaning since for situations like:

 Date: 6/7/2009                        Test Form: A23

I had to check the position of "Date:" and then extracting after ":" or same thing for "Test From:" etc using  IndexOf() and Substring() methods.

Any idea if I'm on a right track?

Thanks.


0
 
LVL 3

Expert Comment

by:_Gerry_
ID: 24847201
That will do nicely and should work fine.
 
...or you could get clever with .Split() and LINQ   :)

        static void Main(string[] args)

        {

            string text = " Date: 6/7/2009                        Test Form: A23";

            System.Console.WriteLine("Original text: '{0}'", text);
 

            var words = from w in text.Split(' ',':','\t').AsEnumerable<string>() where w!=string.Empty select w;

            

            System.Console.WriteLine("{0} words in text:", words.Count());
 

            foreach (string s in words)

            {

                System.Console.WriteLine(s);

            }

            Console.ReadLine();

        }

Open in new window

0
 

Author Comment

by:akohan
ID: 24847310


Thanks for your comment. I don't know anything about Linq since I'm new to C# but will check it out for sure.

I will get back to you.

Thanks!
0
 
LVL 23

Expert Comment

by:Tiggerito
ID: 24849466
For linq you need C# 3.5

another way is to use regular expressions (Regex, System.Text.RegularExpressions).

What's important is that you define the syntax rules. Here's a sequential set of rules that may make sense:

Any character except ":" = first field name
":"
Any character except tab (\t) = first value
"\t"
Any character except ":" = second field name
":"
All other characters

This could be defined in regex like the following. It also strips spaces and names the capture groups  1=name1, 2=value1, 3=name2, 4=value2



 *(?<name1>[^:]*) *: *(?<value1>[^\t]*) *\t *(?<name2>[^:]*) *: *(?<value2>.*) *

Open in new window

0
 

Author Comment

by:akohan
ID: 24854962

Hello,

Thanks for your comments. I have attached the format I am receiving (after converting a specific file to text). Is the above method still good for it or should I change my approach?

In following example I will need to extract;
William Smith
5/2/2008
55 ( which is exam score)
1025409804

Once again thanks.

Regards,



Header file

Name of visitor: William Smith Signin Date: 5/2/2008 Position Applied For: Driver Number Correct (exam score): 55 Percentile Total (%total): 78 Median Score for Position: 52 Applicant ID: 1025409804 Test Form: ZipCode 1x20558 Job Code: 000001 Age Adjusted Score: 52 Equiv: 117 Suggested Hiring Range: 19 - 44

Open in new window

0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:akohan
ID: 24854973

I just found out I'm using (base on help dialog) .NET 3.5 sp1

so I guess I can use Linq right?

0
 
LVL 3

Expert Comment

by:_Gerry_
ID: 24855770
Yup, or regular expressions, or plain old IndexOf/Substring... that's why programming is so fun.
Your attached example is a bit tricky.... trying to separate the "William Smith" from "Signin" etc.  
You need to know in the code the full label names, making it very tricky for an otherwise excellent regular expression approach but easy for IndexOf/Substring and perhaps the rather terse Linq example I posted earlier.

In the interest of the principle of KISS (except I'm sure you're not stupid :-) maybe IndexOf/Substring is the best approach after all !
0
 
LVL 23

Expert Comment

by:Tiggerito
ID: 24857667
Will the field headings always be the same?

Name of visitor
Signin Date
Position Applied For
Number Correct (exam score)
Percentile Total (%total)
Median Score for Position
Applicant ID
Test Form
Job Code
Age Adjusted Score
Equiv
Suggested Hiring Range

This would make life easier.

I don't have time now. I'll try and put a Regex script together later.
0
 

Author Comment

by:akohan
ID: 24862727

Yes, it will be always like that.

Thanks.
0
 
LVL 23

Expert Comment

by:Tiggerito
ID: 24867876
I've just noticed your latest example is different to the original example.

Will all the fields ALWAYS be present and in the same order?
Will a single entry cover multiple lines?
Are you talking about one entry per file?
is "Header file" part of the data to parse?

0
 

Author Comment

by:akohan
ID: 24897072

Hi Tiggerito,

Yes, consider the last one since I'm generating them as the latter one.

0
 
LVL 23

Expert Comment

by:Tiggerito
ID: 24902641
Here's a simple Regex to gather the data if it is entered in exactly as you stated.

It is basically a copy of the example you provided with some alterations:

Any regex sensitive characters have been escaped. that is ( and ) were changed to \( and \)

The values have been replaced by the following capture sequences:

(?<fieldvalue>.*)

In each case 'fieldvalue' is change to the name of the field. This is saying, capture any number of characters into a group called 'fieldvalue'

Name of visitor: (?<name>.*) Signin Date: (?<date>.*) Position Applied For: (?<position>.*) Number Correct \(exam score\): (?<score>.*) Percentile Total \(%total\): (?<total>.*) Median Score for Position: (?<median>.*) Applicant ID: (?<id>.*) Test Form: (?<form>.*) Job Code: (?<job>.*) Age Adjusted Score: (?<agescore>.*) Equiv: (?<equiv>.*) Suggested Hiring Range: (?<range>.*)

Open in new window

0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

For those of you who don't follow the news, or just happen to live under rocks, Microsoft Research released a beta SDK (http://www.microsoft.com/en-us/download/details.aspx?id=27876) for the Xbox 360 Kinect. If you don't know what a Kinect is (http:…
Wouldn’t it be nice if you could test whether an element is contained in an array by using a Contains method just like the one available on List objects? Wouldn’t it be good if you could write code like this? (CODE) In .NET 3.5, this is possible…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now