Parsing a text file using C#


Hello group,

I have a text file and need to parse it using C#. What is the best way to parse it?


 Name : ABC  DEF                      Applicant ID:

 Date: 6/7/2009                        Test Form: A23

 Applied: Yes                            Code: 000001

 Number:163                            Score: 230

 
akohanAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

_Gerry_Commented:
There's more than one way to do this.
Line by line using something like

    using System.IO;
    ...
    using (var reader= new StreamReader("myfile.txt"))
    {
         var myline=string.Empty;
         while ((myline=reader.ReadLine()) != null)
         {
               // do something with the line....
         }
    }

or read the whole file into memory and work with it there:

using System.IO;
   ...
     var thewholefile = File.ReadAllText("myfile.txt");

    use string.split() etc. to break up the file afterwards.


I think I read somewhere that the first method is actually faster to execute.
As for what you do to the lines to parse them depends entirely on what you are expecting to find in the file :-)
 
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
akohanAuthor Commented:

Hi  Gerry,

Thanks yes that was I have done too. However, I had to do some cleaning since for situations like:

 Date: 6/7/2009                        Test Form: A23

I had to check the position of "Date:" and then extracting after ":" or same thing for "Test From:" etc using  IndexOf() and Substring() methods.

Any idea if I'm on a right track?

Thanks.


0
_Gerry_Commented:
That will do nicely and should work fine.
 
...or you could get clever with .Split() and LINQ   :)

        static void Main(string[] args)
        {
            string text = " Date: 6/7/2009                        Test Form: A23";
            System.Console.WriteLine("Original text: '{0}'", text);
 
            var words = from w in text.Split(' ',':','\t').AsEnumerable<string>() where w!=string.Empty select w;
            
            System.Console.WriteLine("{0} words in text:", words.Count());
 
            foreach (string s in words)
            {
                System.Console.WriteLine(s);
            }
            Console.ReadLine();
        }

Open in new window

0
Learn SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

akohanAuthor Commented:


Thanks for your comment. I don't know anything about Linq since I'm new to C# but will check it out for sure.

I will get back to you.

Thanks!
0
Tony McCreathTechnical SEO ConsultantCommented:
For linq you need C# 3.5

another way is to use regular expressions (Regex, System.Text.RegularExpressions).

What's important is that you define the syntax rules. Here's a sequential set of rules that may make sense:

Any character except ":" = first field name
":"
Any character except tab (\t) = first value
"\t"
Any character except ":" = second field name
":"
All other characters

This could be defined in regex like the following. It also strips spaces and names the capture groups  1=name1, 2=value1, 3=name2, 4=value2



 *(?<name1>[^:]*) *: *(?<value1>[^\t]*) *\t *(?<name2>[^:]*) *: *(?<value2>.*) *

Open in new window

0
akohanAuthor Commented:

Hello,

Thanks for your comments. I have attached the format I am receiving (after converting a specific file to text). Is the above method still good for it or should I change my approach?

In following example I will need to extract;
William Smith
5/2/2008
55 ( which is exam score)
1025409804

Once again thanks.

Regards,


Header file
Name of visitor: William Smith Signin Date: 5/2/2008 Position Applied For: Driver Number Correct (exam score): 55 Percentile Total (%total): 78 Median Score for Position: 52 Applicant ID: 1025409804 Test Form: ZipCode 1x20558 Job Code: 000001 Age Adjusted Score: 52 Equiv: 117 Suggested Hiring Range: 19 - 44

Open in new window

0
akohanAuthor Commented:

I just found out I'm using (base on help dialog) .NET 3.5 sp1

so I guess I can use Linq right?

0
_Gerry_Commented:
Yup, or regular expressions, or plain old IndexOf/Substring... that's why programming is so fun.
Your attached example is a bit tricky.... trying to separate the "William Smith" from "Signin" etc.  
You need to know in the code the full label names, making it very tricky for an otherwise excellent regular expression approach but easy for IndexOf/Substring and perhaps the rather terse Linq example I posted earlier.

In the interest of the principle of KISS (except I'm sure you're not stupid :-) maybe IndexOf/Substring is the best approach after all !
0
Tony McCreathTechnical SEO ConsultantCommented:
Will the field headings always be the same?

Name of visitor
Signin Date
Position Applied For
Number Correct (exam score)
Percentile Total (%total)
Median Score for Position
Applicant ID
Test Form
Job Code
Age Adjusted Score
Equiv
Suggested Hiring Range

This would make life easier.

I don't have time now. I'll try and put a Regex script together later.
0
akohanAuthor Commented:

Yes, it will be always like that.

Thanks.
0
Tony McCreathTechnical SEO ConsultantCommented:
I've just noticed your latest example is different to the original example.

Will all the fields ALWAYS be present and in the same order?
Will a single entry cover multiple lines?
Are you talking about one entry per file?
is "Header file" part of the data to parse?

0
akohanAuthor Commented:

Hi Tiggerito,

Yes, consider the last one since I'm generating them as the latter one.

0
Tony McCreathTechnical SEO ConsultantCommented:
Here's a simple Regex to gather the data if it is entered in exactly as you stated.

It is basically a copy of the example you provided with some alterations:

Any regex sensitive characters have been escaped. that is ( and ) were changed to \( and \)

The values have been replaced by the following capture sequences:

(?<fieldvalue>.*)

In each case 'fieldvalue' is change to the name of the field. This is saying, capture any number of characters into a group called 'fieldvalue'

Name of visitor: (?<name>.*) Signin Date: (?<date>.*) Position Applied For: (?<position>.*) Number Correct \(exam score\): (?<score>.*) Percentile Total \(%total\): (?<total>.*) Median Score for Position: (?<median>.*) Applicant ID: (?<id>.*) Test Form: (?<form>.*) Job Code: (?<job>.*) Age Adjusted Score: (?<agescore>.*) Equiv: (?<equiv>.*) Suggested Hiring Range: (?<range>.*)

Open in new window

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
.NET Programming

From novice to tech pro — start learning today.