Link to home
Start Free TrialLog in
Avatar of rutledgj
rutledgj

asked on

vb.net - need help parsing text file

I have a text file that I need to parse through. It contains the medical dictations and each new dictation is separated by <New Record>.  Using split I can easily separate out the individual letters.

Within each letter (after the <New Record> tag) I have the following tags

<New Record>
<Jane Nobody>
<12555>
<Birth Date>
<04/13/2011>
<John R. Doe, M.D.>
<Office Visit>
<Correspondence To>
<Correspondence Subject>

Then after the last tag is the contents of the msg.

I need an efficient way to loop through each tag and pull out the info into separate fields.
The tags are positional in that the same information will be in the same position each time. If the data is missing then just the tag name will be there (like <Birth Date> above is not supplied so the tag is left there.

The tag positions are

<Patient Name>
<Med. Record>
<DOB>
<Date of Service>
<Dictating MD>
<Type of dictation (chart note, letter)>
<Correspondence To>
<Correspondence Subject>

So ultimately I need to break the data into separate variables and the msg that follows into its own variable.

I've attached a sample text document.

I'm using vb.net 2010.

sampledoc.txt
Avatar of kaufmed
kaufmed
Flag of United States of America image

Is there a guarantee of structure in this file? What I mean is that it looks like the patient's name occurs first, then some kind of ID, then a tag of "Birth Date", then the actual birth date's value. Is this always the case? Can any fields be missing? Also, why is that "Birth Date" appears to have a preceding label, but fields like name and doctor name do not?
Avatar of rutledgj
rutledgj

ASKER

The fields are always in the same location

<Patient Name>
<Med. Record>
<Birth Date>
<Date of Service>
<Dictating MD>
<Type of dictation (chart note, letter)>
<Correspondence To>
<Correspondence Subject>

So in the example I gave the Birth date is missing so just the tag is there. The date showing is actually the Date of Service.

If data is missing in any position then the tag is there. If there is data for that tag then it replaces the tag. The structure is the same for each new record.
I see. I'd say something along these lines could suffice:

namespace _27379697
{
    class Program
    {
        static void Main(string[] args)
        {
            System.Collections.Generic.List<MedicalRecord> records = new System.Collections.Generic.List<MedicalRecord>();

            using (System.IO.StreamReader reader = new System.IO.StreamReader("sampledoc.txt"))
            {
                const string NEWREC = "<New Record>";

                // Skip non-new-record lines at beginning of file
                while (reader.ReadLine() != NEWREC) ;

                while (!reader.EndOfStream)
                {
                    MedicalRecord curRec = new MedicalRecord();
                    System.Text.StringBuilder message = new System.Text.StringBuilder();
                    string line;

                    curRec.PatientName = reader.ReadLine();
                    curRec.MedRecordID = reader.ReadLine();
                    curRec.BirthDate = reader.ReadLine();
                    curRec.DateOfService = reader.ReadLine();
                    curRec.DictatingMD = reader.ReadLine();
                    curRec.TypeOfDictation = reader.ReadLine();
                    curRec.CorrespondenceTo = reader.ReadLine();
                    curRec.CorrespondenceSubject = reader.ReadLine();

                    while (!reader.EndOfStream && (line = reader.ReadLine()) != NEWREC) message.AppendLine(line);

                    curRec.Message = message.ToString();

                    records.Add(curRec);
                }
            }
        }
    }

    public class MedicalRecord
    {
        public string PatientName { get; set; }
        public string MedRecordID { get; set; }
        public string BirthDate { get; set; }
        public string DateOfService { get; set; }
        public string DictatingMD { get; set; }
        public string TypeOfDictation { get; set; }
        public string CorrespondenceTo { get; set; }
        public string CorrespondenceSubject { get; set; }
        public string Message { get; set; }
     }
}

Open in new window


It's wholly dependent on your file being structured correctly. There is no error handling and I haven't done any special formatting to the values. Let me know if any parts are confusing  = )
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial