• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 253
  • Last Modified:

vb.net - need help parsing text file

I have a text file that I need to parse through. It contains the medical dictations and each new dictation is separated by <New Record>.  Using split I can easily separate out the individual letters.

Within each letter (after the <New Record> tag) I have the following tags

<New Record>
<Jane Nobody>
<12555>
<Birth Date>
<04/13/2011>
<John R. Doe, M.D.>
<Office Visit>
<Correspondence To>
<Correspondence Subject>

Then after the last tag is the contents of the msg.

I need an efficient way to loop through each tag and pull out the info into separate fields.
The tags are positional in that the same information will be in the same position each time. If the data is missing then just the tag name will be there (like <Birth Date> above is not supplied so the tag is left there.

The tag positions are

<Patient Name>
<Med. Record>
<DOB>
<Date of Service>
<Dictating MD>
<Type of dictation (chart note, letter)>
<Correspondence To>
<Correspondence Subject>

So ultimately I need to break the data into separate variables and the msg that follows into its own variable.

I've attached a sample text document.

I'm using vb.net 2010.

sampledoc.txt
0
rutledgj
Asked:
rutledgj
  • 3
1 Solution
 
käµfm³d 👽Commented:
Is there a guarantee of structure in this file? What I mean is that it looks like the patient's name occurs first, then some kind of ID, then a tag of "Birth Date", then the actual birth date's value. Is this always the case? Can any fields be missing? Also, why is that "Birth Date" appears to have a preceding label, but fields like name and doctor name do not?
0
 
rutledgjAuthor Commented:
The fields are always in the same location

<Patient Name>
<Med. Record>
<Birth Date>
<Date of Service>
<Dictating MD>
<Type of dictation (chart note, letter)>
<Correspondence To>
<Correspondence Subject>

So in the example I gave the Birth date is missing so just the tag is there. The date showing is actually the Date of Service.

If data is missing in any position then the tag is there. If there is data for that tag then it replaces the tag. The structure is the same for each new record.
0
 
käµfm³d 👽Commented:
I see. I'd say something along these lines could suffice:

namespace _27379697
{
    class Program
    {
        static void Main(string[] args)
        {
            System.Collections.Generic.List<MedicalRecord> records = new System.Collections.Generic.List<MedicalRecord>();

            using (System.IO.StreamReader reader = new System.IO.StreamReader("sampledoc.txt"))
            {
                const string NEWREC = "<New Record>";

                // Skip non-new-record lines at beginning of file
                while (reader.ReadLine() != NEWREC) ;

                while (!reader.EndOfStream)
                {
                    MedicalRecord curRec = new MedicalRecord();
                    System.Text.StringBuilder message = new System.Text.StringBuilder();
                    string line;

                    curRec.PatientName = reader.ReadLine();
                    curRec.MedRecordID = reader.ReadLine();
                    curRec.BirthDate = reader.ReadLine();
                    curRec.DateOfService = reader.ReadLine();
                    curRec.DictatingMD = reader.ReadLine();
                    curRec.TypeOfDictation = reader.ReadLine();
                    curRec.CorrespondenceTo = reader.ReadLine();
                    curRec.CorrespondenceSubject = reader.ReadLine();

                    while (!reader.EndOfStream && (line = reader.ReadLine()) != NEWREC) message.AppendLine(line);

                    curRec.Message = message.ToString();

                    records.Add(curRec);
                }
            }
        }
    }

    public class MedicalRecord
    {
        public string PatientName { get; set; }
        public string MedRecordID { get; set; }
        public string BirthDate { get; set; }
        public string DateOfService { get; set; }
        public string DictatingMD { get; set; }
        public string TypeOfDictation { get; set; }
        public string CorrespondenceTo { get; set; }
        public string CorrespondenceSubject { get; set; }
        public string Message { get; set; }
     }
}

Open in new window


It's wholly dependent on your file being structured correctly. There is no error handling and I haven't done any special formatting to the values. Let me know if any parts are confusing  = )
0
 
käµfm³d 👽Commented:
Hmmm...   I guess C# code doesn't do you much good in VB, huh?  Sorry for that. Corrected below:

Namespace _27379697
    Class Program
        Private Shared Sub Main(args As String())
            Dim records As New System.Collections.Generic.List(Of MedicalRecord)()

            Using reader As New System.IO.StreamReader("sampledoc.txt")
                Const  NEWREC As String = "<New Record>"

                ' Skip non-new-record lines at beginning of file
                While reader.ReadLine() <> NEWREC
                    

                End While

                While Not reader.EndOfStream
                    Dim curRec As New MedicalRecord()
                    Dim message As New System.Text.StringBuilder()
                    Dim line As String

                    curRec.PatientName = reader.ReadLine()
                    curRec.MedRecordID = reader.ReadLine()
                    curRec.BirthDate = reader.ReadLine()
                    curRec.DateOfService = reader.ReadLine()
                    curRec.DictatingMD = reader.ReadLine()
                    curRec.TypeOfDictation = reader.ReadLine()
                    curRec.CorrespondenceTo = reader.ReadLine()
                    curRec.CorrespondenceSubject = reader.ReadLine()

                    While Not reader.EndOfStream AndAlso (InlineAssignHelper(line, reader.ReadLine())) <> NEWREC
                        message.AppendLine(line)
                    End While

                    curRec.Message = message.ToString()

                    records.Add(curRec)
                End While
            End Using
        End Sub
        Private Shared Function InlineAssignHelper(Of T)(ByRef target As T, value As T) As T
            target = value
            Return value
        End Function
    End Class

    Public Class MedicalRecord
        Public Property PatientName() As String
            Get
                Return m_PatientName
            End Get
            Set
                m_PatientName = Value
            End Set
        End Property
        Private m_PatientName As String
        Public Property MedRecordID() As String
            Get
                Return m_MedRecordID
            End Get
            Set
                m_MedRecordID = Value
            End Set
        End Property
        Private m_MedRecordID As String
        Public Property BirthDate() As String
            Get
                Return m_BirthDate
            End Get
            Set
                m_BirthDate = Value
            End Set
        End Property
        Private m_BirthDate As String
        Public Property DateOfService() As String
            Get
                Return m_DateOfService
            End Get
            Set
                m_DateOfService = Value
            End Set
        End Property
        Private m_DateOfService As String
        Public Property DictatingMD() As String
            Get
                Return m_DictatingMD
            End Get
            Set
                m_DictatingMD = Value
            End Set
        End Property
        Private m_DictatingMD As String
        Public Property TypeOfDictation() As String
            Get
                Return m_TypeOfDictation
            End Get
            Set
                m_TypeOfDictation = Value
            End Set
        End Property
        Private m_TypeOfDictation As String
        Public Property CorrespondenceTo() As String
            Get
                Return m_CorrespondenceTo
            End Get
            Set
                m_CorrespondenceTo = Value
            End Set
        End Property
        Private m_CorrespondenceTo As String
        Public Property CorrespondenceSubject() As String
            Get
                Return m_CorrespondenceSubject
            End Get
            Set
                m_CorrespondenceSubject = Value
            End Set
        End Property
        Private m_CorrespondenceSubject As String
        Public Property Message() As String
            Get
                Return m_Message
            End Get
            Set
                m_Message = Value
            End Set
        End Property
        Private m_Message As String
    End Class
End Namespace

Open in new window


Feel free to use the automatic properties in VB.NET 2010. I used an online converter, and it fully expanded the property definitions  = )
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now