[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Split a complex file based on multi-char delimiters

Posted on 2011-05-11
9
Medium Priority
?
274 Views
Last Modified: 2012-05-11
I have a large (potentially up to 1Mb) text file which I need to split before translating segments and writing results to Excel.

The first problem is that the file is one long line - ie there are no CrLf characters.

The second problem is that the data segments are delimited by multiple characters such as LIN, ATT, DTM.

The third problem is that the file also contains a header and trailer sections which are in similar formats.

In the attached sample file, I want to obtain the following result (pipes shown to delimit cells):

JONES:M:A:MR|AB112233B|1ZA700001003642|20110504|20110504|2012|BR|N
JAMES:B::MR|BC223344B||20110504|20110504|2012|D0'ATT+ZZZ++2:4:IR|N
JOHNSON:K:E:MS|CD334455B|ZA700001004591|20110504|20110504|2012|0T|Y
JACKSON:C:K:MR|DE445566B|1ZA709991003627|20110504|20110504|2012|BR|N

As you can see, there is an added complication that not all values are guaranteed to be present, but information needs to be created whether a piece of information is present or not.

I would prefer to use VB6, as I already have the ftp code required to retrieve the file in that, but it could be in VB.NET 2008.

(For those interested, the format is EDIFACT as specified by the UK Inland Revenue.)

 sample.txt
0
Comment
Question by:christinaemmm
  • 5
  • 3
9 Comments
 
LVL 19

Expert Comment

by:Bardobrave
ID: 35736831
Well, your main problem here is that your string code is potentially too large for a single string variable to store it.

So here you should loop through the file splitting it. Probably the best option could be to rewrite it to a text file where you insert line feeds where necessary and treat the rest of the data segments delimiters.

What you need to do is to create a variable to store the potentially largest delimiter and read as many chars from file as to per this delimiter (i.e. in your example you'll read 3 chars as the delimiters you put are all 3 chars long).

Then you see if the variable contains any value that fits to a delimiter, if so, you insert into your destiny file the equivalent value and empty the variable, if not, you pop the first character from your variable, insert it into the destiny file and push a new character into the variable readed from source file.

Loop through all source file and you'll get a clean translate of the file wich you'll be able to manage easily.
0
 
LVL 3

Expert Comment

by:Avinash Desai
ID: 35736888
Hi

Please try with the below code which splits the stings with more than one defined delimiter

    Private Sub MultipleSplitFunction(ByVal Value As String)
        Dim splitArray() As String
        Dim Saparator(3) As String
        Saparator(0) = "TMT"
        Saparator(1) = "VSM"
        Saparator(2) = "PPT"
        Saparator(3) = "XLS"

        splitArray = Value.Split(Saparator, StringSplitOptions.None)

    End Sub



I supplied the test string as "test for the value TMT and the VSM there are some more words like PPT and final XLS"

if i am wrong please let me know
0
 

Author Comment

by:christinaemmm
ID: 35736960
Thanks, bardobrave - that's the way I know I can do it.

Avinash - is this only in .net? Split() in vb6 only accepts single char delimiter.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 3

Expert Comment

by:Avinash Desai
ID: 35737046
Hi

in .net it is .. in Vb6 it accepts single char. Then you need to use the loop and split the data in Vb6.

if i get how to do it will post it here soon
0
 
LVL 3

Expert Comment

by:Avinash Desai
ID: 35737053
Hi One Question these delimiters are in Sequence like
first LIN
second  ATT
Third DTM. ???

if yes then you can do it by looping
0
 

Author Comment

by:christinaemmm
ID: 35737459
Yes, always in the same order, if they are present.
0
 

Author Comment

by:christinaemmm
ID: 35745582
Well, I have written a custom solution for this, which depends on on the delimiters being in sequence.

Firstly, a function to open and return a string from the file up to and including the specified delimiter, then another function to parse the string into variables and write them to the s/s.

In case it is useful to someone, this is the file read function:

 
Private Function GetFileSection(ByVal strFilename As String, _
                                ByVal strFind As String) As String
' Opens strFilename and holds it open to continue returning strings
' terminated by strFind until strFilename changes. Returns a string
' from the start of file or the current location up to and including
' strFind or up to EOF.
' Limited to filesizes at or below 2,147,483,647 bytes

  Dim i As Integer, j As Integer
  Dim fileSize As Long
  Dim bufferSize As Integer
  Dim tempBuffer As String
  Dim leftOvers As String
  
  Static ff As Integer
  Static strCurrentFile As String
  Static buffer As String
  
  If ff < 1 Or strFilename <> strCurrentFile Then
    '--- (re)set ---'
    ff = FreeFile
    strCurrentFile = strFilename
    Open strFilename For Binary Access Read As #ff
    bufferSize = 4096   ' read in this many chars at a time - must be more than longest section
  End If
  If buffer = "" Then
    If LOF(ff) < bufferSize Then bufferSize = LOF(ff)
    buffer = Space(bufferSize)
    Get #ff, , buffer
    tempBuffer = buffer
  ElseIf Len(tempBuffer) < bufferSize And Loc(ff) < LOF(ff) Then
    leftOvers = buffer
    If LOF(ff) - Loc(ff) < bufferSize Then bufferSize = LOF(ff) - Loc(ff)
    buffer = Space(bufferSize)
    Get #ff, , buffer
    tempBuffer = leftOvers & buffer
  Else
    tempBuffer = buffer
  End If
  i = InStr(tempBuffer, strFind)
  If i > 0 Then
    j = Len(strFind)
    GetFileSection = Left(tempBuffer, i + j - 1)
    buffer = Mid(tempBuffer, i + j)
  Else
    GetFileSection = tempBuffer
    buffer = ""
  End If

End Function

Open in new window


The second function works in a similar way to locate parts of the string returned from GetFileSection and put the values in variables.
0
 

Accepted Solution

by:
christinaemmm earned 0 total points
ID: 35920869
For completeness, here is the final version of my code above - there was an error in reading multiple sections from larger files.

This works fine, though it is more effort than I had hoped to use, so I hope someone else can make use of it too.
Private Const SECTION_TERMINATOR = "'LIN+"
Private Const PAYE_PRECURSOR = "ATT+ZZZ++7:2:IR:"
Private Const PAYE_SUCCESSOR = "'NAD+TC'"
Private Const NAME_PRECURSOR = "'NAD+BV+++"
Private Const NAME_SUCCESSOR = "'"
Private Const NI_PRECURSOR = "ATT+ZZZ++11:2:IR:"
Private Const NI_SUCCESSOR = "'"
Private Const ISSUEDT_PRECURSOR = "DTM+182:"
Private Const ISSUEDT_SUCCESSOR = ":102'"
Private Const EFFDT_PRECURSOR = "DTM+7:"
Private Const EFFDT_SUCCESSOR = ":102'"
Private Const YEAR_PRECURSOR = "DTM+166:"
Private Const YEAR_SUCCESSOR = ":602'"
Private Const CODE_PRECURSOR = "ALC+ZZZ++++4:1:IR:"
Private Const CODE_SUCCESSOR = "'"

Private Const MAX_BUFFER = 4096 ' this must be big enough to guarantee getting at least one complete section


' Call the function like this:
' strLine = GetFileSection(sNextFilename, SECTION_TERMINATOR)

Private Function GetFileSection(ByVal strFilename As String, _
                                ByVal strFind As String) As String
' Opens strFilename and holds it open to continue returning strings
' terminated by strFind until strFilename changes. Returns a string
' from the start of file or the current location up to and including
' strFind or up to EOF.
' Limited to filesizes at or below 2,147,483,647 bytes

  Dim i As Integer, j As Integer
  Dim fileSize As Long
  Dim tempBuffer As String
  Dim leftOvers As String
  
  Static ff As Integer
  Static bufferSize As Integer
  Static strCurrentFile As String
  Static buffer As String
  
  If ff < 1 Or strFilename <> strCurrentFile Then
    '--- (re)set ---'
    If ff > 0 Then
      Close #ff
      ff = 0
    End If
    ff = FreeFile
    If strCurrentFile <> "" Then
      Kill strCurrentFile ' delete previous file
    End If
    If strFilename = "" Then  ' kill last file
      Exit Function
    End If
    strCurrentFile = strFilename
    Open strFilename For Binary Access Read As #ff
    bufferSize = MAX_BUFFER   ' read in up to this many chars at a time - must be more than longest section
    buffer = ""
  End If
  '--- get (more) chars ---'
  If buffer = "" Then
    If LOF(ff) < bufferSize Then bufferSize = LOF(ff)
    buffer = Space(bufferSize)
    Get #ff, , buffer
    tempBuffer = buffer
  ElseIf InStr(buffer, strFind) < 1 Then
    ' need to get more chars
    leftOvers = buffer
    bufferSize = MAX_BUFFER - Len(leftOvers)
    If (LOF(ff) - Loc(ff)) < bufferSize Then bufferSize = LOF(ff) - Loc(ff)
    buffer = Space(bufferSize)
    Get #ff, , buffer
    tempBuffer = leftOvers & buffer
  Else
    ' last part of file
    tempBuffer = buffer
    bufferSize = 0
  End If
  '--- return string required, keep rest ---'
  i = InStr(tempBuffer, strFind)
  If i > 0 Then
    j = Len(strFind)
    GetFileSection = Left(tempBuffer, i + j - 1)
    buffer = Mid(tempBuffer, i + j)
  Else
    GetFileSection = tempBuffer
    buffer = ""
  End If

End Function

Open in new window

0
 

Author Closing Comment

by:christinaemmm
ID: 35951064
This is my own solution - I don't think it is the best possible, but it does work as required.
0

Featured Post

Upgrade your Question Security!

Add Premium security features to your question to ensure its privacy or anonymity. Learn more about your ability to control Question Security today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Most everyone who has done any programming in VB6 knows that you can do something in code like Debug.Print MyVar and that when the program runs from the IDE, the value of MyVar will be displayed in the Immediate Window. Less well known is Debug.Asse…
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…
Suggested Courses

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question