Split text file into multiple text files based on text filename and

Split 1 text file into multiple textfiles based on Row starts with CC and rename the files into different file name.

Rows AA, BB are headers need to be added to the file after the split. Split based on Row CC.

I don't know what is the best way to do this. I am lost.

Is this possible?

See attached:
Original File to be split: Test_05022018.txt

Results should be like this: See attachement.
Test000000001_05022018.txt
Test000000002_05022018.txt
Test000000003_05022018.txt

See attache excel file for more info.


Please help.

Thank you again.
Split_To_Multiple_Texts.xlsx
Test000000001_05022018.txt
Test000000002_05022018.txt
Test0000000003_05022018.txt
Queennie LAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Bill PrewCommented:
Here's a small BAT script that could be a starting point for you, see what you think.  I ran a test here and it seemed to do just what you wanted.  Place it in the same folder as the input file and try it.

@echo off
setlocal EnableDelayedExpansion

rem define file to work with
set FileName=Test_05022018
set FileExt=txt

rem Initialize output file counter
set Count=0

rem Read each line of input file
for /f "usebackq tokens=*" %%A in ("%FileName%.%FileExt%") do (

    rem Extract left three characters as line identifier
    set ID=%%A
    set ID=!ID:~0,3!

    rem Save off first two lines as headers for output files
    if "!ID!" EQU "AA*" (
        set Header1=%%A
    ) else (
        if "!ID!" EQU "BB*" (
            set Header2=%%A
        ) else (
            rem Start new output file on first data line
            if "!ID!" EQU "CC*" (

                rem Build file number and create new output file name
                set /A Count+=1
                set Chunk=000000!Count!
                set Chunk=!Chunk:~-6!
                set OutFile=%FileName%_!Chunk!.%FileExt%

                rem Write header lines to output file
                echo.!Header1!>"!OutFile!"
                echo.!Header2!>>"!OutFile!"

                rem Write data line
                echo.%%A>>"!OutFile!"
            ) else (
                rem Write data line
                echo.%%A>>"!OutFile!"
            )
        )
    )
)

Open in new window


»bp
1
Queennie LAuthor Commented:
Good Morning Bill,

Can this bat file can be used for multiple original text files?

I tested it and it is working.

Thank you again for your help.
0
Mark WillsTopic AdvisorCommented:
G'Day,

Lots of technologies as TA's ... so let's clarify a bit...

What is the original source ?
What is the destination (apart from the obvious text files) ?
How much embedded (within the source doco) information is used in the actual (resulting) file names ?

Does it matter what tools are used ?
0
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

Mark WillsTopic AdvisorCommented:
Nice work Bill :)
0
Queennie LAuthor Commented:
Hello Mark:

It really does not matter what tools to be used. I will automatically run this split files using SQL Job Agent once the files are available.

Two separate folder for original folder and destination folder.

Thank you again.
0
Bill PrewCommented:
Yes, BAT file certainly could, we could pass it the file name a a parameter, or we could loop over a folder of files, etc.

Can you describe how you would like to process multiple files, where they will be located, where output should go, how they are named, etc?  Then I can offer some suggestions on multi-file usage.


»bp
0
Bill PrewCommented:
Thanks Mark.


»bp
0
Queennie LAuthor Commented:
Hello Bill:

2 separate folder for Original and destination.

Thank you again.
0
Bill PrewCommented:
Processing all TXT files in input folder?

And okay to hard code the input and output folder in the BAT?


»bp
0
Bill PrewCommented:
Here's an update processing all TXT files in a specified input folder, and placing the new files in a specified output folder.

@echo off
setlocal EnableDelayedExpansion

rem Define input and output folders to work with
set BaseDir=B:\EE\EE29097562\IN
set DestDir=B:\EE\EE29097562\OUT

rem Process each matching file in input folder
for %%F in ("%BaseDir%\*.txt") do (

    rem Initialize output file counter
    set Count=0

    rem Read each line of input file
    for /f "usebackq tokens=*" %%A in ("%%~F") do (

        rem Extract left three characters as line identifier
        set ID=%%A
        set ID=!ID:~0,3!

        rem Save off first two lines as headers for output files
        if "!ID!" EQU "AA*" (
            set Header1=%%A
        ) else (
            if "!ID!" EQU "BB*" (
                set Header2=%%A
            ) else (
                rem Start new output file on first data line
                if "!ID!" EQU "CC*" (

                    rem Build file number and create new output file name in output folder
                    set /A Count+=1
                    set Chunk=000000!Count!
                    set Chunk=!Chunk:~-6!
                    set OutFile=%DestDir%\%%~nF_!Chunk!%%~xF

                    rem Write header lines to output file
                    echo.!Header1!>"!OutFile!"
                    echo.!Header2!>>"!OutFile!"

                    rem Write data line
                    echo.%%A>>"!OutFile!"
                ) else (
                    rem Write data line
                    echo.%%A>>"!OutFile!"
                )
            )
        )
    )

)

Open in new window



»bp
0
Mark WillsTopic AdvisorCommented:
Bill, wouldnt it be tempting to use params %~dp1 and %~dp2  for input path (BaseDir) and export path (DestDir)   ?
0
Bill PrewCommented:
Mark, we certainly could use parms, really depends if they will ever change, waiting for author feedback...


»bp
0
Mark WillsTopic AdvisorCommented:
I will leave you to it - nice work Bill.
0
Queennie LAuthor Commented:
@Bill:

Can you explain to me what these lines do?

1)  for /f "usebackq tokens=*" %%A in ("%%~F") do (

        rem Extract left three characters as line identifier
        set ID=%%A
        set ID=!ID:~0,3!

2) set OutFile=%DestDir%\%%~nF_!Chunk!%%~xF

Thank you again.
0
Queennie LAuthor Commented:
@Bill:

Row Header AA is not showing.

Thank you.
0
Bill PrewCommented:
Row AA showing here with your sample file.  If you are having trouble with the real file then I would need to see a sample of that.


»bp
0
Queennie LAuthor Commented:
@Bill:

It is showing now Bill. It was a typo. Sorry. I will do more testing and I will let you know.

Thank you.
0
Bill PrewCommented:
for /f "usebackq tokens=*" %%A in ("%%~F") do (

This is a FOR loop reading each line of the current input file (from the outer directory listing loop, stored in loop variable %%F).  It places each lines content in loop variable %%A so that it can then be worked with inside the loop.

set ID=%%A

Sets the named variable ID to the content of the loop variable %%A (the currentl line of data from the file).  This is needed so that we can use some of the special featrures availabe with named variables that aren't available with loop variable, like pulling out subsets of characters from the value.

set ID=!ID:~0,3!

This extracts the left 3 characters from the ID variable and places them back in the ID variable.  This is the "key" of each line of data that we check for XX* patterns to determine logic based on current line type.

set OutFile=%DestDir%\%%~nF_!Chunk!%%~xF

This builds the name of the output file for the current "chunk" of the input file.  It conatenates the destination directory path (%DestDir%) with a backslash, then the name only of the current file being read (%%~nF), then an underscore, then the left zero padded number of this chunk (!Chunk!), followed by the extension of the original file (%%~xF).


»bp
0
Queennie LAuthor Commented:
@Bill:

How to delete in base directory folder if split completed in destination folder?

Thank you again.
0
Bill PrewCommented:
No real way to know success or failure of the split, but if you want to delete the input file after all lines have been processed and should have been split then this adds that.

@echo off
setlocal EnableDelayedExpansion

rem Define input and output folders to work with
set BaseDir=B:\EE\EE29097562\IN
set DestDir=B:\EE\EE29097562\OUT

rem Process each matching file in input folder
for %%F in ("%BaseDir%\*.txt") do (

    rem Initialize output file counter
    set Count=0

    rem Read each line of input file
    for /f "usebackq tokens=*" %%A in ("%%~F") do (

        rem Extract left three characters as line identifier
        set ID=%%A
        set ID=!ID:~0,3!

        rem Save off first two lines as headers for output files
        if "!ID!" EQU "AA*" (
            set Header1=%%A
        ) else (
            if "!ID!" EQU "BB*" (
                set Header2=%%A
            ) else (
                rem Start new output file on first data line
                if "!ID!" EQU "CC*" (

                    rem Build file number and create new output file name in output folder
                    set /A Count+=1
                    set Chunk=000000!Count!
                    set Chunk=!Chunk:~-6!
                    set OutFile=%DestDir%\%%~nF_!Chunk!%%~xF

                    rem Write header lines to output file
                    echo.!Header1!>"!OutFile!"
                    echo.!Header2!>>"!OutFile!"

                    rem Write data line
                    echo.%%A>>"!OutFile!"
                ) else (
                    rem Write data line
                    echo.%%A>>"!OutFile!"
                )
            )
        )
    )

    rem Remove input file
    del "%%~F"
)

Open in new window


»bp
0
Queennie LAuthor Commented:
@Bill:

I have 340 files to split yesterday, it took 39 minutes from start to finish. Sometimes we have more than 340 files a day to split.

I will test again today how long will it take to run and I will let you know.

Thank you again for your help.
0
Bill PrewCommented:
Here is a VBS approach, likely will run faster, if you want to give it a try as well.  Works the same basic way, adjust the folder paths near the top and save as a VBS.  Then run from a command line or BAT file as:

cscript EE29097562.vbs

' Require variables to be defined
Option Explicit

' Specify file and folder locations
Const strBaseDir = "B:\EE\EE29097562\IN"
Const strDestDir = "B:\EE\EE29097562\OUT"

' Define needed I/O constants
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2

' Declare variables
Dim objFSO
Dim objFolder
Dim objFile
Dim objFileIn
Dim strFileOut
Dim strLine
Dim strHeader
Dim strData
Dim intChunk

' Create file system object
Set objFSO = CreateObject("Scripting.FileSystemObject")

' Quit if folders don't exist
If Not objFSO.FolderExists(strBaseDir) Then
   Wscript.Echo Now & " ERROR: Base folder """ & strBaseDir & """ does not exist, quiting."
   Wscript.Quit
End If
If Not objFSO.FolderExists(strDestDir) Then
   Wscript.Echo Now & " ERROR: Destination folder """ & strDestDir & """ does not exist, quiting."
   Wscript.Quit
End If

' Access base folder
Set objFolder = objFSO.GetFolder(strBaseDir)

' Check each file in this folder
For Each objFile In objFolder.Files

    ' Make sure it is a file to process
    If LCase(Right(objFile.Name, 4)) = ".txt" Then

        ' Open file for reading
        Set objFileIn = objFSO.OpenTextFile(objFile.Path, ForReading, False, TriStateUseDefault)
        intChunk = 0
        strHeader = ""
        strData = ""

        ' Read input file line by line
        Do While Not objFileIn.AtEndOfStream

            ' Read next line from text file
            strLine = objFileIn.ReadLine

            ' Take appropriate action based on line type (left 3 characters)
            Select Case UCase(Left(strLine, 3))
                Case "AA*", "BB*"
                    ' Header lines, add to header for each output file
                    strHeader = strHeader & strLine & vbCrLf
                Case "FF*"
                    ' Last line of data group, build new output file name, and write headers and data
                    strData = strData & strLine & vbCrLf
                    intChunk = intChunk + 1
                    strFileOut = strDestDir & "\" & objFSO.GetBaseName(objFile.Path) & "_" & Right("000000" & intChunk, 6) & "." & objFSO.GetExtensionName(objFile.Path)
                    With objFSO.OpenTextFile(strFileOut, ForWriting, True)
                        .Write strHeader & strData
                        .Close
                    End With
                    strData = ""
                Case Else
                    ' Intermediate data lines, save data line for writing at end of chunk
                    strData = strData & strLine & vbCrLf
            End Select

        Loop

    End If

Next

Open in new window


»bp
0
Bill PrewCommented:
And adding the logic to delete the input files after splitting them appart...

' Require variables to be defined
Option Explicit

' Specify file and folder locations
Const strBaseDir = "B:\EE\EE29097562\IN"
Const strDestDir = "B:\EE\EE29097562\OUT"

' Define needed I/O constants
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2

' Declare variables
Dim objFSO
Dim objFolder
Dim objFile
Dim objFileIn
Dim strFileOut
Dim strLine
Dim strHeader
Dim strData
Dim intChunk

' Create file system object
Set objFSO = CreateObject("Scripting.FileSystemObject")

' Quit if folders don't exist
If Not objFSO.FolderExists(strBaseDir) Then
   Wscript.Echo Now & " ERROR: Base folder """ & strBaseDir & """ does not exist, quiting."
   Wscript.Quit
End If
If Not objFSO.FolderExists(strDestDir) Then
   Wscript.Echo Now & " ERROR: Destination folder """ & strDestDir & """ does not exist, quiting."
   Wscript.Quit
End If

' Access base folder
Set objFolder = objFSO.GetFolder(strBaseDir)

' Check each file in this folder
For Each objFile In objFolder.Files

    ' Make sure it is a file to process
    If LCase(Right(objFile.Name, 4)) = ".txt" Then

        ' Open file for reading
        Set objFileIn = objFSO.OpenTextFile(objFile.Path, ForReading, False, TriStateUseDefault)
        intChunk = 0
        strHeader = ""
        strData = ""

        ' Read input file line by line
        Do While Not objFileIn.AtEndOfStream

            ' Read next line from text file
            strLine = objFileIn.ReadLine

            ' Take appropriate action based on line type (left 3 characters)
            Select Case UCase(Left(strLine, 3))
                Case "AA*", "BB*"
                    ' Header lines, add to header for each output file
                    strHeader = strHeader & strLine & vbCrLf
                Case "FF*"
                    ' Last line of data group, build new output file name, and write headers and data
                    strData = strData & strLine & vbCrLf
                    intChunk = intChunk + 1
                    strFileOut = strDestDir & "\" & objFSO.GetBaseName(objFile.Path) & "_" & Right("000000" & intChunk, 6) & "." & objFSO.GetExtensionName(objFile.Path)
                    With objFSO.OpenTextFile(strFileOut, ForWriting, True)
                        .Write strHeader & strData
                        .Close
                    End With
                    strData = ""
                Case Else
                    ' Intermediate data lines, save data line for writing at end of chunk
                    strData = strData & strLine & vbCrLf
            End Select

        Loop

        ' Delete input file after fully processing
        objFileIn.Close
        objFile.Delete

    End If

Next

Open in new window


»bp
0
Queennie LAuthor Commented:
@Bill:

I tested it, it did not split.

Thank you.
0
Bill PrewCommented:
It split fine on the test file you originally provided.  Did you have to make adjustments to the BAT approach for your "real data"?  If so you will need to make similar adjustments to the VBS approach.


»bp
0
Queennie LAuthor Commented:
@Bill:

I did not make adjustment to your 2nd bat file. The only thing I did is changed the actual case statement with AA and BB and FF but I did not see any CC?

The first bat file, was fast in my machine but when i run on the server, it is very slow.

Thank you again.
0
Bill PrewCommented:
When you changed the CASE statement, are you looking for three character "keys" in the left of each line, or a different length?  If a different length then you also need to change the

Select Case UCase(Left(strLine, 3))

the 3 is the length of the text you are matching on each line.  And in the CASE statements you need to use upper case text.


»bp
0
Queennie LAuthor Commented:
@Bill:

Only 2 or 3 characters and I used upper case.

Thank you.
0
Bill PrewCommented:
Can you post the changed script you are using, and a sample of the actual file?


»bp
0
Queennie LAuthor Commented:
@Bill:

It was my fault. Sorry. The extension I used is .bat instead of .vbs. It ran fast now. 340 files for yesterday is in 2 minutes.

I tried to add the 2 footer at the end but it did show LIKE GG and HH like in Header LIKE AA and BB.

Thank you again.
0
Queennie LAuthor Commented:
Correction:

The 2 footer I added did not show. Thanks.
0
Bill PrewCommented:
This is the first mention of "footers", can you explain further?


»bp
0
Queennie LAuthor Commented:
@Bill:

Sorry. I forgot to include it in my first example.

There are 2 headers and 2 footers. I tried to incorporate the 2 footer like the header but it did not show.

2 Headers: AA and BB
2 Footers: GG and HH

See attached excel.

I put it in the last case statement but it did not work. I must do it wrong.

Thank you.
Split_To_Multiple_Texts_UPDATED.xlsx
0
Bill PrewCommented:
Okay, that will take some changes to the VBS code.  Currently it ready the input files once from top to bottom.  It can locate the headers since they come first, and save them to write to each.  But it doesn't see the footer lines until after it currently has written all the split files, so that doesn't work.

How large in bytes is your largest typical input file, what size are we talking about fore the files that need to be split?


»bp
0
Queennie LAuthor Commented:
@Bill:

Yesterday's date: the largest file is 216 kb.

Today's date: the largest file is 189 kb.

Thank you.
0
Bill PrewCommented:
Okay, that's manageable, let me work up some changes to handle footer lines also.

What is the maximum number of lines you would ever want to support for headers, and for footers?


»bp
0
Queennie LAuthor Commented:
@Bill:

What is the maximum number of lines you would ever want to support for headers, and for footers?


I don't quite understand your question. Please explain.

Thank you.
0
Bill PrewCommented:
Right now there are 2 lines of header, and two lines of footer.

But perhaps in the future there might be 3 lines of header and 4 lines of footer.

I wanted to make the code flexible enough to easily adapt to that.

For no I assumed no more than 10 headers, and 10 footers.

So the code will look at the first 10 lines to try and find the header lines (based on matching the characters at the left), and will look at the last 10 lines trying to find footer lines (based on matching the characters at the left).  I think this should be okay.


»bp
0
Bill PrewCommented:
Okay, here is adjusted code that handles headers as well as footers.  I tried to comment it enough so you can follow it, but questions welcome.  Please take the time to read it over carefully and understand how it works though, so that you can adjust it as needed in the future.

' Require variables to be defined before used
Option Explicit

' Specify file and folder locations
Const strBaseDir = "B:\EE\EE29097562\IN"
Const strDestDir = "B:\EE\EE29097562\OUT"

' Define needed I/O constants
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2

' Constants for line types processed
Const cHeader = 1
Const cFooter = 2
Const cTrailer = 3
Const cData = 4

' Declare variables
Dim objFSO
Dim objFolder
Dim objFile
Dim strFileOut
Dim strLine
Dim strHeader
Dim strFooter
Dim strData
Dim arrData
Dim intChunk

' Create file system object
Set objFSO = CreateObject("Scripting.FileSystemObject")

' Quit if folders don't exist
If Not objFSO.FolderExists(strBaseDir) Then
   Wscript.Echo Now & " ERROR: Base folder """ & strBaseDir & """ does not exist, quiting."
   Wscript.Quit
End If
If Not objFSO.FolderExists(strDestDir) Then
   Wscript.Echo Now & " ERROR: Destination folder """ & strDestDir & """ does not exist, quiting."
   Wscript.Quit
End If

' Access base folder
Set objFolder = objFSO.GetFolder(strBaseDir)

' Check each file in this folder
For Each objFile In objFolder.Files

    ' Make sure it is a file to process
    If LCase(Right(objFile.Name, 4)) = ".txt" Then

        ' Initialize variables for each file
        intChunk = 0
        strHeader = ""
        strData = ""
        
        ' Open file for reading
        With objFSO.OpenTextFile(objFile.Path, ForReading, False, TriStateUseDefault)
            ' Read entire file into memory array, split by lines
            arrData = Split(.ReadAll, vbCrLf)
            .Close
        End With

        ' Try and find the header and footer lines for this file and save them
        strHeader = FindHeader(arrData)
        strFooter = FindFooter(arrData)

        ' Read input file line by line
        For Each strLine In arrData

            ' Take appropriate action based on line type
            Select Case LineType(strLine)
                Case cHeader, cFooter
                    ' Header or footer lines, skip now (already processed thenm above)
                Case cTrailer
                    ' Last line of data group, build new output file name, and write headers, data and footers
                    strData = strData & strLine & vbCrLf
                    intChunk = intChunk + 1
                    strFileOut = strDestDir & "\" & objFSO.GetBaseName(objFile.Path) & "_" & Right("000000" & intChunk, 6) & "." & objFSO.GetExtensionName(objFile.Path)
                    With objFSO.OpenTextFile(strFileOut, ForWriting, True)
                        .Write strHeader & strData & strFooter
                        .Close
                    End With
                    strData = ""
                Case cData
                    ' Intermediate data lines, save data line for writing at end of chunk
                    strData = strData & strLine & vbCrLf
            End Select

        Next

        ' Delete input file after fully processing
        objFile.Delete

    End If

Next

Function FindHeader(arrLines())
    ' Search the top group of lines for footer records and return them
    Dim i
    FindHeader = ""
    For i = 0 to 10
        If LineType(arrLines(i)) = cHeader Then
            FindHeader = FindHeader & arrLines(i) & vbCrLf
        End If
    Next
End Function

Function FindFooter(arrLines())
    ' Search the end group of lines for footer records and return them
    Dim i
    FindFooter = ""
    For i = UBound(arrLines) - 10 to UBound(arrLines)
        If LineType(arrLines(i)) = cFooter Then
            FindFooter = FindFooter & arrLines(i) & vbCrLf
        End If
    Next
End Function

Function LineType(strLine)
    ' Clasify this line type based on leftmost 3 characters
    Select Case UCase(Left(strLine, 3))
        Case "AA*", "BB*"
            LineType = cHeader
        Case "GG*", "HH*"
            LineType = cFooter
        Case "FF*"
            LineType = cTrailer
        Case Else
            LineType = cData
    End Select
End Function

Open in new window


»bp
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Queennie LAuthor Commented:
Bill,

It is working and split the files fast in seconds.

Thank you for your help. I really appreciate it.

Have a good day Sir!
0
Bill PrewCommented:
Welcome, glad that helped.


»bp
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VB Script

From novice to tech pro — start learning today.