Solved

Splitting a txt file into multiple by file size

Posted on 2016-10-12
17
55 Views
Last Modified: 2016-10-12
We have a program that generates a text file that can range from 3mb to 30 mb. I am trying to figure out how to split the file by size - the max size of the text files is 4mb - so a text file that was originally 14 mb would need to be split into 4 text files.

The file name would need "_1", "_2", "_3" etc. appended to it.. so, if we had a 14mb file named "filename.txt"
the resulting text files would be

filename_1.txt
filename_2.txt
filename_3.txt
filename_4.txt

Is there a batch file that can accomplish this?
0
Comment
Question by:Brent Guttmann
  • 10
  • 4
  • 3
17 Comments
 
LVL 83

Expert Comment

by:oBdA
ID: 41840277
That can of course be done in Powershell; the question is how you want the file split.
Exactly at the given block size, or rather at the end of a line?
If line based:
- Will the lines always be shorter than your chosen block size?
- What is the program using as EOL - the usual <CR><LF> or something else?
0
 

Author Comment

by:Brent Guttmann
ID: 41840300
Hi, id like to have the file split at the end of the line when the file size is, lets say 3.9mb -- so, I guess it would be split based on both file size and end of line.

The EOL has carriage return, line feed - so, <CR><LF> is correct.
0
 
LVL 40

Expert Comment

by:Subsun
ID: 41840417
Here is a simple PowerShell function which you can use to split file. Check and see if it works for you..
Function Split-File ($DestPath,$Inputfile,$Size){
Begin{
	$count = 1
	$FileData = GI $Inputfile
	$Outputname = "$DestPath\$($FileData.BaseName)"
	$NewFile = "$Outputname`_$count$($FileData.Extension)"
	New-Item $NewFile -ItemType File -Force | Out-Null
	Write-host "Writing file $NewFile"
}
Process{
Get-Content $Inputfile | % {
  If ((GI $NewFile).Length -ge $Size) {
		$count++
		$NewFile = "$Outputname`_$count$($FileData.Extension)"
		Write-host "Writing file $NewFile"
		Add-Content $_ -Path $NewFile
  }Else{
	Add-Content $_ -Path $NewFile
   }
 }
}
}
#Run Function..
Split-File -DestPath C:\temp\testing -Inputfile C:\Temp\Test.txt -Size 4MB

Open in new window

0
 

Author Comment

by:Brent Guttmann
ID: 41840465
Okay - so first question, how can we have it just run any text file in the folder? I wont know the name of the file until its there and am trying to automate the entire process.  Also, I tried running the bat file with the below but it just opened and closed - tried adding a couple different commands at the end to pause the script to view the errors but none of them worked...

Function Split-File ($DestPath,$Inputfile,$Size){
Begin{
      $count = 1
      $FileData = GI $Inputfile
      $Outputname = "$DestPath\$($FileData.BaseName)"
      $NewFile = "$Outputname`_$count$($FileData.Extension)"
      New-Item $NewFile -ItemType File -Force | Out-Null
      Write-host "Writing file $NewFile"
}
Process{
Get-Content $Inputfile | % {
  If ((GI $NewFile).Length -ge $Size) {
            $count++
            $NewFile = "$Outputname`_$count$($FileData.Extension)"
            Write-host "Writing file $NewFile"
            Add-Content $_ -Path $NewFile
  }Else{
      Add-Content $_ -Path $NewFile
   }
 }
}
}
#Run Function..
Split-File -DestPath "\\server-win-sv05\data\divisions\COL\Col\MVP\MVP_Website_Uploads\bat_test" -Inputfile "\\server-win-sv05\data\divisions\COL\Col\MVP\MVPWebsite_Uploads\bat_test\NC13_20161012.txt" -Size 4MB
0
 
LVL 40

Expert Comment

by:Subsun
ID: 41840480
It's PowerShell script so you need to save it as .ps1 file and run it from PowerShell console.. Following articles will help you..
How to Run a PowerShell script
http://ss64.com/ps/syntax-run.html
Run PowerShell Scripts from Task Scheduler
https://community.spiceworks.com/how_to/17736-run-powershell-scripts-from-task-scheduler

To split all files in a directory you can change last line to..
GCI "\\server-win-sv05\data\divisions\COL\Col\MVP\MVPWebsite_Uploads\bat_test\*.Txt" | %{Split-File -DestPath "\\server-win-sv05\data\divisions\COL\Col\MVP\MVP_Website_Uploads\bat_test" -Inputfile $_.FullName -Size 4MB}

Open in new window

0
 

Author Comment

by:Brent Guttmann
ID: 41840487
Okay - so this cannot be run from a bat file?
0
 

Author Comment

by:Brent Guttmann
ID: 41840511
I tried running this and it froze at the line where its writing the file..

PS Microsoft.PowerShell.Core\FileSystem::\\server-win-fs05\data\divisions\COL\Col\MVP\MVP_Website_Uploads\bat_test> . .\split.ps1
Writing file \\server-win-fs05\data\divisions\COL\Col\MVP\MVP_Website_Uploads\bat_test\NC13_20161012_1.txt
0
 
LVL 40

Expert Comment

by:Subsun
ID: 41840512
Yes you can run the PowerShell script from bat file..
For example.. You can save the PowerShell code in to a file named Splitfile.ps1 in C:\Script folder..
and use the following code in bat file to execute it..
@ECHO OFF
Powershell.exe -ExecutionPolicy Bypass -Command C:\Script\Splitfile.ps1
PAUSE

Open in new window

0
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

 

Author Comment

by:Brent Guttmann
ID: 41840514
Nevermind.. typo in my path... so that worked, although a lot slower than I thought it would... i guess I could just make a bat file to execute the ps1 file ,right?
0
 

Author Comment

by:Brent Guttmann
ID: 41840518
okay, great - thanks
0
 
LVL 40

Expert Comment

by:Subsun
ID: 41840538
It's slow, because it's directly writing to file share, if it's too much of a burden we can try to save all files in local folder and copy it to share once it's complete splitting. It might work bit faster..
0
 

Author Comment

by:Brent Guttmann
ID: 41840546
Yeah - can we edit it to first copy to local temp and then move back after? Its been running 5min on a 5,000 kb text file and has only written 650 kb
0
 
LVL 83

Accepted Solution

by:
oBdA earned 500 total points
ID: 41840551
This one is significantly faster (by about a factor of 25 on my test system).
Save in the same folder as the batch script following below, and set the PSFile variable in the batch to match the script name (the "%~dp0" at the beginning will expand to the script's drive and path, including a trailing backslash).
You can call it with -Verbose to see what's happening.
[CmdletBinding()]
Param(
	[Parameter(Mandatory=$True, ValueFromPipeline=$True, ValueFromPipelineByPropertyName=$True, Position=0)]
	[String[]]$Path,
	[Parameter(Mandatory=$False, Position=1)]
	[uint32]$Size = 4MB
)
Begin {
	$StringBuilder = New-Object -TypeName Text.StringBuilder
}
Process {
	$Path | ForEach-Object {
		If ($FileItem = Get-Item -Path $_) {
			If (($FileItem.BaseName -match '.*_\d\d\Z') -and ($FileItem.Length -lt $Size)) {
				"Skipped file '$($FileItem.FullName)', has probably been processed already." | Write-Warning
			} Else {
				$FileIndex = 0
				$OutputFile = Join-Path -Path $FileItem.DirectoryName -ChildPath "$($FileItem.BaseName)_{0:D2}$($FileItem.Extension)"
				Get-Content -Path $FileItem.FullName | ForEach-Object {
					If (($StringBuilder.Length + $_.Length) -gt $Size) {
						$StringBuilder.Length -= 2
						Set-Content -Value $StringBuilder.ToString() -Path ($OutputFile -f $FileIndex)
						"Wrote part $($FileIndex), $($StringBuilder.Length + 2) bytes." | Write-Verbose
						[void]$StringBuilder.Clear()
						$FileIndex += 1
					}
					[void]$StringBuilder.AppendLine($_)
				}
				$StringBuilder.Length -= 2
				Set-Content -Value $StringBuilder.ToString() -Path ($OutputFile -f $FileIndex)
				"Wrote part $($FileIndex), $($StringBuilder.Length + 2) bytes." | Write-Verbose
			}
		}
	}
}
End {
}

Open in new window


Batch to start the Powershell script; you find/define the file to process here:
@echo off
setlocal
set File=C:\Temp\test.txt
set PSFile=%~dp0Split-File.ps1
PowerShell.exe -ExecutionPolicy Bypass -Command "& '%PSFile%'" -Path "%File%" -Verbose

Open in new window


Edit: Fixed issue with pipeline input.
1
 

Author Comment

by:Brent Guttmann
ID: 41840560
so what variables do I need to edit here? Just the PSFILE to the ps1 file location?

I see in the bat file its setting the file - but I wont know the file name... should I change to .\*.txt?
0
 

Author Comment

by:Brent Guttmann
ID: 41840562
nevermind - i get it... the c:\temp is the temporary folder for splitting
0
 

Author Comment

by:Brent Guttmann
ID: 41840584
appreciate your help!
0
 
LVL 83

Expert Comment

by:oBdA
ID: 41840587
You're rather sparse with the details required to help you.
Does this file have a completely random name, or are some parts static?
Do you just want to process all files of a specific extension, and/or is the only file of its type in the folder?
Do you want the processed file(s) to end up in the same location as the source, or do you need them in a different directory?
Do you want the file processed/moved (and so renamed with the index) even if it is smaller than the size limit, or can this never happen anyway?
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Hi all.   The other day I had to change the passwords for a bunch of users on the fly. Because they were so many, I decided to do it in an automated way and I would like to share it with you all.   If you are not doing it directly in a Domain Co…
This script can help you clean up your user profile database by comparing profiles to Active Directory users in a particular OU, and removing the profiles that don't match.
Migrating to Microsoft Office 365 is becoming increasingly popular for organizations both large and small. If you have made the leap to Microsoft’s cloud platform, you know that you will need to create a corporate email signature for your Office 365…
This is used to tweak the memory usage for your computer, it is used for servers more so than workstations but just be careful editing registry settings as it may cause irreversible results. I hold no responsibility for anything you do to the regist…

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now