Link to home
Start Free TrialLog in
Avatar of Martin Rees
Martin ReesFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Sorting .txt files by content and filename

I require a batch file/PS script to sort thousands of .txt files into specific folders.  i have some production machines that create reports in the form of .txt files and these are all put into 1 folder.  Ideally it would create any sub folders required automatically so i don't need to manage them.
I need to sort these in a number of ways.

firstly i need to check the content of the file to look for specific text. and then sort into a number of sub folders using the contents of the file name.

an example would be files named
BR20190127T104207.txt  
PR20190127T104208.txt
BR20190127T102850.txt
PR20190127T102851.txt
BR20190127T102827.txt

the content will contain a specific string  in this case LINE 2

the folder structure is as follows:#

Line 1
      -> Year
           -> Month
                 -> Day
Line 2
      -> Year
           -> Month
                 -> Day
Line 3
      -> Year
           -> Month
                 -> Day

and so on.

i have 8 lines in total and each of these reports come off anywhere from several times a day to several times an hour.  

i plan to run a script overnight to sort any new files created.  the problem is i cannot alter the file structure for a few reasons and currently i am having to sort these manually which is very time consuming.
ASKER CERTIFIED SOLUTION
Avatar of Martin Rees
Martin Rees
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of oBdA
oBdA

Just in case you're still interested in a PowerShell solution: this is in test mode and will only list the directories and files it would create/move. Remove the two "-WhatIf" in lines 18 and 20 to run it for real.
$sourceFolder = 'C:\Temp'
$destinationFolder = 'C:\Temp\Destination'
$content = @(
	'Line 1'
	'Line 2'
	'Line 3'
)

$pattern = '^(' + (($content | ForEach-Object {[regex]::Escape($_)}) -join '|') + ')$'
Get-ChildItem -Path $sourceFolder -Filter *.txt -File |
	Where-Object {$_.BaseName -match '(?<yyyy>\d{4})(?<MM>\d{2})(?<dd>\d{2})T\d{6}$'} |
	ForEach-Object {
		Write-Host "Processing '$($_.Name)'"
		If ($content = Select-String -Pattern $pattern -Path $_.FullName) {
			$targetPath = "$($destinationFolder)\$($content.Line)\$($Matches['yyyy'])\$($Matches['MM'])\$($Matches['dd'])"
			Write-Host "  --> $($targetPath)"
			If (-not (Test-Path -Path $targetPath)) {
				New-Item -Path $targetPath -ItemType Directory -WhatIf | Out-Null
			}
			Move-Item -Path $_.FullName -Destination $targetPath -WhatIf
		} Else {
			Write-Warning "'$($_.Name)': expected content not found, unable to move!"
		}
	}

Open in new window

Avatar of Martin Rees

ASKER

Should start by saying thank you oBdA

i have tried this but it says content not found on all files.  I have tried with a few variants of content that i have confirmed are in the files but same result for all files.

not that i know much about this (obviously as I was asking for your help) so please forgive me if i'm being stupid here but all the time i was trying to do this myself i was using the Get-Content command to search against.  i cannot see this in your script.  i'm probably well off the mark here so if i am forgive me.
Just ask, no problem. We've all started somewhere.
The script is using "Select-String", which (in this case) does a match against the file's content using a regular expression.
It currently matches only a complete line, that is, from beginning to end.
In the sample script above, it would match lines consisting of either "Line 1", "Line 2", or "Line 3", but not something like "Foo Line 1", "Line 2 Bar", or "Foo Line 3 Bar".
That can obviously be adjusted; can you provide a bit more specific information about the file format and what you're trying to match where?
Example would be:

---------------------------------------
|                 REPORT              |
---------------------------------------
 
          Manual Batch Reset
 
     Time: 07:36  Date: 01/08/2018
 
Machine Identification: Line2-1  
 
               Lane 2-1
Product: Auto BB
Batch Code:


the line I am trying to match in this example is "Line 2-1" which is the machine Identification,  this is unique to the specific device that created it and so this is how i need to sort it.  however the wording changes on newer machine to Machine ID:......  so i can only use the text after the : for this

hope this makes sense
Just to verify I understood you correctly: the string you're looking for is
either Machine Identification: <Device>
or Machine ID: <Device>
and you probably want the main folder name to be <Device>?
Yes this is correct. Also to add the device name length varies too.
EXPERT CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Works perfectly thank you