Shailesh Shinde
asked on
Merging text files strings with filename
Hi All,
We have one requirement to merge all the Tab delimited txt files from folders and sub-folders into Excel file with third column as filename.
Can you please provide any reference or sample code for this.
Thanks,
Shail
We have one requirement to merge all the Tab delimited txt files from folders and sub-folders into Excel file with third column as filename.
Can you please provide any reference or sample code for this.
Thanks,
Shail
ASKER
Hi,
Yes, these files consist of two columns tab delimited format with extension .properties.
Yes, filename should contain the full path in column third.
Thanks,
Shail
Yes, these files consist of two columns tab delimited format with extension .properties.
Yes, filename should contain the full path in column third.
Thanks,
Shail
There are many ways to do that. One is using a PowerShell script and simple text processing:
# following vars are to be replaced
$srcFolder = 'C:\temp\EE\test'
$srcMask = '*.properties'
$dstFile = 'C:\temp\EE\results.csv'
# nothing to change here
Get-ChildItem -recurse $srcFolder $srcMask |
% {
$file = $_
Get-Content $_ | % { $_ + "`t$($file.FullName)" }
} | Out-File $dstFile -Encoding OEM
You might have to change the encoding, but I suppose all files are simple text files, no Unicode type.
Here is a VBscript approach, adjust the base folder location near the top and save as a VBS file, then run as follows. I assumed you would want to enclose tha dded column of the full path in double quotes since it could contain spaces, but easy to remove the use of the Quote() function if that is not needed.
cscript EE29015138.vbs
cscript EE29015138.vbs
' Define needed constants
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2
' Define the folder and extension to scan
strBaseDir = "B:\EE\EE29015138\Files"
strFindExt = ".properties"
' Set up filesystem object for usage
Set objFSO = CreateObject("Scripting.FileSystemObject")
' Look for files (recursively)
FindFiles objFSO.GetFolder(strBaseDir)
' Subroutine (recursive) to search for files to delete
Sub FindFiles(objFolder)
On Error Resume Next
' Look at all files in this folder
For Each objFile In objFolder.Files
' Make sure it's the extension we want
If LCase(Right(objFile.Name, Len(strFindExt))) = LCase(strFindExt) Then
' Process this file
UpdateFile objFile.Path
End If
Next
' See if we were able to access this folder, if not don't recurse into it
If Err.Number = 0 Then
' Recursively drill down into subfolder
For Each objSubFolder In objFolder.SubFolders
FindFiles objSubFolder
Next
End If
End Sub
' Subroutine to update a file
Sub UpdateFile(strFile)
' Read entire input file into a variable and close it
Set objFile = objFSO.OpenTextFile(strFile, ForReading, False, TriStateUseDefault)
strData = objFile.ReadAll
objFile.Close
Set objFile = Nothing
' Add full file path to end of each line (TAB delimited, enclose in quotes)
strData = Replace(strData, vbCrLf, vbTab & Quote(strFile) & vbCrLf)
' Write to new CSV file with changes made
Set objFile = objFSO.OpenTextFile(Replace(strFile, ".properties", ".csv", vbTextCompare), ForWriting, True)
objFile.Write(strData)
objFile.Close
Set objFile = Nothing
End Sub
' Add surrounding double quotes to a string
Function Quote(s)
Quote = Chr(34) & s & Chr(34)
End Function
~bp
I think I may have missed a piece, you want all data merged into one file, not individual files, I will add that now...
~bp
~bp
Okay, this version merges all data into one output file.
' Define needed constants
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 8
Const TriStateUseDefault = -2
' Define the folder and extension to scan
strBaseDir = "B:\EE\EE29015138\Files"
strFindExt = ".properties"
strDestFile = "B:\EE\EE29015138\Merge.csv"
' Set up filesystem object for usage
Set objFSO = CreateObject("Scripting.FileSystemObject")
' Delete output file if it exists
If objFSO.FileExists(strDestFile) Then
objFSO.DeleteFile(strDestFile)
End If
' Look for files (recursively)
FindFiles objFSO.GetFolder(strBaseDir)
' Subroutine (recursive) to search for files to delete
Sub FindFiles(objFolder)
On Error Resume Next
' Look at all files in this folder
For Each objFile In objFolder.Files
' Make sure it's the extension we want
If LCase(Right(objFile.Name, Len(strFindExt))) = LCase(strFindExt) Then
' Process this file
UpdateFile objFile.Path
End If
Next
' See if we were able to access this folder, if not don't recurse into it
If Err.Number = 0 Then
' Recursively drill down into subfolder
For Each objSubFolder In objFolder.SubFolders
FindFiles objSubFolder
Next
End If
End Sub
' Subroutine to update a file
Sub UpdateFile(strFile)
' Read entire input file into a variable and close it
Set objFile = objFSO.OpenTextFile(strFile, ForReading, False, TriStateUseDefault)
strData = objFile.ReadAll
objFile.Close
Set objFile = Nothing
' Add full file path to end of each line (TAB delimited, enclose in quotes)
strData = Replace(strData, vbCrLf, vbTab & Quote(strFile) & vbCrLf)
' Write to new CSV file with changes made
Set objFile = objFSO.OpenTextFile(strDestFile, ForAppending, True)
objFile.Write(strData)
objFile.Close
Set objFile = Nothing
End Sub
' Add surrounding double quotes to a string
Function Quote(s)
Quote = Chr(34) & s & Chr(34)
End Function
~bp
Clemo,
I tried to test your PS1 script here but got some errors. Haven't tried to debug, but wanted to pass along to you in case it was something obvious.
I had this test setup:
And got these errors:
I tried to test your PS1 script here but got some errors. Haven't tried to debug, but wanted to pass along to you in case it was something obvious.
I had this test setup:
[Mon 04/10/2017 7:23:40.87 B:\ee\EE29015138]tree /f /a
B:.
| EE29015138.vbs
| EE29015138.ps1
|
\---Files
| File2.properties
| Dummy.txt
| File1.properties
|
+---Sub2
| File2.properties
| Dummy.txt
| File1.properties
|
\---Sub1
File2.properties
Dummy.txt
File1.properties
And got these errors:
PS B:\EE\EE29015138> B:\ee\EE29015138\EE29015138.ps1
Get-Content : Cannot find path 'B:\EE\EE29015138\File2.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+ Get-Content $_ | % { $_ + "`t$($file.FullName)" }
+ ~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (B:\EE\EE29015138\File2.properties:String) [Get-Content], ItemNotFoundException
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
Get-Content : Cannot find path 'B:\EE\EE29015138\File1.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+ Get-Content $_ | % { $_ + "`t$($file.FullName)" }
+ ~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (B:\EE\EE29015138\File1.properties:String) [Get-Content], ItemNotFoundException
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
Get-Content : Cannot find path 'B:\EE\EE29015138\File2.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+ Get-Content $_ | % { $_ + "`t$($file.FullName)" }
+ ~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (B:\EE\EE29015138\File2.properties:String) [Get-Content], ItemNotFoundException
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
Get-Content : Cannot find path 'B:\EE\EE29015138\File1.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+ Get-Content $_ | % { $_ + "`t$($file.FullName)" }
+ ~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (B:\EE\EE29015138\File1.properties:String) [Get-Content], ItemNotFoundException
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
Get-Content : Cannot find path 'B:\EE\EE29015138\File2.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+ Get-Content $_ | % { $_ + "`t$($file.FullName)" }
+ ~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (B:\EE\EE29015138\File2.properties:String) [Get-Content], ItemNotFoundException
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
Get-Content : Cannot find path 'B:\EE\EE29015138\File1.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+ Get-Content $_ | % { $_ + "`t$($file.FullName)" }
+ ~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (B:\EE\EE29015138\File1.properties:String) [Get-Content], ItemNotFoundException
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
~bp
ASKER
Hi,
With the updated VBS file, I am still not getting the filenames in third columns.
Thanks,
Shail
With the updated VBS file, I am still not getting the filenames in third columns.
Thanks,
Shail
Please upload one of your .properties file here, it worked fine on files I created here so I suspect your files are different somehow, need to see one.
~bp
~bp
ASKER
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Bill, thanks for testing. That is one of the oddities with PowerShell and using objects as parameters vs. via pipeline ... But this should work:
# following vars are to be replaced
$srcFolder = 'C:\temp\EE\test'
$srcMask = '*.properties'
$dstFile = 'C:\temp\EE\results.csv'
# nothing to change here
Get-ChildItem -recurse $srcFolder $srcMask |
% {
$fileName = $_.FullName
Get-Content $fileName | % { $_ + "`t$fileName" }
} | Out-File $dstFile -Encoding OEM
ASKER
Hi,
I have modified this by commenting
'Const EndOfLine = vbLf
and used
strData = Replace(strData, vbLf, vbTab & Quote(strFile) & vbLf)
Gives required output.
Thanks,
Shail
I have modified this by commenting
'Const EndOfLine = vbLf
and used
strData = Replace(strData, vbLf, vbTab & Quote(strFile) & vbLf)
Gives required output.
Thanks,
Shail
Interesting, I guess Const cant reference pre-defined constants...
Glad you worked around it.
~bp
Glad you worked around it.
~bp
ASKER
Thanks a lot!
How could this script be modified to merge two text files out of many pairs that match in file name with a difference of "-TOP" and "-BTM" for the extension?
Would a CSV file do?
Should the filename contain the subfolder or path?