Link to home
Start Free TrialLog in
Avatar of Shailesh Shinde
Shailesh ShindeFlag for India

asked on

Merging text files strings with filename

Hi All,

We have one requirement to merge all the Tab delimited txt files from folders and sub-folders into Excel file with third column as filename.
Can you please provide any reference or sample code for this.

Thanks,
Shail
Avatar of Qlemo
Qlemo
Flag of Germany image

Those files only consist of two columns?
Would a CSV file do?
Should the filename contain the subfolder or path?
Avatar of Shailesh Shinde

ASKER

Hi,

Yes, these files consist of two columns tab delimited format with extension .properties.
Yes, filename should contain the full path in column third.

Thanks,
Shail
There are many ways to do that. One is using a PowerShell script and simple text processing:
# following vars are to be replaced
$srcFolder = 'C:\temp\EE\test'
$srcMask = '*.properties'
$dstFile = 'C:\temp\EE\results.csv'

# nothing to change here
Get-ChildItem -recurse $srcFolder $srcMask |
% {
  $file = $_
  Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
} | Out-File $dstFile -Encoding OEM

Open in new window

You might have to change the encoding, but I suppose all files are simple text files, no Unicode type.
Avatar of Bill Prew
Bill Prew

Here is a VBscript approach, adjust the base folder location near the top and save as a VBS file, then run as follows.  I assumed you would want to enclose tha dded column of the full path in double quotes since it could contain spaces, but easy to remove the use of the Quote() function if that is not needed.

cscript EE29015138.vbs

' Define needed constants
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2

' Define the folder and extension to scan
strBaseDir = "B:\EE\EE29015138\Files"
strFindExt = ".properties"

' Set up filesystem object for usage
Set objFSO = CreateObject("Scripting.FileSystemObject")

' Look for files (recursively)
FindFiles objFSO.GetFolder(strBaseDir)

' Subroutine (recursive) to search for files to delete
Sub FindFiles(objFolder)
   On Error Resume Next

   ' Look at all files in this folder
   For Each objFile In objFolder.Files
      ' Make sure it's the extension we want
      If LCase(Right(objFile.Name, Len(strFindExt))) = LCase(strFindExt) Then
         ' Process this file
         UpdateFile objFile.Path
      End If
   Next

   ' See if we were able to access this folder, if not don't recurse into it
   If Err.Number = 0 Then
       ' Recursively drill down into subfolder
       For Each objSubFolder In objFolder.SubFolders
           FindFiles objSubFolder
       Next
   End If

End Sub

' Subroutine to update a file
Sub UpdateFile(strFile)
   ' Read entire input file into a variable and close it
   Set objFile = objFSO.OpenTextFile(strFile, ForReading, False, TriStateUseDefault)
   strData = objFile.ReadAll
   objFile.Close
   Set objFile = Nothing

   ' Add full file path to end of each line (TAB delimited, enclose in quotes)
   strData = Replace(strData, vbCrLf, vbTab & Quote(strFile) & vbCrLf)

   ' Write to new CSV file with changes made
   Set objFile = objFSO.OpenTextFile(Replace(strFile, ".properties", ".csv", vbTextCompare), ForWriting, True)
   objFile.Write(strData)
   objFile.Close
   Set objFile = Nothing
End Sub

' Add surrounding double quotes to a string
Function Quote(s)
   Quote = Chr(34) & s & Chr(34)
End Function

Open in new window

~bp
I think I may have missed a piece, you want all data merged into one file, not individual files, I will add that now...

~bp
Okay, this version merges all data into one output file.

' Define needed constants
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 8
Const TriStateUseDefault = -2

' Define the folder and extension to scan
strBaseDir = "B:\EE\EE29015138\Files"
strFindExt = ".properties"
strDestFile = "B:\EE\EE29015138\Merge.csv"

' Set up filesystem object for usage
Set objFSO = CreateObject("Scripting.FileSystemObject")

' Delete output file if it exists
If objFSO.FileExists(strDestFile) Then
   objFSO.DeleteFile(strDestFile)
End If

' Look for files (recursively)
FindFiles objFSO.GetFolder(strBaseDir)

' Subroutine (recursive) to search for files to delete
Sub FindFiles(objFolder)
   On Error Resume Next

   ' Look at all files in this folder
   For Each objFile In objFolder.Files
      ' Make sure it's the extension we want
      If LCase(Right(objFile.Name, Len(strFindExt))) = LCase(strFindExt) Then
         ' Process this file
         UpdateFile objFile.Path
      End If
   Next

   ' See if we were able to access this folder, if not don't recurse into it
   If Err.Number = 0 Then
      ' Recursively drill down into subfolder
      For Each objSubFolder In objFolder.SubFolders
         FindFiles objSubFolder
      Next
   End If

End Sub

' Subroutine to update a file
Sub UpdateFile(strFile)
   ' Read entire input file into a variable and close it
   Set objFile = objFSO.OpenTextFile(strFile, ForReading, False, TriStateUseDefault)
   strData = objFile.ReadAll
   objFile.Close
   Set objFile = Nothing

   ' Add full file path to end of each line (TAB delimited, enclose in quotes)
   strData = Replace(strData, vbCrLf, vbTab & Quote(strFile) & vbCrLf)

   ' Write to new CSV file with changes made
   Set objFile = objFSO.OpenTextFile(strDestFile, ForAppending, True)
   objFile.Write(strData)
   objFile.Close
   Set objFile = Nothing
End Sub

' Add surrounding double quotes to a string
Function Quote(s)
   Quote = Chr(34) & s & Chr(34)
End Function

Open in new window

~bp
Clemo,

I tried to test your PS1 script here but got some errors.  Haven't tried to debug, but wanted to pass along to you in case it was something obvious.

I had this test setup:
[Mon 04/10/2017  7:23:40.87 B:\ee\EE29015138]tree /f /a
B:.
|   EE29015138.vbs
|   EE29015138.ps1
|
\---Files
    |   File2.properties
    |   Dummy.txt
    |   File1.properties
    |
    +---Sub2
    |       File2.properties
    |       Dummy.txt
    |       File1.properties
    |
    \---Sub1
            File2.properties
            Dummy.txt
            File1.properties

Open in new window


And got these errors:
PS B:\EE\EE29015138> B:\ee\EE29015138\EE29015138.ps1
Get-Content : Cannot find path 'B:\EE\EE29015138\File2.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+   Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
+   ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (B:\EE\EE29015138\File2.properties:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
 
Get-Content : Cannot find path 'B:\EE\EE29015138\File1.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+   Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
+   ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (B:\EE\EE29015138\File1.properties:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
 
Get-Content : Cannot find path 'B:\EE\EE29015138\File2.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+   Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
+   ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (B:\EE\EE29015138\File2.properties:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
 
Get-Content : Cannot find path 'B:\EE\EE29015138\File1.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+   Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
+   ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (B:\EE\EE29015138\File1.properties:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
 
Get-Content : Cannot find path 'B:\EE\EE29015138\File2.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+   Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
+   ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (B:\EE\EE29015138\File2.properties:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
 
Get-Content : Cannot find path 'B:\EE\EE29015138\File1.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+   Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
+   ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (B:\EE\EE29015138\File1.properties:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand

Open in new window

~bp
Hi,

With the updated VBS file, I am still not getting the filenames in third columns.

Thanks,
Shail
Please upload one of your .properties file here, it worked fine on files I created here so I suspect your files are different somehow, need to see one.

~bp
Hi,

Please find attached sample files.

Thanks,
Shail
Sample.zip
ASKER CERTIFIED SOLUTION
Avatar of Bill Prew
Bill Prew

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi,
I am getting this error message while running the script.
User generated image
Thanks,
Shail
Bill, thanks for testing. That is one of the oddities with PowerShell and using objects as parameters vs. via pipeline ... But this should work:
# following vars are to be replaced
$srcFolder = 'C:\temp\EE\test'
$srcMask = '*.properties'
$dstFile = 'C:\temp\EE\results.csv'

# nothing to change here
Get-ChildItem -recurse $srcFolder $srcMask |
% {
  $fileName = $_.FullName
  Get-Content $fileName | % { $_ + "`t$fileName"  }
} | Out-File $dstFile -Encoding OEM

Open in new window

Hi,
I have modified this by commenting
'Const EndOfLine = vbLf

and used
strData = Replace(strData, vbLf, vbTab & Quote(strFile) & vbLf)

Gives required output.

Thanks,
Shail
Interesting, I guess Const cant reference pre-defined constants...

Glad you worked around it.

~bp
Thanks a lot!
How could this script be modified to merge two text files out of many pairs that match in file name with a difference of "-TOP" and "-BTM" for the extension?