Avatar of Shailesh Shinde
Shailesh ShindeFlag for India

asked on 

Merging text files strings with filename

Hi All,

We have one requirement to merge all the Tab delimited txt files from folders and sub-folders into Excel file with third column as filename.
Can you please provide any reference or sample code for this.

Thanks,
Shail
VB ScriptScripting LanguagesMicrosoft OfficeVBAMicrosoft Excel

Avatar of undefined
Last Comment
mike g
Avatar of Qlemo
Qlemo
Flag of Germany image

Those files only consist of two columns?
Would a CSV file do?
Should the filename contain the subfolder or path?
Avatar of Shailesh Shinde

ASKER

Hi,

Yes, these files consist of two columns tab delimited format with extension .properties.
Yes, filename should contain the full path in column third.

Thanks,
Shail
Avatar of Qlemo
Qlemo
Flag of Germany image

There are many ways to do that. One is using a PowerShell script and simple text processing:
# following vars are to be replaced
$srcFolder = 'C:\temp\EE\test'
$srcMask = '*.properties'
$dstFile = 'C:\temp\EE\results.csv'

# nothing to change here
Get-ChildItem -recurse $srcFolder $srcMask |
% {
  $file = $_
  Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
} | Out-File $dstFile -Encoding OEM

Open in new window

You might have to change the encoding, but I suppose all files are simple text files, no Unicode type.
Avatar of Bill Prew
Bill Prew

Here is a VBscript approach, adjust the base folder location near the top and save as a VBS file, then run as follows.  I assumed you would want to enclose tha dded column of the full path in double quotes since it could contain spaces, but easy to remove the use of the Quote() function if that is not needed.

cscript EE29015138.vbs

' Define needed constants
Const ForReading = 1
Const ForWriting = 2
Const TriStateUseDefault = -2

' Define the folder and extension to scan
strBaseDir = "B:\EE\EE29015138\Files"
strFindExt = ".properties"

' Set up filesystem object for usage
Set objFSO = CreateObject("Scripting.FileSystemObject")

' Look for files (recursively)
FindFiles objFSO.GetFolder(strBaseDir)

' Subroutine (recursive) to search for files to delete
Sub FindFiles(objFolder)
   On Error Resume Next

   ' Look at all files in this folder
   For Each objFile In objFolder.Files
      ' Make sure it's the extension we want
      If LCase(Right(objFile.Name, Len(strFindExt))) = LCase(strFindExt) Then
         ' Process this file
         UpdateFile objFile.Path
      End If
   Next

   ' See if we were able to access this folder, if not don't recurse into it
   If Err.Number = 0 Then
       ' Recursively drill down into subfolder
       For Each objSubFolder In objFolder.SubFolders
           FindFiles objSubFolder
       Next
   End If

End Sub

' Subroutine to update a file
Sub UpdateFile(strFile)
   ' Read entire input file into a variable and close it
   Set objFile = objFSO.OpenTextFile(strFile, ForReading, False, TriStateUseDefault)
   strData = objFile.ReadAll
   objFile.Close
   Set objFile = Nothing

   ' Add full file path to end of each line (TAB delimited, enclose in quotes)
   strData = Replace(strData, vbCrLf, vbTab & Quote(strFile) & vbCrLf)

   ' Write to new CSV file with changes made
   Set objFile = objFSO.OpenTextFile(Replace(strFile, ".properties", ".csv", vbTextCompare), ForWriting, True)
   objFile.Write(strData)
   objFile.Close
   Set objFile = Nothing
End Sub

' Add surrounding double quotes to a string
Function Quote(s)
   Quote = Chr(34) & s & Chr(34)
End Function

Open in new window

~bp
Avatar of Bill Prew
Bill Prew

I think I may have missed a piece, you want all data merged into one file, not individual files, I will add that now...

~bp
Avatar of Bill Prew
Bill Prew

Okay, this version merges all data into one output file.

' Define needed constants
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 8
Const TriStateUseDefault = -2

' Define the folder and extension to scan
strBaseDir = "B:\EE\EE29015138\Files"
strFindExt = ".properties"
strDestFile = "B:\EE\EE29015138\Merge.csv"

' Set up filesystem object for usage
Set objFSO = CreateObject("Scripting.FileSystemObject")

' Delete output file if it exists
If objFSO.FileExists(strDestFile) Then
   objFSO.DeleteFile(strDestFile)
End If

' Look for files (recursively)
FindFiles objFSO.GetFolder(strBaseDir)

' Subroutine (recursive) to search for files to delete
Sub FindFiles(objFolder)
   On Error Resume Next

   ' Look at all files in this folder
   For Each objFile In objFolder.Files
      ' Make sure it's the extension we want
      If LCase(Right(objFile.Name, Len(strFindExt))) = LCase(strFindExt) Then
         ' Process this file
         UpdateFile objFile.Path
      End If
   Next

   ' See if we were able to access this folder, if not don't recurse into it
   If Err.Number = 0 Then
      ' Recursively drill down into subfolder
      For Each objSubFolder In objFolder.SubFolders
         FindFiles objSubFolder
      Next
   End If

End Sub

' Subroutine to update a file
Sub UpdateFile(strFile)
   ' Read entire input file into a variable and close it
   Set objFile = objFSO.OpenTextFile(strFile, ForReading, False, TriStateUseDefault)
   strData = objFile.ReadAll
   objFile.Close
   Set objFile = Nothing

   ' Add full file path to end of each line (TAB delimited, enclose in quotes)
   strData = Replace(strData, vbCrLf, vbTab & Quote(strFile) & vbCrLf)

   ' Write to new CSV file with changes made
   Set objFile = objFSO.OpenTextFile(strDestFile, ForAppending, True)
   objFile.Write(strData)
   objFile.Close
   Set objFile = Nothing
End Sub

' Add surrounding double quotes to a string
Function Quote(s)
   Quote = Chr(34) & s & Chr(34)
End Function

Open in new window

~bp
Avatar of Bill Prew
Bill Prew

Clemo,

I tried to test your PS1 script here but got some errors.  Haven't tried to debug, but wanted to pass along to you in case it was something obvious.

I had this test setup:
[Mon 04/10/2017  7:23:40.87 B:\ee\EE29015138]tree /f /a
B:.
|   EE29015138.vbs
|   EE29015138.ps1
|
\---Files
    |   File2.properties
    |   Dummy.txt
    |   File1.properties
    |
    +---Sub2
    |       File2.properties
    |       Dummy.txt
    |       File1.properties
    |
    \---Sub1
            File2.properties
            Dummy.txt
            File1.properties

Open in new window


And got these errors:
PS B:\EE\EE29015138> B:\ee\EE29015138\EE29015138.ps1
Get-Content : Cannot find path 'B:\EE\EE29015138\File2.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+   Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
+   ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (B:\EE\EE29015138\File2.properties:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
 
Get-Content : Cannot find path 'B:\EE\EE29015138\File1.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+   Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
+   ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (B:\EE\EE29015138\File1.properties:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
 
Get-Content : Cannot find path 'B:\EE\EE29015138\File2.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+   Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
+   ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (B:\EE\EE29015138\File2.properties:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
 
Get-Content : Cannot find path 'B:\EE\EE29015138\File1.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+   Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
+   ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (B:\EE\EE29015138\File1.properties:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
 
Get-Content : Cannot find path 'B:\EE\EE29015138\File2.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+   Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
+   ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (B:\EE\EE29015138\File2.properties:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand
 
Get-Content : Cannot find path 'B:\EE\EE29015138\File1.properties' because it does not exist.
At B:\ee\EE29015138\EE29015138.ps1:10 char:3
+   Get-Content $_ | % { $_ + "`t$($file.FullName)"  }
+   ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (B:\EE\EE29015138\File1.properties:String) [Get-Content], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand

Open in new window

~bp
Avatar of Shailesh Shinde

ASKER

Hi,

With the updated VBS file, I am still not getting the filenames in third columns.

Thanks,
Shail
Avatar of Bill Prew
Bill Prew

Please upload one of your .properties file here, it worked fine on files I created here so I suspect your files are different somehow, need to see one.

~bp
Avatar of Shailesh Shinde

ASKER

Hi,

Please find attached sample files.

Thanks,
Shail
Sample.zip
ASKER CERTIFIED SOLUTION
Avatar of Bill Prew
Bill Prew

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of Shailesh Shinde

ASKER

Hi,
I am getting this error message while running the script.
User generated image
Thanks,
Shail
Avatar of Qlemo
Qlemo
Flag of Germany image

Bill, thanks for testing. That is one of the oddities with PowerShell and using objects as parameters vs. via pipeline ... But this should work:
# following vars are to be replaced
$srcFolder = 'C:\temp\EE\test'
$srcMask = '*.properties'
$dstFile = 'C:\temp\EE\results.csv'

# nothing to change here
Get-ChildItem -recurse $srcFolder $srcMask |
% {
  $fileName = $_.FullName
  Get-Content $fileName | % { $_ + "`t$fileName"  }
} | Out-File $dstFile -Encoding OEM

Open in new window

Avatar of Shailesh Shinde

ASKER

Hi,
I have modified this by commenting
'Const EndOfLine = vbLf

and used
strData = Replace(strData, vbLf, vbTab & Quote(strFile) & vbLf)

Gives required output.

Thanks,
Shail
Avatar of Bill Prew
Bill Prew

Interesting, I guess Const cant reference pre-defined constants...

Glad you worked around it.

~bp
Avatar of Shailesh Shinde

ASKER

Thanks a lot!
Avatar of mike g
mike g

How could this script be modified to merge two text files out of many pairs that match in file name with a difference of "-TOP" and "-BTM" for the extension?
Microsoft Excel
Microsoft Excel

Microsoft Excel topics include formulas, formatting, VBA macros and user-defined functions, and everything else related to the spreadsheet user interface, including error messages.

144K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo