Powershell Script List Top 50 Largest Files

I have a very large File Server environment with 17TB of data and our current storage monitor solution is not able to scan the volume fast enough to produce a report daily..

I am not a powershell export but I am looking to see if a scritp would be able to list the Top 50 largest files per volume from largest to smallest, its full path and date of late access , owner and of course size. If it could be exported to a csv this would be great...
LVL 21
compdigit44Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
I doubt the PS approach will help with that, as it still has to traverse all file system info to determine the top n files. But of course you can try.
Get-ChildItem C:\ -recurse | select FullName, Length | sort Length -Desc | Select -First 50

Open in new window

compdigit44Author Commented:
Thanks...Does this list the size of the file though?
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
Yes, it shows the full path plus the size.
Active Protection takes the fight to cryptojacking

While there were several headline-grabbing ransomware attacks during in 2017, another big threat started appearing at the same time that didn’t get the same coverage – illicit cryptomining.

Martin PerottoCommented:
yes and add this
| export-csv C:\file_size\log.csv
the you have the output as a file
compdigit44Author Commented:
Thanks I am trying it now..
compdigit44Author Commented:
the script is running but I am getting messages that the file paths it is hitting are to long..
AsifCommented:
By default the "length" property does not show size in MB.
Some addition to script posted by Qlemo.
will give output to .csv and size will be in MB
Get-ChildItem C:\ -Recurse | select Name, Directory, @{N="Size(MB)";E={[Math]::Round($_.length/1MB,2)}} | sort "Size(MB)" -Descending | select -First 50 | Export-Csv -Path d:\Top50Files.csv -NoTypeInformation

Open in new window

compdigit44Author Commented:
Thanks... I will try this tomorrow and report back.. With this be able to handles files names with very long path names?
AsifCommented:
As the error says, its due to the path of file is more than 256 characters.
You will face the same error by running the script I posted coz there is no change is the cmdlet used (get-childitem).
oBdACommented:
This uses robocopy to produce the file list (but it will not actually copy anything), as robocopy doesn't care about long paths. It should have a pretty low memory footprint, too, since it doesn't collect all items first and then sorts them, but only collects the top n biggest ones.
Can't promise anything concerning speed, though.
It returns an array of PSCustom objects with two properties, FullName and Length. You can process that output any way you feel like, for example like this:
$Top = .\Whatever.ps1 -Path E:\Wherever
$Top | fl
$Top | Export-Csv -Path C:\Wherever\Top.csv -NoTypeInformation

Open in new window

[CmdletBinding()]
Param(
	[string]$Path = $(Get-Location -PSProvider Filesystem),
	[uint32]$Top = 50
)
$List = 1..$Top | ForEach-Object {New-Object -TypeName PSObject -Property @{"Length" = 0}}
& robocopy.exe $Path C:\Dummy_Must_Not_Exist *.* /L /s /nc /njh /njs /ndl /fp /bytes /r:0 | ForEach-Object {
	If ($_) {
		$Split = $_.Split("`t")
		$Length = [int]$Split[3].Trim()
		If ($Length -gt $List[0].Length) {
			$List[0] = $Split[4].Trim() | Select-Object -Property @{Name="FullName"; Expression={$_};}, @{Name="Length"; Expression={$Length}}
			$List = $List | Sort-Object -Property Length
		}
	}
}
$List | Where-Object {$_.Length -gt 0}

Open in new window

We'll worry about how (considering the long file paths) to get the rest of the information you want if that runs in a timely fashion.
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
A very clever way to get the top n. Had to think about how it works for some time...
compdigit44Author Commented:
Thank for the replied and will look at this more tomorrow but the though of using robocopy even to just list files scare me a bit
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
In the past, I've used successive automaitc SUBST to cut down overly long paths in PowerShell, but that is in no way better (as any advantage) over using RoboCopy. RoboCopy is built-in, so why not use it?
aikimarkCommented:
A couple of suggestions
Exclude small files -- in this parameter example, 1MB minimum greatly reduced the number of output lines produced by Robocopy in my test.  If you know your file system well, you might be able to increase that minimum
/min:1000000

Open in new window


Sort once -- instead of sorting your list every time you add an item, sort all the items and take the top N.
oBdACommented:
compdigit44,
with the /L, argument, robocopy will only list the files - really. All it will do is generate a file list.

aikimark,
the /min is a nice touch.
The "sort once [all the items]" is exactly what I wanted to avoid. Not knowing anything about the server resources and its file structure, I find it safer to invest a bit of CPU into sorting (not that sorting 50 elements is really that taxing) than to first collect potentially gigabytes of data in RAM.
oBdACommented:
Here's a version that includes the minimum size suggested by aikimark, and more importantly, fixes a sizing issue (sorry); [int] might not be big enough. In addition, it gives you LastWriteTime and size in MB.
[CmdletBinding()]
Param(
	[string]$Path = $(Get-Location -PSProvider Filesystem),
	[uint32]$Top = 50,
	[string]$MinimumSize = "1MB"
)
$List = 1..$Top | ForEach-Object {New-Object -TypeName PSObject -Property @{"Length" = 0}}
If (($Min = ([int64]1 * $MinimumSize.Replace(" ", ""))) -isnot [int64]) {
	"Unable to parse '$($MinimumSize)' to an integer." | Write-Error
	Exit 1
}
## Not collecting LastWriteTime yet; it'll be in the same field as the Length, which would require an additional Split().
& robocopy.exe $Path C:\Dummy_Must_Not_Exist *.* /L /s /nc /njh /njs /ndl /fp /bytes /min:$Min /r:0 | ForEach-Object {
	If ($_) {
		$Split = $_.Split("`t")
		$Length = [int64]$Split[3].Trim()
		If ($Length -gt $List[0].Length) {
			$List[0] = $Split[4].Trim() | Select-Object -Property @{Name="FullName"; Expression={$_};}, @{Name="Length"; Expression={$Length}}
			$List = $List | Sort-Object -Property Length
		}
	}
}
$List | Where-Object {$_.Length -gt 0} | ForEach-Object {
	$Folder = Split-Path -Path $_.FullName -Parent
	$File = Split-Path -Path $_.FullName -Leaf
	$Line = & robocopy.exe $Folder C:\Dummy_Must_Not_Exist $File /L /s /nc /njh /njs /ndl /ns /ts
	$_ | Select-Object -Property `
		*,
		@{Name="SizeMB"; Expression={"{0:N3}" -f ($_.Length / 1MB)}},
		@{Name="LastWriteTime"; Expression={[DateTime]($Line.Split("`t")[4].Trim())}}
}

Open in new window

aikimarkCommented:
Every time you find a file that is larger than the minimum in your list, you are performing the sort.

You could go through a second round of filtering after the Robocopy /min:### process.  
1. Read the results and look for the largest and smallest values.  Alternatively, you could do a frequency analysis of the file sizes (a more accurate approach).    
2. With the information/data from step 1, you can then filter the Robocopy output to make sorting a simple, non-system-stressing operation.

===================
For frequency analysis, use a 10x14 array to store the counts of the mantissa digit and exp indexed file counts.

A simpler frequency analysis can be made of the length of the string numbers, ignoring the mantissa data.

A simpler approach to the initial filtering would be to start with a large /min:### value and decrease it until you get more than 50 (?N?) files.  Save that value for each volume for the next run.

Example:
On my test, I ran the Robocopy against a directory tree with 31000 files.  The resulting directory tree output was 4MB.  Applying the /min:1000000 reduced the file to 30KB with 390 lines.
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
Mark, there only two practical choices:
Either you ignore the amount of data, e.g. because it is of no significance anymore (filtered to something reasonable),
or you sort each time.

Everything else is sorting theory only. without any practical value for this case. It is making it complex, not simple, to apply statistical methods.
The "sort on change" approach will scale well, as the probability to have the top n values in the list is increasing with each file found, and then no sort is done.

A sort of 50 objects (with an average of maybe 250 Bytes per entry) should be a very fast operation, without much CPU cost. The more objects to keep, the more cost, that is correct - with O(n^2), if I'm correct.
aikimarkCommented:
@Q

O(N * log2(N))

I did some back-of-the-napkin estimate, based on my earlier test.  If I scale up my 8.5GB tree to 17TB, I might expect a concomitant 2000 fold increase in the number of files.  So, my 30,900 file problem would scale up to 61,800,000 file problem.  Sure, sorting 50 items isn't too terribly expensive, but we don't know the distribution of the files, so we can't eliminate the possibility that we would be doing over 60M sorting operations on our little list.
compdigit44Author Commented:
I ran the script below as posted earlier and get a message that states.."Cannot call method on null value expression" yet does should some files.

Also I noticed the Full name path list a whole bunch of dots at the end. If the script exported to a CSV could we see the full file path? What does the column Length mean? Stupid question how can I be sure that robocopy is not moving any files? Sorry just paranoid...


[CmdletBinding()]
Param(
	[string]$Path = $(Get-Location -PSProvider Filesystem),
	[uint32]$Top = 50,
	[string]$MinimumSize = "1MB"
)
$List = 1..$Top | ForEach-Object {New-Object -TypeName PSObject -Property @{"Length" = 0}}
If (($Min = ([int64]1 * $MinimumSize.Replace(" ", ""))) -isnot [int64]) {
	"Unable to parse '$($MinimumSize)' to an integer." | Write-Error
	Exit 1
}
## Not collecting LastWriteTime yet; it'll be in the same field as the Length, which would require an additional Split().
& robocopy.exe $Path C:\Dummy_Must_Not_Exist *.* /L /s /nc /njh /njs /ndl /fp /bytes /min:$Min /r:0 | ForEach-Object {
	If ($_) {
		$Split = $_.Split("`t")
		$Length = [int64]$Split[3].Trim()
		If ($Length -gt $List[0].Length) {
			$List[0] = $Split[4].Trim() | Select-Object -Property @{Name="FullName"; Expression={$_};}, @{Name="Length"; Expression={$Length}}
			$List = $List | Sort-Object -Property Length
		}
	}
}
$List | Where-Object {$_.Length -gt 0} | ForEach-Object {
	$Folder = Split-Path -Path $_.FullName -Parent
	$File = Split-Path -Path $_.FullName -Leaf
	$Line = & robocopy.exe $Folder C:\Dummy_Must_Not_Exist $File /L /s /nc /njh /njs /ndl /ns /ts
	$_ | Select-Object -Property `
		*,
		@{Name="SizeMB"; Expression={"{0:N3}" -f ($_.Length / 1MB)}},
		@{Name="LastWriteTime"; Expression={[DateTime]($Line.Split("`t")[4].Trim())}}
}

Open in new window

aikimarkCommented:
how can I be sure that robocopy is not moving any files?
because of the /L command line switch
compdigit44Author Commented:
Thanks ... Just wanted to make sure since the Microsoft site use a lower case l and not upper case.. Is the other syntax include just logging?

what about the error message I am getting or the full file name not being listed? Would it display the file name if exported to CSV... What would need to be changed to export it to a CSV?
aikimarkCommented:
shouldn't be an issue, since the long path names are coming from Robocopy and not from within PS
oBdACommented:
robocopy arguments are case insensitive (the /L is uppercase so it's obvious it's an "L", not maybe an uppercase "i"); the other arguments just get rid of the additional output robocopy generates otherwise.
The column Length is the length of the file in bytes.
For testing, use it as I suggested above; start with
$Top = .\Whatever.ps1 -Path E:\Wherever

Open in new window

$Top now contains an array of the objects found, which you can inspect any which way you want, for example (Format-Table won't help you much because of the long file paths.):
$Top | Format-List

Open in new window

Or export it to csv:
$Top | Export-Csv -Path C:\Wherever\Top.csv -NoTypeInformation

Open in new window

To run the script and export to csv in one go, just pipe the output to Export-Csv instead of collecting it in a variable:
.\Whatever.ps1 -Path E:\Wherever | Export-Csv -Path C:\Wherever\Top.csv -NoTypeInformation

Open in new window

The error was probably an error message from robocopy. The following has better error handling:
[CmdletBinding()]
Param(
	[string]$Path = $(Get-Location -PSProvider Filesystem),
	[uint32]$Top = 50,
	[string]$MinimumSize = "1MB"
)
$List = 1..$Top | ForEach-Object {New-Object -TypeName PSObject -Property @{"Length" = 0}}
If (($Min = ([int64]1 * $MinimumSize.Replace(" ", ""))) -isnot [int64]) {
	"Unable to parse '$($MinimumSize)' to an integer." | Write-Error
	Exit 1
}
$FileCount = $ErrorCount = 0
## Not collecting LastWriteTime yet; it'll be in the same field as the Length, which would require an additional Split().
& robocopy.exe $Path C:\Dummy_Must_Not_Exist *.* /L /s /nc /njh /njs /ndl /fp /bytes /min:$Min /r:0 | ForEach-Object {
	If ($_) {
		$Line = $_
		Try {
			$Split = $Line.Split("`t")
			$Length = [int64]$Split[3].Trim()
			If ($Length -gt $List[0].Length) {
				$List[0] = $Split[4].Trim() | Select-Object -Property @{Name="FullName"; Expression={$_}}, @{Name="Length"; Expression={$Length}}
				$List = $List | Sort-Object -Property Length
			}
			$FileCount++
		} Catch {
			"Unable to parse the line: '$($Line)'" | Write-Warning
			$ErrorCount++
		}
	}
}
$List | Where-Object {$_.Length -gt 0} | ForEach-Object {
	$Folder = Split-Path -Path $_.FullName -Parent
	$File = Split-Path -Path $_.FullName -Leaf
	$Line = & robocopy.exe $Folder C:\Dummy_Must_Not_Exist $File /L /nc /njh /njs /ndl /ns /ts
	$_ | Select-Object -Property `
		*,
		@{Name="SizeMB"; Expression={[math]::Round(($_.Length / 1MB), 3)}},
		@{Name="LastWriteTime"; Expression={[DateTime]($Line.Split("`t")[4].Trim())}}
}
"Analyzed $($FileCount) files bigger than $($MinimumSize)." | Write-Host
If ($ErrorCount -gt 0) {
	"Encountered $($ErrorCount) errors." | Write-Warning
}

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
compdigit44Author Commented:
Stupid question...

I see you are saving the script to a PS1 file.. for powershell then using the path command.. Isn't this redundant

\Whatever.ps1 -Path E:\Wherever | Export-Csv -Path C:\Wherever\Top.csv -NoTypeInformation

Also were do I specific which volume to scan
oBdACommented:
Sorry, I can't quite follow you.
The script accepts three arguments, all optional:
-Path: The path to the folder in which to start the report; default is the current folder (which is not necessarily the script's).
-Top: the number of files to collect in the report; default is 50.
-MinimumSize: the minimum file size in bytes to consider wort checking; this accepts (as you can see in the default value) stuff like "1MB" or "100KB" or "10TB" as well.
compdigit44Author Commented:
Thank but how does the script know which drive to scan..
oBdACommented:
With the -Path argument, the same way Get-ChildItem expects the path to list as argument.
Save the script somewhere, for example as C:\Temp\Get-TopFiles.ps1.
Then open a PS console (preferably as Administrator), and if you want to scan for example the folder "E:\BigData", enter
$Top = C:\Temp\Get-TopFiles.ps1 -Path E:\BigData

Open in new window

Do not forget the "$Top ="; it collects the objects returned so that you have more to work with than some console output that will be truncated.
For further processing of that variable and how to export it to csv, check my examples above http:#a41455680.
aikimarkCommented:
In a test I ran, the Robocopy output might contain error messages like this:
2016/02/09 10:18:50 ERROR 5 (0x00000005) Scanning Source Directory c:\users\aikimark\Templates\
Access is denied.

Open in new window


So, you will need to check that the file length column you parse is numeric before you add the item to your list.

Alternatively, you might check that the split line result has more than one item.
oBdACommented:
That's already been taken care of in the latest version. The error lines don't contain tabs, so access to the split array will fail and end up in the Catch.
compdigit44Author Commented:
First off I wanted to thank everyone one for their help. I have been testing the script and it is working great you Guy's are Grand Master at this ...

Right now my goal is to run a scheduled task to scan each of my volume daily overwriting the same CSV name. The after all file are done run a final script to send all CSV's as an email attachment. Is this hard to do?

 Also when I run the script in the CSV what does the LENGTH column represent?
Qlemo"Batchelor", Developer and EE Topic AdvisorCommented:
Length is the file size (it is the way PowerShell fills the file object). We would not have to keep it, as the script adds SizeMB, but it is no bad idea either to see the "raw" data.

The "send all CSV" part isn't difficult:
Send-MailMessage -Subject "Daily Report" -From me@domain.com -To you@domain.com -SmtpServer mail.domain.com -Attachments (get-childitem c:\temp\ -include *.csv | select -Expand FullName)

Open in new window

or
get-childitem c:\temp\ -include *.csv | select -Expand Fullname | Send-MailMessage  -Subject "Daily Report" -From me@domain.com -To you@domain.com -SmtpServer mail.domain.com

Open in new window

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Powershell

From novice to tech pro — start learning today.