Solved

Powershell growing a file

Posted on 2012-04-04
11
921 Views
Last Modified: 2012-04-09
This is really far out there. But Powershell is doing something which I can't explain for sure. I mean, I think that when it is storing variables it is storing them with a type which is far larger in size than the original. So let's dig in.

I have a file. Even though it's a text file, it contains values which are comma separated inside the file.
Example line from file:
<True/False>,<Name>,<date>,<Instance Id>,<Value>,<Group ID>

Every line looks the same. But they are randomly entered, and not sorted. The goal here is to sort the lines by <Group ID>. To do so I pull each line one at a time, and store the unique Group ID's in an array, then come back and loop over that array, and inside loop again over the file again to pull and sort the lines in order of Group ID.

                    Foreach ($line in $xFile) {
                        $lineARR = $line.split(",")
                        $ArrOfOwners += $lineARR[5]
                    }
                    $ArrOfOwners = $ArrOfOwners | Sort-Object


                   foreach ($owner in $ArrOfOwners) {
                        foreach ($line in $xFile) {
                            if ($owner -eq $line[5]) {
                                $line >> $FixedPath
                            }
                        } 
                    }

Open in new window



PROBLEM IS THIS: It's not that this code doesn't work. Because it does. It's that the output file is FAR larger than the input file. Where is all this extra space coming from??? Because after this I have to import the file into out CRM software, and it will not seem to accept it, but it will accept the original, but ONLY the first <Group ID> of any group. After that if the <Group ID> repeats itself later, it will not accept the second input per Group ID unless it is right bellow the above one. This is why the file needs to be sorted by that, but the problem is that it will not accept this file which has not changed it's content (simply sorted it) but somehow nearly doubled in size.

 Pre-Powershell (bottom)  -- Post-Powershell (top)
0
Comment
Question by:Alex_MPM
11 Comments
 
LVL 12

Expert Comment

by:prashanthd
ID: 37806748
Rename file to .csv

The first line in the file is considered as header in csv file.

So if the header is missing, add the follwing line as header

Status,Name,date,InstanceId,Value,GroupID

run following command from powershell

import-csv c:\inputfile.csv | sort-object "GroupId" | export-csv c:\output.csv
0
 

Author Comment

by:Alex_MPM
ID: 37806939
This has been extremely helpful, but unfortunately it still requires a lot of user intervention. See this is a large process that does a lot with the data afterwards (it's thousands of lines of code), and what the user will end up doing is dropping a file into a folder, and this process will automatically notice the file is there, then does the processing. This is only one section of it that isn't working. So I'm wondering if you would know a way which I can get that first line/header inserted into the file before using the import-csv cmd, as well as exporting it to a text file (without the additional quotes <"> powershell has added onto each element) instead of export-csv. Otherwise this does seem to address the size issue.

with export-csv it's making these lines:
"<True/False>","<Name>","<date>","<Instance Id>","<Value>","<Group ID>"

But I need it without the quotes:
<True/False>,<Name>,<date>,<Instance Id>,<Value>,<Group ID>
0
 
LVL 12

Expert Comment

by:prashanthd
ID: 37807477
Try the following

only modify path to input file, no need to rename file or add header

$input_file=test.txt
"Status,Name,date,InstanceId,Value,GroupID" | Out-File temp.txt
Get-Content $input_file | Out-File temp.txt -Append
$sorted=Import-Csv temp.txt | Sort-Object "GroupID"
New-Item $input_file -type file -force
foreach($obj in $sorted){
"$($obj.status),$($obj.name),$($obj.date),$($obj.instanceid),$($obj.value),$($obj.groupid)" | Out-File $input_file -Append
}

Open in new window

0
 

Author Comment

by:Alex_MPM
ID: 37807949
Yeah, This method gets the job done, but again by somehow doubling the file size without adding ANY content. I have figured this out though, is that it has something to do with Get-Content. Using the Import-CSV it's able to keep the filesize to roughly the same size, but Get-Content seems to transform the data somehow, because we start out with a 17KB file, and end up with a 34KB file. Although we didn't change anything about the file, we only sorted the data. And I don't know what that extra data is that is packed in there, and I really don't feel like opening the hex editor to try to find out right now, but it may come to that. Because it's something in that process which is not allowing this file to then be imported into our Software. But if you manually re-arrange these entires you it will work. So it's definately something Powershell is doing behind the scenes to increase this data size. For example, perhaps when you run the 'Foreach Line in File' it may be forcing each line to be 64 bytes in size, and then if the data doesn't fill the 64 bytes, it pads it with zeros, thus truely changing the data that is in there?!? I mean that's the only thing I can think of which would make a file to grow so much when so little is done to it.? Thoughts?
0
 
LVL 12

Expert Comment

by:prashanthd
ID: 37809489
Can you check the encoding type on the original text file?
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 

Author Comment

by:Alex_MPM
ID: 37811622
No idea how to do that?
0
 
LVL 11

Expert Comment

by:marek1712
ID: 37815640
PowerShell probably encodes the file in UTF-16 (2B/character) and your original file is in the different encoding (like ASCII, which uses 1 byte to describe a character).
My way of checking the encoding:
- open the file in a hex editor (i.e. small HxD)
- open a website with encoding charts (like Wikipedia)
- check character hex codes and compare them with the charts

Description of the Out-File cmdlet. Contains -Encryption switch.
0
 
LVL 68

Accepted Solution

by:
Qlemo earned 500 total points
ID: 37821217
I'm pretty certain it is the Encoding, since the file size is just doubled. PowerShell defaults to Unicode (UTF-16), as the description of out-file tells us (and I know from own experience). Line 7 above would then sound:
"$($obj.status),$($obj.name),$($obj.date),$($obj.instanceid),$($obj.value),$($obj.groupid)" | Out-File $input_file -Append -Encoding "ASCII"

Open in new window

However, we can even skip the temp file generation here, and do anything in almost one go.
A slight improvement might be to use less file operation commands, and do everything in memory - if the files aren't that big.
$input_file="test.txt"
$txt = import-Csv $input_file -Header Status, Name, Date, InstanceID, Value, GroupID |
  sort-object GroupID |
  % { write-output "$($_.Status),$($_.Name),$($_.Date),$($_.InstanceID),$($_.Value),$($_.GroupID)" }
out-file -InputObject $txt -Encoding "ASCII" $input_file 

Open in new window

0
 
LVL 12

Expert Comment

by:prashanthd
ID: 37821296
Try the following..

$input_file=test.txt
"Status,Name,date,InstanceId,Value,GroupID" | Out-File temp.txt
Get-Content $input_file | Out-File temp.txt -Append
$sorted=Import-Csv temp.txt | Sort-Object "GroupID"
New-Item $input_file -type file -force
foreach($obj in $sorted){
"$($obj.status),$($obj.name),$($obj.date),$($obj.instanceid),$($obj.value),$($obj.groupid)" | Out-File $input_file -Append -Encoding "ASCII"
}
                                            

Open in new window

0
 

Author Closing Comment

by:Alex_MPM
ID: 37823335
That did it, and did it phenomenally. By using the '-Header' switch I don't need to add the headers to the file, and thus, don't have to remove them afterwards too. Changing it to ASCII is quite a useful feature, and will surely come in handy many times in my future. This is probably the best answer I have ever gotten on the exchange because it did precisely what I needed it to do without cluttering up with tones of code to do a 'work around'. I appreciate your efforts.
0
 
LVL 12

Expert Comment

by:prashanthd
ID: 37824095
@Qlemo Awesome as usual...
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Why would I want to create a function for tracking messages? I am glad you asked. As with most monotonous/routine tasks, human error tends to creep in after doing the same task over and over again. By creating a function, you load the function once…
Are you one of those front-line IT Service Desk staff fielding calls, replying to emails, all-the-while working to resolve end-user technological nightmares? I am! That's why I have put together this brief overview of tools and techniques I use in o…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now