I'm processing xml files. Because of their size, I need to chunk them. This means a StreadReader, from what I've determined.
I soon noticed a couple of strange "transformations". For instance a node like "<Person attribute=value>" would come through as "<Person attribute=value />". Note the closing slash. Even stranger, actual closing slashes like "/>" were transformed to "/\>".
After quite a bit of struggling I discovered that if I specify the encoding on the StreadReader as UTF8 it seems to work as desired.
Can anyone explain what might be going on here? I assumed these files were simple "ASCII" encoded. Why would they be transformed?
$if = new-object System.IO.StreamReader -ArgumentList ([string]$sourceFile, [System.Text.Encoding]::UTF8, [Boolean]$false, [int]$bufferSize)
[Char]$buffer = new-object char $bufferSize
[int]$bytesRead = $if.ReadBlock($buffer, 0, $buffer.Length)
while ($bytesRead -gt 0)
[string]$chunk = New-Object string($buffer, 0, $bytesRead)