Use PowerShell to modify text in cp866 codepage

Have a lot of batch scripts (inherited from exAdmin). All of those scripts are in cp866 (Cyrillic DOS) codepage.
Those scripts work fine and do what they are expected to do.

Recently we changed some infrastructure objects, so i need to modify all those scripts to replace the infrastructure_object_names.

For example, the source file (cp866) has a text, that from PSh looks like:
PS C:\tests> get-content .\source.txt
'к?им ?йс нвЁе ┐п?ЄЁе да -жг§бЄЁе Ўг<RЄ, ¤  ўлЇ?c з о.

But actually, the content is:
Съешь ещё этих мягких французских булок, да выпей чаю.

And in this .\source.txt file i need to replace "булок" to "плюшек" and get inresult:
Съешь ещё этих мягких французских плюшек, да выпей чаю.

P.S. I understand that my example doesn't look like batch script, but the main idea is to replace a content within a cyrillic-dos-encoded file.
LVL 2
Petr PoleshkoSystems administratorAsked:
Who is Participating?
 
Petr PoleshkoSystems administratorAuthor Commented:
And, i was so stubborn this morning, so i tried again and again.

PS C:\tests> Get-Content .\source.txt | ConvertTo-Encoding cp866 windows-1251 | foreach-object {$_ -replace 'булок', 'плюшек'} | out-file -encoding oem .\result.txt
PS C:\tests> Get-Content .\result.txt | ConvertTo-Encoding cp866 windows-1251
Съешь ещё этих мягких французских плюшек, да выпей чаю.
PS C:\tests> Get-Content .\source.txt | ConvertTo-Encoding cp866 windows-1251
Съешь ещё этих мягких французских булок, да выпей чаю.

Open in new window


The source text in cp866 and converted+replaced text that converted back in oem (actualy i dont know how to get that exactly oem-encoding mean) encoding returns write text  :)
0
 
QlemoBatchelor, Developer and EE Topic AdvisorCommented:
Does it display properly if you change the encoding when reading the file? Use
   get-content source.txt -Encoding X
with X one of: Unicode, UTF8, UTF7, UTF32, Ascii, Default, Oem.
0
 
Petr PoleshkoSystems administratorAuthor Commented:
PS C:\tests> get-content source.txt -Encoding Unicode
???????????????????????????
PS C:\tests> get-content source.txt -Encoding utf8
???? ??? ??? ???? ?????? ???, ?? ??? ??.
PS C:\tests> get-content source.txt -Encoding utf7
?e?ei ?en ia?a ┐i???a aa -?a§a??a ?a<R?, ¤  ?e??c c i.
PS C:\tests> get-content source.txt -Encoding ascii
????? ??? ???? ?????? ??????????? ?????, ?? ????? ???.
PS C:\tests> get-content source.txt -Encoding String
'к?им ?йс нвЁе ┐п?ЄЁе да -жг§бЄЁе Ўг<RЄ, ¤  ўлЇ?c з о.
PS C:\tests> get-content source.txt -Encoding byte
145
234
... ...
238
46

Encoging into oem, default are not supported and causes error:
+ get-content source.txt -Encoding <<<<  oem (default)
    + CategoryInfo          : InvalidArgument: (:) [Get-Content], ParameterBindingException
    + FullyQualifiedErrorId : CannotConvertArgumentNoMessage,Microsoft.PowerShell.Commands.GetContentCommand
0
Protect Your Employees from Wi-Fi Threats

As Wi-Fi growth and popularity continues to climb, not everyone understands the risks that come with connecting to public Wi-Fi or even offering Wi-Fi to employees, visitors and guests. Download the resource kit to make sure your safe wherever business takes you!

 
Petr PoleshkoSystems administratorAuthor Commented:
There is what i've found on the Internet:

function ConvertTo-Encoding ([string]$From, [string]$To){
      Begin{
            $encFrom = [System.Text.Encoding]::GetEncoding($from)
            $encTo = [System.Text.Encoding]::GetEncoding($to)
      }
      Process{
            $bytes = $encTo.GetBytes($_)
            $bytes = [System.Text.Encoding]::Convert($encFrom, $encTo, $bytes)
            $encTo.GetString($bytes)
      }
}

If in my example i run:
PS C:\tests> Get-Content .\source.txt | ConvertTo-Encoding cp866 windows-1251
Съешь ещё этих мягких французских булок, да выпей чаю.

So, actualy, i can now do replace the output with parameters i need,
PS C:\tests> Get-Content .\source.txt | ConvertTo-Encoding cp866 windows-1251 | foreach-object {$_ -replace 'булок', 'плюшек'}
Съешь ещё этих мягких французских плюшек, да выпей чаю.

So, the replacement goes well, and the last thing i need in this case - write output into txt-file in CP866 encoding.
The problem is that if i use encoding: cp866 -> windows-1251 -> cp866, i get
'ъ?шь ?щё этих мя?ких фра-цу§ских п<юш?к, да вып?c чаю.

And you can see, that original cp866 text and reconverted to cp866 text are different..
0
 
Petr PoleshkoSystems administratorAuthor Commented:
Exctraction from get-help out-file -full

-Encoding <string>
    Specifies the type of character encoding used in the file. Valid values are "Unicode", "UTF7", "UTF8", "UTF32",
     "ASCII", "BigEndianUnicode", "Default", and "OEM". "Unicode" is the default.

    "Default" uses the encoding of the system's current ANSI code page.

    "OEM" uses the current original equipment manufacturer code page identifier for the operating system.

    Required?                    false
    Position?                    2
    Default value
    Accept pipeline input?       false
    Accept wildcard characters?  false

Who can explain in this, how to understand what is the "current original equipment manufacturer code page identifier for the operating system"?

Maybe because:
PS P:\tests> chcp
Active code page: 866

Open in new window


and if i run
PS P:\tests> [System.Text.Encoding]::GetEncodings()

CodePage Name                                    DisplayName
-------- ----                                    -----------
866 cp866                                   Cyrillic (DOS)

Open in new window

0
 
Petr PoleshkoSystems administratorAuthor Commented:
During the solution process, while being not very familiar with encoding/decoding and running powershell itself, by using logic, search results from the internet and something "else", suddenly i got what i need, by, actually, talking to myself in public :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.