1

I have a perl script that produces utf8 output. I tried using Set-Content to write a utf8 file as suggested by Powershell overruling Perl binmode?.

perl -S testbinmode.pl | Set-Content "binmode.txt" -Encoding Byte

produces the error

"Set-Content : Cannot proceed with byte encoding. When using byte encoding the content must be of type byte."

perl -S testbinmode.pl | Set-Content "binmode.txt" -Encoding UTF8

doesn't produce an error message, but it doesn't write a correct utf8 file either.

The output of the perl script is displayed correctly in the Powershell window. What is the correct way to write that output to a utf8-encoded file?

Thanks.

Update: I have seen many responses to this and similar problems, here at the link referenced above, and at https://stackoverflow.com/questions/40098771/changing-powershells-default-output-encoding-to-utf-8. None of them appear to work, leading me to believe that not one has actually been tested. A tested method for redirecting UTF8 text output from a CLI program to a file is desired. Thanks.

Here is the perl test script:

use strict;
use warnings;
use utf8;
binmode(STDOUT, ":utf8");
print("The Crüxshadows");
  • Have the perl script output to a text file then run Set-Content on that. Which while similar to what your doing isn't exactly the same – Ramhound Sep 04 '17 at 06:06
  • perl -S testbinmode.pl >binmode.txt...Set-Content "binmode.txt" -Encoding UTF8...produces cmdlet Set-Content at command pipeline position 1 Supply values for the following parameters: Value[0]:...how do I proceed?...Why does this box close and post when I attempt to separate my response into multiple lines? – FreonPSandoz Sep 04 '17 at 06:36
  • Update your question your unformatted comment can't be read. I don't answer question asked in a comment or consider any information contained within an comment when submitting an answer – Ramhound Sep 04 '17 at 07:44
  • try this: $utf8 = New-Object System.Text.utf8encoding and then use it as your encoding: perl -S testbinmode.pl | Set-Content "binmode.txt" -Encoding $utf8 – SimonS Sep 04 '17 at 10:41
  • also: I guess the error message says that there is no content to write, or it doesn't know where to write it to – SimonS Sep 04 '17 at 10:50
  • Nope. Error message: "Set-Content : Cannot bind parameter 'Encoding'. Cannot convert the "System.Text.UTF8Encoding" value of type "System.Text.UTF8Encoding" to type "Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding". If you want a formatted response, you need to tell me how. This site doesn't permit me to use the return key to break my response into multiple lines. Can someone please provide a tested method for redirecting UTF8 text output from a CLI program to a file under PowerShell? – FreonPSandoz Sep 05 '17 at 01:10

1 Answers1

0

Make sure PowerShell uses UTF-8 when communicating with external programs. (The built-in cmdlets already default to UTF-8.) This requires setting [console]::InputEncoding and [console]::OutputEncoding to UTF-8.

On my Windows 10 system, PowerShell uses Code Page 437 by default:

PS C:\Users\Me> [Console]::OutputEncoding

IsSingleByte : True EncodingName : OEM United States WebName : ibm437 HeaderName : ibm437 BodyName : ibm437 Preamble : WindowsCodePage : IsBrowserDisplay : IsBrowserSave : IsMailNewsDisplay : IsMailNewsSave : EncoderFallback : System.Text.InternalEncoderBestFitFallback DecoderFallback : System.Text.InternalDecoderBestFitFallback IsReadOnly : False CodePage : 437

We fix this for the current PowerShell session with this command:

$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding

(See the above-linked github.com issue for ways to persist this change.)

PS C:\Users\Me> [Console]::OutputEncoding

Preamble : BodyName : utf-8 EncodingName : Unicode (UTF-8) HeaderName : utf-8 WebName : utf-8 WindowsCodePage : 1200 IsBrowserDisplay : True IsBrowserSave : True IsMailNewsDisplay : True IsMailNewsSave : True IsSingleByte : False EncoderFallback : System.Text.EncoderReplacementFallback DecoderFallback : System.Text.DecoderReplacementFallback IsReadOnly : False CodePage : 65001

Windows 7 and later, i.e. all supported Windows versions, have codepage 65001, as a synonym for UTF-8
-- https://en.wikipedia.org/wiki/UTF-8

Now your script works as expected.

perl .\testbinmode.pl | Set-Content "binmode.txt" -Encoding UTF8

Successfully tested on PowerShell 5.1 and 7.1.

If you prefer BOM-less:

perl .\testbinmode.pl | Set-Content "binmode.txt" -Encoding UTF8NoBOM

Successfully tested on PowerShell 7.1. (The UTF8NoBOM encoding was introduced in PowerShell 6.)