The issue is that GNU tr, which you have on Linux, doesn't really have a concept of multibyte characters, but instead works byte at a time.
The tr man page and online documentation speak of characters, but that's a bit of a simplification. The TODO file in the source code package mentions this item (picked from coreutils 8.30):
Adapt tools like wc, tr, fmt, etc. (most of the textutils) to be
multibyte aware. The problem is that I want to avoid duplicating
significant blocks of logic, yet I also want to incur only minimal
(preferably 'no') cost when operating in single-byte mode.
On a Linux system—even with a UTF-8 locale (en_US.UTF-8)—GNU tr replaces an ä as two "characters" (the UTF-8 representation of ä has two bytes):
linux$ echo 'ä' | tr 'ä' 'x'
xx
In the same vein, mixing an ä and an ö produces funny results, since their UTF-8 representations share a common byte:
linux$ echo 'ö' | tr ä x
x�
Or the other way around (the x doesn't apply here):
linux$ echo ab | tr ab äx
ä
And in your case, GNU tr takes the \377 as a raw byte value.
The tr on Mac is different, it knows the concept of multibyte characters and acts accordingly:
mac$ echo 'ä' | tr ä x
x
mac$ echo ab | tr ab äx
äx
The UTF-8 representation of the character with numerical value 0377 (U+00ff) is the two bytes c3 bf, so that's what you get.
The easy way to have tr work byte-by-byte is to have it use the C locale, instead of a UTF-8 locale. This gives the funny behavior again:
$ echo 'ä' | LC_ALL=C tr 'ä' 'x'
xx
And in your case, you can use:
... | LC_ALL=C tr "\000" "\377"
Or you could use something like Perl to generate those \xff bytes:
perl -e 'printf "\377" x 1000 for 1..100'
Cwhile macOS has it set to something likeen_US.UTF-8" -- I'm not sure this is the whole story. In my Kubuntu or Debianenv | grep -E 'LANG|LC'returnsLANG=pl_PL.UTF-8only, so it's Unicode. Still the OP's original command yields0xffout of the box. Could it be becausetrimplementation itself differs between Linux and Mac? – Kamil Maciorowski Aug 16 '18 at 05:27tr, including the one in GNU coreutils, don't support multibyte encodings". Seems legit. In my Debiantr 'Ł' 'L'translatesŁtoLL(Łis a Polish letter, I useLANG=pl_PL.UTF-8), so it apparently treats its first argument as two characters. – Kamil Maciorowski Aug 16 '18 at 05:46tr. It would make negative sense for such conversion to happen when writing to file. – u1686_grawity Aug 16 '18 at 05:47LANG=en_US.UTF-8(on a Linux system that has that locale generated),printf ' ' | tr ' ' '\377' | hexdump -Cplainly showsff. – ilkkachu Aug 16 '18 at 09:03LANGmight not be enough. The relevant locale setting isLC_CTYPE, and the value it gets comes first fromLC_ALL, thenLC_CTYPE, thenLANG, with the first one set taking effect (that's the same for all other locale settings). So, ifLC_CTYPEis set, changingLANGdoesn't do anything in this case. To reliably override it, you'd need to setLC_ALL. Also, it's enough to set it just fortr, i.e.... | LC_ALL=C tr ' ' '\377' | ...– ilkkachu Aug 16 '18 at 09:08