Perl: Unicode Tutorial 🐪

By Xah Lee. Date: . Last updated: .

unicode intro

utf8 encoded perl source code, use utf8

If your Perl script is encoded in UTF-8 , then you need to declare it, like this:

use utf8;

for processing unicode string, you need at least

use v5.14;

use strict;
use utf8;

# processing Unicode string
my $s = 'I ★ you';
$s =~ s/★/♥/;
print "$s\n";

Jamie W Zawinski on how perl unicode sucks

use bytes;
# Larry can take Unicode and shove it up his ass sideways.
            # Perl 5.8.0 causes us to start getting incomprehensible
            # errors about UTF-8 all over the place without this.

from the source code of WebCollage (1998), by Jamie W Zawinski (born 1968)

when calling scripts that process Unicode, call it with -C option in the command line.

Perl Unicode tips from Tom Christiansen

Here's some Unicode tips, gathered from Tom Christiansen's answer at 〔Why does modern Perl avoid UTF-8 by default? http://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default/6163129#6163129〕.

• Set your PERL_UNICODE environment variable to AS. This makes all Perl scripts decode @ARGV as UTF‑8 strings, and sets the encoding of all three of stdin, stdout, and stderr to UTF‑8. Both these are global effects.

• Enable warnings.

use warnings;
use warnings qw( FATAL utf8 );

• Declare that anything that opens a filehandles within this lexical scope.

use open qw( :encoding(UTF-8) :std );

• If you have a DATA handle, you must explicitly set its encoding. If you want this to be UTF‑8, then say: binmode(DATA, ":encoding(UTF-8)");

Reference

Perl, unicode