If you are not familiar with Unicode, first see: UNICODE Basics: What's Character Set, Character Encoding, UTF-8, and All That?
use bytes; # Larry can take Unicode and shove it up his ass sideways.
# Perl 5.8.0 causes us to start getting incomprehensible
# errors about UTF-8 all over the place without this.
βfrom the source code of WebCollage (1998),
by Jamie W Zawinski (b1968)
Starting about Perl v5.12 (β2010), Unicode support is very good.
when calling scripts that process Unicode, call it with -C option in the command line.
If your Perl script is encoded in UTF-8, then you should declare it, like this: use utf8;. You can have Unicode in string, also in variable names. Example:
# -*- coding: utf-8 -*- # perl use strict; use utf8; # necessary if you want to use Unicode in function or var names # processing Unicode string my $s = 'I β you'; $s =~ s/β /β₯/; print "$s\n"; # variable with Unicode char my $ζ = 4; print "$ζ\n"; # function with Unicode char sub fζ { return 2;} print fζ();
Here's some Unicode tips, gathered from Tom Christiansen's answer at γWhy does modern Perl avoid UTF-8 by default? Source stackoverflow.comγ.
β’ Declare that this source code file is encoded as UTFβ8.
# -*- coding: utf-8 -*- # perl use utf8;
β’ Demand a particular Perl version, 5.12 or later. Like this:
# -*- coding: utf-8 -*- # perl use v5.12; # minimal for Unicode string feature use v5.14; # optimal for Unicode string feature
β’ Set your PERL_UNICODE environment variable to AS. This makes all Perl scripts decode @ARGV as UTFβ8 strings, and sets the encoding of all three of stdin, stdout, and stderr to UTFβ8. Both these are global effects, not lexical ones.
β’ Enable warnings.
# -*- coding: utf-8 -*- # perl use warnings; use warnings qw( FATAL utf8 );
β’ Declare that anything that opens a filehandles within this lexical scope but not elsewhere is to assume that that stream is encoded in UTFβ8 unless you tell it otherwise. That way you do not affect other module's or other program's code.
# -*- coding: utf-8 -*- # perl use open qw( :encoding(UTF-8) :std );
β’ If you have a DATA handle, you must explicitly set its encoding. If you want this to be UTFβ8, then say: binmode(DATA, ":encoding(UTF-8)");
β’ Perl supports representing Unicode chars by name. Use the package βcharnamesβ, like this:
# -*- coding: utf-8 -*- # perl use utf8; use v5.12; # minimal for Unicode string feature use charnames qw( :full ); # allow Unicode char be represented by name, β \N{CHARNAME} print "\N{GREEK SMALL LETTER ALPHA}"; # same as "Ξ±"