Python & Perl: Converting a File's Encoding

, , …,

Python

How to convert a file encoding in Python?

# -*- coding: utf-8 -*-
# python

path='infile.html'
path2='outfile.html'

f= open(path, 'rb')
content= unicode(f.read(), 'gb18030')
f.close()
f= open(path2, 'wb')
f.write(content.encode('utf-8'))
f.close()

(thanks to Andrew Clover for help.)

Perl

How to convert a file encoding in Perl?

Use the shell util /usr/bin/piconv. Type piconv for help. piconv is installed by Perl and written in Perl. You can also look at the code to see how it's done.

◆ piconv
piconv [-f from_encoding] [-t to_encoding] [-s string] [files...]
piconv -l
piconv -r encoding_alias
  -l,--list
     lists all available encodings
  -r,--resolve encoding_alias
    resolve encoding to its (Encode) canonical name
  -f,--from from_encoding  
     when omitted, the current locale will be used
  -t,--to to_encoding    
     when omitted, the current locale will be used
  -s,--string string         
     "string" will be the input instead of STDIN or files
The following are mainly of interest to Encode hackers:
  -D,--debug          show debug information
  -C N | -c           check the validity of the input
  -S,--scheme scheme  use the scheme for conversion
Those are handy when you can only see ascii characters:
  -p,--perlqq
  --htmlcref
  --xmlcref

For converting charset encodings in Perl, you need the Encoding module. It is bundled with Perl v5.8.6 or earlier.

See also: Perl Unicode Tutorial 🐫

See also: 〔How can I convert an input file to UTF-8 encoding in Perl? By Brian D Foy. @ stackoverflow.com…

Other Tools

The GNU command line tool “iconv” does character encoding conversion. Example: iconv -f utf-16 -t utf-8 file1.txt > file2.txt. Use iconv -l for a list of encodings.

If you use emacs, you can open the file, then call set-buffer-file-coding-system with a value such as “utf-8” or “utf-16” (press Tab ↹ to see available choices), then save the file. 〔➤ Emacs File/Character Encoding/Decoding FAQ〕 〔➤ Emacs & Unicode Tips

blog comments powered by Disqus