What Character Encoding Do Chinese Sites Use?
This page is a survey of encoding used by top Asian websites, and a short intro on Chinese encoding.
Global top sites serving Chinese (these are mainland China sites)
|www.baidu.com||百度 (search)||UTF-8 (GB2312 front page)||#5|
All ranking are global ranking, from www.alexa.com/topsites, as of .
In 2005, Yahoo China 〔 http://cn.yahoo.com/ 〕 was using GB2312 encoding. In they use UTF-8. As of , yahoo closed the cn.yahoo.com and and redirect to http://sg.search.yahoo.com/
Taiwan, Hong Kong
Most popular Chinese sites in Taiwan (traditional chars):
|tw.yahoo.com||Yahoo! Taiwan 雅虎奇摩||UTF-8||#4|
|www.wretch.cc||無名小站 (Photo Blog, from Yahoo!)||UTF-8||#438|
|www.pixnet.net||痞客邦 (photo blog)||UTF-8||#720|
|www02.eyny.com||伊莉心情車站 (entertainment, movies, forum)||UTF-8||#938|
Note: many top sites for Hong Kong are the same as mainlaind China, typically video sites.
Note: ranking are by domain. For example, “tw.yahoo.com” is globally ranked 4, but that ranking is for the whole domain “yahoo.com”, not just the subdomain “tw.yahoo.com”.
|fc2.com||(blog, web hosting, …)||UTF-8||#53|
|www.rakuten.co.jp||楽天市場 (shopping)||EUC-JP, UTF-8||#74|
|goo.ne.jp||(search; portal)||UTF-8, EUC-JP||#146|
Many Japan sites seem to use both UTF-8 and EUC-JP. Home page uses one, but many pages uses the other.
South Korea Sites
|www.nate.com||SKT의 유·무선 종합 포털 (portal)||EUC-KR||#1068|
Note: many top sites for South Korea are the same as mainland China.
Top 200 sites in India are all English. Vast majority are just normal USA sites, for example, Yahoo, Twitter, MSN, etc. (Note: the official language for India is English and Hindi, but English is practically the language for all business and science.)
Intro to Chinese Encoding
Here's a summary of major Chinese encoding:
- GB2312 → Dated . Simplified Characters only. Was used in mainland China.
- GBK → Dated . Extended GB2312. Includes traditional chars.
- GB18030 → Dated . Extended both GB2312 and GBK. Charset equivalent to Unicode. Contains both simplified and traditional characters. Used in mainland China.
- BIG5 → Dated . Traditional chars only. Invented and used in Taiwan, before Unicode became popular.
Taiwan sites almost all use UTF-8. Very old ones still use BIG5.
Mainland China sites mostly still use GBK or GB2312, but a few newer ones use UTF-8.
Many top Japan, Korea, sites also use UTF-8, but some uses EUC (Extended Unix Code) variants.
This probably means that UTF-8 might dominate in the future.
- What Language Does Google Facebook Twitter Paypal Wikipedia … Use?
- Character Sets and Encoding in HTML
- Unicode Basics: What's Character Set, Character Encoding, UTF-8?
- Unicode Popularity: How Popular is UTF-8?
- 简体繁體字表; List of Simplified/Traditional Chinese Characters
- Python: Convert File Encoding
- Java: Convert File Encoding
Ask me question on patreon