Chinese Websites Character Encoding Survey, Year 2012

By Xah Lee. Date: . Last updated: .

2022-10-23 update: Chinese Websites Encoding survey, Year 2022

This page is a survey of encoding used by top Asian websites.

Asian Sites

China

Global top sites serving Chinese (these are mainland China sites)

SiteName, DescriptionEncodingRank
www.baidu.com 百度 (search)UTF-8 (GB2312 front page)#5
www.qq.com 腾讯 (portal)GB2312#9
www.taobao.com 淘宝 (shopping)GBK#14
www.sina.com.cn 新浪新闻 (portal)GB2312#16
www.163.com 网易 (portal)GB2312#28
www.weibo.com 新浪微博 (microblog)UTF-8#30
www.soso.com 搜搜 (search)GB2312#33
www.sohu.com 搜狐网 (search)GBK#36
www.tudou.com 土豆网 (video)GBK#54
www.youku.com 优酷 (video)UTF-8#55

All ranking are global ranking, from http://www.alexa.com/topsites/, as of 2012-08-08.

In 2005, Yahoo China http://cn.yahoo.com/ was using GB2312 encoding. In they use UTF-8. As of , yahoo closed the cn.yahoo.com and and redirect to http://sg.search.yahoo.com/

Taiwan, Hong Kong

Most popular Chinese sites in Taiwan (traditional chars):

SiteName, DescriptionEncodingRank
tw.yahoo.com Yahoo! Taiwan 雅虎奇摩UTF-8#4
www.wretch.cc 無名小站 (Photo Blog, from Yahoo!)UTF-8#438
www.pixnet.net 痞客邦 (photo blog)UTF-8#720
www02.eyny.com 伊莉心情車站 (entertainment, movies, forum)UTF-8#938
gamer.com.tw 巴哈姆特電玩資訊站 (gaming)UTF-8#985
pchome.com.tw (home computing)UTF-8#1132
udn.com 聯合新聞網 (news)BIG5#979
www.yam.com yam天空,蕃薯藤 (portal)UTF-8#1080

Note: many top sites for Hong Kong are the same as mainlaind China, typically video sites.

Note: ranking are by domain. For example, “tw.yahoo.com” is globally ranked 4, but that ranking is for the whole domain “yahoo.com”, not just the subdomain “tw.yahoo.com”.

Japan Sites

SiteName, DescriptionEncodingRank
fc2.com(blog, web hosting, etc)UTF-8#53
www.rakuten.co.jp楽天市場 (shopping)EUC-JP, UTF-8#74
ameblo.jp(blog)UTF-8#84
www.livedoor.com(portal)UTF-8 #106
goo.ne.jp(search; portal)UTF-8, EUC-JP#146
www.nicovideo.jp(video)UTF-8#223

Many Japan sites seem to use both UTF-8 and EUC-JP. Home page uses one, but many pages uses the other.

South Korea Sites

SiteName, DescriptionEncodingRank
www.daum.net다음daum (portal)UTF-8#308
www.nate.comSKT의 유·무선 종합 포털 (portal)EUC-KR#1068

Note: many top sites for South Korea are the same as mainland China.

India Sites

Top 200 sites in India are all English. Vast majority are just normal USA sites, for example, Yahoo, Twitter, MSN, etc. (Note: the official language for India is English and Hindi, but English is practically the language for all business and science.)

Intro to Chinese Encoding

Summary

Taiwan sites almost all use UTF-8. Very old ones still use BIG5.

Mainland China sites mostly still use GBK or GB2312, but a few newer ones use UTF-8.

Many top Japan, Korea, sites also use UTF-8, but some uses EUC (Extended Unix Code) variants.

This probably means that UTF-8 might dominate in the future.