This page is a survey of encoding used by top Asian websites, and a short intro on Chinese encoding.
In 2005, Yahoo China cn.yahoo.com was using GB2312 encoding. Now, they use UTF-8.
Here's a check of the most popular Asian web sites.
Global top sites serving Chinese (these are mainland China sites)
| Site | Name, Description | Encoding | Rank |
|---|---|---|---|
| www.baidu.com | 百度 (search) | UTF-8 (GB2312 front page) | #5 |
| www.qq.com | 腾讯 (portal) | GB2312 | #9 |
| www.taobao.com | 淘宝 (shopping) | GBK | #14 |
| www.sina.com.cn | 新浪新闻 (portal) | GB2312 | #16 |
| www.163.com | 网易 (portal) | GB2312 | #28 |
| www.weibo.com | 新浪微博 (microblog) | UTF-8 | #30 |
| www.soso.com | 搜搜 (search) | GB2312 | #33 |
| www.sohu.com | 搜狐网 (search) | GBK | #36 |
| www.tudou.com | 土豆网 (video) | GBK | #54 |
| www.youku.com | 优酷 (video) | UTF-8 | #55 |
All ranking are global ranking, from www.alexa.com/topsites, as of .
Most popular Chinese sites in Taiwan (traditional chars):
| Site | Name, Description | Encoding | Rank |
|---|---|---|---|
| tw.yahoo.com | Yahoo! Taiwan 雅虎奇摩 | UTF-8 | #4 |
| www.wretch.cc | 無名小站 (Photo Blog, from Yahoo!) | UTF-8 | #438 |
| www.pixnet.net | 痞客邦 (photo blog) | UTF-8 | #720 |
| www02.eyny.com | 伊莉心情車站 (entertainment, movies, forum) | UTF-8 | #938 |
| gamer.com.tw | 巴哈姆特電玩資訊站 (gaming) | UTF-8 | #985 |
| pchome.com.tw | (home computing) | UTF-8 | #1132 |
| udn.com | 聯合新聞網 (news) | BIG5 | #979 |
| www.yam.com | yam天空,蕃薯藤 (portal) | UTF-8 | #1080 |
Note: many top sites for Hong Kong are the same as mainlaind China, typically video sites.
Note: ranking are by domain. For example, “tw.yahoo.com” is globally ranked 4, but that ranking is for the whole domain “yahoo.com”, not just the subdomain “tw.yahoo.com”.
| Site | Name, Description | Encoding | Rank |
|---|---|---|---|
| fc2.com | (blog, web hosting, …) | UTF-8 | #53 |
| www.rakuten.co.jp | 楽天市場 (shopping) | EUC-JP, UTF-8 | #74 |
| ameblo.jp | (blog) | UTF-8 | #84 |
| www.livedoor.com | (portal) | UTF-8 | #106 |
| goo.ne.jp | (search; portal) | UTF-8, EUC-JP | #146 |
| www.nicovideo.jp | (video) | UTF-8 | #223 |
Many Japan sites seem to use both UTF-8 and EUC-JP. Home page uses one, but many pages uses the other.
| Site | Name, Description | Encoding | Rank |
|---|---|---|---|
| www.daum.net | 다음daum (portal) | UTF-8 | #308 |
| www.nate.com | SKT의 유·무선 종합 포털 (portal) | EUC-KR | #1068 |
Note: many top sites for South Korea are the same as mainland China.
Top 200 sites in India are all English. Vast majority are just normal USA sites, ⁖ Yahoo, Twitter, MSN, etc. (Note: the official language for India is English and Hindi, but English is practically the language for all business and science.)
Taiwan sites almost all use UTF-8. Very old ones still use BIG5.
Mainland China sites mostly still use GBK or GB2312, but a few newer ones use UTF-8.
Many top Japan, Korea, sites also use UTF-8, but some uses EUC (Extended Unix Code) variants.
This probably means that UTF-8 might dominate in the future.
Here's a summery of major Chinese encoding: