Computer Languages Characters Frequency
All Languages Combined
the frequency of punctuation characters used in programing languages.
![char freq 2013 XW4vB](i/char_freq_2013_XW4vB.png)
- Total num of files processed: 13,841
- Total num of punc chars counted: 14,084,715
Percentage of languages:
- 19.8% C
- 18.5% Python
- 13.5% PHP
- 12.9% Ruby
- 9.8% Perl
- 9.4% C++
- 7.3% Java
- 5.2% BASH
- 3.2% JavaScript
- 0.3% CSS
- 0.1% Emacs Lisp
JavaScript
![char freq js 2018-08-27 d9cf0](i/char_freq_js_2018-08-27_d9cf0.png)
All these are actually generated pre ES2015 code.
![char freq react js 2018-08-27 96d1a](i/char_freq_react_js_2018-08-27_96d1a.png)
![char freq d3js 2018-08-27 f84db](i/char_freq_d3js_2018-08-27_f84db.png)
![char freq js FYDZ](i/char_freq_js_FYDZ.png)
Sample syntax JS: Raining Hearts
Java
![comp lang char frequency java](i/char_freq_java_DZc2t.png)
Sample syntax Complex Numbers in Java
golang
![char freq golang 2020-06-29 7498t](i/char_freq_golang_2020-06-29_7498t.png)
lines starting with the comment “// ” are moved.
using other source have similar result.
Sample syntax Golang: Script to Find Replace Multi-Pairs of Regex in a Directory
C++
![comp lang char frequency cpp](i/char_freq_cpp_jTjQ9.png)
C
![comp lang char frequency c](i/char_freq_c_9MQkS.png)
PHP
![comp lang char frequency php](i/char_freq_php_tq5DR.png)
Sample syntax PHP: Send Mail with Attachment
Python
![comp lang char frequency python](i/char_freq_python_RmctH.png)
Sample syntax Python: Find Replace Regex in Dir
Ruby
![comp lang char frequency ruby](i/char_freq_ruby_zjFw2.png)
Sample syntax Ruby Tutorial
Perl
![comp lang char frequency perl](i/char_freq_perl_8VwHx.png)
Sample syntax Perl: Find Replace String Pairs in Directory
Bash
![comp lang char frequency bash](i/char_freq_bash_VHCSw.png)
CSS
![comp lang char frequency css](i/char_freq_css_ZZTyx.png)
Sample syntax Atomic CSS
Wolfram language, Mathematica
![comp lang char frequency wl](i/char_freq_wl_bHy3f.png)
Sample syntax Geometric Inversion, 2D Grid, Polygon
Emacs Lisp
![comp lang char frequency elisp 2024-04-03 vXp](i/char_freq_elisp_2024-04-03_vXp.png)
source is dired.el in emacs 29.
Haskell
![comp lang char frequency haskell 2019-05-14 gs527](i/char_freq_haskell_2019-05-14_gs527.png)
About Source Input
After this study, i realized that the size of input does not matter much. It is not necessary to gather thousands of source code files. 20 or 50 files from a generic project is sufficient.
For certain languages, different projects do favor certain chars, but again, not overall significant.
For example, for python, just pick 20 files from standard library is good enough. No need to go out of the way to get source from different projects.