Computer Languages Characters Frequency
All Languages Combined
the frequency of punctuation characters used in programing languages.
- Total num of files processed: 13,841
- Total num of punc chars counted: 14,084,715
Percentage of languages:
- 19.8% C
- 18.5% Python
- 13.5% PHP
- 12.9% Ruby
- 9.8% Perl
- 9.4% C++
- 7.3% Java
- 5.2% BASH
- 3.2% JavaScript
- 0.3% CSS
- 0.1% Emacs Lisp
JavaScript
All these are actually generated pre ES2015 code.
Sample syntax JS DOM: Falling Snow Effect
Java
Sample syntax Java: Complex Numbers
golang
lines starting with the comment “// ” are moved.
using other source have similar result.
Sample syntax Golang: Regex Find Replace Text in Directory 📜
C++
C
PHP
Sample syntax PHP: Send Mail with Attachment
Python
Sample syntax Python: Find Replace Regex in Dir
Ruby
Sample syntax Ruby Tutorial
Perl
Sample syntax Perl: Find Replace Text in Directory 📜
Bash
CSS
Sample syntax CSS: Atomic Style
Wolfram language, Mathematica
Sample syntax Geometric Inversion, 2D Grid, Polygon
Emacs Lisp
source is dired.el in emacs 29.
Haskell
About Source Input
After this study, i realized that the size of input does not matter much. It is not necessary to gather thousands of source code files. 20 or 50 files from a generic project is sufficient.
For certain languages, different projects do favor certain chars, but again, not overall significant.
For example, for python, just pick 20 files from standard library is good enough. No need to go out of the way to get source from different projects.