Programing Style: Variable Naming: English Words Considered Harmful

, , …,

Knuth's literate programing wants to turn code into prose. I want to turn code into symbolic logic. This page is some thoughts on variable naming in writing computer programs.

In emacs lisp, i usually use camelCase for my local variables. Here's a example:

(defun read-lines (filePath)
  "Return a list of lines of a file at FILEPATH."
  (with-temp-buffer
    (insert-file-contents filePath)
    (split-string (buffer-string) "\n" t)))

Some lisp coder question the use of camelCase, because it is not conventional lisp style. Here's the reason why i'm using camelCase, and some thoughts about naming of variables.

Distinction from Language Keywords

It provides a easy way to distinguish variables from built-in symbols. Particularly because of the fact that emacs-lisp-mode's coloring scheme is not full. 〔➤ Emacs Lisp Mode Syntax Coloring Problem

In particular, all my local variables are in camelCase. I could use pot_hole_casing but that's more typing and less visually distinguishable to lisp's hypen-word-style.

Variable Naming: UUID, Referential Transparency, Point-Free Function Syntax, Combinatory Logic, Hygienic Macro

For variables, in recent years i developed a habit to avoid naming variables that's also a standard English word. So, i'd name “file” as {myFile, aFile}, and “files” might become {fileList, fPaths}. “string” would be {strA, myString, inputStr}. A ultimate solution for uniqueness is to append a random number in var names. So, “string” would be “str8277”. But the problem with this is that it's too long and disruptive in reading and typing. Recently i've been toying with the idea of attaching a Unicode to all vars. ⁖ all my var would start with “ξ”. So “string” would be “ξstring”. 〔➤ Unicode Support in Ruby, Perl, Python, JavaScript, Java, Emacs Lisp, Mathematica〕 This solves the random string readability problem. 〔➤ Sigil for Function Parameter Names

My reason to avoid English words is for easy source code transformation, out of practical reasons. (i do a lot search or find & replace in my source code.)

Imagine, every variable name (or every symbol, identifier) are unique in the source code. This way, you could locate any variable in the whole project. It makes some aspect easier for debugging and code tracing and code management. It also makes refactoring easier.

The idea is similar to the idea of Referential transparency (computer science). (Referential transparency can be thought of as a notion of find & replace of function & values.)

The desire to have unique identifier in source code comes in many guises. At the extreme is a desire to eliminate variables completely. For example: if every variable in source code can be unique somehow, then much of the desire for lexical scope over dynamic scope is gone. Some namespace problem is also solved. (in particular, elisp does not support namespace.) Combinatory logic is a desire to get rid of variables from lambda calculus. “Point-free programing” is a invention of syntax for defining functions without the need to write out its formal parameter. 〔➤ What's Point-free Programing? (point-free function syntax)〕Unique variable name is also the impetus for Hygienic macro.

〔➤ What's Windows CLSID? Second Life UUID?

Variable Name: English Prose vs Symbolic Logic

Another reason that somewhat pushed me in this naming experiment is that… instead of naming your vars in some meaningful English words, the opposite is to name them completely abstractly, as in math's x, y, z, α, β, γ.

So, i'd name “counter” or “num” as just “i” or “n”. (since these are 1-letter string and too common, so with the unique naming idea above, i usually name them “ii” or “nn” or might be “ξi”)

The idea with abstract naming is that it forces you to understand the code as a math expression that specify algorithm, instead of like English prose. Readability of source code is helped by coding in a pure functional programing style (⁖ functions, input, output), and good documentation of each function. So, to understand a function, you should just read the doc about its input output. While inside a code snippet, it is understood by simple functional style programing constructs.

To view this idea in another way … when you read math, you never see mathematician name their variables with a multi-letter descriptive word, but usually a single symbol (a, b, c, x, y, z, α, β, γ …), yet there's no problem understanding the expression. Your focus and understanding is on the abstract process and structure.

English Prose Style

To illustrate from the opposite view, the problem with English naming is that often it interfere with what the code is actually doing. For example, in normal convention often you'll see names like {thisObject, thatTree, fileList, files}, your focus is on the meaning of these words, but not what the data type actually are or the function's actual mathematical behavior. The words can be deceptive. ⁖ “file” can be a file handle, file path, file content. This is especially a problem when you are reading source code of a lang you do not know. ⁖ when you encounter the word “object”, you don't know if that's a keyword in the language, a keyword in its pattern matching syntax, a keyword for datatype, or just a user defined name that can be arbitrary. When you read a normal source code, half of the words are like that unless the editor does syntax coloring that distinguish the language's keywords.

For example, here's a elisp code with naming following elisp convention:

(defun do-something-region (begin end)
  "Prints region beginning and ending positions."
  (interactive "r")
  (message "Region begins: %d, end at: %d" begin end)
)

Are you familiar with elisp? If not, you wouldn't know what those “begin” and “end” are. Maybe they are built-in keywords and have significance to the construct, and if you change them, the code wouldn't work.

But if the code is like this:

(defun do-something-region (φ1 φ2)
  "Prints region beginning and ending positions."
  (interactive "r")
  (message "Region begins: %d, end at: %d" φ1 φ2)
)

Then you know that φ1 and φ2 are probably just arbitrary names.

A Example of in Emacs Lisp

Here's a example from emacs lisp source code showing the usefulness of using math symbols in {function, variable} naming. (GNU Emacs 24.3.1 〔http://git.savannah.gnu.org/cgit/emacs.git/tree/lisp/color.el〕)

(defconst color-cie-ε (/ 216 24389.0))

(defun color-cie-de2000 (color1 color2 &optional kL kC kH)
  "Return the CIEDE2000 color distance between COLOR1 and COLOR2.
Both COLOR1 and COLOR2 should be in CIE L*a*b* format, as
returned by `color-srgb-to-lab' or `color-xyz-to-lab'."
  (pcase-let*
      ((`(,L₁ ,a₁ ,b₁) color1)
       (`(,L₂ ,a₂ ,b₂) color2)
       (kL (or kL 1))
       (kC (or kC 1))
       (kH (or kH 1))
       (C₁ (sqrt (+ (expt a₁ 2.0) (expt b₁ 2.0))))
       (C₂ (sqrt (+ (expt a₂ 2.0) (expt b₂ 2.0))))
       (C̄ (/ (+ C₁ C₂) 2.0))
       (G (* 0.5 (- 1 (sqrt (/ (expt C̄ 7.0)
                               (+ (expt C̄ 7.0) (expt 25 7.0)))))))
       (a′₁ (* (+ 1 G) a₁))
       (a′₂ (* (+ 1 G) a₂))
       (C′₁ (sqrt (+ (expt a′₁ 2.0) (expt b₁ 2.0))))
       (C′₂ (sqrt (+ (expt a′₂ 2.0) (expt b₂ 2.0))))
       (h′₁ (if (and (= b₁ 0) (= a′₁ 0))
                0
              (let ((v (atan b₁ a′₁)))
                (if (< v 0)
                    (+ v (* 2 float-pi))
                  v))))
       (h′₂ (if (and (= b₂ 0) (= a′₂ 0))
                0
              (let ((v (atan b₂ a′₂)))
                (if (< v 0)
                    (+ v (* 2 float-pi))
                  v))))
       (ΔL′ (- L₂ L₁))
       (ΔC′ (- C′₂ C′₁))
       (Δh′ (cond ((= (* C′₁ C′₂) 0)
                   0)
                  ((<= (abs (- h′₂ h′₁)) float-pi)
                   (- h′₂ h′₁))
                  ((> (- h′₂ h′₁) float-pi)
                   (- (- h′₂ h′₁) (* 2 float-pi)))
                  ((< (- h′₂ h′₁) (- float-pi))
                   (+ (- h′₂ h′₁) (* 2 float-pi)))))
       (ΔH′ (* 2 (sqrt (* C′₁ C′₂)) (sin (/ Δh′ 2.0))))
       (L̄′ (/ (+ L₁ L₂) 2.0))
       (C̄′ (/ (+ C′₁ C′₂) 2.0))
       (h̄′ (cond ((= (* C′₁ C′₂) 0)
                  (+ h′₁ h′₂))
                 ((<= (abs (- h′₁ h′₂)) float-pi)
                  (/ (+ h′₁ h′₂) 2.0))
                 ((< (+ h′₁ h′₂) (* 2 float-pi))
                  (/ (+ h′₁ h′₂ (* 2 float-pi)) 2.0))
                 ((>= (+ h′₁ h′₂) (* 2 float-pi))
                  (/ (+ h′₁ h′₂ (* -2 float-pi)) 2.0))))
       (T (+ 1
             (- (* 0.17 (cos (- h̄′ (degrees-to-radians 30)))))
             (* 0.24 (cos (* h̄′ 2)))
             (* 0.32 (cos (+ (* h̄′ 3) (degrees-to-radians 6))))
             (- (* 0.20 (cos (- (* h̄′ 4) (degrees-to-radians 63)))))))
       (Δθ (* (degrees-to-radians 30)
              (exp (- (expt (/ (- h̄′ (degrees-to-radians 275))
                               (degrees-to-radians 25)) 2.0)))))
       (Rc (* 2 (sqrt (/ (expt C̄′ 7.0) (+ (expt C̄′ 7.0) (expt 25.0 7.0))))))
       (Sl (+ 1 (/ (* 0.015 (expt (- L̄′ 50) 2.0))
                   (sqrt (+ 20 (expt (- L̄′ 50) 2.0))))))
       (Sc (+ 1 (* C̄′ 0.045)))
       (Sh (+ 1 (* 0.015 C̄′ T)))
       (Rt (- (* (sin (* Δθ 2)) Rc))))
        (sqrt (+ (expt (/ ΔL′ (* Sl kL)) 2.0)
                 (expt (/ ΔC′ (* Sc kC)) 2.0)
                 (expt (/ ΔH′ (* Sh kH)) 2.0)
                 (* Rt (/ ΔC′ (* Sc kC)) (/ ΔH′ (* Sh kH)))))))

Two Types of Character Sequence in Source Code

Source code is a sequence of characters. When reading source code, you see symbols (operators) and identifiers (function names, var names, keywords.). Among the identifiers, it can be divided into 2 types: ① Those that cannot be changed without effecting the meaning of the program. ② Those that are arbitrary and can be changed.

The ones in the first category are language keywords. ⁖ {for, while, class, function, extends, Class, this, self, public, static, System, from, begin, end, map, require, import, let, list, defun, lambda, Take, Pattern, Table, Blank, …} These are the words in the source code that are critical, and they are almost always English words. To be able to know at a glance which words are lang keywords in a source code greatly helps in understanding, especially when you do not know the language well yet. This particularly applies to non-mainstream languages ⁖ OCaml, PowerShell, Haskell, Erlang, Mathematica, LSL, etc.

The above ideas is just a experiment. Without actually doing it, you never know what's really good or bad.

This essay is originally a post in comp.lang.lisp @ groups.google.com….

What Happens When You Name Your Functions/Variables as Math Symbols?

It's been close to a year since i wrote the article. I've done some experiment. Here's a short report.

In the article, i expressed a few points:

These can be viewed from language design perspective, or from practical programing perspective.

For ①, i've tried to name all functions and variables to be meaningless symbols, as in math. This turned out to be practically impossible, for any code that's more than 100 lines. (go ahead, you should try it on your own code. You'll get a better understanding of many issues and details only if you try it yourself.) This is easy to see. Just look at some official doc, ⁖ http://www.ruby-doc.org/core-1.9.3/. Look at the right side, all the method names. Imagine if all of them are like {α β χ δ …}. You can see that source code like that is completely incomprehensible. Similar for vast majority of languages.

There is a important revelation for me: Naming powers ≈99% of documentation, regardless how rich or complete are other forms of proper documention.

This finding has important ramifications with the fact that naming has nothing to do with how the program behaves. To a compiler, naming serves the purpose of identifiers, ID. To human, it serves as documentation, a way to understand the code. These two purposes are completely distinct.

How much programer time or code error have been wasted by misleading name? 〔➤ The Importance of Terminology's Quality In Computer Languages

The other realization from this is: for languages to use meaningless symbol as function/variables names, the language must be specifically designed. You can't do this and expect readable source code in Perl, Ruby, Python, Lisp, JavaScript, Java, etc. One example that does this is APL. Here's sample APL code from Wikipedia:

life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}

For ② (the benefits of unique identifiers) and ③ (The benefits of distinction of language keywords vs user defined words.), these benefits are still there, but there's no standard solution. Note that globally unique identifiers is commonly needed.

blog comments powered by Disqus