Python Documentation Problems

By Xah Lee. Date:

The following is a collection of criticisms on the Python Programing Language's documentation. (written in a style intend to sting)

I've started to read Python tutorial. (http://python.org/doc/2.3.3/tut/tut.html) Here are some quick critique:

Quick examples from “7.1 Fancier Output Formatting” at http://python.org/doc/2.3.3/tut/node9.html

… If the input string is too long, they don't truncate it, but return it unchanged; this will mess up your column lay-out but that's usually better than the alternative, which would be lying about a value. (If you really want truncation you can always add a slice operation, as in "x.ljust( n)[:n]"…

It would be better if it simply say: “If the input string is too long, they don't truncate it, but return it unchanged.”.

As another example, lines such as the following are redundant:

Reverse quotes (``) are equivalent to repr(), but their use is discouraged.

Similarly, many places mentioning uncritical info such as warning or reference to other languages should be deleted.

The tutorial should be simple, concise, to the point, standalone. Perhaps 1/5th length of the tutorial should be deleted for better.

At places often a whole paragraph on some so called computer science jargons should be deleted. They are there more to showcase inane technicality than do help the reader. (related, many passages with jargons should be rewritten sans inane jargon. For example, mutable object.)

One easy way to understand these principles is to compare Perl's documentation or unix man pages to Python's. The formers are often irrelevant, rambling on, not standalong (it is written such that it unnecessarily requires the reader to be knowledgeable of lots of other things). Python doc does not suffer from obfuscation mentality, but like many computer language manuals, also suffers from gratuitous jargon verbiage. (these jargons or passages about them are usually there to please the authors themselves).

A exemplary writing in this direction is the Mathematica Manual by Stephen Wolfram. Any intelligent layman can read it straightforwardly, and learn unhindered a language that is tantamount to features of lisp languages. Such documentation is not difficult to write at all. (contrary to the lot of “computer scientists” or IT pundits.) All it take is some simple principles outlined above.

Exhibit: Author Masturbation

See: Python Documentation Author Masturbation

Exhibit: extraneous discombobulation

In the doc for re module “4.2.2 on Matching vs Searching” at http://python.org/doc/2.3.3/lib/matching-searching.html , quote:

Python offers two different primitive operations based on regular expressions: match and search. If you are accustomed to Perl's semantics, the search operation is what you're looking for. See the search() function and corresponding method of compiled regular expression objects.

Its mentioning of Perl is irrelevant, since the majority reading that page will not have expertise with Perl regex. The whole section should be deleted, because it only adds confusion. (later section 4.2.3 on search and match methods plainly indicated their difference.)

(the mentioning of Perl here and this befuddlement gratis is a syndrome of jealousy and Python fanaticism. All together innocently done as ignorance of common doc writers.)

Note that match may differ from search using a regular expression beginning with "^": "^" matches only at the start of the string, or in MULTILINE mode also immediately following a newline. The ``match'' operation succeeds only if the pattern matches at the start of the string regardless of mode, or at the starting position given by the optional pos argument regardless of whether a newline precedes it.

A detailed explanation of their difference or the mentioning of Perl should be in FAQ or such material.

in section “4.2.6 Examples”, there needs to be more and simple examples. [see Perl, Python: Regex Example] The beginning large section about some scaf() should be deleted for the same reason above.

For a completely rewritten version of Python's regex module doc, see: Python: Regex Reference

Exhibit: fluff in code example

in section “11.12.2 SMTP Examples” ( http://python.org/doc/2.3.3/lib/SMTP-example.html ) gives this turgid example of using SMTP.

import smtplib

def prompt(prompt):
    return raw_input(prompt).strip()

fromaddr = prompt("From: ")
toaddrs  = prompt("To: ").split()
print "Enter message, end with ^D (Unix) or ^Z (Windows):"

# Add the From: and To: headers at the start!
     msg = ("From: %s\r\nTo: %s\r\n\r\n"
       % (fromaddr, ", ".join(toaddrs)))
while 1:
    try:
        line = raw_input()
    except EOFError:
        break
    if not line:
        break
    msg = msg + line

print "Message length is " + repr(len(msg))

server = smtplib.SMTP('localhost')
server.set_debuglevel(1)
server.sendmail(fromaddr, toaddrs, msg)
server.quit()

In a documentation, you want to give example as short and to the point as possible. In this case, the goal is to illustrate how to use smtplib, not how to preamble with nice command line interface. A reader who comes to read this doc will be harried in discerning fluff code from the core. This is especially tasking for a significant portion of doc readers because they are not already experts. A better example would be like this:

import smtplib
smtpServer='smtp.example.org'
fromAddr='xah@xahlee.org'
toAddr='joe@example.com'
text='''Subject: test test

Hi …
'''

server = smtplib.SMTP(smtpServer)
server.set_debuglevel(1)
server.sendmail(fromAddr, toAddr, text)
server.quit()

In this example, it shows how to use “smtplib” exactly, just that.

Exhibit and Analysis: pell-mell writing

Python doc “2.3.3 Comparisons” at http://python.org/doc/2.4/lib/comparisons.html , quote:

Comparison operations are supported by all objects. They all have the same priority (which is higher than that of the Boolean operations). Comparisons can be chained arbitrarily; for example, x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).

Problem:

“Comparison operations are supported by all objects.”

This is very vague and ambiguous.

The word “object” has generic English meaning as well might have specific technical meaning in a language that supports Object-Oriented programing. In Python, it does not have very pronounced technical meaning. For example, there's a chapter in Python Library Ref titled “2. Built-In Objects”, and under it a section “2.1 Built-in Functions”. Apparently, functions can't possibly be meant as a “object” for comparisons.

Now suppose we take the object in the sentence to be sensible items such as numbers, lists etc. The clause “supported by all objects” is ambiguous. What is meant by “supported”?

Problem:

They all have the same priority (which is higher than that of the Boolean operations).

This sentence is very stupid, in multitude of aspects.

The “priority” referred to here means operator precedence.

It tries to say that the comparison operator has higher syntactical connectivity than boolean operators. e.g. False and False==False means False and (False==False) and not (False and False)==False.

However, the “they” pronoun from the context of previous sentence, refers to “the comparison operation”, not “operator”. So, it conjures the reader to think about some “operation precedence”, which in itself is a critical and common concept in programing languages totally separate from operator precedence. Very stupid confusive writing.

And, from pure writing aspect, the sentence “…(which is …)” is some kind of a juvenile latch-on. If the author intends to make that point, say it in its own sentence. For example, “The comparison operators have higher precedence than boolean operators.”. It would be better to not mention this at all. For practical considerations, in the rare case of mixing boolean and comparison operators, parenthesis are likely used and is indeed a good practice. The proper place for operator precedence is in the appendix of the language spec, with a table showing all operator symbols and their precedence level, giving a clear view.

Problem:

Comparisons can be chained arbitrarily; for example, x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).

Drop the word “arbitrarily”. It has no meaning here.

the whole sentence is one verbiage of pell-mell thinking and writing. Here's a better version:

Comparison can be chained, like this: x<y<=z. It is equivalent to a sequence of comparisons with and in between: (x<y) and (y<=z)

Problem:

<> and != are alternate spellings for the same operator. != is the preferred spelling; <> is obsolescent

Bad choice of term “spellings”. Computer language operators are not known as “spellings”.

Better: “<> is equivalent to !=.”.

If <> is not likely to go out in future versions, don't even mention about “preference”, because it serves no effective purpose. (if one wants to wax philosophical about “programing esthetics”, go nag it outside of language documentation.)

In general, when something is obsolete or might go defunct in the future, consider not even mentioning that construct. If necessary, add it in a obscure place, such as in a appendix of obsolete list, and not adjacent to critical info. In many places of Python documentation, this is breached.

This is just a quick partial analysis of one episode of incompetence i see in Python docs in the past months i've had the pleasure to scan here and there. A extreme pain in the ass.

I'm in fact somewhat surprised by this poor quality in writing. The more egregious error is the hardware-oriented organization aka technical drivel. But that i accept as common in imperative language communities and in general the computing industry. But the poor quality in the effectiveness and clarity of the writing itself surprised me. As exhibited above, the writing is typical of programers, filled with latch on sentences and unclear thinking. (they in general don't have a clear picture of what they are talking about, and in cases they do, they don't have the skills to express it effectively. (just as a footnote: this writing problem isn't entirely the fault of programers or Python doc writers. In part the English language (or in general natural languages) are to blame, because they are exceptionally convoluted and really take years to master as a art by itself.))

The Python doc, though relatively incompetent, but we can see that the authors did care about quality. This is in contrast to documentations in unix related things (unix tools, perl, apache, and so on), where the writers have absolutely no sense of clear writing, and in most cases don't give a damn, in fact delight in drivel, thinking of it as literary artistry. A criminal of this sort that does society huge damage is Larry Wall and the likes of his cohorts in the unix community. (disclaimer: this is a piece of opinion.)

Addendum: quality writing takes time. Though, the critical part lies not in the mastery of writing itself, but in clarity of thinking of what exactly one wants to say. So, next time you are writing a tech doc, first try to have a precise understanding of the object, and then know exactly what is that you want to say about it, then the writing will come out vastly better. If the precise understanding of the object is not readily at hand (which is natural and common and no need to fidget about), being aware of it helps greatly in its exposition.

Case Examples