Python Documentation Problems

,

The following is a collection of criticisms on the Python Programing Language's documentation. (written in a style intend to sting)

I've started to read Python tutorial. (http://python.org/doc/2.3.3/tut/tut.html) Here are some quick critique:

Quick examples from “7.1 Fancier Output Formatting” at http://python.org/doc/2.3.3/tut/node9.html

… If the input string is too long, they don't truncate it, but return it unchanged; this will mess up your column lay-out but that's usually better than the alternative, which would be lying about a value. (If you really want truncation you can always add a slice operation, as in "x.ljust( n)[:n]"…

It would be better if it simply say: “If the input string is too long, they don't truncate it, but return it unchanged.”.

As another example, lines such as the following are redundant:

Reverse quotes (``) are equivalent to repr(), but their use is discouraged.

Similarly, many places mentioning uncritical info such as warning or reference to other languages should be deleted.

The tutorial should be simple, concise, to the point, standalone. Perhaps 1/5th length of the tutorial should be deleted for better.

At places often a whole paragraph on some so called computer science jargons should be deleted. They are there more to showcase inane technicality than do help the reader. (related, many passages with jargons should be rewritten sans inane jargon. ⁖ mutable object.)

One easy way to understand these principles is to compare Perl's documentation or unix man pages to Python's. The formers are often irrelevant, rambling on, not standalong (it is written such that it unnecessarily requires the reader to be knowledgeable of lots of other things). Python docs does not suffer from obfuscation mentality, but like many computer language manuals, also suffers from gratuitous jargon verbiage. (these jargons or passages about them are usually there to please the authors themselves).

A exemplary writing in this direction is the Mathematica Manual by Stephen Wolfram. Any intelligent layman can read it straightforwardly, and learn unhindered a language that is tantamount to features of lisp languages. Such documentation is not difficult to write at all. (contrary to the lot of “computer scientists” or IT pundits.) All it take is some simple principles outlined above.

Exhibit: Author Masturbation

In the Python tutorial “9. Classes” at http://python.org/doc/2.3.3/tut/node11.html, it begins thus:

Python's class mechanism adds classes to the language with a minimum of new syntax and semantics. It is a mixture of the class mechanisms found in C++ and Modula-3. As is true for modules, classes in Python do not put an absolute barrier between definition and user, but rather rely on the politeness of the user not to ``break into the definition.'' The most important features of classes are retained with full power, however: the class inheritance mechanism allows multiple base classes, a derived class can override any methods of its base class or classes, a method can call the method of a base class with the same name. Objects can contain an arbitrary amount of private data.

classic masturbation.

In C++ terminology, all class members (including the data members) are public, and all member functions are virtual. There are no special constructors or destructors. As in Modula-3, there are no shorthands for referencing the object's members from its methods: the method function is declared with an explicit first argument representing the object, which is provided implicitly by the call. As in Smalltalk, classes themselves are objects, albeit in the wider sense of the word: in Python, all data types are objects. This provides semantics for importing and renaming. Unlike C++ and Modula-3, built-in types can be used as base classes for extension by the user. Also, like in C++ but unlike in Modula-3, most built-in operators with special syntax (arithmetic operators, subscripting etc.) can be redefined for class instances.

This introduction should be deleted. Nobody gives a shit except a few smug academicians where the author wrote it for pleasing himself. For 99% of readers, it is incomprehensible and irrelevant.

The section immediately follows it: “9.1 A Word About Terminology”, quote:

Lacking universally accepted terminology to talk about classes, I will make occasional use of Smalltalk and C++ terms. (I would use Modula-3 terms, since its object-oriented semantics are closer to those of Python than C++, but I expect that few readers have heard of it.)

is epitome of masturbation. The rest of 9.1 goes:

I also have to warn you that there's a terminological pitfall for object-oriented readers: the word ``object'' in Python does not necessarily mean a class instance. Like C++ and Modula-3, and unlike Smalltalk, not all types in Python are classes: the basic built-in types like integers and lists are not, and even somewhat more exotic types like files aren't. However, all Python types share a little bit of common semantics that is best described by using the word object.

Objects have individuality, and multiple names (in multiple scopes) can be bound to the same object. This is known as aliasing in other languages. This is usually not appreciated on a first glance at Python, and can be safely ignored when dealing with immutable basic types (numbers, strings, tuples). However, aliasing has an (intended!) effect on the semantics of Python code involving mutable objects such as lists, dictionaries, and most types representing entities outside the program (files, windows, etc.). This is usually used to the benefit of the program, since aliases behave like pointers in some respects. For example, passing an object is cheap since only a pointer is passed by the implementation; and if a function modifies an object passed as an argument, the caller will see the change -- this eliminates the need for two different argument passing mechanisms as in Pascal.

The entire 9.1 is not necessary.

Large part of the windy “9.2 Python Scopes and Name Spaces” is again masturbatory.

Most texts in computing are written by authors to defend and showcase their existence against their peers. In a tutorial, nobody cares how the language compared to x y z, or what technicality is it all about, or some humorous snippet of history only funny to the author himself.

Particularly for texts in a tutorial context, you want to write it as simple as possible covering the most useful basic functionality and concepts, and self-contained. Not showcasing your knowledge of history of languages or your linguistic lineage byways.

For example, take this chapter 9 on Objects, it is not difficult to write it without making a show of lingoes. One simply write what is of Python with respect to how it functions, without thinking about relation to xyz languages or in the framework of “computer science” establishment and their ways of thinkings of namespaces and scopes and dynamic and statics and polymorphism … bags of baggage.

Also, in the computing industry, documentations and tutorials often lack examples, especially important in tutorials. Be fewer in words, more in examples. (for example, unix man pages are full of arcane abstract syntactical notations and gut technicalities while most don't contain a single example of usage most users seek.)

This does not mean beginning to write for dummies as the highly successful series of “xyz for Dummies” commercial publications. These sells because the general readers have learned to fear the corpus of textbooks chalking up to jargons and intellectualization on accounts of the author's own esteem and careers at the expense of textbook's educational effectiveness. Dummy books are moronic because they assumed the general readers are morons.

PS Another illustrative case is the official Java Tutorial ( http://java.sun.com/docs/books/tutorial/java/TOC.html ). The Official Java Tutorial is completely asinine. Waxing rocket science with unhelpful and actually misleading drivels throughout, and meanwhile writing as if the readers are 3-year-olds who need instructions step by step repeated multiple times. (See this exposition on Java's access specifiers for example of inanities of the official Java tutorial: Java's Access Specifiers)

Example of a Tutorial on Objects

In my previous two messages, i've given critique on the inanity that applies to the vast majority of language documentations and tutorials in the industry. I've used the Python tutorial's chapter on class as a example. I've indicated that proper tutorial should be simple, covering just common cases, be self-contained, and be example based; documenting the language's functionality manifest as is. A exemplary case of this style i've indicated is Stephen Wolfram's Mathematica documentation.

Following is excerpt from the a-Python-a-day mailing list. As a example, it shows what i mean by covering the language's functionality as is, sans extraneous lingoe sideshows. If expanded slightly and edited, it can supplant sections 9.0 to 9.4 of the Python tutorial. Languages Tutorials should follow this style.

# in Python, one can define a boxed set
# of data and functions, which are
# traditionally known as "class".

# in the following, we define a set of data
# and functions as a class, and name it xxx

class xxx:
     "a class extempore! (^_^)"
     i=1 # i'm a piece of data
     def okaydokey(self): return "okaydokey"
     def square(self,a): return a*a

# in the following,
# we create a object, of the class xxx.
# aka "instantiate a class".
x = xxx()

# data or functions defined in a class
# are called the class's attributes or
# methods.
# to use them, append a dot and
# their name after the object's name.
print 'value of attribute i is:', x.i
print "3 squared is:", x.square(3)
print "okaydokey called:", x.okaydokey()

# in the definition of function inside a
# class, the first parameter "self" is
# necessary. (you'll know why when you need to)

# the first line in the class definition
# is the class's documentation. It can
# be accessed thru the __doc__
# attribute.
print "xxx's doc string is:", x.__doc__

# one can change data inside the class
x.i = 400

# one can also add new data to the class
x.j=4
print x.j

# or even override a method
x.square = 333
# (the following line will no longer work)
# print "3 squared is:", x.square(3)

# in Python, one must be careful not to
# overwrite data or methods defined in a
# class.

# for a obfuscated treatment with a few
# extra info, see http://python.org/doc/2.3.4/tut/node11.html

Exhibit: extraneous discombobulation

In the doc for re module “4.2.2 on Matching vs Searching” at http://python.org/doc/2.3.3/lib/matching-searching.html , quote:

Python offers two different primitive operations based on regular expressions: match and search. If you are accustomed to Perl's semantics, the search operation is what you're looking for. See the search() function and corresponding method of compiled regular expression objects.

Its mentioning of Perl is irrelevant, since the majority reading that page will not have expertise with Perl regex. The whole section should be deleted, because it only adds confusion. (later section 4.2.3 on search and match methods plainly indicated their difference.)

(the mentioning of Perl here and this befuddlement gratis is a syndrome of jealousy and Python fanaticism. All together innocently done as ignorance of common doc writers.)

Note that match may differ from search using a regular expression beginning with "^": "^" matches only at the start of the string, or in MULTILINE mode also immediately following a newline. The ``match'' operation succeeds only if the pattern matches at the start of the string regardless of mode, or at the starting position given by the optional pos argument regardless of whether a newline precedes it.

A detailed explanation of their difference or the mentioning of Perl should be in FAQ or such material.

in section “4.2.6 Examples”, there needs to be more and simple examples. 〔➤ Perl & Python: Regex Example〕 The beginning large section about some scaf() should be deleted for the same reason above.

For a completely rewritten version of Python's regex module doc, see: Python Regex Tutorial: String Pattern Matching

Exhibit: fluff in code example

in section “11.12.2 SMTP Examples” ( http://python.org/doc/2.3.3/lib/SMTP-example.html ) gives this turgid example of using SMTP.

import smtplib

def prompt(prompt):
    return raw_input(prompt).strip()

fromaddr = prompt("From: ")
toaddrs  = prompt("To: ").split()
print "Enter message, end with ^D (Unix) or ^Z (Windows):"

# Add the From: and To: headers at the start!
     msg = ("From: %s\r\nTo: %s\r\n\r\n"
       % (fromaddr, ", ".join(toaddrs)))
while 1:
    try:
        line = raw_input()
    except EOFError:
        break
    if not line:
        break
    msg = msg + line

print "Message length is " + repr(len(msg))

server = smtplib.SMTP('localhost')
server.set_debuglevel(1)
server.sendmail(fromaddr, toaddrs, msg)
server.quit()

In a documentation, you want to give example as short and to the point as possible. In this case, the goal is to illustrate how to use smtplib, not how to preamble with nice command line interface. A reader who comes to read this doc will be harried in discerning fluff code from the core. This is especially tasking for a significant portion of doc readers because they are not already experts. A better example would be like this:

import smtplib
smtpServer='smtp.example.org'
fromAddr='xah@xahlee.org'
toAddr='jane@example.com'
text='''Subject: test test

Hi …
'''

server = smtplib.SMTP(smtpServer)
server.set_debuglevel(1)
server.sendmail(fromAddr, toAddr, text)
server.quit()

In this example, it shows how to use “smtplib” exactly, just that.

Exhibit & Analysis: pell-mell writing

Python doc “2.3.3 Comparisons” at http://python.org/doc/2.4/lib/comparisons.html , quote:

Comparison operations are supported by all objects. They all have the same priority (which is higher than that of the Boolean operations). Comparisons can be chained arbitrarily; for example, x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).

Problem:

“Comparison operations are supported by all objects.”

This is very vague and ambiguous.

The word “object” has generic English meaning as well might have specific technical meaning in a language that supports Object-Oriented programing. In Python, it does not have very pronounced technical meaning. For example, there's a chapter in Python Library Ref titled “2. Built-In Objects”, and under it a section “2.1 Built-in Functions”. Apparently, functions can't possibly be meant as a “object” for comparisons.

Now suppose we take the object in the sentence to be sensible items such as numbers, lists etc. The clause “supported by all objects” is ambiguous. What is meant by “supported”?

Problem:

They all have the same priority (which is higher than that of the Boolean operations).

This sentence is very stupid, in multitude of aspects.

The “priority” referred to here means operator precedence.

It tries to say that the comparison operator has higher syntactical connectivity than boolean operators. E.g. False and False==False means False and (False==False) and not (False and False)==False.

However, the “they” pronoun from the context of previous sentence, refers to “the comparison operation”, not “operator”. So, it conjures the reader to think about some “operation precedence”, which in itself is a critical and common concept in programing languages totally separate from operator precedence. Very stupid confusive writing.

And, from pure writing aspect, the sentence “…(which is …)” is some kind of a juvenile latch-on. If the author intends to make that point, say it in its own sentence. ⁖ “The comparison operators have higher precedence than boolean operators.”. It would be better to not mention this at all. For practical considerations, in the rare case of mixing boolean and comparison operators, parenthesis are likely used and is indeed a good practice. The proper place for operator precedence is in the appendix of the language spec, with a table showing all operator symbols and their precedence level, giving a clear view.

Problem:

Comparisons can be chained arbitrarily; for example, x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).

Drop the word “arbitrarily”. It has no meaning here.

the whole sentence is one verbiage of pell-mell thinking and writing. Here's a better version:

Comparison can be chained, like this: x<y<=z. It is equivalent to a sequence of comparisons with and in between: (x<y) and (y<=z)

Problem:

<> and != are alternate spellings for the same operator. != is the preferred spelling; <> is obsolescent

Bad choice of term “spellings”. Computer language operators are not known as “spellings”.

Better: “<> is equivalent to !=.”.

If <> is not likely to go out in future versions, don't even mention about “preference”, because it serves no effective purpose. (if one wants to wax philosophical about “programing esthetics”, go nag it outside of language documentation.)

In general, when something is obsolete or might go defunct in the future, consider not even mentioning that construct. If necessary, add it in a obscure place, such as in a appendix of obsolete list, and not adjacent to critical info. In many places of Python documentation, this is breached.

This is just a quick partial analysis of one episode of incompetence i see in Python docs in the past months i've had the pleasure to scan here and there. A extreme pain in the ass.

I'm in fact somewhat surprised by this poor quality in writing. The more egregious error is the hardware-oriented organization aka technical drivel. But that i accept as common in imperative language communities and in general the computing industry. But the poor quality in the effectiveness and clarity of the writing itself surprised me. As exhibited above, the writing is typical of programers, filled with latch on sentences and unclear thinking. (they in general don't have a clear picture of what they are talking about, and in cases they do, they don't have the skills to express it effectively. (just as a footnote: this writing problem isn't entirely the fault of programers or Python doc writers. In part the English language (or in general natural languages) are to blame, because they are exceptionally convoluted and really take years to master as a art by itself.))

The Python doc, though relatively incompetent, but we can see that the authors did care about quality. This is in contrast to documentations in unix related things (unix tools, perl, apache, and so on), where the writers have absolutely no sense of clear writing, and in most cases don't give a damn, in fact delight in drivel, thinking of it as literary artistry. A criminal of this sort that does society huge damage is Larry Wall and the likes of his cohorts in the unix community. (disclaimer: this is a piece of opinion.)

Addendum: quality writing takes time. Though, the critical part lies not in the mastery of writing itself, but in clarity of thinking of what exactly one wants to say. So, next time you are writing a tech doc, first try to have a precise understanding of the object, and then know exactly what is that you want to say about it, then the writing will come out vastly better. If the precise understanding of the object is not readily at hand (which is natural and common and no need to fidget about), being aware of it helps greatly in its exposition.

Case Examples

blog comments powered by Disqus