Python Doc Problem: sort()
Exhibit: Incompletion and Imprecision
Python doc “2.3.6.4 Mutable Sequence Types” at
http://python.org/doc/2.4/lib/typesseq-mutable.html
in which contains the documentation of the “sort” method of a list. Quote:
Operation Result Notes s.sort([cmp[, key[, reverse]]]) sort the items of s in place (7), (8), (9), (10)
- (7) The sort() and reverse() methods modify the list in place for economy of space when sorting or reversing a large list. To remind you that they operate by side effect, they don't return the sorted or reversed list.
- (8) The sort() method takes optional arguments for controlling the comparisons.
cmp specifies a custom comparison function of two arguments (list items) which should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument: "cmp=lambda x,y: cmp(x.lower(), y.lower())"
key specifies a function of one argument that is used to extract a comparison key from each list element: "cmp=str.lower"
reverse is a boolean value. If set to True, then the list elements are sorted as if each comparison were reversed.
In general, the key and reverse conversion processes are much faster than specifying an equivalent cmp function. This is because cmp is called multiple times for each list element while key and reverse touch each element only once.
Changed in version 2.3: Support for None as an equivalent to omitting cmp was added.
Changed in version 2.4: Support for key and reverse was added.- (9) Starting with Python 2.3, the sort() method is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal -- this is helpful for sorting in multiple passes (for example, sort by department, then by salary grade).
- (10) While a list is being sorted, the effect of attempting to mutate, or even inspect, the list is undefined. The C implementation of Python 2.3 and newer makes the list appear empty for the duration, and raises ValueError if it can detect that the list has been mutated during a sort.
Problems
As a piece of documentation, this is a lousy one.
The question Python doc writers need to ask when evaluating this piece of doc are these:
- Can a experienced programer who is expert at several languages but new to Python, and also have read the official Python tutorial, can he, read this doc, and know exactly how to use sort with all the options?
- Can this piece of documentation be rewritten fairly easily, so that the answer to the previous question is a resounding yes?
To me, the answers to the above questions are No and Yes. Here are some issues with the doc:
• In the paragraph about the “key” parameter, the illustration given is: cmp=str.lower
. It should be be key=str.lower
• This doc lacks examples. One or two examples will help a lot, especially to less experienced programers. (which comprises the majority of readers) In particular, it should give a full example of using the comparison function and one with the “key” parameter. Examples are particularly needed here because these parameters are functions, often with the “lambda” construct. These are unusual and advanced constructs among imperative languages.
• This doc fails to mention what happens when the predicate and the shortcut version conflicts. e.g. myList.sort(cmp=lambda x,y: cmp(x[0], y[0]), key=lambda x: str(x[1]) )
• The notation the Python doc has adopted for indicating the syntax of optional parameters, does not give a clear view just exactly what combination of optional parameters can be omitted. The notation: s.sort([cmp[, key[, reverse]]])
gives the impression that only trailing arguments can be omitted, which is not true.
• The doc gives no indication of how to omit a optional arg. Should it be nul
, Null
, 0, or left empty? Since it doesn't give any examples, doc reader who isn't Python experts is left to guess at how true/false values are presented in Python.
• On the whole, the way this doc is written does not give a clear picture of the roles of the supplied options, nor how to use them.
Suggestion
Suggested Quick Remedy: add a example of using the “cmp” function. And a example using the “key” function. Add a example of using one of them and with “reverse”. (the examples need not to come with much explanations. One sentence annotation is better than none.)
Other than that, the way the doc is laid out with a terse table and run-on footnotes (employed in several places in Python doc) is not inductive. For a better improvement, there needs to be a overhaul of the organization and the attitude of the entire doc. The organization needs to be programing based, as opposed to implementation or computer science based. (in this regard, one can learn from the Perl folks). As to attitude, the writing needs to be Python-as-is, as opposed to computer science framework, as indicated in the early parts of this critique series.
Python Sort Documentation Problem Detail
Since Python 2.4 released in 2005-03, a new built-in function sorted() was added. There's no mention of it at the doc page of the sort() method.
Here's further example of Python's extreme low quality of documentation. In particular, what follows focuses on the bad writing skill aspect, and comments on some language design and quality issues of Python.
From the Official Python documentation of the sort() method, at:
http://python.org/doc/2.4.2/lib/typesseq-mutable.html, Quote:
The sort() method takes optional arguments for controlling the comparisons.
It should be “optional parameter” not “optional argument”. Their difference is that “parameter” indicates the variable, while “argument” indicates the actual value.
… for controlling the comparisons.
This is a bad writing caused by lack of understanding. No, it doesn't “control the comparison”. The proper way to say it is that “the comparison function specifies an order”.
The sort() and reverse() methods modify the list in place for economy of space when sorting or reversing a large list. To remind you that they operate by side effect, they don't return the sorted or reversed list.
This is a example of tech-geeking drivel. The sort() and reverse() methods are just the way they are. Their design and behavior are really not for some economy or remind programers of something. The Python doc is bulked with these irrelevant drivels. These littered inanities dragged down the whole quality and effectiveness of the doc.
Changed in version 2.4: Support for key and reverse was added.
In general, the key and reverse conversion processes are much faster than specifying an equivalent cmp function. This is because cmp is called multiple times for each list element while key and reverse touch each element only once.
When sorting something, one needs to specify a order. The easiest way is to simply list all the elements as a sequence. That way, their order is clearly laid out. However, this is in general not feasible and impractical. Therefore, we devised a mathematically condensed way to specify the order, by defining a function f(x,y) that can take any two elements and tell us which one comes first. This, is the gist of sorting a list in any programing language.
The ordering function, being a mathematically condensed way of specifying the order, has some constraints. e.g. the function should not tell us “x < y” and “y < x”. (For a complete list of these constraints, see Sorting in Python and Perl.)
With this ordering function, it is all sort needed to sort a list. Anything more is interface complexity.
The optional parameters “key” and “reverse” in Python's sort method is a interface complexity. What happened here is that a compiler optimization problem is evaded by moving it into the language syntax for programers to worry about. If the programer does not use the “key” syntax when sorting a large matrix (provided that he knew in advance of the list to be sorted or the ordering function), then he is penalized by a severe inefficiency by a order of magnitude of execution time.
This situation, of moving compiler problems to the syntax surface is common in imperative languages.
Changed in version 2.3: Support for None as an equivalent to omitting cmp was added.
This is a epitome of catering towards morons. myList.sort()
is perfect but Pythoners had to add myList.sort(None)
interface complexity just because idiots need it.
The motivation here is simple: a explicit “None” gives coding monkeys a direct sensory input of the fact that “there is no comparison function”. This is like the double negative in black English “I ain't no gonna do it!”. Logically, “None” is not even correct and leads to bad thinking. What really should be stated in the doc, is that “the default ordering function to sort() is the ‘cmp’ function.”.
Starting with Python 2.3, the sort() method is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal -- this is helpful for sorting in multiple passes (for example, sort by department, then by salary grade).
One is quite surprised to read this. For about a decade of a language's existence, its sort functionality is not smart enough to preserve order?? A sort that preserves original order isn't something difficult to implement. What we have here is sloppiness and poor quality common in Open Source projects.
Also note the extreme low quality of the writing. It employs the jargon “stable sort” then proceed to explain what it is, then in trying to illustrate the situation, it throws “multiple passes” and the mysterious “by department, by salary”.
Here's a suggested rewrite: “Since Python 2.3, the result of sort() no longer rearrange elements where the comparison function returns 0.”