On Unix File System's Case Sensitivity

by Brian Tiemann, 2001

As for the “rightness” of using a case-preserving, case-insensitive filesystem, though… well, I come from a UNIX-geek background myself, and it was many galling years before I understood why it was designed that way in the first place.

Case-sensitivity seems like a great idea to UNIX-heads. These are people who want every possible command and workflow to have a distinct, deterministic result — the kind of thinking you expect from an academic/research environment. Synonymous workflows that arrive at the same result are anathema to science. Students filling up directories with lab data like for there to be a difference between “a.dat” and “A.dat” and sorts them according to ASCII value rather than orthography. It's a sure-footed, obedient scheme, one where the computer does exactly what the user wants it to do — because the user is one who has the expertise to issue instructions that are very clear and precise and speak the same internal language that the computer does.

But that's not who desktop OSes are written for. In a desktop OS, there is no conceivable reason why you would want to have two files in the same folder that are, for all intents and purposes, named the same thing. “Picture1.jpg” is the same thing as “picture1.jpg”. No, really — it is. It's the metaphor by which you organize the people in your address book. Would you consider “john thomas” to be a different person from “John Thomas”? Would you be unconfused by a set of introductions at a party with both these fellows in attendance?

Mac and Windows users have to have filenames read to them over the phone by support techs. They have to be able to write little sticky notes to their mothers about how to open up the mail program, without worrying about how the filenames are capitalized. Haven't you ever fumed over a URL with initial-caps in the folder names in the path, having to fiddle with capitalization until you get a response that's anything but a 404? Haven't you ever been secretly pleased that e-mail addresses aren't case-sensitive?

(Side note: Apache's mod_spelling module corrects capitalization errors, but does it the “right” way— issuing a redirect so the browser re-requests the file with the correct capitalization, thus closing the loop as a “case-preserving” scheme. Windows servers just happily serve up the file with the wrong capitalization, leading the client to save a file capitalized differently from the copy on the server.)

Bear in mind that it's MUCH more work for a filesystem to be case-insensitive than -sensitive. A filesystem is case-sensitive by default, in the simplest case; it can only be made case-INsensitive through a lot of extra engineering. In UNIX, all the system has to do is sort on the ASCII values of the first letters of the filenames. In the Mac OS and Windows, the filesystem has to be smart enough to create synonyms of various letters — A for a, and so on — and sort accordingly. That takes a LOT of code. It's a testament to the completeness of the original Mac OS that in 1984 this was all handled properly, before Windows even brought lower-case letters to the PC side.

It goes well beyond that, too. Look at how the Mac handles foreign, extended characters. Rather than the Windows way, in which you have to pick accented characters from a bizarre grid of upper-ASCII values, the Mac builds such things into the keyboard input routines in a way that's workflow-consistent. Press Option+U, and you get an umlaut. Then enter a vowel, and the umlaut combines with the vowel. Option+U,U — and there's your ü.

And what about sorting? The Mac sorts ü right along with the other U variants. iTunes recognizes Bjork and Björk as the same artist. There's a completeness to the thinking here that Windows still only approximates. Windows sorts all of your folders first in a folder listing, before any files. Why is that? And it still has that infuriating and inexplicable stupidity of not allowing you to create a file with a name that's an all-caps acronym, without helpfully converting it to an initial-caps string for you.

It's taken me a long time to come to terms with the appropriateness of a case-preserving, case-insensitive filesystem, but I've done it. It's clear to me now that while it's nice in an academic sense to have deterministic control over filenames to the point where two files that differ only in capitalization can exist in the same folder, it's simply nothing but confusing to a casual user for there to be a distinction. It's an area in which Mac OS X inherits some of the very hard and complete work of the classic Mac OS development in the form of the HFS+ filesystem, but in which some of the recent UI decisions are causing ease-of-use to suffer (for instance, hidden filename extensions, which lead to the never-before-seen problem of multiple files with the same APPARENT filename in the same folder — we can only hope we don't end up like Windows, with six files in a folder all called “setup”). And this is one case where I hope Apple develops with the home user in mind, rather than the academic UNIX geek.

It's telling that over the past several years, in attempts to reassure myself of the usefulness of case-sensitivity, I've asked UNIX geek after UNIX geek to name a single truly compelling reason for it to exist. And to a man, not one of them could think of anything more concrete than “Well… y'know… it's just better.”

No, the Mac way of handling lexicography is by far the most intensive and complete out there. Certainly by comparison to UNIX and Windows. And I've come to appreciate it as a true Mac advantage.

Notes from Xah Lee

This article is a part from Scot Hacker's 2001-01 article Tales of a BeOS Refugee at http://www.birdhouse.org/macos/beos_osx/redux.html. Scot Hacker is the author of The BeOS Bible . Buy at amazon