Complexity of Software Engineering; Emacs, Unicode, Unison

By Xah Lee. Date:

This page is a blog of a experience of few examples that illustrate how some seemingly trivial task can become quite tedious and complicated in the software industry.

A Complexity with Emacs

Discovered a emacs problem. Summary:

Detail: I have a Mac running emacs 22 with OS X, and i have PC running emacs 23 and Windows Vista. When emacs 23 loads a css mode, it gives this error:

if: Wrong type argument: integerp, (0 . 8)

The problem seems simple in retrospect, but wasn't simple at all when you trying to get things done and things don't work as expected. Here's the story.

Emacs 22 does not come with a css mode, emacs 23 does. There's one css mode written by Stefan. I've been using it on the Mac for the past couple of years. The same package is now bundled into emacs 23, which i'm using on PC. However, the code in the 2 files are not identical.

I have my emacs setup to load css mode. Since i am migrating from Mac to PC, along with my whole emacs system, and am setting up a networked Mac and PC that'd let me work with either machine harmoniously. On the PC, when i start css mode, it gives error, but not always. You don't know what's wrong. It could be a faakup in the emacs distro i'm using on PC (which is emacsW32, new to me), or it could be emacs 23 problem (emacs 23 is still in beta), or it could be something in my emacs customization that works perfectly well on the Mac but not on the PC with whole new environment. Eventually, i realized that's because sometimes i started plain emacs 23 without loading my own setup, it was using the bundled css mode file, so no error, but when i loaded my own emacs setup, it is loading my own css mode file, and error. This seems still simple in retrospect, but wasn't then.

I added a version check to my emacs init file, so that if emacs is 23, don't load my css mode. The next day, same symptom occurs. Eventually I realized that's a problem with emacs's “load-path” variable.

When you need to load a package you download, the way you do that is to put the package in dir for elisp libraries, and add this path to emacs's “load-path” variable, so that emacs knows where to find your stuff. Also, you usually add the path before the other paths, so emacs will load your package of the same name before it loads the one in its default load path.

This means, whenever emacs tries to load css mode, it still loads the one i was using on the Mac, because its path comes before the default lib path. So, my version check has no effect. This load path issue cannot be easily fixed, because i cannot simply remove my lib path because i have a lot other packages that are not bundled with emacs23.

Eventually, the solution is to check if the css mode bundled with emacsW32 runs ok on emacs 22 on my Mac; if so, simply use that as the single source for my Mac and PC. When doing this work, that's when i found out that the 2 css mode's files don't contain version number at all.

All this takes 5 minutes to read, but was one of the gazillion seemingly trivial issues and problems when setting my Mac/PC networking environment with cygwin and all. This took me one week, and haven't gotten to wholly converting my Mac files to PC. Added the time to google for the answer, possibly write a bug report to the emacsers, etc, all in all, i'd say this problem caused me 4 hours.

One might think that if i upgrade my emacs on the Mac to version 23, things might workout more smoothly. But note that emacs 23 is still beta, which means, there might be many potential problems. Adding the complexity is that the 3 main emacs Mac distros: the Carbon Emacs, Aquamacs Emacs, and NeXT Emacs (aka Emacs.app), each with varying degree of using emacs 23 as base. Also, my Mac is PowerPC running OS X 10.4.x. The hardware is 3 years old, not Intel based, and the OS version is also one major release old (current OS X is 10.5.x). The emacs distros for the Mac have varying degrees of supporting my Mac.

Here's the emacs version i'm using. On the Mac: “GNU Emacs 22.2.1 (powerpc-apple-darwin8.11.0, Carbon Version 1.6.0) of 2008-04-05 on g5.tokyo.stp.isas.jaxa.jp”. On Windows “GNU Emacs 23.0.94.1 (i386-mingw-nt6.0.6001) of 2009-05-28 on LENNART-69DE564 (patched)”

Emacs on Windows and Unicode File Names

When using emacsW32, dired has problem dealing with files with chinese char on remote machine. Consequently, when trying to move or rename a file, the result will be unpredictable.

For example: i have this file [林志玲.jpg] on a Mac. I use emacsW32 on Windows to view it thru network. (For example, using a path like is //169.254.145.104/xah/web/ ) The file name shows up in emacs dired as _viu0y~a.jpg.

In fact, the file needs not be on the remote server for this emacs problem to occur. Just create a file in Windows with chinese name, then view it in emacs thru dired.

Note that, Mac's file system is HFS+, and Windows Vista uses NTFS, both encode files names in utf-16, albeit with slight variation. When operating with OS X or Vista, each is able to show file names of chinese chars correctly on the remote machine. However, it appears, that unison treats file names as utf-8 or ascii, and the dired in emacs 23 with EmacsW32 also treat file names as ascii or utf-8. This creates a curious phenomenon: you transfer a Chinese named file from Mac to Vista thru unison, which screwed up the file name, however, it magically displays correctly in emacs dired.

“GNU Emacs 23.0.94.1 (i386-mingw-nt6.0.6001) of 2009-05-28 on LENNART-69DE564 (patched)”, Mac version: 10.4.11, Windows Vista SP1.

Unison and Unicode File Names

Summary: Unison does not support Unicode chars.

When using Unison version 2.27, to sync files from Mac to PC, the file name on the Mac is [赵薇_flag.jpg], and the file name became [èµµè-╪_flag.jpg] on Windows.

Mac version: 10.4.11, Windows Vista SP1.

This may indicate that Unison interpret file names as utf-8, or just ascii. Indeed, it is said on Wikipedia that Unison have problems with non-ASCII file names.

When the file is copied from Mac to Windows or Windows to Mac, operating either on Windows or Mac as the local machine, using either OS's file manager, the file name is copied correctly.

Setting up Unison itself is not so trivial. It is trivial in concept, but actually took hours. I have Unison on my Mac installed, and use it few times a year, since about 2006, so i'm familiar with it. On PC, first you have to install cygwin. I know there are Unison binaries for Windows but since i use cygwin, so my first choice is staying with cygwin, since it contains the whole unix environment.

Installing cygwin is another ordeal [see Installing Cygwin Tutorial] , but once you installed Unison in cygwin, and tried to test sync my Mac and PC, you run into the problem that sshd must be turned on in one of the machines. Namely, sshd should run on the “remote” machine. ( setting up a local network among Win and Mac is yet another ordeal. [see How to Share File Between Mac and Windows] )

Then, there's the issue of deciding which machine you want sshd to run or both. On the Mac, i can turn on sshd in a minute. On Windows, i'm not sure. I'm not sure if Windows Vista Home Premium edition provide ssh server, and am not sure how to turn it on if so. As far as i know, Vista Home Premium does not come with a ssh client. In the process, also realize that firewall must be turned off for ssh port. So, you spend 30 min or possibly hours (here and there) reading or probing with Windows Firewall control panel and whatnot other admin tools.

After a while, i decided it's easier just to turn on sshd on the Mac then unison from the Windows side to the Mac. At least, have this direction work first, and when that works, i can play with the other direction. After all this done, i tried to Unison, but Unison reports that the Unison version on my Mac and PC is not the same, so it doesn't work. Jesus. The one on my Mac turns out to be Unison 2.13.x, and the one i have in Cygwin is 2.31.x. Now, i figured that with each release of Unison, it probably obsolete some older versions. So, back to digging Unison docs and the web. The simplest solution comes to mind is to update my Unison on my Mac to latest. The Unison on my Mac haven't been updated for like 2 years. On the Mac, i use Fink to install unix programs, and that's where i got the Unison there. I am fairly familiar with using Fink, been a Fink user since 2001. However, after “fink selfupdate” etc, then “fink desc unison”, it reports the latest being 2.13, which is the same i have. Odd. Then, searching web on fink home page indicates they have unisone-2.27.57-1007, for my OS 10.4 PowerPC. So, why doesn't it show up in my fink? I remember i had a case last year, where the mirror site i used for fink had corrupted database, a fault entirely on their end. (See newsgroup post comp.unix.admin, 2006-03-22, “fink selfupdate problem”. Source groups.google.com)

After spending maybe some more 30 min, i decided to install a binary unison from another website. After that done, i got Unison 2.27.x on the Mac. I tried to sync again, still no go. So, it seems like that the 2 Unison used must be the same version, or very close. Checking on Unison website, it looks like the current stable release is 2.27.x, so, my cygwin's 2.31.x is actually a beta. Hot Damn. So, now back to cygwin. Luckily, it appears there are several version of Unison there to be installed, and i easily installed 2.27. Then, finally, test sync is successful. Now, i go back to get my files ready in a state to be synced. (long story there. See: Perl Script for Removing Mac Resource Fork and Mac and Windows File Conversion) When finally i successfully unisoned, then there's the Chinese character faakup!

(Note: one might wonder why i need to unison, since i can simply use the built-in file sharing between OS X and Windows, by drag and drop, from either machine. I need unison because i need to sync 2-ways, because sometimes work on the Mac, sometimes Windows.)

Some Thoughts

The point in these short examples is not about software bugs or problems. It illustrates, how seemingly trivial problems, such as networking, transferring files, running a app on Mac or Windows, upgrading a app, often involves a lot subtle complexities. For mom and pop users, it simply stop them dead. For a professional programer, it means some conceptually 10-minutes task often ends up in hours of tedium.

In some “theoretical” sense, all these problems are non-problems. But in practice, these are real, non-trivial problems. These are complexities that forms a major, multi-discipline, area of pragmatics research that doesn't have a name. I'm trying to think of a name that categorize this issue. I think it is a mix of software interface, version control, release control, formal software specification, automated upgrade system, etc. Perhaps Software Robustness would be the proper term for this issue. The ultimate scenario is that, if one needs to transfer files from one machine to another, one really should just press a button and expect everything to work. Software upgrade should be all automatic behind the scenes, to the degree that users really don't need faaking to know what so-called “version” of software he is using.

Today, with extremely fast scientific progress (so-called “exponential” growth rate), and software has progress tremendously too. In our context, that means there are a huge proliferation of protocols and standards. For example, unicode, networking protocols, version control systems, automatic update technologies, all comes into play here. However, in terms of the above visionary ideal, these are only the beginning. There needs to be more protocols, standards, specifications, and more strict ones, and unified ones, for the ideal scenario to take place.