Why Must Software be Rewritten for Multi-Core Processors?

By Xah Lee. Date: . Last updated: .

I had a revelation today: it is necessary to rewrite software to use multi-processor in order to benefit from it.

This may sound stupid, but is a revelation to me. For the past decade, the question has been on my mind, about why should software needs to be rewritten to take advantage of multi-processors. Because, in my mind, i thought that software are at some fundamental level just algorithms, and algorithms, have nothing to do with hardware implementation aspects such as number of processors. I always felt, that those talks about the need or difficulty of rewriting software for multi-processor (or multi-core these days) must be the product of idiocy of industrial imperative coding monkeys. In particular, some languages such as java, the way they deal with it, seems to me extremely stupid. For example, the concept of threads. In my mind, there should be a layer between the software and the hardware, such as the operating system, or the compiler, that should be able to automatically handle or compile the software so that it FULLY use the multi-processors when present. In short, i thought that a algorithm really has nothing to do with hardware details.

I never really thought hard about this issue, but today, since i got a quad-core PC, so i looked into the issue, and thought about it, and i realized the answer. The gist is that, algorithm, fundamentally means manipulating some hardware, in fact, algorithm is a step by step instruction about some form of hardware, even the hardware may be abstract or virtual. For example, let's say the simplest case of 1+1. It is a algorithm, but where is the hardware? You may say it's abstract concept, or it being a mathematical model. What you call 1+1 depends on the context, but in our context, those numbers are the hardware. To see this, lets say more complex example of listing primes by sieve. Again, it comes down to “what is a number”? Here, numbers can be stones, or arrangement of beads on abacus, it's hardware! As another example, say sorting. To begin with, you have to have something to sort, that's hardware.

Another way to see this is that, for a given computing problem, there are infinite number of algorithms to achieve the same thing. Some, will be better ones, requiring less steps, or requiring less storage. All these are concrete manipulation issues, and the thing being manipulated, ultimately we have to call it hardware. So, when hardware changes, say from one cpu to multi-cpu, there is no way for the algorithm to magically change and adopt the changed hardware. If you need a algorithm that is catered to the new hardware, you need a new algorithm.

One might think that there might be algorithm Omega, such that it takes input of old hardware O, new hardware N, and a algorithm A, and output a algorithm B, such that B has the same behavior as A, but N+B performs better than O+A. This is like asking for Strong AI.

One thing we often forgot is that algorithms ultimately translates to manipulating hardware. In a modern digital computer, that means software algorithms ultimately becomes machine instructions in CPU, which manipulate the 1s and 0s in register, or electricity voltage in transistors.

In a more mundane point of view, a automatic system for software to work on multi-processors is a problem of breaking a given algorithm into independent units (so that they can be computed in parallel). The problem of finding a algorithm is entirely different from the problem of breaking a algorithm into independent units. The problem of dissecting a given algorithm into independent units is a entire new branch of mathematics. The problem of writing a unitized algorithm is also relatively new, with the need to analyze a problem by independent units. For example, let's say factoring. Factoring is a well defined mathematical problem. There are millions algorithms to do it, each class has different properties such as number of steps or storage units. However, if we look at these algorithms from the point of view of independent units, it's a new perspective on classification of algorithms. Software are in general too ill-defined and fuzzy and complex, the software we use on personal computers such as web browsers, games, video players, don't even have mathematical models. They don't even have mathematical models of their inputs and outputs. To talk about automatic system of unitizing a given software, would be more like a AI fantasy. Roughly, a term that describes this aspect of research is Parallel computing .

In the Wikipedia article, it talks about types of parallelism: Bit-level parallelism, Instruction-level parallelism, Data parallelism, Task parallelism. Then it also discusses types of hardware related issues: multi-core, symmetric multiprocessing, distributed computing, cluster, grid. The subjects mentioned there more close to this essay are “data parallelism” and “task parallelism”. However, neither are high level enough as i discussed here. The issue discussed here would be like “algorithmic parallelism”. Indeed, Wikipedia mentioned “Automatic parallelization”, which is exactly what i'm talking about here. Quote:

Automatic parallelization of a sequential program by a compiler is the holy grail of parallel computing. Despite decades of work by compiler researchers, automatic parallelization has had only limited success.[40]

Mainstream parallel programming languages remain either explicitly parallel or (at best) partially implicit, in which a programmer gives the compiler directives for parallelization. A few fully implicit parallel programming languages exist — SISAL, Parallel Haskell, and (for FPGAs) Mitrion-C.

It says “A few fully implicit parallel programming languages exist”. If you are a starry-eyed comp lang tech geeker, don't get carried away by what those words might seem to mean.

(Note: Wikipedia has a dedicated article on the subject: Automatic parallelization)

Reading those Wikipedia articles, it solidifies the popular thought of why functional languages have a great time with the multi-processor future. Because in functional languages, you write software by synthesizing functions, and functions are by definition, independent units. This leaves the compiler much more able to analyze and dissect to produce efficient byte code for multi-processor machines.

For a excellent discussion on parallel computing, see: Guy Steele on Parallel Programing .


I've been studying math and computing fanatically since about 1991. Sometimes, i suddenly had a insight on questions that have bugged me for years. I the past few years, i have written out some of these. Here they are: Multiplication and Multiplicative IdentityThe Geometric Significance of Complex ConjugateWhat is Technical Drawing, Descriptive Geometry, Projective Geometry, Linear Algebra.