Language Design: Should Array Index Start at 0 or 1?

By Xah Lee. Date: . Last updated: .

“When the 0th bus comes, i'll be on the 0th floor at 0th avenue with my 0th wife.”

Which language's array index starts at 0?

Non-math languages start with 0. For example, C, Perl, Python, Ruby, Java, Emacs Lisp, JavaScript.

Which language's array index starts at 1?

Math languages start at 1: FORTRAN, SASL, MATLAB, Julia, Mathematica, Smalltalk, Lua , Erlang, APL.

Should array index start at 0 or 1?

The computer scientist Edsger W Dijkstra (EWD) thinks it should start at 0. See: [• Why numbering should start at zero By Edsger W Dijkstra. At http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html , Accessed on 2013-02-22 ]. But it's not really convincing.

① Dijkstra gave his reasons about how a range function from A to B of integers, should include A and exclude B. For example, he'd have range(2,5) return [2,3,4].

The reason he prefer that because:

② he then gave his reasons that a index function to extract a array, should start at 0. That is, myArray[0] returns the first element in the array. Dijkstra thinks this is consistent with his ①.

Python follows EWD's recommendation to a tee. It start at 0, and the ending index does not include the element. Example:

# -*- coding: utf-8 -*-
# python3

a = list(range(1,4))
print(a)                 # prints [1, 2, 3]. most annoying.

b = ["a", "b", "c"]
print(b[0:1])                   # ['a']

Python's way is the most painful to work with.

The very fact, that range(1,4) returns [1,2,3] not [1,2,3,4], is source of countless redoes.

Most other langs, even they start at 0, but a ending index usually is inclusive. For example, here's Perl.

# -*- coding: utf-8 -*-
# perl

use Data::Dumper;

@a = qw(a b c);

@b = @a[0..1];

print Dumper(\@b);              # ['a', 'b']

Should Array Index Start at 0 or 1?

Counting should start at 1, and ending index should be inclusive. Because that's just easier to work with.

This can be easily seen when you have nested list. For example, suppose you have a simple 3D matrix [[[7,5],[84,54]],[[1,6],[96,71]],[[9,b],[3,8]]]. See that b there? what index is it?

In math, vector and matrix's components are 1 based, not 0.

Nested array is heavily used in math, for marix, tensor, tree. This is why most math oriented languages starts at 1.

Why Do Programmers Start Counting at Zero

Here's a excerpt from a most well-researched essay: [• Why Do Programmers Start Counting at Zero By Mike Hoye. At http://exple.tive.org/blarg/2013/10/22/citation-needed/ , Accessed on 2014-10-13 ]

The usual arguments involving pointer arithmetic and incrementing by sizeof(struct) and so forth describe features that are nice enough once you've got the hang of them, but they're also post-facto justifications. This is obvious if you take the most cursory look at the history of programming languages; C inherited its array semantics from B, which inherited them in turn from BCPL, and though BCPL arrays are zero-origin, the language doesn't support pointer arithmetic, much less data structures. On top of that other languages that antedate BCPL and C aren't zero-indexed. Algol 60 uses one-indexed arrays, and arrays in Fortran are arbitrarily indexed – they're just a range from X to Y, and X and Y don't even need to be positive integers.

… if your answer started with “because in C…”, you've been repeating a good story you heard one time, without ever asking yourself if it's true.

The fact of it is this: before pointers, structs, C and Unix existed, at a time when other languages with a lot of resources and (by the standard of the day) user populations behind them were one- or arbitrarily-indexed, somebody decided that the right thing was for arrays to start at zero.

So I found that person and asked him.

His name is Dr. Martin Richards; he's the creator of BCPL, now almost 7 years into retirement; you've probably heard of one of his doctoral students Eben Upton, creator of the Raspberry Pi. I emailed him to ask why he decided to start counting arrays from zero, way back then. He replied that…

So: the technical reason we started counting arrays at zero is that in the mid-1960's, you could shave a few cycles off of a program's compilation time on an IBM 7094. The social reason is that we had to save every cycle we could, because if the job didn't finish fast it might not finish at all and you never know when you're getting bumped off the hardware because the President of IBM just called and fuck your thesis, it's yacht-racing time.

Why Python Uses 0-Based Indexing?

[• Why Python Uses 0-Based Indexing? By Guido Van Rossum. At https://plus.google.com/115212051037621986145/posts/YTUxbXYZyfi , Accessed on 2014-10-13 ]

I was asked on Twitter why Python uses 0-based indexing, with a link to a new (fascinating) post on the subject (http://exple.tive.org/blarg/2013/10/22/citation-needed/). I recall thinking about it a lot; ABC, one of Python's predecessors, used 1-based indexing, while C, the other big influence, used 0-based. My first few programming languages (Algol, Fortran, Pascal) used 1-based or variable-based. I think that one of the issues that helped me decide was slice notation.

Let's first look at use cases. Probably the most common use cases for slicing are “get the first n items” and “get the next n items starting at i” (the first is a special case of that for i == the first index). It would be nice if both of these could be expressed as without awkward +1 or -1 compensations.

Using 0-based indexing, half-open intervals, and suitable defaults (as Python ended up having), they are beautiful: a[:n] and a[i:i+n]; the former is long for a[0:n].

Using 1-based indexing, if you want a[:n] to mean the first n elements, you either have to use closed intervals or you can use a slice notation that uses start and length as the slice parameters. Using half-open intervals just isn't very elegant when combined with 1-based indexing. Using closed intervals, you'd have to write a[i:i+n-1] for the n items starting at i. So perhaps using the slice length would be more elegant with 1-based indexing? Then you could write a[i:n]. And this is in fact what ABC did — it used a different notation so you could write a@i|n.(See http://homepages.cwi.nl/~steven/abc/qr.html#EXPRESSIONS.)

But how does the index:length convention work out for other use cases? TBH this is where my memory gets fuzzy, but I think I was swayed by the elegance of half-open intervals. Especially the invariant that when two slices are adjacent, the first slice's end index is the second slice's start index is just too beautiful to ignore. For example, suppose you split a string into three parts at indices i and j — the parts would be a[:i], a[i:j], and a[j:].

So that's why Python uses 0-based indexing.