Why Does Struct Datatype Encroach Namespace?

By Xah Lee. Date: . Last updated: .

There's something odd about struct as in Golang, Racket Scheme lisp, C. Normaly, datatype doesn't become a name in namespace, but struct does.

For example, here's struct in golang:

type Circle struct {
  x float64
  y float64
  r float64
}

Notice that now Circle is a new name in namespace.

You use it like this

// declare
var myCir Circle

// set values
myCir.x = 2
myCir.y = 7
myCir.r = 1

or:

// use type inference and literal expression
myCir := Circle{2, 7, 1}

// assign individual field
myCir.x = 4

Struct is basically a list of key/value pairs. It is essentially same as hash table, dictionary, associative list, from a interface point of view.

Now, notice, when you create a hash table (in Python, Perl, JavaScript object), you don't actually create a new datatype. Instead, the language provides the hashtable datatype. You just declare or assign variable to it. For example:

# python
myCir = {"x":2, "y":7, "r":1}
print myCir
// js
var myCir = {"x":2, "y":7, "r":1};
console.log(myCir);

Although there's some difference between struct and hashtable/dictionary, in that struct is fixed number of items, but still, it's curious every struct creates a new type, new name in namespace, even though they are actually the same type, namely, a fixed hashtable.

For example, golang could have:

myCir := {
  x float64 = 2
  y float64 = 7
  r float64 = 1
}

You might think because of the C heritage that golang followed. But Racket Scheme lisp also have struct, also introduce a new name to namespace for every struct defined. Worse, it adds a name for every field of the struct too. If you define a struct with n fields, you get n extra names in namespace. 〔https://docs.racket-lang.org/guide/define-struct.html〕 (Racket is that way may also due to history, from Scheme, and from existing lisps of 1970s.)

The question here is, why is struct that way? Simply out of history, or is there some technical reason in language design, other than social and convention?

Note here, the other thing that introduces a name is function definition/declaration. Perhaps, a possible clean design is to only allow variable declaration to introduce names.

Answer

Yuri Khan gave a excellent answer. (see comment, reposted here)

The type of an array in a statically typed language is relatively simple: it is either “array of N elements of type T”, or “dynamically sized array of elements of type T”. In C, they are spelled as T x[N] and T x[] or T*, respectively. Names can be introduced using typedef but are often deemed unnecessary. Even with nested or multidimensional arrays, the notation is still quite compact.

Structs, on the other hand, can have multiple members of different and complex types. One can imagine a statically typed language where structs have no names by default. In such a hypothetical language, you can say:

var john: record
    name: string;
    birthdate: record
        year: integer;
        month: integer;
        day: integer;
    end;
end;

(This notation is actually valid Pascal.)

Look: we've spent 8 lines just to declare a variable. We'll have to repeat all that boilerplate every time we want to accept a person as a function argument, and we'll have to update every repetition of that when we decide we'd like to also know the gender of our persons. Guess what we do? We assign it a name, to save typing.

Requiring a name for every struct type also makes it possible to introduce name-based type matching: in order for a parameter to be usable with a function, its type must exactly match the type of the argument. On the other hand, with anonymous structs, the compiler has to implement structural matching, where types are deemed equivalent if they are both structs and have fields of the same names and types, memberwise. Moreover, the programmer also has to perform structural matching in their head.

I am not familiar with Racket, but I recognize the problem of field names from Haskell. In Haskell, a record type introduces one name for the type, and one name for each field, and all of that gets dumped into the module namespace. This makes it very inconvenient to have a Point record with fields x and y.

The static syntax of function declaration is a historical artifact of languages which lacked first-class functions, where you couldn't have function-valued variables and constants. I agree that in a functional language one could require the declaration of functions to be written in constant or variable declaration syntax; however, syntactic distinction makes code more readable.

In summary, each struct/record provide a particularly shaped data structure. Each field can have particular type. It is of great convenience, both to compiler and programer, to have a name for that structure, for easy identification of that particular shaped structure (without needing pattern matching), easy testing equality, easy defining a new var with the same struct.

Struct is like Java Class, but without methods. Each java Class definition also introduces a name.

Why didn't the hashtable, dictionary, types also introduce a new name for each? Because, the nature and purpose of dictionary is that, the keys are dynamic and changes all the time. If the keys are fixed, then it becomes a struct/record, and then thus more need to identify them as that particular shaped datatype, so, a name name, or name of the type, is convenient. In other words, each struct, is a structure, or record, kinda used like a fixed type of data base.

Then, the question is, why doesn't Python, Ruby, perl, JavaScript, elisp, not have the struct/record datatype? Probably because, for dynamically typed language, there's less need for a fixed shape structure.

Why does Racket, a dynamic language, has struct then? Probably because, at this point, it's just a matter of choice, a choice that does not necessary have a perfect answer. It can go both ways.

If you have a question, put $5 at patreon and message me.