C For Smarties: Types and values

C is (partly) a strongly typed language.

Some people claim that C is not a strongly typed language [1], but I claim it is -- it is just that C compilers are generally weak about type-checking. In other words, C allows the programmer to be cavalier about its rules. I like to call C a ‘weakly strongly typed’ language. Others might just call it a statically-typed language. In any case, it is important to keep track of types.

The definition of a strongly typed language is that every variable, function call, and constant has some type chosen at compile time and enforced by the compiler.[2] In languages stricter than C, you would not be able to assign a character to an integer variable, or an integer to a floating-point variable; and you would definitely not be allowed to do many of the things you can do with pointers. Still, C does enforce various type rules, as we shall see.

C’s types are grouped into three major divisions: object types, function types, and incomplete types. (It is tempting to lump incomplete types under the object types, but the C standards separate them.) Object types are essentially those used to declare variables, and function types are those used to declare functions. Incomplete types are just a special case of object types, when some important piece or pieces are missing. An incomplete type can usually be ‘completed’ later, filling in the missing bits, as needed.

These types can be grouped several other ways. I prefer to start with the division into ‘basic’ and ‘derived’ types. The basic types are simply those from a list:

char
signed char
unsigned char
short int
unsigned short int
int
unsigned int
long int
unsigned long int
long long int (new in C99)
unsigned long long int (new in C99)
float
double
long double
float complex (new in C99)
double complex (new in C99)
long double complex (new in C99)

Excluding float, double, long double, and the three complexes, these are called the integral types. The three variants of char are also called the character types. The last six of these are the floating-point types, split into ‘real’ and ‘complex’ floating-point types.

Many of the integral types have shorter names, because the keyword int can be omitted after short or long. (In C89, you can omit it in even more cases, but C99 has pretty much eliminated ‘implicit int’.) The type char (sometimes called ‘plain char’) has exactly the same characteristics -- minimum and maximum values, and sizeof -- as either signed char or unsigned char, at the implemention's discretion. Often int is ‘similarly similar’, as it were, to either short or long. Nonetheless, all of these types are considered distinct, and it is an error to mix them (as with pointers, below) even when the characteristics match. (This is one reason C can be considered ‘strongly typed’.)

As a special case, C has enumerated types, which are ‘compatible with’ some integral type, but are nonetheless distinct types. As another special case, C offers the void type, which is just an incomplete type that can never be completed. There are no actual values of type void, which is something of a self-contradiction. (Think of it as the empty set of values. Since all null sets are identical, it really does not matter what value it has, or does not have.)

These basic types are used to build the derived types. The derived types can sometimes also refer to the incomplete and function types. The derived types are:

Structures and unions. These are simply collections of objects. C99 adds a special feature wherein the last member of a structure can be an array whose size is omitted (an incomplete type); otherwise these all have to be complete types. Unions are just a special case of structures, wherein the various members overlap, so that you can only use one of them at a time.
Functions. These are reasonably conventional -- a function has some fixed set of parameters of fixed types, and a return type. For instance, the remove() function in <stdio.h> takes, as its parameter, a file name, and returns a value of type int indicating success or failure. Thus remove() can be said to be a ‘function returning int’, or, in more detail, a ‘function (of one argument of type const char *) returning int. (The const here is a ‘type qualifier’.)
Arrays: an array has a size and an element type. The size is sometimes optional, but the element type is always required. The element type must be an object type -- function and incomplete types are not legal here. If the size is a known constant N and the element type is T, the array can be said to be an ‘array N of T’. If N is not known or is omitted, the array can simply be called an ‘array of T’. (In this case the array itself is an incomplete type.) C99 adds a new wrinkle: array sizes no longer need to be constant.
Pointers: pointers always point to some particular type, called the reference type. Unusually, the reference type here can be any object, function, or incomplete type. The reference type has to be fixed in advance, though, and -- with a few exceptions to be addressed later -- when the reference type is ‘pointer to T’, this pointer can only point to things of type T. For instance, a pointer of type ‘pointer to int’ can only point to ints; attempting to make one point to a double is not a good idea.

Pointers can be confusing, because C's are so unusual and have a strong relationship with C's arrays. We will see much more about this later.

Values

All values are typed. Constants have a type that is determined by their syntax and value. The most obvious are the simple integral and floating-point constants, such as 42 or 3.14159. The constant 42 has type int, and the constant 3.14159 has type double. Very large constants will automatically use a larger type than int if needed, and integral and floating-point constants can be suffixed to give them an alternative type. Hexadecimal and octal constants use slightly different rules than decimal constants. Thus, 2.71828F has type float instead of double, while 0xFFUL has type unsigned long int. (The complete list of rules can be found elsewhere[insert link]).

In a departure from most other similar languages (even including C++), character constants like 'a' have type int.

The set of operations allowed on any value, and the meaning of those operations, are determined entirely by the type of that value. (This is the other main requirement to call C a strongly typed language.) For instance, the precise action of the right-shift operation >> may change depending on whether the value being shifted is signed or unsigned. This is why it is crucial to keep track of the type of each value. If you do not know the type as well as the value, you cannot predict the outcome of the operation.

For this reason, when analyzing any particular expression in C, you should write down both the type and the value of each sub-expression. I like to write them in pairs, enclosed in angle brackets:

42 is an <int, 42>
3.14159F is a <float, 3.14159>

Later we will add a third element to these, so as to distinguish between objects and values.

Conversions

C offers relatively free conversions. These change not only the type but also the value. Often the underlying machine's bit patterns, or even size in bytes, change in the process. Changing 3.14, a double, to an int drops the fractional part. If sizeof(double) is 8 and sizeof(int) is 2 or 4, changing one to the other even changes the number of bytes.

Many of C's conversions will happen automatically. That is, given something like:

int i = 3;
double d = 3.14159;

the assignments i = d; and d = i; are entirely legal, and cause an automatic conversion. Since character constants have type int, even something simple like:

char c = 'A';

contains an automatic conversion. Conversions are thus quite ordinary and frequent (which is one argument some use to say that C is not strongly typed).

A cast is a syntactic construct that forces a conversion. The syntax is a type-name enclosed in parentheses, and the result of a cast is a new value with a new type. The new type is just the type-name inside the parentheses, and usually, the new value is just the value that would result from an automatic conversion. That is:

(double)i

converts the value in i to type double, just as if it had been assigned to d. There are, however, any number of conversions that are at least questionable, if not outright wrong. For instance, converting a value of type ‘pointer to char’ to a new value of type ‘pointer to int’ is not guaranteed to do anything useful. Such conversions will not happen automatically -- if you create a situation in which one would have to happen automatically, the compiler must produce a diagnostic [xref to diagnostics]. The cast is a ‘more powerful’ conversion, in effect telling the compiler: just shut up and do this, even though it is fraught with peril.

This leads to a general rule: Be suspicious of any cast, especially pointer casts. Unfortunately, some C compilers are unnecessarily picky about converting from, e.g., int to short, producing warnings if the programmer does not shut them up with a cast. Equally unfortunately, many C programs contain questionable -- or outright wrong -- pointer conversions, and many C programmers have learned to disable warnings about them by sprinkling casts in liberally. After all, one typical diagnostic is ‘warning: integer to pointer conversion without a cast’, practically begging the programmer to insert a cast blindly, rather than figuring out why the compiler thinks one of the values involved is an integer.

The void * weirdness

Before C89, C did not have any special pointer cases. This led to a problem: there was no ‘generic’ pointer type, yet a function like malloc() had to return a valid pointer without knowing what type it might be. The solution in K&R-1 C was to return a value of type char *, counting on the ability to decompose everything into bytes. Unfortunately, that meant that most uses of malloc() required a cast. ANSI C removed the need for this cast by adding a new generic pointer type. Logically this should be an ‘anyptr’ or some such spelling, but the ANSI committee did not want to add a new keyword. Since the void type is already special -- meaning ‘nothing’ -- they used ‘pointer to void’, or void *, to denote a generic pointer. Instead of ‘pointing to nothing’, this ‘points to anything’.

What makes this particularly weird is that void ** cannot point to just any pointer. The special type void * works as a generic pointer, converting freely to or from any other pointer type, but it is a unique type -- an ‘anyptr’, as it were -- in and of itself. The type void ** can only point to these special pointers. Even though void * can point to any other type (including another void *), void ** can only point to void *.

back

[1] In fact, Kernighan and Ritchie themselves claim, in the original White Book, that ‘C is not a strongly typed language in the sense of Pascal or Algol-68’ (p. 3). However, they use the reasoning that ‘where strong type checking is desirable, a separate version of the compiler is used. This program is called lint ...’. Modern compilers, especially those with C99 features, often produce most or all of the diagnostics that the original lint program did. If lint was a strongly-typed C ‘compiler’, and a modern C compiler does what lint did, then C itself must be strongly typed.
[2] It is worth noting that the term ‘strongly typed’ is not all that well-defined in the first place; different programming texts have different definitions. Many would call this ‘static typing’, to contrast with ‘dynamic typing’ where a variable name acquires a new type based on the most recent assignment to it. The idea here, though, is not to pin down the terminology, but rather to emphasize that C expressions always have types.