The definition of a strongly typed language is that every variable, function call, and constant has some type chosen at compile time and enforced by the compiler.[2] In languages stricter than C, you would not be able to assign a character to an integer variable, or an integer to a floating-point variable; and you would definitely not be allowed to do many of the things you can do with pointers. Still, C does enforce various type rules, as we shall see.
C’s types are grouped into three major
divisions: object types, function types, and
incomplete types.
(It is tempting to lump incomplete types under the object types,
but the C standards separate them.)
Object types are essentially those used to declare variables, and function
types are those used to declare functions. Incomplete types are just
a special case of object types, when some important piece or pieces are
missing. An incomplete type can usually be ‘completed’ later, filling
in the missing bits, as needed.
These types can be grouped several other ways. I prefer to start
with the division into ‘basic’ and ‘derived’ types. The basic types
are simply those from a list:
Many of the integral types have shorter names, because the keyword int
can be omitted after short or long.
(In C89, you can omit it in even more cases, but C99 has pretty much eliminated
‘implicit int’.) The type char
(sometimes called ‘plain char’) has exactly the same characteristics --
minimum and maximum values, and sizeof
-- as either signed char or unsigned
char, at the implemention's discretion. Often int
is ‘similarly similar’, as it were, to either short
or long. Nonetheless, all
of these types are considered distinct, and it is an error to mix them
(as with pointers, below) even when the characteristics match. (This
is one reason C can be considered ‘strongly typed’.)
As a special case, C has enumerated types, which are ‘compatible
with’ some integral type, but are nonetheless distinct types. As
another special case, C offers the void
type, which is just an incomplete type that can never be completed.
There are no actual values of type void,
which is something of a self-contradiction. (Think of it as the empty
set of values. Since all null sets are identical, it really does
not matter what value it has, or does not have.)
These basic types are used to build the derived types. The derived
types can sometimes also refer to the incomplete and function types.
The derived types are:
In a departure from most other similar languages (even including C++),
character constants like 'a' have
type int.
The set of operations allowed on any value, and the meaning of those
operations, are determined entirely by the type of that value. (This
is the other main requirement to call C a strongly typed language.)
For instance, the precise action of the right-shift operation
>> may change depending
on whether the value being shifted is signed or unsigned.
This is why it is crucial to keep track of the type of each value.
If
you do not know the type as well as the value, you cannot predict the outcome
of the operation.
For this reason, when analyzing any particular expression in C, you
should write down both the type and the value of each sub-expression.
I like to write them in pairs, enclosed in angle brackets:
Many of C's conversions will happen automatically. That is, given
something like:
A cast is a syntactic construct that forces a conversion.
The syntax is a type-name enclosed in parentheses, and the result of a
cast is a new value with a new type. The new type is just the type-name
inside the parentheses, and usually, the new value is just the value that
would result from an automatic conversion. That is:
This leads to a general rule: Be suspicious of any cast, especially
pointer casts. Unfortunately, some C compilers are unnecessarily
picky about converting from, e.g., int to short, producing warnings if
the programmer does not shut them up with a cast. Equally unfortunately,
many C programs contain questionable -- or outright wrong -- pointer conversions,
and many C programmers have learned to disable warnings about them by sprinkling
casts in liberally. After all, one typical diagnostic is ‘warning:
integer to pointer conversion without a cast’, practically begging the
programmer to insert a cast blindly, rather than figuring out why the compiler
thinks one of the values involved is an integer.
What makes this particularly weird is that
void **
cannot point to just any pointer. The special type
void
* works as a generic pointer, converting freely to or from any other
pointer type, but it is a unique type -- an ‘anyptr’, as it were -- in
and of itself. The type void **
can only point to these special pointers. Even though void
* can point to any other type (including another void
*), void ** can only point
to void *.
char
Excluding float,
double, long double,
and the three complexes,
these are called the integral types. The three variants of
char
are also called the character types. The last six of these
are the floating-point types, split into ‘real’ and
‘complex’ floating-point types.
signed char
unsigned char
short int
unsigned short int
int
unsigned int
long int
unsigned long int
long long int (new in C99)
unsigned long long int (new
in C99)
float
double
long double
float complex (new in C99)
double complex (new in C99)
long double complex (new in C99)
Pointers can be confusing, because C's are so unusual and have a strong
relationship with C's arrays. We will see much more about this later.
Values
All values are typed. Constants have a type that is determined by
their syntax and value. The most obvious are the simple integral
and floating-point constants, such as 42
or 3.14159. The constant
42
has type int, and the constant
3.14159
has type
double. Very large
constants will automatically use a larger type than int
if needed, and integral and floating-point constants can be suffixed to
give them an alternative type. Hexadecimal and octal constants use
slightly different rules than decimal constants. Thus, 2.71828F
has type float instead of double,
while 0xFFUL has type unsigned
long int. (The complete list of rules can be found elsewhere[insert
link]).
42 is an <int,
42>
Later we will add a third element to these, so as to distinguish between
objects and values.
3.14159F is a <float,
3.14159>
Conversions
C offers relatively free conversions. These change not only the type
but also the value. Often the underlying machine's bit patterns,
or even size in bytes, change in the process. Changing 3.14, a double,
to an int drops the fractional
part. If sizeof(double) is
8 and sizeof(int) is 2 or 4, changing
one to the other even changes the number of bytes.
int i = 3;
the assignments i = d; and d
= i; are entirely legal, and cause an automatic conversion.
Since character constants have type int,
even something simple like:
double d = 3.14159;char c = 'A';
contains an automatic conversion. Conversions are thus quite ordinary
and frequent (which is one argument some use to say that C is not strongly
typed).
(double)i
converts the value in i to type
double,
just as if it had been assigned to d.
There are, however, any number of conversions that are at least questionable,
if not outright wrong. For instance, converting a value of type ‘pointer
to char’ to a new value of type
‘pointer to int’ is not guaranteed
to do anything useful. Such conversions will not happen automatically
-- if you create a situation in which one would have to happen automatically,
the compiler must produce a diagnostic [xref to diagnostics]. The
cast is a ‘more powerful’ conversion, in effect telling the compiler: just
shut up and do this, even though it is fraught with peril.
The void * weirdness
Before C89, C did not have any special pointer cases. This led to
a problem: there was no ‘generic’ pointer type, yet a function like malloc()
had to return a valid pointer without knowing what type it might be.
The solution in K&R-1 C was to return a value of type char
*, counting on the ability to decompose everything into bytes.
Unfortunately, that meant that most uses of malloc()
required a cast. ANSI C removed the need for this cast by adding
a new generic pointer type. Logically this should be an ‘anyptr’
or some such spelling, but the ANSI committee did not want to add a new
keyword. Since the void type
is already special -- meaning ‘nothing’ -- they used ‘pointer to void’,
or void *, to denote a generic
pointer. Instead of ‘pointing to nothing’, this ‘points to anything’.
[1] In fact, Kernighan and Ritchie themselves claim,
in the original White Book, that ‘C is not a strongly typed language in
the sense of Pascal or Algol-68’ (p. 3). However, they use the reasoning
that ‘where strong type checking is desirable, a separate version of the
compiler is used. This program is called lint ...’.
Modern compilers, especially those with C99 features, often produce most
or all of the diagnostics that the original lint program did.
If lint was a strongly-typed C ‘compiler’, and a modern C compiler does
what lint did, then C itself must be strongly typed.
[2] It is worth noting that the term ‘strongly typed’
is not all that well-defined in the first place; different programming
texts have different definitions. Many would call this ‘static
typing’, to contrast with ‘dynamic typing’ where a variable name
acquires a new type based on the most recent assignment to it.
The idea here, though, is not to pin down the terminology, but rather
to emphasize that C expressions always have types.