The C language divides expressions into two major categories, which I call ‘objects’ and ‘values’. (The ANSI/ISO C Standard uses the terms ‘lvalue’ and ‘value’ respectively. Many computer scientists use the name ‘rvalue’ for the latter. Technically, an lvaue ‘names’ an object, while an rvalue is what most of us think of as an ordinary value.) An object is, in effect, a place to store values. Ordinary variables are the simplest examples of objects, but not the only ones.
A C value is actually a pair of things, because each C value carries a type as well. Values are ephemeral: they only last for the duration of a single expression, and then they vanish. If a particular value is to be of any use, it must be saved somewhere (or printed, which simply amounts to saving it in the user's brain instead of in the computer's memory). Objects, on the other hand, last as long as their object lifetime (which may be anything from a few lines of code to the entire program run).
Object also have types, and in general, any given object can only hold values that have types compatible with that object's type. Automatic objects that have never been assigned a value will contain garbage, and C allows the garbage to be arbitrarily poisonous. The same holds for objects allocated with malloc(). For this reason, it is important to be sure that any particular object has been given a value before attempting to look at its value. Note that static objects always have an initial value, even if you did not assign one.
A side note on garbage
Some systems always zero out ‘fresh’ memory. On these systems, your uninitialized variables may all start out as zero or NULL. This is not a property of the language; it is merely your system being helpful and/or making sure your program does not have access to sensitive information (such as passwords) that was left in memory by a previous program.
the machine finds the value of y, and sticks that in x. In general, the value of an object is pretty obvious. Its type is determined by the type of the object, and its actual value is whatever you last stored in it. This is not the case for arrays.x = y;
One Rule to ring them allOf course, in order to talk about any value, we need to know its type -- a pointer to int is quite different from a pointer to double. Hence, The Rule, which determines both for an array object:
One Rule to find them
One Rule to point them all
And in the language bind them
-- apologies to J. R. R. Tolkien
In any value context, an object of type ‘array of T’ is converted to a value of type ‘pointer to T’, pointing to the first element of that array, i.e., the one with subscript 0.Here T is any valid array-element type. Valid types include array types themselves; it is possible to have an object of type ‘array 3 of array 4 of float’. The Rule turns this into a value of type ‘pointer to array 4 of float’ -- not float **, but rather float (*)[4]. The Rule only applies once, because after that, you no longer have an object. In order to get The Rule to fire again, you have to turn the new value into another object.
(Note that the size of the array, even if it is known, disappears once The Rule is finished. This is why it can often be eliminated or ignored.)
Structure and union objects have structure and union values, and pointers have pointer values. (Functions have return values, of course, but functions are not objects. In fact, attempting to find the ‘value’ of a function without calling it results in an effect similar to The Rule about arrays and pointers.)
Note that C always passes all parameters by value. Since the ‘value’ of an array is actually a pointer to the array's first element, it follows that no C parameter has array type. Thus, when you declare a function as, e.g.:
you are actually declaring that f() takes a parameter of type ‘pointer to char’, rather than an array (of unspecified size of) char. This ‘type rewrite’ rule will be discussed more later.void f(char s[]);
As noted above, in a simple ordinary assignment like x = y, the left-hand side of the assignment operator is in ‘object context’. This is how, even if the value of x is 3 before the assignment, the compiler knows to set x instead of ‘setting 3’ (which obviously makes no sense). C has quite a few operators that demand an object context, including all the assignment operators like += and the increment and decrement operators ++ and --. Since all of these need to change some object's value, they have to find the object, not its value.
There are two other places where an expression is in object context. The unary & (address-of) operator finds the address of some object. To do so, it needs the object, rather than its value. That means that if arr is an array, &arr has arr in object context, and The Rule does not apply. The last special case is the sizeof operator. It too uses object context, so that it can find the size of the object.
Knowing The Rule, it is easy to see that sizeof has object context, and that arrays and pointers are very different, by running a simple example program:
This will almost always print two different numbers for the first two lines of output. (If it prints two identical numbers, you probably have a broken compiler. It is possible that five ints happen to be exactly the same size as one ‘pointer to array 5 of int’. In that case, changing the 5 to some other constant should produce two different numbers. In the past, some broken compilers have applied The Rule to array names that follow the sizeof operator.)#include <stdio.h> int main(void) { int a[5]; printf("sizeof a = %lu\n", (unsigned long)sizeof a); printf("sizeof &a = %lu\n", (unsigned long)sizeof &a); printf("sizeof a+0 = %lu\n", (unsigned long)sizeof (a+0)); return 0; }
The first sizeof operator finds the size of the entire array object, without applying The Rule. The second sizeof operator finds the size of the result of applying the unary & operator, i.e., the size of a pointer to one ‘array 5 of int’. The final line, though, uses the addition operator + to add nothing to the value of the array. Since the ‘value’ of the array is a pointer to its first element, the expression (a+0) first has to find the value -- a pointer to int -- and the sizeof operator then prints the size of a pointer to one int.
It is quite likely that the last two numbers the above program printed are identical. That is, the size of a pointer to the entire array is probably the same as the size of a pointer to the first element of the array. In fact, most modern machines really only have one, or sometimes two, sizes of pointer. A number of older machines had many different ‘flavors’ of pointer, and C compilers for those machines would use them all; on those machines, it might be easier to tell that &a and (a+0) have different types. On modern machines, however, you have to resort to another method to see this.
The easiest is simply to observe the diagnostics, or lack thereof, produced for correct and incorrect programs. A correct program such as:
does not require any diagnostics. On the other hand, if you switch the unary & operator from the first assignment to the second, the program violates a constraint, and requires a diagnostic (and a good compiler should produce two diagnostics, one for each erronous assignment).#include <stdio.h> int main(void) { int a[5]; int (*p1)[5]; int *p2; p1 = &a; p2 = a; /* same as (a+0) -- is on the right hand side */ return 0; }
Another method is to observe the effect of pointer arithmetic. That, however, is tricky to do portably; we leave this for later.
Suppose you have some declarations:
Here the variable i has type int, ip has type ‘pointer to int’, a1 has type ‘array 5 of int’, a2 has type ‘array 3 of array 5 of int’, and ‘ap’ has type ‘pointer to array 5 of int’.int i; int *ip; int aone[5]; int atwo[3][5]; int (*ap)[5];
The first line is pretty simple. On the left is the variable name i. This is an:i = 42; ip = aone; *ip = i; ap = atwo;
<object, int, i>Likewise, on the right is the integer constant 42, which is:
<value, int, 42>The = (assignment) operator demands an object on its left hand side, and a value on its right. This is exactly what it has. It then converts the value on the right (42) to the required type if needed. In this case, 42 is already the right type, so nothing interesting happens. Finally, it assigns the value to the object -- so i becomes 42.
The second line is not really difficult either. The left and right sides of the assignment operator are, respectively:
<object, pointer to int, ip>Of course, the assignment operator needs a value on the right -- so now it is time to find the ‘value’ of aone, i.e., to apply The Rule.
<object, array 5 of int, aone>
The Rule drops the size of the array (5) and considers only the element type (int). The ‘array of T’ becomes a ‘pointer to T’, pointing to the first element of that array, i.e., the one with subscript 0. Thus, the <object, array 5 of int, aone> becomes a <value, pointer to int, &aone[0]>. Now the left and right sides have the right form, and once again, they also have the right types. The assignment proceeds to set ip to point to aone[0].
The third line is a little more complicated, and more interesting. The left hand side of the assignment operator is, itself, an expression. Before you can figure out what the assignment does or means, you have to work out this sub-expression.
The sub-expression consists of the prefix unary * (‘indirection’) operator, and the variable named ip. The indirection operator demands a value, and that value has to have some pointer type. But ip is not a value; it is an <object, pointer to int, ip>. Pointers are not magic after all -- they have values, just like any other ordinary variable, and the value of a pointer is just its value. We just set that value a moment ago: it points to aone[0]. Thus, this <object> becomes a <value, pointer to int, &aone[0]>.
Now the unary * operator has what it needs, a value of type ‘pointer to T’. The indirection operator simply follows this pointer -- which had better not be garbage or NULL -- to find the object to which it points. The result of the indirection is an <object> and has type T. The name of that object can be a bit problematic, but in this case, we know it is just aone[0], so the final result of this * operator is <object, int, aone[0]>. (We could also call this <object, int, *ip>.)
The right hand side of that line -- *ip = i; -- is of course just the triple <object, int, i>. The left hand side is an object; the value of the right hand side is the value of i, i.e., the pair <int, 42>; so the assignment sets *ip (i.e., aone[0]) to 42.
The last line is left as an exercise.
The C language makes a fairly strong promise about representations: if you take the address of any ordinary object obj, convert that pointer to one of type unsigned char *, and print out sizeof obj bytes, this will print all of the representation bits in that object, along with any ‘padding’ bits. The pointer conversion and subsequent indirection are the ‘trick’ that gets the compiler to interpret the representation bits as unsigned chars.
This trick can be quite useful, but it is also quite limited. The problem is that it shows you a representation of your value, as stored in one particular object. This does not have to be the representation, even on the one machine on which you try it. For instance, some machines have a scary number of representations of the double value 0.0 (e.g., 4,503,599,627,370,496 possible ways to represent 0.0). Worse, the representation on one machine may bear little or no resemblance to that on another.
The things to remember about representations, then, are: