Pointers: Are They Just Numbers?
Pointers Versus Numbers
There is a discussion of numbers and representations, both integer and
floating-point, in this page. If you
have not read it, you should at least skim it now, with particular
attention to the multiple possible integer representations and the IEEE
floating point representation.
All C pointers always include a type, as described here.
At a sufficiently low machine level, most systems have just a few
underlying hardware pointer types: sometimes as few as two (one for
data and one for code), or even just one. In C, however, every
pointer value is at least associated with the type of the object(s) to
which it can point. A machine is allowed (but of course not
required) to make every different pointer type use a different
underlying representation and/or number of bytes in memory. This
complicates the issue, at least conceptually; but we can still ask
the question once, for each different pointer type: is a value now stored
in this object, of type ‘pointer to T’ for some type T,
just a number, or is it something more complicated?
Fundamentally, in C, a valid pointer gives you two separate pieces of
information: the type of the object to which it points, and the
location of that object. In that sense, the type alone makes
it ‘something more complicated’. But let us pretend that
this is not an issue. As mentioned here,
any object can be decomposed into bytes. We can use this trick to
store a pointer value in a variable (i.e., an object) with the correct
type, then decompose that object into bytes. This allows us to
inspect the actual representation of any given pointer value.
In this sense, then, pointers are
numbers: a pointer value, once stored in memory, has some underlying
bit pattern. This bit pattern can be interpreted as some sort of
integer, producing some sort of number. But this is where things
get tricky: that is merely an
interpretation, not necessarily ‘the’ interpretation to use.
Suppose, for instance, that pointers happen to be exactly 32 bits long
on a given machine, and further, that the machine has 8-bit bytes and
is little-endian. We can stuff a pointer value into memory, then
extract the four bytes and compose a 32-bit integer:
This code will show you the ‘integer value’ of the 32-bit pointer
stored in ip. But is this the actual
value of ip, or just some interpretation? What if the bits making
up the pointer are actually a 32-bit IEEE single-precision
floating-point number? In this case, to find the ‘true’ value, we
have to look at the sign, mantissa, and exponent bits. A value
that we printed out as (say) 1081081856 might be more correctly printed
int *ip = malloc(20 * sizeof *ip);
unsigned char *ucp;
unsigned long l;
if (ip == NULL)
panic("out of memory");
ucp = (unsigned char *)&ip;
l = ucp;
l |= (unsigned long)ucp << 8;
l |= (unsigned long)ucp << 16;
l |= (unsigned long)ucp << 24;
printf("ip = %p; as an integer, %lu\n", (void *)ip, l);
Of course, on this particular machine, it probably really is
1081081856—or 0x40700000. But what if pointer values are in fact
structured, similar to (but not the same way as) floating-point values?
In particular, suppose that this machine has ‘segments’ that can be
mapped in and out, and pointers consist of a 12 bit segment number
plus 20 bits of offset-within-segment. In this case, on this
machine, the segment number is 0x407, and the offset is 0.
Putting these two together gives us the 32-bit number 0x40700000.
Here is where things get particularly sneaky.
On this machine, each segment can be marked invalid. If we call free(ip), the
underlying system will mark segment 0x407 invalid. An attempt to
use the pointer value will then trap (because this machine, unlike
most, is actually designed to catch errors, instead of producing the
wrong answer as fast as possible). But now that segment 0x407 is
invalid, what is the value of this pointer? If we ask the C
compiler to print it directly, it may load it into a pointer register
on the machine, and this may look up the segment number, see that it is
invalid, and trap at runtime:
The output never occurs, because the attempt to load the value to send
to printf() traps. A later call to malloc() may make segment
0x407 valid again, but map
it to a different part of RAM, so that the same pointer (0x40700000)
now points to different memory.
printf("ip = %p\n", (void *)ip);
On this machine, in other words, the representation
stored in the pointer—the bit pattern in memory—never changes.
What changes is the value
thus represented. The value is constructed, at least in part, by
looking up (part of) the representation in a separate table. Whenever
the table changes, so does the value stored in the pointer.
Most machines today do not do this sort of thing.
It was more
common on older machines. C does, however, allow it; so if you
want to write strictly portable C code—guaranteeing that your code will
work on a future machine—you should avoid inspecting the values of
invalid pointers. They may not mean what you expect, and in some
cases, the value itself may not even exist.
 Well, not visibly anyway: paging techniques do all
of this, but keep it
hidden from ordinary programmers.