Scope, linkage, and name spaces
Storage-class, duration, and storage-class-specifiers
Note that this gets confusing, because there are a number of fairly
‘deep’ concepts that are all mixed together here.
First, scope, linkage, and namespace are properties of identifiers.
Identifiers
Identifiers and their name spaces
Identifiers start with an alphabetic character (or an underscore, but
most such names are reserved) and continue with more alphabetic
characters or digits. Identifiers name one of these six items:
- an object (see object vs value)
- a function
- a tag or member (of a struct, union, or enum)
- a typedef-name
- a label (for goto)
- a macro name or parameter (in the preprocessor [link])
Several of these have their own ‘name
spaces’, which means that you can use the same name for more than
one entity and the language will ‘know what you mean’. For
instance, you can have a label named top in a function
named top; a
reference to top()
calls the function,
while goto top
refers to the label. The compiler can distinguish between these
because of the syntax of the language: a label always comes right
before a colon, or right after the goto keyword.
There are two ‘special’ name spaces for goto-labels and tags
respectively. In addition, every time you create
a new type with struct
or union, its
members occupy a newly-created namespace, specific to that type.
Hence if x and y have different
structure types, and value
is a valid
member of x, value may or may not
be a valid member of y.
(In a very ancient version of C, long predating the C89 standard, there
was only one namespace for all
structure types, so every member name had to be unique. This is
one reason that some traditional C code will use names like st_size, rather than
just size, for
members of a struct
stat: the prefix and underscore distinguished this st_size from the i_size attached to a struct inode.
Prefix disambiguators like this are no longer necessary, although some
programmers like them stylistically.)
All other identifiers, including the members of an enum, fall into the
‘ordinary’ name space. Names that are in separate name spaces are
different, even if they are spelled the same, as with the goto-label
example above. Thus, whenever we ask sensible questions that
decide if two names refer to the same entity, we are assuming that they
are in the same name-space. Since most identifiers are in the
‘ordinary’ name space, and all other cases are obvious from syntax, it
is easy to tell if this assumption is warranted.
Scopes
The scope of an identifier
is, roughly, ‘where you can see it’. The C language is defined
such that compilers rarely need to ‘look ahead’ or ‘look back’, so
scopes start at the point the identifier appears, and are decided based
on when the identifier stops
being in scope, rather than where it first starts being in scope. An
identifier that is otherwise ‘in scope’ can also be obscured, as
described in a moment. When it is obscured, we say it is ‘not in
scope’, even though it will re-appear as soon as whatever is hiding it
is gotten out of the way.
There are four possible scopes:
- File: an identifier with
file scope can be seen anywhere in the translation unit [insert link to
terms] (from the point
at which it is defined, of course).
- Function: an identifier
with function scope can be seen anywhere within the current
function. This scope is used only for goto labels.
- Block: an identifier
with block scope can be seen only until the current {}-delimited block
ends.
- Function prototype: this
special scope exists solely for function prototypes (more about those
later).
Two identifiers have the same scope if and only if their scopes
terminate at the same point. (This is just another way to say
that scopes are decided based on when the identifier ‘stops being in
scope’.) Note that block and function scope only occur within
functions, and function prototype scope only occurs within prototypes;
so any identifier outside a function, or the name of any function
itself, must necessarily have file scope. The latter occurs
because functions cannot be defined within other functions, only at
file scope.
Except for goto-labels with their special function scope, identifiers
that are defined within a block always have block scope. That
scope ends at the closing brace that ends the block. (The new for-loop scope in C99
is also a block scope, although I am not yet sure where it ends—I need
to scour the C99 Standard. The tricky case is illustrated by this
code fragment:
for (int i = 0; i < 10; i++) { int i = 4; }
Is that inner i
allowed? What if the braces are removed? One should avoid
this anyway, if only for style reasons.)
Preprocessor macro names have file scope, but are slightly peculiar for
historical reasons. In particular, you can erase them using a #undef directive.
Linkage
Name spaces and scopes are quite abstract, but linkage starts to let
the ‘real world’ in. The essential purpose of linkage is to
decide whether two names in two
separate translation units refer to the same object or function: does a
call to run()
in baseball.c
call the function run()
in nylons.c, or
are runs in nylons different from runs in baseball? Is the
variable score
in the first file the same or different from one of the same name
in some other file?
To answer this, we look at the linkage
of the identifiers in question. There are three possible
linkages: external, internal, and ‘none’. Note that these are still properties of
identifiers, even though the whole point is to answer questions about
objects and functions. At the very end of the compilation
sequence, linkage is ‘resolved’ by the last phase of the
compiler. On most systems, this is actually a completely separate
program called a linker (or link-editor, or linking loader, or loader,
or a number of similar names). The linker, which connects up
(links) separately-compiled files, is usually only handed the
identifiers with ‘external’ linkage.
Identifiers with ‘no linkage’ are easy: since they have no linkage, the
only interesting properties remaining are namespace and scope. If
two identifiers with identical spelling have the same scope (and of
course namespace) and no linkage, they refer to the same thing.
If they have different scopes, they refer to different things.
Thus, in the following code fragment:
int f(void) {
int i; /* first i */
for (i = 0; i < 10; i++) {
printf("start trip %d: ", i); /* first i */
{
int i = somefunc(); /* second i */
printf("got %d...", i); /* second i */
} /* end 2nd i */
printf("end trip %d\n", i); /* back to first i */
}
} /* end 1st i */
there are two different things both named i, both of which have
block scope. The first i’s scope ends at the
end of the function (the last }), and the second
(inner) i has
scope that ends at the first }, so these are
different ‘i’s.
Note that the inner (second) i obscures the outer
one.
Identifiers with internal linkage are contained within a given
translation unit. Since the linker generally never sees them,
these names cannot cross translation-unit boundaries. If the run()
in baseball.c
has internal linkage, and the run()
in nylons.c
also has internal linkage, they must be different entities, even though
they have the same name.
Identifiers with external linkage are linked if they match up. In
this case, there should be just one definition for the name. If
the run()
in baseball.c
has external linkage, and the function run()
in nylons.c
also has external linkage, and you attempt to link the two translation
units, the compiler should produce a diagnostic.
What if the identifier has internal linkage in one translation unit,
and external linkage in another? This is OK: the internal-linkage
version is available within that one translation unit, and not
elsewhere. The external-linkage version is available in all the
other translation units, or rather, all others that do not also have
their own internal-linkage version of that identifier.
The only situation that goes wrong is when an identifier appears in a
single translation unit with both internal and external linkage.
In this case, the effect is undefined. Some linkers connect the
identifiers seemingly at random (with the actual behavior depending on
the order symbols have in memory and/or linearized data structures);
others behave in more predictable ways, but the situation is best
avoided. A good compiler can detect when this situation
arises, and complain, but compilers are not required to do this.
(We will have an example of a bad situation later.)
Storage durations of objects
We now jump from identifiers, which have the name-space, scope, and
linkage properties described above, to objects, which have
storage-duration. The reason for this odd jump should become
clear in a moment.
There are three storage durations: automatic,
static, and allocated. These are more
fully described elsewhere. When
an object is created by an ordinary variable declaration, it always has
either automatic or static duration. Such a declaration consists
of a type-specifier, followed by optional syntactic bits that act as
modifiers, with an identifier embedded in those modifiers. The
default storage duration depends on the scope of the identifier.
If the identifier has block scope, the object will have automatic
storage duration by default, but you can give it static duration.
If the identifier has file scope, the object will have static storage
duration—automatic duration is not even an option. (C has an auto keyword, but
using it here just gets you a diagnostic. In fact, this keyword
is entirely useless, as we will see.)
Definition vs declaration
At this point, we also need an aside on
the difference between an object or function definition, and a mere declaration. I intend to
write more about this, but for now, let me just say that the heart of
the difference is the same as that between saying that something
exists, and actually pointing it out. For instance, I can declare
that a particular green-eggs-and-ham book exists, but I have not really
defined it until I tell you that it is the one by Dr Seuss. Or in
C, one might write something like extern int GreenEggCount;
to declare that the variable exists, while writing int GreenEggCount = 2;
not only declares it, but also defines it and gives it an initial value
of 2. Note that, in C, a definition always gives you an implied
declaration as well, but the reverse is not true.
A definition that assigns a value, as in this example, is a sort of
‘definite definition’: the variable GreenEggCount is
defined and set to 2. If we omit the initial value, however, the
definition becomes a so-called tentative
definition, in which the variable is held in a sort of
quasi-defined state while the compiler works on the rest of the
translation unit. If the compiler comes across another, more
definite, definition, the variable is defined at that point. But
if the compiler reaches the end of the translation unit, and no
definite definition has happened yet, the tentative definition is made
into a real definition, and the initial value for the variable is zero.
The static
keyword is the obvious way to select static duration for an object that
would otherwise have automatic duration—and indeed, this is exactly how
one does this in C. But for an object that already has static
duration, the keyword would be useless. This is where C gets too
clever for its own good. Now we must jump back to identifier
properties, and in particular, linkage.
Linkage of objects and functions
Using the static
keyword would seem pointless for an object that already has static
duration. Functions do not—at least not technically—have a
storage duration at all, but one can say that all functions have
‘static duration’ in the sense that they exist by the time the program
enters its main()
function. It would, however, make sense to have a keyword or
two—perhaps something like public and private—to mark
objects and functions as having external and internal linkage
respectively. But this is not the ‘C Way’. The C Way asks:
why have two more keywords when we can just overload the one?
Since objects defined outside functions always have static storage
duration, and functions effectively have static duration, we can use
the static
keyword to mark these identifiers as having internal linkage.
Then, in yet another twist, C’s extern keyword, which
seems like an obvious way to mark identifiers as having external
linkage, has a bizarrely complicated meaning.
What extern
does is this:
- If the identifier is already in scope, look up its current
linkage.
- If that linkage is none, give the new declaration external
linkage.
- Otherwise (that linkage is either internal or external), give
the new declaration the same linkage as the previous visible
declaration.
- Otherwise (no previous declaration is visible), give the new
declaration external linkage.
- Then, in any case, suppress any ‘tentative definition’.
That is, if some variable(s) being declared extern would have
been tentatively defined, the definition-ness should be suppressed
entirely, so that this is a mere declaration. But if they are
being defininitely defined, with an initial value, the defining happens
anyway. Only tentative
definitions are suppressed.
This definition-suppression effect only applies to objects (ordinary
variables), not functions.
Storage class specifiers
With all of the above out of the way, we can finally talk about storage class specifiers, which are
really just five keywords:
- auto
- extern
- register
- static
- typedef
The typedef
keyword is yet another special case, which is best ignored at this
point. Doing so leaves us with four keywords, three of which have
already been mentioned. That leaves the register keyword,
which can be viewed as a special case of the (useless) auto keyword: both
can only be applied to block-scope identifiers for objects, and both
cause those objects to have automatic duration, which they already
have. The register
keyword does have one special effect, though: any object declared using
this keyword cannot have its address taken with the unary & operator.
(Also, in some particularly dumb compilers, or compilers told not to
put any work into optimization, the keyword can be used to attempt some
manual optimization. In general, though, it works better just to
turn on optimization in a good compiler.)
We can put all of this together to make up a table that describes the
effect of each keyword. Note that only one keyword is allowed for
any given declaration (even if the declaration declares multiple
identifiers), auto
and register
cannot be applied to functions, and the question about definition-ness
only applies to variables, not functions.
Effects of storage-class specifier keywords
keyword
|
duration
|
linkage
|
definition?
|
auto
(file scope)
auto
(block scope)
|
error
automatic
|
error
none
|
error
yes
|
extern (file scope)
extern
(block scope)
|
static
static
|
varies
varies
|
tentative is
suppressed
no (cannot initialize)
|
register (file scope)
register
(block scope)
|
error
automatic
|
error
none
|
error
yes
|
static (file scope)
static
(block scope)
|
static
static
|
internal
none
|
tentative or
yes
yes
|
–none–
(file scope)
–none– (block scope)
|
static
automatic
|
varies [see
note below]
none
|
tentative or
yes
yes
|
In general, using the extern
keyword has no effect on linkage; its real purpose is to suppress
tentative definitions. There is one exception though: for no
particularly good reason, if you omit the extern keyword when
declaring a function, the
linkage is exactly the same as if you had included it. If you
omit the extern
keyword when declaring a variable,
however, the linkage is external. That is, the extern keyword looks
up the linkage of any visible previous declaration of that variable,
and if there is one and it is marked as having internal linkage, the extern keyword gives
this declaration internal linkage, as if you had written static instead.
Let me repeat this, because it is so bizarre: for a file-scope
variable, the extern
keyword can give the identifier internal linkage. Omitting it can
give that indentifier external linkage. So adding extern can remove external linkage!
There is no good reason for this state of affairs. The rules
appear to be pointlessly complicated. Moreover, in the only case
in which adding the extern
keyword gives a file-scope variable identifier internal linkage,
omitting the keyword produces undefined behavior. In particular,
the code fragment:
static int x;
extern int x;
int x;
has undefined behavior: the first two lines give x internal linkage,
but the third gives x
external linkage. Peculiarly:
static int f(void);
extern int f(void);
int f(void);
is OK, as all three lines give f internal
linkage. (Stylistically, it is best to just use the static keyword every
time. I have no idea why this ‘extern means static’ rule even
exists.)
more examples go here. (begin leftover text from usenet posting)
"Varies" means "external unless currently shown to be internal".
The "currently shown" part is tricky at block scope, since it
depends on whether a previous file-scope declaration has been
obscured by a block-scope declaration for the same identifier.
Note that the "auto" keyword is redundant: it is illegal at file
scope, and at block scope, it does the same thing you would get
if you did not use a storage-class specifier keyword.
>The compiler ultimately assumes that your "extern int b" must be
>referring to a different "b" than the one you declare and define in
>main(), i.e. one that has storage class extern and external linkage;
>the compiler can't find such a variable and complains accordin gly.
This part is correct.
Now on to the second example...
>> int b = 2; // line 1
>> int main( void )
>> {
>> if ( a ) {
>> extern int b; //line 5
>> b++;
>> }
>> return 0;
>> }
>>
>> In contrast to the last example, integer "b" is not defined globally.
>> What linkage type is object "b" now? According to ISO/IEC 9899:1999
>> §6.2.2.4, it should adopt the linkage type of the prior declaration of
>> the same identifier, namely internal linkage as used in line 1. Right?
>Note: Changed "extern b" to "extern int b"
>
>Not quite. First off, in line 1 your "int b" has external linkage: A
>variable defined in file-scope and which isn't "static" has external
>linkage by default.
Right. Look in the table above: no keyword, file scope: we get static
duration, external linkage, and "tentative or yes" definition. The
declaration includes an initial value so the "definition" answer is
"yes".
On line 5, we have extern at file scope, so (per the table) we get
static duration, "varies" linkage, and "no" for definition. There
is no initializer (one would be illegal here anyway) and "tentative
definition-ness" is suppressed, so this is purely a declaration.
For a third and fourth example, consider:
static int x = 42; /* declaration 1 (also definition) of x */
int main(void) {
int a = 0;
{
extern int x; /* decl 2: iffy, but legal */
x++;
}
}
The "extern int x" line again refers to this entry:
keyword duration linkage definition?
+----------------------+-----------+----------+--------------------+
| extern (file scope) | static | varies | suppress tentative |
| (block scope) | static | varies | no (cannot init.) |
+----------------------+-----------+----------+--------------------+
It is at block scope so the duration is "static" and definition-ness
is "no", but as before, the linkage is "varies". We must play
compiler and ask "is there a previous x in some visible scope that
has some linkage?" The answer is yes: the previous file-scope
"static int x" is visible, and has internal linkage. So here "x"
gets internal linkage, and this refers to the same "x" as the
"static int x = 42".
If we change "int a = 0;" to "int x = 0;" in main(), however, we get:
static int x = 42; /* decl 1 (also definition) */
int main(void) {
int x = 0; /* decl 2 (also definition) */
{
extern int x; /* decl 3: ERROR */
x++;
}
}
Here the only visible "x" at the "extern int x" line is the
block-scope "x", which has no linkage. Thus, "extern int x" gives
the third declaration of "x" external linkage, in spite of the
first declaration giving it internal linkage. This triggers
undefined behavior (paragraph 7 of section 6.1.2.2 of the C99 draft
I keep handy).