Scope, linkage, and name spaces

Storage-class, duration, and storage-class-specifiers

Note that this gets confusing, because there are a number of fairly ‘deep’ concepts that are all mixed together here.

First, scope, linkage, and namespace are properties of identifiers.

Identifiers

Identifiers and their name spaces

Identifiers start with an alphabetic character (or an underscore, but most such names are reserved) and continue with more alphabetic characters or digits.  Identifiers name one of these six items:
Several of these have their own ‘name spaces’, which means that you can use the same name for more than one entity and the language will ‘know what you mean’.  For instance, you can have a label named top in a function named top; a reference to top() calls the function, while goto top refers to the label.  The compiler can distinguish between these because of the syntax of the language: a label always comes right before a colon, or right after the goto keyword.

There are two ‘special’ name spaces for goto-labels and tags respectively.  In addition, every time you create a new type with struct or union, its members occupy a newly-created namespace, specific to that type.  Hence if x and y have different structure types, and value is a valid member of x, value may or may not be a valid member of y.  (In a very ancient version of C, long predating the C89 standard, there was only one namespace for all structure types, so every member name had to be unique.  This is one reason that some traditional C code will use names like st_size, rather than just size, for members of a struct stat: the prefix and underscore distinguished this st_size from the i_size attached to a struct inode.  Prefix disambiguators like this are no longer necessary, although some programmers like them stylistically.)

All other identifiers, including the members of an enum, fall into the ‘ordinary’ name space.  Names that are in separate name spaces are different, even if they are spelled the same, as with the goto-label example above.  Thus, whenever we ask sensible questions that decide if two names refer to the same entity, we are assuming that they are in the same name-space.  Since most identifiers are in the ‘ordinary’ name space, and all other cases are obvious from syntax, it is easy to tell if this assumption is warranted.

Scopes

The scope of an identifier is, roughly, ‘where you can see it’.  The C language is defined such that compilers rarely need to ‘look ahead’ or ‘look back’, so scopes start at the point the identifier appears, and are decided based on when the identifier stops being in scope, rather than where it first starts being in scope.  An identifier that is otherwise ‘in scope’ can also be obscured, as described in a moment.  When it is obscured, we say it is ‘not in scope’, even though it will re-appear as soon as whatever is hiding it is gotten out of the way.

There are four possible scopes:
Two identifiers have the same scope if and only if their scopes terminate at the same point.  (This is just another way to say that scopes are decided based on when the identifier ‘stops being in scope’.)  Note that block and function scope only occur within functions, and function prototype scope only occurs within prototypes; so any identifier outside a function, or the name of any function itself, must necessarily have file scope.  The latter occurs because functions cannot be defined within other functions, only at file scope.

Except for goto-labels with their special function scope, identifiers that are defined within a block always have block scope.  That scope ends at the closing brace that ends the block.  (The new for-loop scope in C99 is also a block scope, although I am not yet sure where it ends—I need to scour the C99 Standard.  The tricky case is illustrated by this code fragment:
for (int i = 0; i < 10; i++) { int i = 4; }
Is that inner i allowed?  What if the braces are removed?  One should avoid this anyway, if only for style reasons.)

Preprocessor macro names have file scope, but are slightly peculiar for historical reasons.  In particular, you can erase them using a #undef directive.

Linkage

Name spaces and scopes are quite abstract, but linkage starts to let the ‘real world’ in.  The essential purpose of linkage is to decide whether two names in two separate translation units refer to the same object or function: does a call to run() in baseball.c call the function run() in nylons.c, or are runs in nylons different from runs in baseball?  Is the variable score in the first file the same or different from one of the same name in some other file?

To answer this, we look at the linkage of the identifiers in question.  There are three possible linkages: external, internal, and ‘none’.  Note that these are still properties of identifiers, even though the whole point is to answer questions about objects and functions.  At the very end of the compilation sequence, linkage is ‘resolved’ by the last phase of the compiler.  On most systems, this is actually a completely separate program called a linker (or link-editor, or linking loader, or loader, or a number of similar names).  The linker, which connects up (links) separately-compiled files, is usually only handed the identifiers with ‘external’ linkage.

Identifiers with ‘no linkage’ are easy: since they have no linkage, the only interesting properties remaining are namespace and scope.  If two identifiers with identical spelling have the same scope (and of course namespace) and no linkage, they refer to the same thing.  If they have different scopes, they refer to different things.  Thus, in the following code fragment:
int f(void) {
int i; /* first i */
for (i = 0; i < 10; i++) {
printf("start trip %d: ", i); /* first i */
{
int i = somefunc(); /* second i */
printf("got %d...", i); /* second i */
} /* end 2nd i */
printf("end trip %d\n", i); /* back to first i */
}
} /* end 1st i */
there are two different things both named i, both of which have block scope.  The first i’s scope ends at the end of the function (the last }), and the second (inner) i has scope that ends at the first }, so these are different ‘i’s.  Note that the inner (second) i obscures the outer one.

Identifiers with internal linkage are contained within a given translation unit.  Since the linker generally never sees them, these names cannot cross translation-unit boundaries.  If the run() in baseball.c has internal linkage, and the run() in nylons.c also has internal linkage, they must be different entities, even though they have the same name.

Identifiers with external linkage are linked if they match up.  In this case, there should be just one definition for the name.  If the run() in baseball.c has external linkage, and the function run() in nylons.c also has external linkage, and you attempt to link the two translation units, the compiler should produce a diagnostic.

What if the identifier has internal linkage in one translation unit, and external linkage in another?  This is OK: the internal-linkage version is available within that one translation unit, and not elsewhere.  The external-linkage version is available in all the other translation units, or rather, all others that do not also have their own internal-linkage version of that identifier.

The only situation that goes wrong is when an identifier appears in a single translation unit with both internal and external linkage.  In this case, the effect is undefined.  Some linkers connect the identifiers seemingly at random (with the actual behavior depending on the order symbols have in memory and/or linearized data structures); others behave in more predictable ways, but the situation is best avoided.   A good compiler can detect when this situation arises, and complain, but compilers are not required to do this.  (We will have an example of a bad situation later.)

Storage durations of objects

We now jump from identifiers, which have the name-space, scope, and linkage properties described above, to objects, which have storage-duration.  The reason for this odd jump should become clear in a moment.

There are three storage durations: automatic, static, and allocated.  These are more fully described elsewhere.  When an object is created by an ordinary variable declaration, it always has either automatic or static duration.  Such a declaration consists of a type-specifier, followed by optional syntactic bits that act as modifiers, with an identifier embedded in those modifiers.  The default storage duration depends on the scope of the identifier.  If the identifier has block scope, the object will have automatic storage duration by default, but you can give it static duration.  If the identifier has file scope, the object will have static storage duration—automatic duration is not even an option.  (C has an auto keyword, but using it here just gets you a diagnostic.  In fact, this keyword is entirely useless, as we will see.)

Definition vs declaration

At this point, we also need an aside on the difference between an object or function definition, and a mere declaration.  I intend to write more about this, but for now, let me just say that the heart of the difference is the same as that between saying that something exists, and actually pointing it out.  For instance, I can declare that a particular green-eggs-and-ham book exists, but I have not really defined it until I tell you that it is the one by Dr Seuss.  Or in C, one might write something like extern int GreenEggCount; to declare that the variable exists, while writing int GreenEggCount = 2; not only declares it, but also defines it and gives it an initial value of 2.  Note that, in C, a definition always gives you an implied declaration as well, but the reverse is not true.

A definition that assigns a value, as in this example, is a sort of ‘definite definition’: the variable GreenEggCount is defined and set to 2.  If we omit the initial value, however, the definition becomes a so-called tentative definition, in which the variable is held in a sort of quasi-defined state while the compiler works on the rest of the translation unit.  If the compiler comes across another, more definite, definition, the variable is defined at that point.  But if the compiler reaches the end of the translation unit, and no definite definition has happened yet, the tentative definition is made into a real definition, and the initial value for the variable is zero.

The static keyword is the obvious way to select static duration for an object that would otherwise have automatic duration—and indeed, this is exactly how one does this in C.  But for an object that already has static duration, the keyword would be useless.  This is where C gets too clever for its own good.  Now we must jump back to identifier properties, and in particular, linkage.

Linkage of objects and functions

Using the static keyword would seem pointless for an object that already has static duration.  Functions do not—at least not technically—have a storage duration at all, but one can say that all functions have ‘static duration’ in the sense that they exist by the time the program enters its main() function.  It would, however, make sense to have a keyword or two—perhaps something like public and private—to mark objects and functions as having external and internal linkage respectively.  But this is not the ‘C Way’.  The C Way asks: why have two more keywords when we can just overload the one?

Since objects defined outside functions always have static storage duration, and functions effectively have static duration, we can use the static keyword to mark these identifiers as having internal linkage.  Then, in yet another twist, C’s extern keyword, which seems like an obvious way to mark identifiers as having external linkage, has a bizarrely complicated meaning.

What extern does is this:
This definition-suppression effect only applies to objects (ordinary variables), not functions.

Storage class specifiers

With all of the above out of the way, we can finally talk about storage class specifiers, which are really just five keywords:
The typedef keyword is yet another special case, which is best ignored at this point.  Doing so leaves us with four keywords, three of which have already been mentioned.  That leaves the register keyword, which can be viewed as a special case of the (useless) auto keyword: both can only be applied to block-scope identifiers for objects, and both cause those objects to have automatic duration, which they already have.  The register keyword does have one special effect, though: any object declared using this keyword cannot have its address taken with the unary & operator.  (Also, in some particularly dumb compilers, or compilers told not to put any work into optimization, the keyword can be used to attempt some manual optimization.  In general, though, it works better just to turn on optimization in a good compiler.)

We can put all of this together to make up a table that describes the effect of each keyword.  Note that only one keyword is allowed for any given declaration (even if the declaration declares multiple identifiers), auto and register cannot be applied to functions, and the question about definition-ness only applies to variables, not functions.

Effects of storage-class specifier keywords
keyword
duration
linkage
definition?
auto (file scope)
auto (block scope)
error
automatic
error
none
error
yes
extern (file scope)
extern (block scope)
static
static
varies
varies
tentative is suppressed
no (cannot initialize)
register (file scope)
register (block scope)
error
automatic
error
none
error
yes
static (file scope)
static (block scope)
static
static
internal
none
tentative or yes
yes
–none– (file scope)
–none– (block scope)
static
automatic
varies [see note below]
none
tentative or yes
yes

In general, using the extern keyword has no effect on linkage; its real purpose is to suppress tentative definitions.  There is one exception though: for no particularly good reason, if you omit the extern keyword when declaring a function, the linkage is exactly the same as if you had included it.  If you omit the extern keyword when declaring a variable, however, the linkage is external.  That is, the extern keyword looks up the linkage of any visible previous declaration of that variable, and if there is one and it is marked as having internal linkage, the extern keyword gives this declaration internal linkage, as if you had written static instead.

Let me repeat this, because it is so bizarre: for a file-scope variable, the extern keyword can give the identifier internal linkage.  Omitting it can give that indentifier external linkage.  So adding extern can remove external linkage!

There is no good reason for this state of affairs.  The rules appear to be pointlessly complicated.  Moreover, in the only case in which adding the extern keyword gives a file-scope variable identifier internal linkage, omitting the keyword produces undefined behavior.  In particular, the code fragment:
static int x;
extern int x;
int x;
has undefined behavior: the first two lines give x internal linkage, but the third gives x external linkage.  Peculiarly:
static int f(void);
extern int f(void);
int f(void);
is OK, as all three lines give f internal linkage.  (Stylistically, it is best to just use the static keyword every time.  I have no idea why this ‘extern means static’ rule even exists.)

more examples go here.  (begin leftover text from usenet posting)

"Varies" means "external unless currently shown to be internal".
The "currently shown" part is tricky at block scope, since it
depends on whether a previous file-scope declaration has been
obscured by a block-scope declaration for the same identifier.

Note that the "auto" keyword is redundant: it is illegal at file
scope, and at block scope, it does the same thing you would get
if you did not use a storage-class specifier keyword.

>The compiler ultimately assumes that your "extern int b" must be
>referring to a different "b" than the one you declare and define in
>main(), i.e. one that has storage class extern and external linkage;
>the compiler can't find such a variable and complains accordin gly.

This part is correct.

Now on to the second example...

>> int b = 2; // line 1
>> int main( void )
>> {
>> if ( a ) {
>> extern int b; //line 5
>> b++;
>> }
>> return 0;
>> }
>>
>> In contrast to the last example, integer "b" is not defined globally.
>> What linkage type is object "b" now? According to ISO/IEC 9899:1999
>> §6.2.2.4, it should adopt the linkage type of the prior declaration of
>> the same identifier, namely internal linkage as used in line 1. Right?

>Note: Changed "extern b" to "extern int b"
>
>Not quite. First off, in line 1 your "int b" has external linkage: A
>variable defined in file-scope and which isn't "static" has external
>linkage by default.

Right. Look in the table above: no keyword, file scope: we get static
duration, external linkage, and "tentative or yes" definition. The
declaration includes an initial value so the "definition" answer is
"yes".

On line 5, we have extern at file scope, so (per the table) we get
static duration, "varies" linkage, and "no" for definition. There
is no initializer (one would be illegal here anyway) and "tentative
definition-ness" is suppressed, so this is purely a declaration.

For a third and fourth example, consider:

static int x = 42; /* declaration 1 (also definition) of x */

int main(void) {
int a = 0;
{
extern int x; /* decl 2: iffy, but legal */
x++;
}
}

The "extern int x" line again refers to this entry:

keyword duration linkage definition?
+----------------------+-----------+----------+--------------------+
| extern (file scope) | static | varies | suppress tentative |
| (block scope) | static | varies | no (cannot init.) |
+----------------------+-----------+----------+--------------------+

It is at block scope so the duration is "static" and definition-ness
is "no", but as before, the linkage is "varies". We must play
compiler and ask "is there a previous x in some visible scope that
has some linkage?" The answer is yes: the previous file-scope
"static int x" is visible, and has internal linkage. So here "x"
gets internal linkage, and this refers to the same "x" as the
"static int x = 42".

If we change "int a = 0;" to "int x = 0;" in main(), however, we get:

static int x = 42; /* decl 1 (also definition) */

int main(void) {
int x = 0; /* decl 2 (also definition) */
{
extern int x; /* decl 3: ERROR */
x++;
}
}

Here the only visible "x" at the "extern int x" line is the
block-scope "x", which has no linkage. Thus, "extern int x" gives
the third declaration of "x" external linkage, in spite of the
first declaration giving it internal linkage. This triggers
undefined behavior (paragraph 7 of section 6.1.2.2 of the C99 draft
I keep handy).