24.5. Indentation for Programs

The best way to keep a program properly indented is to use Emacs to reindent it as you change it. Emacs has commands to indent properly either a single line, a specified number of lines, or all of the lines inside a single parenthetical grouping.

Emacs also provides a Lisp pretty-printer in the library pp. This program reformats a Lisp object with indentation chosen to look nice.

24.5.1. Basic Program Indentation Commands

TAB

Adjust indentation of current line.

C-j

Equivalent to RET followed by TAB (newline-and-indent).

The basic indentation command is TAB, which gives the current line the correct indentation as determined from the previous lines. The function that TAB runs depends on the major mode; it is lisp-indent-line in Lisp mode, c-indent-line in C mode, etc. These functions understand different syntaxes for different languages, but they all do about the same thing. TAB in any programming-language major mode inserts or deletes whitespace at the beginning of the current line, independent of where point is in the line. If point is inside the whitespace at the beginning of the line, TAB leaves it at the end of that whitespace; otherwise, TAB leaves point fixed with respect to the characters around it.

Use C-q TAB to insert a tab at point.

When entering lines of new code, use C-j (newline-and-indent), which is equivalent to a RET followed by a TAB. C-j creates a blank line and then gives it the appropriate indentation.

TAB indents the second and following lines of the body of a parenthetical grouping each under the preceding one; therefore, if you alter one line's indentation to be nonstandard, the lines below will tend to follow it. This behavior is convenient in cases where you have overridden the standard result of TAB because you find it unaesthetic for a particular line.

Remember that an open-parenthesis, open-brace or other opening delimiter at the left margin is assumed by Emacs (including the indentation routines) to be the start of a function. Therefore, you must never have an opening delimiter in column zero that is not the beginning of a function, not even inside a string. This restriction is vital for making the indentation commands fast; you must simply accept it. Section 24.4, for more information on this.

24.5.2. Indenting Several Lines

When you wish to reindent several lines of code which have been altered or moved to a different level in the list structure, you have several commands available.

C-M-q

Reindent all the lines within one list (indent-sexp).

C-u TAB

Shift an entire list rigidly sideways so that its first line is properly indented.

C-M-\

Reindent all lines in the region (indent-region).

You can reindent the contents of a single list by positioning point before the beginning of it and typing C-M-q (indent-sexp in Lisp mode, c-indent-exp in C mode; also bound to other suitable commands in other modes). The indentation of the line the sexp starts on is not changed; therefore, only the relative indentation within the list, and not its position, is changed. To correct the position as well, type a TAB before the C-M-q.

If the relative indentation within a list is correct but the indentation of its first line is not, go to that line and type C-u TAB. TAB with a numeric argument reindents the current line as usual, then reindents by the same amount all the lines in the grouping starting on the current line. In other words, it reindents the whole grouping rigidly as a unit. It is clever, though, and does not alter lines that start inside strings, or C preprocessor lines when in C mode.

Another way to specify the range to be reindented is with the region. The command C-M-\ (indent-region) applies TAB to every line whose first character is between point and mark.

24.5.3. Customizing Lisp Indentation

The indentation pattern for a Lisp expression can depend on the function called by the expression. For each Lisp function, you can choose among several predefined patterns of indentation, or define an arbitrary one with a Lisp program.

The standard pattern of indentation is as follows: the second line of the expression is indented under the first argument, if that is on the same line as the beginning of the expression; otherwise, the second line is indented underneath the function name. Each following line is indented under the previous line whose nesting depth is the same.

If the variable lisp-indent-offset is non-nil, it overrides the usual indentation pattern for the second line of an expression, so that such lines are always indented lisp-indent-offset more columns than the containing list.

The standard pattern is overridden for certain functions. Functions whose names start with def always indent the second line by lisp-body-indent extra columns beyond the open-parenthesis starting the expression.

The standard pattern can be overridden in various ways for individual functions, according to the lisp-indent-function property of the function name. There are four possibilities for this property:

nil

This is the same as no property; the standard indentation pattern is used.

defun

The pattern used for function names that start with def is used for this function also.

a number, number

The first number arguments of the function are distinguished arguments; the rest are considered the body of the expression. A line in the expression is indented according to whether the first argument on it is distinguished or not. If the argument is part of the body, the line is indented lisp-body-indent more columns than the open-parenthesis starting the containing expression. If the argument is distinguished and is either the first or second argument, it is indented twice that many extra columns. If the argument is distinguished and not the first or second argument, the standard pattern is followed for that line.

a symbol, symbol

symbol should be a function name; that function is called to calculate the indentation of a line within this expression. The function receives two arguments:

state

The value returned by parse-partial-sexp (a Lisp primitive for indentation and nesting computation) when it parses up to the beginning of this line.

pos

The position at which the line being indented begins.

It should return either a number, which is the number of columns of indentation for that line, or a list whose car is such a number. The difference between returning a number and returning a list is that a number says that all following lines at the same nesting level should be indented just like this one; a list says that following lines might call for different indentations. This makes a difference when the indentation is being computed by C-M-q; if the value is a number, C-M-q need not recalculate indentation for the following lines until the end of the list.

24.5.4. Commands for C Indentation

Here are the commands for indentation in C mode and related modes:

C-c C-q

Reindent the current top-level function definition or aggregate type declaration (c-indent-defun).

C-M-q

Reindent each line in the balanced expression that follows point (c-indent-exp). A prefix argument inhibits error checking and warning messages about invalid syntax.

TAB

Reindent the current line, and/or in some cases insert a tab character (c-indent-command).

If c-tab-always-indent is t, this command always reindents the current line and does nothing else. This is the default.

If that variable is nil, this command reindents the current line only if point is at the left margin or in the line's indentation; otherwise, it inserts a tab (or the equivalent number of spaces, if indent-tabs-mode is nil).

Any other value (not nil or t) means always reindent the line, and also insert a tab if within a comment, a string, or a preprocessor directive.

C-u TAB

Reindent the current line according to its syntax; also rigidly reindent any other lines of the expression that starts on the current line. Section 24.5.2.

To reindent the whole current buffer, type C-x h C-M-\. This first selects the whole buffer as the region, then reindents that region.

To reindent the current block, use C-M-u C-M-q. This moves to the front of the block and then reindents it all.

24.5.5. Customizing C Indentation

C mode and related modes use a simple yet flexible mechanism for customizing indentation. The mechanism works in two steps: first it classifies the line syntactically according to its contents and context; second, it associates each kind of syntactic construct with an indentation offset which you can customize.

24.5.5.1. Step 1--Syntactic Analysis

In the first step, the C indentation mechanism looks at the line before the one you are currently indenting and determines the syntactic components of the construct on that line. It builds a list of these syntactic components, each of which contains a syntactic symbol and sometimes also a buffer position. Some syntactic symbols describe grammatical elements, for example statement and substatement; others describe locations amidst grammatical elements, for example class-open and knr-argdecl.

Conceptually, a line of C code is always indented relative to the indentation of some line higher up in the buffer. This is represented by the buffer positions in the syntactic component list.

Here is an example. Suppose we have the following code in a C++ mode buffer (the line numbers don't actually appear in the buffer):

1: void swap (int& a, int& b)
2: {
3:   int tmp = a;
4:   a = b;
5:   b = tmp;
6: }

If you type C-c C-s (which runs the command c-show-syntactic-information) on line 4, it shows the result of the indentation mechanism for that line:

((statement . 32))

This indicates that the line is a statement and it is indented relative to buffer position 32, which happens to be the i in int on line 3. If you move the cursor to line 3 and type C-c C-s, it displays this:

((defun-block-intro . 28))

This indicates that the int line is the first statement in a block, and is indented relative to buffer position 28, which is the brace just after the function header.

Here is another example:

1: int add (int val, int incr, int doit)
2: {
3:   if (doit)
4:     {
5:       return (val + incr);
6:     }
7:   return (val);
8: }

Typing C-c C-s on line 4 displays this:

((substatement-open . 43))

This says that the brace opens a substatement block. By the way, a substatement indicates the line after an if, else, while, do, switch, for, try, catch, finally, or synchronized statement.

Within the C indentation commands, after a line has been analyzed syntactically for indentation, the variable c-syntactic-context contains a list that describes the results. Each element in this list is a syntactic component: a cons cell containing a syntactic symbol and (optionally) its corresponding buffer position. There may be several elements in a component list; typically only one element has a buffer position.

24.5.5.2. Step 2--Indentation Calculation

The C indentation mechanism calculates the indentation for the current line using the list of syntactic components, c-syntactic-context, derived from syntactic analysis. Each component is a cons cell that contains a syntactic symbol and may also contain a buffer position.

Each component contributes to the final total indentation of the line in two ways. First, the syntactic symbol identifies an element of c-offsets-alist, which is an association list mapping syntactic symbols into indentation offsets. Each syntactic symbol's offset adds to the total indentation. Second, if the component includes a buffer position, the column number of that position adds to the indentation. All these offsets and column numbers, added together, give the total indentation.

The following examples demonstrate the workings of the C indentation mechanism:

1: void swap (int& a, int& b)
2: {
3:   int tmp = a;
4:   a = b;
5:   b = tmp;
6: }

Suppose that point is on line 3 and you type TAB to reindent the line. As explained above (Section 24.5.5.1), the syntactic component list for that line is:

((defun-block-intro . 28))

In this case, the indentation calculation first looks up defun-block-intro in the c-offsets-alist alist. Suppose that it finds the integer 2; it adds this to the running total (initialized to zero), yielding a updated total indentation of 2 spaces.

The next step is to find the column number of buffer position 28. Since the brace at buffer position 28 is in column zero, this adds 0 to the running total. Since this line has only one syntactic component, the total indentation for the line is 2 spaces.

1: int add (int val, int incr, int doit)
2: {
3:   if (doit)
4:     {
5:       return(val + incr);
6:     }
7:   return(val);
8: }

If you type TAB on line 4, the same process is performed, but with different data. The syntactic component list for this line is:

((substatement-open . 43))

Here, the indentation calculation's first job is to look up the symbol substatement-open in c-offsets-alist. Let's assume that the offset for this symbol is 2. At this point the running total is 2 (0 + 2 = 2). Then it adds the column number of buffer position 43, which is the i in if on line 3. This character is in column 2 on that line. Adding this yields a total indentation of 4 spaces.

If a syntactic symbol in the analysis of a line does not appear in c-offsets-alist, it is ignored; if in addition the variable c-strict-syntax-p is non-nil, it is an error.

24.5.5.3. Changing Indentation Style

There are two ways to customize the indentation style for the C-like modes. First, you can select one of several predefined styles, each of which specifies offsets for all the syntactic symbols. For more flexibility, you can customize the handling of individual syntactic symbols. Section 24.5.5.4, for a list of all defined syntactic symbols.

M-x c-set-style RET style RET

Select predefined indentation style style. Type ? when entering style to see a list of supported styles; to find out what a style looks like, select it and reindent some C code.

C-c C-o symbol RET offset RET

Set the indentation offset for syntactic symbol symbol (c-set-offset). The second argument offset specifies the new indentation offset.

The c-offsets-alist variable controls the amount of indentation to give to each syntactic symbol. Its value is an association list, and each element of the list has the form (syntactic-symbol . offset). By changing the offsets for various syntactic symbols, you can customize indentation in fine detail. To change this alist, use c-set-offset (see below).

Each offset value in c-offsets-alist can be an integer, a function or variable name, a list, or one of the following symbols: +, -, ++, -, *, or /, indicating positive or negative multiples of the variable c-basic-offset. Thus, if you want to change the levels of indentation to be 3 spaces instead of 2 spaces, set c-basic-offset to 3.

Using a function as the offset value provides the ultimate flexibility in customizing indentation. The function is called with a single argument containing the cons of the syntactic symbol and the buffer position, if any. The function should return an integer offset.

If the offset value is a list, its elements are processed according to the rules above until a non-nil value is found. That value is then added to the total indentation in the normal manner. The primary use for this is to combine the results of several functions.

The command C-c C-o (c-set-offset) is the easiest way to set offsets, both interactively or in your ~/.emacs file. First specify the syntactic symbol, then the offset you want. Section 24.5.5.4, for a list of valid syntactic symbols and their meanings.

24.5.5.4. Syntactic Symbols

Here is a table of valid syntactic symbols for indentation in C and related modes, with their syntactic meanings. Normally, most of these symbols are assigned offsets in c-offsets-alist.

string

Inside a multi-line string.

c

Inside a multi-line C style block comment.

defun-open

On a brace that opens a function definition.

defun-close

On a brace that closes a function definition.

defun-block-intro

In the first line in a top-level defun.

class-open

On a brace that opens a class definition.

class-close

On a brace that closes a class definition.

inline-open

On a brace that opens an in-class inline method.

inline-close

On a brace that closes an in-class inline method.

extern-lang-open

On a brace that opens an external language block.

extern-lang-close

On a brace that closes an external language block.

func-decl-cont

The region between a function definition's argument list and the defun opening brace (excluding K&R function definitions). In C, you cannot put anything but whitespace and comments between them; in C++ and Java, throws declarations and other things can appear in this context.

knr-argdecl-intro

On the first line of a K&R C argument declaration.

knr-argdecl

In one of the subsequent lines in a K&R C argument declaration.

topmost-intro

On the first line in a topmost construct definition.

topmost-intro-cont

On the topmost definition continuation lines.

member-init-intro

On the first line in a member initialization list.

member-init-cont

On one of the subsequent member initialization list lines.

inher-intro

On the first line of a multiple inheritance list.

inher-cont

On one of the subsequent multiple inheritance lines.

block-open

On a statement block open brace.

block-close

On a statement block close brace.

brace-list-open

On the opening brace of an enum or static array list.

brace-list-close

On the closing brace of an enum or static array list.

brace-list-intro

On the first line in an enum or static array list.

brace-list-entry

On one of the subsequent lines in an enum or static array list.

brace-entry-open

On one of the subsequent lines in an enum or static array list, when the line begins with an open brace.

statement

On an ordinary statement.

statement-cont

On a continuation line of a statement.

statement-block-intro

On the first line in a new statement block.

statement-case-intro

On the first line in a case "block."

statement-case-open

On the first line in a case block starting with brace.

inexpr-statement

On a statement block inside an expression. This is used for a GNU extension to the C language, and for Pike special functions that take a statement block as an argument.

inexpr-class

On a class definition inside an expression. This is used for anonymous classes and anonymous array initializers in Java.

substatement

On the first line after an if, while, for, do, or else.

substatement-open

On the brace that opens a substatement block.

case-label

On a case or default label.

access-label

On a C++ private, protected, or public access label.

label

On any ordinary label.

do-while-closure

On the while that ends a do-while construct.

else-clause

On the else of an if-else construct.

catch-clause

On the catch and finally lines in trycatch constructs in C++ and Java.

comment-intro

On a line containing only a comment introduction.

arglist-intro

On the first line in an argument list.

arglist-cont

On one of the subsequent argument list lines when no arguments follow on the same line as the arglist opening parenthesis.

arglist-cont-nonempty

On one of the subsequent argument list lines when at least one argument follows on the same line as the arglist opening parenthesis.

arglist-close

On the closing parenthesis of an argument list.

stream-op

On one of the lines continuing a stream operator construct.

inclass

On a construct that is nested inside a class definition. The indentation is relative to the open brace of the class definition.

inextern-lang

On a construct that is nested inside an external language block.

inexpr-statement

On the first line of statement block inside an expression. This is used for the GCC extension to C that uses the syntax ({ … }). It is also used for the special functions that takes a statement block as an argument in Pike.

inexpr-class

On the first line of a class definition inside an expression. This is used for anonymous classes and anonymous array initializers in Java.

cpp-macro

On the start of a cpp macro.

friend

On a C++ friend declaration.

objc-method-intro

On the first line of an Objective-C method definition.

objc-method-args-cont

On one of the lines continuing an Objective-C method definition.

objc-method-call-cont

On one of the lines continuing an Objective-C method call.

inlambda

Like inclass, but used inside lambda (i.e. anonymous) functions. Only used in Pike.

lambda-intro-cont

On a line continuing the header of a lambda function, between the lambda keyword and the function body. Only used in Pike.

24.5.5.5. Variables for C Indentation

This section describes additional variables which control the indentation behavior of C mode and related mode.

c-offsets-alist

Association list of syntactic symbols and their indentation offsets. You should not set this directly, only with c-set-offset. Section 24.5.5.3, for details.

c-style-alist

Variable for defining indentation styles; see below.

c-basic-offset

Amount of basic offset used by + and - symbols in c-offsets-alist.

c-special-indent-hook

Hook for user-defined special indentation adjustments. This hook is called after a line is indented by C mode and related modes.

The variable c-style-alist specifies the predefined indentation styles. Each element has form (name variable-setting…), where name is the name of the style. Each variable-setting has the form (variable . value); variable is one of the customization variables used by C mode, and value is the value for that variable when using the selected style.

When variable is c-offsets-alist, that is a special case: value is appended to the front of the value of c-offsets-alist instead of replacing that value outright. Therefore, it is not necessary for value to specify each and every syntactic symbol--only those for which the style differs from the default.

The indentation of lines containing only comments is also affected by the variable c-comment-only-line-offset (Section 24.19.5).

24.5.5.6. C Indentation Styles

A C style is a collection of indentation style customizations. Emacs comes with several predefined indentation styles for C and related modes, including gnu, k&r, bsd, stroustrup, linux, python, java, whitesmith, ellemtel, cc-mode, and user.

To choose the style you want, use the command M-x c-set-style. Specify a style name as an argument (case is not significant in C style names). The chosen style only affects newly visited buffers, not those you are already editing. You can also set the variable c-default-style to specify the style for various major modes. Its value should be an alist, in which each element specifies one major mode and which indentation style to use for it. For example,

(setq c-default-style
      '((java-mode . "java") (other . "gnu")))

specifies an explicit choice for Java mode, and the default gnu style for the other C-like modes.

The style gnu defines the formatting recommend by the GNU Project; it is the default, so as to encourage the indentation we recommend. However, if you make changes in variables such as c-basic-offset and c-offsets-alist in your ~/.emacs file, your changes override the what gnu style says.

To define a new C indentation style, call the function c-add-style:

(c-add-style name values use-now)

Here name is the name of the new style (a string), and values is an alist whose elements have the form (variable . value). The variables you specify should be among those documented in Section 24.5.5.5.

If use-now is non-nil, c-add-style selects the new style after defining it.