Emacs Key Syntax Explained
This page gives a explanation of the several confusing represenation of non-printable characters such as \n \r \t \f
and ^J ^M ^I ^L
and C-m RET <return> Enter ^m control-m 13 (?\C-m)
, and how to type non-printable characters in emacs.
The issues involved are:
- Printable Representation of non-printable characters. (such as
^j
for line feed character.) - Input method of non-printable characters. (such as pressing Ctrl+q Ctrl+j for inserting a line feed char.)
- Key notation for a non-printable character's input method. (such as
(kbd "C-q C-j")
.) - Printable Representation in programing language strings of non-printable characters. (such as
"\n"
.)
Summary
C-q
is a notation for the keyboard shortcut Ctrl+q. The command invoked by that shortcut isquoted-insert
. Thequoted-insert
command lets you insert a character and suppress the corresponding key's normal function. For example, in minibuffer, type Ctrl+q Tab to insert a literal tab. (otherwise it'll usually do completion)- The
C-j
inC-q C-j
is for inputing ASCII Line Feed char (used in unix as newline char), which does not have a corresponding key on the keyboard. The reason thatC-j
is for Line Feed is because it's a notation from the ASCII standard. That is, the non-printables are represented by a Control followed by a letter, regardless whether there is a corresponding key on the keyboard. Line feed is the 10th ASCII char, and j is the 10th letter, so Line Feed isC-j
. Similarly,C-i
is for Horizontal Tab,C-m
is for Carriage Return,C-[
is for Escape,C-l
is for Form Feed, etc. - The
^M
is a display notation for Carriage Return, similar toC-m
. The caret^
is used because in ASCII standard,^
is display representation of Control sequence. Similarly,^I
is for Horizontal Tab,^J
is for Line Feed,^[
is for Escape, etc. - The
\e, \r, \n, \t
are representations of non-printable ASCII chars in a programing language inside a string. This is a C language convention. Emacs lisp adopted this notation. - The
(global-set-key [(control b)] 'cmd)
and other variations is emacs's syntax to represent keystrokes in elisp. A syntax for key strokes is necessary because keys are not ASCII chars (e.g. F1, F2, Home, PageUp keys.). For historical reasons, elisp has several syntaxes to represent the same keystrokes.
On , Will [schimpan…@gmx.de] wrote:
how can I find an overview on how to enter meta-characters (e.g. esc, return, linefeed, tab, etc.) (a) in a regular buffer (b) in the minibuffer when using standard search/replace-functions (c) in the minibuffer when using search/replace-functions using regular expressions (d) in the .emacs file when defining keybindings As far as I can see in all those situations entering meta-characters is addressed in a different way which I find confusing, e.g. (a) <key> _or_ C-q <key> (b) C-q C-[, C-q C-m, C-q C-j, C-q C-i (c) \e, \r, \n, \t (d) (define-key [(meta c) (control c) (tab c)] “This is confusing!”) Furthermore, they are displayed in a different way, e.g. - actual, visible layout - ^E, ^M, ^L, ^I - Octals I would be happy about pages summarizing such information. Any references available?
The following is a detailed explanation.
Suppressing Normal Function of a Key; Literal Data Entry
Your first item:
Ctrl+q ‹key›
The Ctrl+q (holding the Ctrl key down then type q) is the keyboard shortcut to invoke the command quoted-insert
. After this command is invoked, the key press on your keyboard will force emacs to insert a character represented by that key, and suppress that key's normal function.
For example, if you are doing string replacement, and you want to replace tabs by returns. When emacs prompts you to type a string to replace, you can't just press the Tab key, because the normal function of a tab key in emacs will try to do a command completion. (and in other Applications, it usually switches you to the next input field) So, here you can do Ctrl+q first, then press the Tab key. Similarly, you can't type the Return key and expect it to insert a newline character, because normally the Return key will activate the OK button or signal “end of input”.
This input mechanism usually doesn't exist in other text editors. In other text editors, when you want to enter the ASCII Tab character or Carriage Return character in some pop-up dialog, you often use a special representation such as /t
or /r
instead. Or, sometimes, by holding down the mouse, then press the key. Or, they simply provide a graphical menu or check box to let you select the special characters. The need to input character literally, is frequently needed in keyboard macro apps. (See: Mac OS X Keyboard Software • Windows Keyboard Software.)
Data Entry for Non-printable Chars
Ctrl+q Ctrl+[, Ctrl+q Ctrl+m, Ctrl+q Ctrl+j, Ctrl+q Ctrl+i
Here, the Ctrl+[, Ctrl+m, Ctrl+j etc key-press combinations, are methods to input non-printable characters that may not have a corresponding key on the keyboard.
For example, suppose you want to do string replacement, by replacing Carriage Return (ASCII 13) by Line Feed (ASCII 10). Depending what is your operating system and software, usually your keyboard only has a key that corresponds to just one of these characters. But now with the special method to input non-printable characters, you can insert any of the non-printable characters.
Display Representation of Non-printable Chars
When speaking of non-printable characters, implied in the context is some standard character set. Implicitly, we are talking about ASCII, and this applies to emacs. Now, in ASCII, there are about 30 non-printable characters. Each of these is given a standard abbreviation, and several representations for different purposes. For example, ASCII 13 is the “Carriage return” character, with standard abbreviation code CR, and “^M” as its control-key-input representation. (M being the 13th of the English alphabet), and Ctrl+m is emacs's convention to input the character, and the conventional method to indicate a control key combination is by using the caret “^” followed by the character.
For full detail, look at the non-printable ASCII chars table.
(Note: Emacs has several input methods to enter any non-printable chars in Unicode. See Emacs: Insert Unicode Character.)
String Representation of Non-printable Chars in Programing Languages
\e, \r, \n, \t
This is a ad-hoc set of input and display representation for a few non-printable characters, used primarily in programing languages. This set is started by most likely the C language, and is today a de facto standard used in {C++, C#, Java, Perl, Python, PHP, JavaScript, emacs lisp, etc}.
There are good reasons that these are preferred than a literal or the more systematic caret notation. Here are some reasons:
- In programing, text processing, it turns out that only a few of the non-printable chars are particularly useful. e.g. {line Feed, Carriage Return, Horizontal Tab}.
- The representation {
\r
,\n
,\t
} for {Carriage Return, Line Feed, Horizontal Tab} are much simpler and easier to remember than the alphabet-order based Caret notation {^M
,^J
,^I
}. - In a programing language, inside a string, it is often preferable to use a visible glyph to represent a non-printable chars.
Because, for example,
"\t"
for the invisible" "
. This means we need a way to encode non-printable and control-char inside a data string. This is known as “escape mechanism”. For a escape mechanism,"\t"
is preferable over the"^i"
notation for the non-printable ASCII Horizontal Tab character. Because the caret notation is hard to remember which is which, and also ambiguous as a ^ followed by letter i.
Syntax for Keystrokes
In the above, we discussed non-printable chars:
- Its printable notation.
- Its input method.
- The notation of its input method.
- Its notation in a computer language's strings.
However, emacs also need a system to represent keystrokes (as used in its keyboard macro system and keybinding).
Keystroke notation is not just a sequence of characters. For example, the F1 key isn't a character. The Alt modifier key, isn't a character nor is it a function in one of ASCII's non-printable character. There's also key combinations (e.g. Ctrl+Alt+↑) and key sequences (e.g. Ctrl+h f). The keys on the number keypad, need a different representation than the ones on the main keyboard section.
Emacs's key notation is rather confusing, due to historical reasons from 1980s.
Here are examples of multiple representation for the same keystroke (tested in emacs 22):
;; multiple representation for the same key stroke, tested in emacs 22 ; equivalent code for a single keystroke (global-set-key "b" 'backward-char) (global-set-key [98] 'backward-char) (global-set-key [?b] 'backward-char) (global-set-key [(?b)] 'backward-char) (global-set-key (kbd "b") 'backward-char) ; equivalent code for a named special key: Enter (global-set-key "\r" 'backward-char) (global-set-key [?\r] 'backward-char) (global-set-key [13] 'backward-char) (global-set-key [(13)] 'backward-char) (global-set-key [return] 'backward-char) (global-set-key [?\^M] 'backward-char) (global-set-key [?\^m] 'backward-char) (global-set-key [?\C-M] 'backward-char) (global-set-key [?\C-m] 'backward-char) (global-set-key [(?\C-m)] 'backward-char) (global-set-key (kbd "<return>") 'backward-char) (global-set-key (kbd "RET") 'backward-char) ; equivalent code for binding 1 mod key + 1 letter key: Meta+b (global-set-key "\M-b" 'backward-char) (global-set-key [?\M-b] 'backward-char) (global-set-key [(meta 98)] 'backward-char) (global-set-key [(meta b)] 'backward-char) (global-set-key [(meta ?b)] 'backward-char) (global-set-key (kbd "M-b") 'backward-char) ; equivalent code for binding 1 mod key + 1 special key: Meta+Enter (global-set-key [M-return] 'backward-char) (global-set-key [\M-return] 'backward-char) (global-set-key [(meta return)] 'backward-char) (global-set-key (kbd "M-<return>") 'backward-char) ; equivalent code for binding Meta + cap letter key: Meta Shift b (global-set-key (kbd "M-B") 'backward-char) (global-set-key "\M-\S-b" 'backward-char) (global-set-key "\S-\M-b" 'backward-char) (global-set-key "\M-B" 'backward-char) (global-set-key [?\M-S-b] 'backward-char) ; invalid-read-syntax (global-set-key [?\M-?\S-b] 'backward-char) ; invalid-read-syntax (global-set-key [?\M-\S-b] 'backward-char) ; compile but no effect (global-set-key [?\M-B] 'backward-char) (global-set-key [\M-B] 'backward-char) ; compile but no effect (global-set-key [(meta shift b)] 'backward-char) (global-set-key [(shift meta b)] 'backward-char) (global-set-key (kbd "M-B") 'backward-char) (global-set-key (kbd "M-S-b") 'backward-char) ; compile but no effect ; Meta + shifted symbol key. (global-set-key (kbd "M-@") 'backward-char) ; good (global-set-key (kbd "M-S-2") 'backward-char) ; compile but no effect
Note: keystroke notation is not a new concept. Here are some examples of syntax from different keyboard related software:
- Mac OS X Keybinding Key Syntax
- AutoHotkey Key Syntax
- Microsoft IntelliType commands.xml Syntax
- Microsoft IntelliType Macros Syntax
- X11's xmodmap: Linux: xmodmap Tutorial
Char as Integers
One of emacs's quirk is that its character data type are simply integers.
So, a character “c” is just the integer 99 in emacs lisp.
Now, elisp has a special read syntax for chars, so that the letter “c” in lisp can also be written as ?c
instead of 99
.
This way, it is easier for programers to insert a character data in their program, and easier to read too.
A backslash can be added in front of the char, so that ?'
can be written as ?\'
.
This syntax is introduced in part so that Emacs's editing commands don't get confused (because the apostrophe is lisp syntax to quote symbols).
Many of the control characters in ASCII also have a backslash representation.
Here's a table from the elisp manual:
Character Type (ELISP Manual)
?\a ⇒ 7 ; control-g, C-g ?\b ⇒ 8 ; backspace, <BS>, C-h ?\t ⇒ 9 ; tab, <TAB>, C-i ?\n ⇒ 10 ; newline, C-j ?\v ⇒ 11 ; vertical tab, C-k ?\f ⇒ 12 ; formfeed character, C-l ?\r ⇒ 13 ; carriage return, <RET>, C-m ?\e ⇒ 27 ; escape character, <ESC>, C-[ ?\s ⇒ 32 ; space character, <SPC> ?\\ ⇒ 92 ; backslash character, \ ?\d ⇒ 127 ; delete character, <DEL>
So, the character tab (ASCII 9), can be represented in elisp as a character type data as:
9
, ?\t
.
Here's more quote from the manual:
Control characters may be represented using yet another read syntax. This consists of a question mark followed by a backslash, caret, and the corresponding non-control character, in either upper or lower case. For example, both `?\^I' and `?\^i' are valid read syntax for the character C-i, the character whose value is 9.
Instead of the `^', you can use `C-'; thus, `?\C-i' is equivalent to `?\^I' and to `?\^i':
?\^I ⇒ 9 ?\C-I ⇒ 9
… The read syntax for meta characters uses `\M-'. For example, `?\M-A' stands for M-A. You can use `\M-' together with octal character codes (see below), with `\C-', or with any other syntax for a character. Thus, you can write M-A as `?\M-A', or as `?\M-\101'. Likewise, you can write C-M-b as `?\M-\C-b', `?\C-\M-b', or `?\M-\002'.
So now, the tab char can be any of:
9 ?\t ?\^i ?\^I ?\C-i ?\C-I
Key Sequence Data Type
Thanks to diszno for a correction on ?t
.