A Text Editor Feature: Extend Selection by Semantic Unit

By Xah Lee. Date: . Last updated: .

This article introduces a feature in the Wolfram Language, Mathematica notebook, that could be useful in any editor and for any language.

In Mathematica, a user can press a key Ctrl+., and the token the cursor is on will be selected (highlighted). When the key is pressed again, the selection expands to highlight the next smallest semantic unit. When the key is pressed again, it extends further.

Example with Mathematica Syntax

Wolfram Language 2021-05-13
Wolfram Language 2021-05-13

Here is a example of Mathematica code with highlights showing its extend selection behavior, starting at the “n” inside the braces, extend outwards to cover higher level syntactical unit.

syntax highlight Wolfram Language 2021-05-13
syntax highlight Wolfram Language 2021-05-13

Examples for C-Like Syntax

Here's some examples on a language with C-like syntax (C, C++, C#, Java, JavaScript, and others).

syntax highlight c 2021-05-13
syntax highlight c 2021-05-13

Nested Syntax Examples: XML

For a language with nested syntax, suppose we have this XML example:

syntax highlight html 2021-05-13
syntax highlight html 2021-05-13

If the cursor is inside a tag's enclosing content, say, on the letter T in the string “Gulliver's Travels” inside the <title> tag, then the repeated extension is obvious. But, suppose the cursor is at t in the “alternate” inside the “link” tag, then it would first select the whole “alternate” word, then expand to the double quotes “"alternate"”, then the whole property “rel="alternate"”, then the whole link tag, then the whole content of the entry tag, then including the <entry> tags itself.

Lisp Example

For the lisp, the language syntax is almost pure nested parentheses (exceptions are chars such as ; ' , @ | # that have special syntactical meanings). Here's some example on how this feature would work in lisp.

syntax highlight lisp 2021-05-13
syntax highlight lisp 2021-05-13

Note: emacs's lisp mode provides several functions to traverse nested syntax: backward-sexp, forward-sexp, backward-up-list, down-list, backward-list, forward-list, mark-sexp. Effectively, it is relatively trivial to implement the above extend-selection-semantic-unit function. You just need to call one of the sexp walking function to move the cursor to the right place, then call mark-sexp.

Summary

In summary, this extend selection feature is a lexical syntax tree walker. Each invocation will go up one level on the syntax tree and select all its branches.

Ideally, the editor includes a full parser for the language, and is able to use the parser to fully read in source code and regenerate it on the fly for the purposes of reformatting the code. However, it is important to note that is only the ideal situation. A full parser in emacs include for languages elisp, XML (nxml mode), JavaScript (js2 mode), but not for most languages. Also, parsers often discard comments, thus is not usable. Also, parsers expect the code to be valid, so cannot be used for formatting code that are being edited.

A more practical solution is to have the algorithm base on text processing approach using a simple lexical scanner. (which is in fact the case in all emacs's language modes, even for elisp mode.)

Emacs Implementation

For the lisp case, it's easy to implement. See a implementation at

For the case of languages with C syntax, a practical solution that works 99% of the time should be easy. The selection will extend somewhat like the following sequence:

Elisp system has many functions that already understand each of these syntactical units. It's not difficult to put the whole together.

For the XML case, with its regular nested syntax of start/end tags where the start tag may contain tokens in sequence, one may need a bit more work than the lisp case, but there's already a full XML parser in the nxml mode.

Solution: expand-region mode

Magnar Sveen has written expand-region mode. See: https://github.com/magnars/expand-region.el