Emacs Lisp: Write a Major Mode for Syntax Coloring

By Xah Lee. Date: . Last updated: .

This page shows you how to write a emacs major mode to do syntax coloring of your own language.

emacs mymath major mode
syntax color your own language

Problem

You are writing a major mode for a new language. You want keywords of the language syntax colored.

Suppose your language source code looks like this:

Sin[x]^2 + Cos[y]^2 == 1
Pi^2/6 == Sum[1/x^2,{x,1,Infinity}]

You want the words “Sin”, “Cos”, “Sum”, colored as functions, and “Pi” and “Infinity” colored as constants.

Solution

Save the following in a file.

;; a simple major mode, mymath-mode

(setq mymath-highlights
      '(("Sin\\|Cos\\|Sum" . 'font-lock-function-name-face)
        ("Pi\\|Infinity" . 'font-lock-constant-face)))

(define-derived-mode mymath-mode fundamental-mode "mymath"
  "major mode for editing mymath language code."
  (setq font-lock-defaults '(mymath-highlights)))

Now, copy and paste the above code into a buffer, then Alt+x eval-buffer.

Now, type following code into a buffer:

Sin[x]^2 + Cos[y]^2 == 1
Pi^2/6 == Sum[1/x^2,{x,1,Infinity}]

Now, Alt+x mymath-mode, you see words colored.

How Does it Work?

The string "Sin\\|Cos\\|Sum" is a regex, the font-lock-function-name-face is a predefined variable that holds the value for the default font and coloring spec used for function keywords.

[see Emacs Lisp: Regex Tutorial]

The line define-derived-mode defines your mode, named “mymath-mode”, based on the fundamental-mode. fundamental-mode is the most basic mode.

The line (setq font-lock-defaults '(mymath-highlights)) sets up the syntax highlighting for your mode.

Writing a Mode for a Language that Has Hundreds of Keywords

Typically, a language has hundreds of keywords. Elisp has a way to generate regex for your keywords.

Suppose you are writing a mode for the Linden Scripting Language (LSL). LSL has about 553 keywords. First, here's a sample of LSL source code so you get some idea of how we want it colored.

// sample LSL file

// Examples of variable declaration and assignment:
integer score = 0;
string mySay = "i ♥ you";
vector v = <3,4,5>;
list myList= [2,4,7,3];

// Example of defining a function.
// built-in function's names start with “ll” (Linden Library).
integer sum(integer a, integer b)
{
    integer result = a + b;
    return result;
}

 default
 {
     state_entry()
     {
         llSay(0, mySay);
     }

     touch_start(integer total_number)
     {
         if (score == 1) {
             llSay(0, mySay);
         } else {
             llWhisper(0, "Ouch!");
         }
     }
 }

Each type of keyword uses a different color.

Here's the code.

;;; mylsl-mode.el --- sample major mode for editing LSL. -*- coding: utf-8; lexical-binding: t; -*-

;; Copyright © 2017, by you

;; Author: your name ( your email )
;; Version: 2.0.13
;; Created: 26 Jun 2015
;; Keywords: languages
;; Homepage: http://ergoemacs.org/emacs/elisp_syntax_coloring.html

;; This file is not part of GNU Emacs.

;;; License:

;; You can redistribute this program and/or modify it under the terms of the GNU General Public License version 2.

;;; Commentary:

;; short description here

;; full doc on how to use here

;;; Code:

;; create the list for font-lock.
;; each category of keyword is given a particular face
(setq mylsl-font-lock-keywords
      (let* (
            ;; define several category of keywords
            (x-keywords '("break" "default" "do" "else" "for" "if" "return" "state" "while"))
            (x-types '("float" "integer" "key" "list" "rotation" "string" "vector"))
            (x-constants '("ACTIVE" "AGENT" "ALL_SIDES" "ATTACH_BACK"))
            (x-events '("at_rot_target" "at_target" "attach"))
            (x-functions '("llAbs" "llAcos" "llAddToLandBanList" "llAddToLandPassList"))

            ;; generate regex string for each category of keywords
            (x-keywords-regexp (regexp-opt x-keywords 'words))
            (x-types-regexp (regexp-opt x-types 'words))
            (x-constants-regexp (regexp-opt x-constants 'words))
            (x-events-regexp (regexp-opt x-events 'words))
            (x-functions-regexp (regexp-opt x-functions 'words)))

        `(
          (,x-types-regexp . 'font-lock-type-face)
          (,x-constants-regexp . 'font-lock-constant-face)
          (,x-events-regexp . 'font-lock-builtin-face)
          (,x-functions-regexp . 'font-lock-function-name-face)
          (,x-keywords-regexp . 'font-lock-keyword-face)
          ;; note: order above matters, because once colored, that part won't change.
          ;; in general, put longer words first
          )))

;;;###autoload
(define-derived-mode mylsl-mode c-mode "lsl mode"
  "Major mode for editing LSL (Linden Scripting Language)…"

  ;; code for syntax highlighting
  (setq font-lock-defaults '((mylsl-font-lock-keywords))))

;; add the mode to the `features' list
(provide 'mylsl-mode)

;;; mylsl-mode.el ends here

Now, to run the code, Alt+x eval-buffer. [see Evaluate Emacs Lisp Code]

Open the LSL language sample file given above, then Alt+x mylsl-mode. Here's the result:

emacs sample mylsl-mode
sample mylsl-mode syntax highlighting result.

Now, lets study the code above.

Note that the highlighting mechanism of font-lock-defaults is based on first-come-first-serve basis. Once a sequence of characters is colored, it won't be changed. So, the order of your list is important. In general, put longer length keywords first. (this won't fix all cases where a keyword matches part of other keywords. If your language has a lot such keywords, you need to use other forms to solve this problem. Search-based Fontification (ELISP Manual) )

The `( ,a ,b …) is a lisp special syntax to evaluate parts of elements inside the list. Inside the paren, elements preceded by a , will be evaluated.

In the above, we based our mode on c-mode, because the syntax is similar. Basing on a similar language's mode will save you time in coding many features, such as handling comment and indentation.

The line:

(provide 'mylsl-mode)

adds the symbol mylsl-mode to the variable features list. [see Emacs Lisp: provide, require, features]

Font Lock Mode Basics

For many languages, the syntax coloring are not fixed set of strings. For example, in XML, you have <xyz></xyz> pattern where the xyz can be anything.

emacs syntax coloring 2022-06-22 cB77m
emacs syntax coloring 2022-06-22 cB77m

To handle more complex syntax coloring, continue to

Emacs Lisp: Font Lock Mode Basics