Emacs Lisp: Write a Major Mode for Syntax Coloring
This page shows you how to write a emacs major mode to do syntax coloring of your own language.

Problem
You are writing a major mode for a new language. You want keywords of the language syntax colored.
Suppose your language source code looks like this:
Sin[x]^2 + Cos[y]^2 == 1 Pi^2/6 == Sum[1/x^2,{x,1,Infinity}]
You want the words “Sin”, “Cos”, “Sum”, colored as functions, and “Pi” and “Infinity” colored as constants.
Solution
Save the following in a file.
;; a simple major mode, mymath-mode (setq mymath-highlights '(("Sin\\|Cos\\|Sum" . 'font-lock-function-name-face) ("Pi\\|Infinity" . 'font-lock-constant-face))) (define-derived-mode mymath-mode fundamental-mode "mymath" "major mode for editing mymath language code." (setq font-lock-defaults '(mymath-highlights)))
Now, copy and paste the above code into a buffer, then Alt+x eval-buffer
.
Now, type following code into a buffer:
Sin[x]^2 + Cos[y]^2 == 1 Pi^2/6 == Sum[1/x^2,{x,1,Infinity}]
Now, Alt+x mymath-mode, you see words colored.
How Does it Work?
The string "Sin\\|Cos\\|Sum"
is a regex, the font-lock-function-name-face is a predefined variable that holds the value for the default font and coloring spec used for function keywords.
[see Emacs Lisp: Regex Tutorial]
The line define-derived-mode
defines your mode, named “mymath-mode”, based on the fundamental-mode
.
fundamental-mode is the most basic mode.
The line (setq font-lock-defaults '(mymath-highlights))
sets up the syntax highlighting for your mode.
Writing a Mode for a Language that Has Hundreds of Keywords
Typically, a language has hundreds of keywords. Elisp has a way to generate regex for your keywords.
Suppose you are writing a mode for the Linden Scripting Language (LSL). LSL has about 553 keywords. First, here's a sample of LSL source code so you get some idea of how we want it colored.
// sample LSL file // Examples of variable declaration and assignment: integer score = 0; string mySay = "i ♥ you"; vector v = <3,4,5>; list myList= [2,4,7,3]; // Example of defining a function. // built-in function's names start with “ll” (Linden Library). integer sum(integer a, integer b) { integer result = a + b; return result; } default { state_entry() { llSay(0, mySay); } touch_start(integer total_number) { if (score == 1) { llSay(0, mySay); } else { llWhisper(0, "Ouch!"); } } }
Each type of keyword uses a different color.
Here's the code.
;;; mylsl-mode.el --- sample major mode for editing LSL. -*- coding: utf-8; lexical-binding: t; -*- ;; Copyright © 2017, by you ;; Author: your name ( your email ) ;; Version: 2.0.13 ;; Created: 26 Jun 2015 ;; Keywords: languages ;; Homepage: http://ergoemacs.org/emacs/elisp_syntax_coloring.html ;; This file is not part of GNU Emacs. ;;; License: ;; You can redistribute this program and/or modify it under the terms of the GNU General Public License version 2. ;;; Commentary: ;; short description here ;; full doc on how to use here ;;; Code: ;; create the list for font-lock. ;; each category of keyword is given a particular face (setq mylsl-font-lock-keywords (let* ( ;; define several category of keywords (x-keywords '("break" "default" "do" "else" "for" "if" "return" "state" "while")) (x-types '("float" "integer" "key" "list" "rotation" "string" "vector")) (x-constants '("ACTIVE" "AGENT" "ALL_SIDES" "ATTACH_BACK")) (x-events '("at_rot_target" "at_target" "attach")) (x-functions '("llAbs" "llAcos" "llAddToLandBanList" "llAddToLandPassList")) ;; generate regex string for each category of keywords (x-keywords-regexp (regexp-opt x-keywords 'words)) (x-types-regexp (regexp-opt x-types 'words)) (x-constants-regexp (regexp-opt x-constants 'words)) (x-events-regexp (regexp-opt x-events 'words)) (x-functions-regexp (regexp-opt x-functions 'words))) `( (,x-types-regexp . 'font-lock-type-face) (,x-constants-regexp . 'font-lock-constant-face) (,x-events-regexp . 'font-lock-builtin-face) (,x-functions-regexp . 'font-lock-function-name-face) (,x-keywords-regexp . 'font-lock-keyword-face) ;; note: order above matters, because once colored, that part won't change. ;; in general, put longer words first ))) ;;;###autoload (define-derived-mode mylsl-mode c-mode "lsl mode" "Major mode for editing LSL (Linden Scripting Language)…" ;; code for syntax highlighting (setq font-lock-defaults '((mylsl-font-lock-keywords)))) ;; add the mode to the `features' list (provide 'mylsl-mode) ;;; mylsl-mode.el ends here
Now, to run the code, Alt+x eval-buffer
.
[see Evaluate Emacs Lisp Code]
Open the LSL language sample file given above, then Alt+x mylsl-mode
. Here's the result:

Now, lets study the code above.
Note that the highlighting mechanism of font-lock-defaults is based on first-come-first-serve basis. Once a sequence of characters is colored, it won't be changed. So, the order of your list is important. In general, put longer length keywords first. (this won't fix all cases where a keyword matches part of other keywords. If your language has a lot such keywords, you need to use other forms to solve this problem. Search-based Fontification (ELISP Manual) )
The `( ,a ,b …)
is a lisp special syntax to evaluate parts of elements inside the list. Inside the paren, elements preceded by a ,
will be evaluated.
In the above, we based our mode on c-mode
, because the syntax is similar. Basing on a similar language's mode will save you time in coding many features, such as handling comment and indentation.
The line:
(provide 'mylsl-mode)
adds the symbol mylsl-mode
to the variable features list.
[see Emacs Lisp: provide, require, features]
Font Lock Mode Basics
For many languages, the syntax coloring are not fixed set of strings. For example, in XML, you have
<xyz>…</xyz>
pattern where the
xyz
can be anything.

To handle more complex syntax coloring, continue to