ELisp: Syntax Color Source Code in HTML

By Xah Lee. Date: . Last updated: .

This page shows you how to write a emacs lisp command to syntax color computer language source code in HTML.

Problem

Write a command “htmlize-pre-block”. When called, it syntax color the computer language source code under cursor.

For example, here's a elisp code snippet:

(if (< 3 2) (message "yes") )

Here's what you want:

htmlize 2018-09-28 3ab4c

Here's how it looks like in a web browser:

htmlize 2018-09-28 672b9

What we want is a command, that will htmlize the current region of text.

Solution

There is a emacs package that transforms any colored text in emacs to HTML form.

The package is htmlize.el, is written by Hrvoje Niksic, at https://github.com/hniksic/emacs-htmlize

This package primarily gives you 3 new commands:

  1. htmlize-region. Output to a new buffer.
  2. htmlize-buffer. Output to a new buffer.
  3. htmlize-file. Takes a input file name, output to new file.

This will help us a lot.

Here's a outline of our algorithm:

  1. Grab the text inside the <pre class="lang_name">…</pre> tag the cursor is in.
  2. Create a temp buffer. Insert the text in.
  3. Set the new buffer to a major mode corresponding to lang_name, and fontify it.
  4. Call htmlize-buffer
  5. From the htmlize-buffer output, grab the (htmlized) text inside <pre> tag.
  6. Kill the htmlize output buffer and my temp buffer.
  7. Delete the original text, insert in the htmlized text.

To achieve the above, i decided on 2 steps:

  1. Write a function “htmlize-string” that takes a string and mode name, and returns the htmlized string.
  2. Write a function “htmlize-pre-block” that does the steps of grabbing text, calls “htmlize-string”, then replace original text with the new.

Htmlize String Function

The “htmlize-string” takes a string and a mode name, and returns a htmlized string.

(defun xah-html-htmlize-string (@input-str @major-mode-name)
  "Take @input-str and return a htmlized version using @major-mode-name.
The purpose is to syntax color source code in HTML.

If @major-mode-name is string. It'll be converted to symbol and if is not in `obarray', `fundamental-mode' is used.

This function requires the `htmlize-buffer' from htmlize.el by Hrvoje Niksic.

Version 2018-09-28"
  (interactive)
  (let ($output-buff
        $resultStr
        ($majorModeSym (intern-soft @major-mode-name)))
    ;; put code in a temp buffer, set the mode, fontify
    (with-temp-buffer
      (insert @input-str)
      (if (fboundp $majorModeSym)
          (funcall $majorModeSym)
        (fundamental-mode))
      (font-lock-ensure)
      (setq $output-buff (htmlize-buffer)))
    ;; extract the fontified source code in htmlize output
    (with-current-buffer $output-buff
      (let ($p1 $p2 )
        (setq $p1 (search-forward "<pre>"))
        (setq $p2 (search-forward "</pre>"))
        (setq $resultStr (buffer-substring-no-properties (+ $p1 1) (- $p2 6)))))
    (kill-buffer $output-buff)
    $resultStr ))

It's called like this:

(xah-html-htmlize-string inputStr "js-mode")

Language Name to Major Mode Map

We need this, so that we can map a HTML class name to a emacs major mode.

e.g. <pre class="js"> maps to “javascript-mode”.

(defvar xah-html-lang-name-map nil "a alist that maps lang name. Each element has this form
 (‹lang code› . [‹emacs major mode name› ‹file extension›])
For example:
 (\"emacs-lisp\" . [\"xah-elisp-mode\" \"el\"])")

(setq xah-html-lang-name-map
      '(
        ("ahk" . ["ahk-mode" "ahk"])

        ("code" . ["fundamental-mode" "txt"])
        ("output" . ["fundamental-mode" "txt"])

        ("bash" . ["sh-mode" "sh"])
        ("unix-config" . ["conf-space-mode" "conf"])
        ("cmd" . ["dos-mode" "bat"])

        ("bbcode" . ["xbbcode-mode" "bbcode"])
        ("markdown" . ["markdown-mode" "md"])
        ("c" . ["c-mode" "c"])
        ("cpp" . ["c++-mode" "cpp"])
        ("common-lisp" . ["lisp-mode" "lisp"])

        ("org-mode" . ["org-mode" "org"])

        ;; ("clojure" . ["clojure-mode" "clj"])
        ("clojure" . ["xah-clojure-mode" "clj"])
        ("typescript" . ["typescript-mode" "ts"])
        ("css" . ["xah-css-mode" "css"])
        ("emacs-lisp" . ["xah-elisp-mode" "el"])
        ("dart" . ["dart-mode" "dart"])
        ("haskell" . ["haskell-mode" "hs"])
        ("golang" . ["go-mode" "go"])
        ("html" . ["xah-html-mode" "html"])
        ("mysql" . ["sql-mode" "sql"])
        ("xml" . ["sgml-mode" "xml"])
        ("html6" . ["xah-html6-mode" "html6"])
        ("java" . ["java-mode" "java"])
        ("js" . ["xah-js-mode" "js"])
        ("nodejs" . ["xah-js-mode" "js"])
        ("lsl" . ["xlsl-mode" "lsl"])
        ("latex" . ["latex-mode" "txt"])
        ("ocaml" . ["tuareg-mode" "ml"])
        ("perl" . ["perl-mode" "pl"])
        ("php" . ["xah-php-mode" "php"])
        ("povray" . ["pov-mode" "pov"])
        ("powershell" . ["powershell-mode" "ps1"])
        ("python" . ["python-mode" "py"])
        ("python3" . ["python-mode" "py3"])
        ("qi" . ["shen-mode" "qi"])
        ("ruby" . ["ruby-mode" "rb"])
        ("scala" . ["scala-mode" "scala"])
        ("apl" . ["gnu-apl-mode" "apl"])
        ("scheme" . ["scheme-mode" "scm"])
        ("racket" . ["racket-mode" "rkt"])
        ("prolog" . ["prolog-mode" "prolog"])
        ("yasnippet" . ["snippet-mode" "yasnippet"])
        ("vbs" . ["visual-basic-mode" "vbs"])
        ("visualbasic" . ["visual-basic-mode" "vbs"])
        ("mathematica" . ["fundamental-mode" "m"])
        ("math" . ["fundamental-mode" "txt"])

        ("slim" . ["slim-mode" "slim"])
        ("yaml" . ["yaml-mode" "yaml"])
        ("haml" . ["haml-mode" "haml"])
        ("sass" . ["sass-mode" "sass"])
        ("scss" . ["xah-css-mode" "css"])

        ("vimrc" . ["vimrc-mode" "vim"])))
(defvar xah-html-lang-mode-list nil "List of supported language mode names.")
(setq xah-html-lang-mode-list (mapcar (lambda (x) (aref (cdr x) 0)) xah-html-lang-name-map))
(defun xah-html-langcode-to-major-mode-name (@lang-code @lang-code-map)
  "get the `major-mode' name associated with @lang-code.
return major-mode name as string. If none found, return nil.
Version 2017-01-10"
  (interactive)
  (elt (cdr (assoc @lang-code @lang-code-map)) 0))

Htmlize Region

Here's the code to htmlize region.

(defun xah-html-htmlize-region (@p1 @p2 @mode-name )
  "Htmlized region @p1 @p2 using `major-mode' @mode-name.
This function requires the `htmlize-buffer' from htmlize.el by Hrvoje Niksic.
Version 2016-12-18 (2022-06-20)"
  (interactive
   (list (region-beginning)
         (region-end)
         (completing-read "Chose mode for coloring:" xah-html-lang-mode-list)))
  (let* (
         ($input-str (buffer-substring-no-properties @p1 @p2))
         ($out-str (xah-html-htmlize-string $input-str @mode-name)))
    (if (string-equal $input-str $out-str)
        nil
      (progn
        (delete-region @p1 @p2)
        (insert $out-str)))))

htmlize-pre-block

Here's the code of “htmlize-pre-block” function:

(defun xah-html-get-precode-langCode ()
  "Get the langCode and position boundary of current HTML pre block.
A pre block is text of this form
 <pre class=\"‹langCode›\">…▮…</pre>.
Your cursor must be between the tags.

Returns a vector [langCode pos1 pos2], where pos1 pos2 are the boundary of the text content.
Version 2018-09-28"
  (interactive)
  (let ($langCode $p1 $p2)
    (save-excursion
      (re-search-backward "<pre class=\"\\([-A-Za-z0-9]+\\)\"") ; tag begin position
      (setq $langCode (match-string 1))
      (setq $p1 (search-forward ">")) ; text content begin
      (backward-char 1)
      (xah-html-skip-tag-forward)
      (setq $p2 (search-backward "</pre>")) ; text content end
      (vector $langCode $p1 $p2))))
(defun xah-html-htmlize-precode (@lang-code-map)
  "Replace text enclosed by “pre” tag to htmlized code.

For example, if the cursor is inside the pre tags <pre class=\"‹langCode›\">…▮…</pre>, then after calling, the text inside the pre tag will be htmlized. That is, wrapped with many span tags for syntax coloring.

The opening tag must be of the form <pre class=\"‹langCode›\">.  The ‹langCode› determines what emacs mode is used to colorize the text. See `xah-html-lang-name-map' for possible ‹langCode›.

Cursor will end up right before </pre>.

See also: `xah-html-dehtmlize-precode', `xah-html-toggle-syntax-coloring-markup'.
This function requires the `htmlize-buffer' from htmlize.el by Hrvoje Niksic.
Version 2018-09-28"
  (interactive (list xah-html-lang-name-map))
  (let* (
         ($precodeData (xah-html-get-precode-langCode))
         ($langCode (elt $precodeData 0))
         ($p1 (elt $precodeData 1))
         ($p2 (elt $precodeData 2))
         ($modeName (xah-html-langcode-to-major-mode-name $langCode @lang-code-map)))
    (xah-html-htmlize-region $p1 $p2 $modeName t)))

Setting Up htmlize.el and CSS

Note: quote from htmlize.el's header documentation:

htmlize supports three types of HTML output, selected by setting `htmlize-output-type': `css', `inline-css', and `font'. In `css' mode, htmlize uses cascading style sheets to specify colors; it generates classes that correspond to Emacs faces and uses <span class=FACE>…</span> to color parts of text.

My functions assume you are using the CSS mode output. This means, you'll have to do a one-time manual process of taking the CSS code generated by the htmlized output and place it in your own HTML page to reference it. You can use my CSS code for language here:

pre {font-family: "Courier", monospace;}

pre.bash,
pre.bbcode,
pre.code,
pre.cpp,
pre.css,
pre.emacs-lisp,
pre.html,
pre.java,
pre.js,
pre.mathematica,
pre.ocaml,
pre.org-mode,
pre.output,
pre.perl,
pre.python,
pre.python3,
pre.ruby,
pre.xml
{
min-width:44%;
white-space:pre-wrap;
background-color:#eeeeee;
border:solid thin grey;
padding:.5rem;
margin:.5rem;
border-radius: 1rem;
}

pre:before {content:"";
position:relative;
top:-1ex;
right:0;
float:right;
color:black;
text-shadow: 0.2ex 0.2ex 0.2ex white;
}

pre.bash {background-color:hsl(22,24%,85%)}
pre.bash:before {content:"bash"}
pre.code {background-color:hsl(0,0%,95%)}
pre.css {background-color:hsl(160,50%,97%)}
pre.css:before {content:"CSS"}
pre.emacs-lisp {background-color:hsl(120,100%,98%)}
pre.emacs-lisp:before {content:"emacs lisp"}
pre.html {background-color:hsl(244,61%,90%)}
pre.html:before {content:"HTML"}
pre.java {background-color:hsl(280,50%,97%)}
pre.java:before {content:"Java"}
pre.js {background-color:hsl(70,50%,95%)}
pre.js:before {content:"JavaScript"}
pre.mathematica {background-color:hsl(103,47%,82%)}
pre.mathematica:before {content:"Wolfram Language"}
pre.ocaml {background-color: hsl(180,50%,97%)}
pre.ocaml:before {content:"OCaml"}
pre.org-mode {background-color:hsl(158,27%,75%)}
pre.output {background-color:hsl(0,0%,95%)}
pre.perl {background-color:hsl(200,50%,98%)}
pre.perl:before {content:"Perl"}
pre.python {background-color:hsl(159,40%,92%)}
pre.python3 {background-color:hsl(165,49%,86%)}
pre.python3:before {content:"Python 3"}
pre.python:before {content:"Python 2"}
pre.ruby {background-color:hsl(90,50%,97%)}
pre.ruby:before {content:"Ruby"}
pre.xml {background-color:hsl(230,50%,95%)}
pre.xml:before {content:"XML"}

pre .bold {font-weight:bold}
pre .builtin {color:#483d8b}
pre .comment {color:#b22222}
pre .comment-delimiter {color:#b22222}
pre .constant {color:#008b8b}
pre .doc {color:#8b2252}
pre .function-name {color:#0000ff}
pre .keyword {color:#a020f0}
pre .preprocessor {color:hsl(314,19%,30%);background-color:silver}
pre .string {color:#8b2252}
pre .type {color:#228b22}
pre .underline {text-decoration:underline}
pre .variable-name {color:#a0522d}
pre .warning {color:#ff0000;font-weight:bold}

span.css-property {color:#a0522d}
span.css-selector {color:#0000ff}

pre.xml span.sgml-namespace {color:#da70d6}

If your HTML is in Unicode UTF-8 encoding, you might add the following to your emacs init file:

(progn
  (require 'htmlize)
  (when (fboundp 'htmlize-region)
    (setq htmlize-convert-nonascii-to-entities nil)
    (setq htmlize-html-charset "utf-8")
    (setq htmlize-untabify nil)))

They will prevent htmlize creating ugly HTML entities. For example, if you have a bullet char “•” (Unicode U+2022), you will see the character as is instead of &#x2022.

If you are not familiar with {HTML, CSS}, see:

Dehtmlize Text, and Move Code to a File

It's also convenient to remove all the span tags and get just plain source code text.

Also, a command to move the current pre code block into a file. So you can actually run the code.

For these commands, see Emacs: Xah HTML Mode

JavaScript Solution

Google has a open source technology that uses JavaScript to color code in HTML on the fly instead of using the bulky markup. For detail, see: Syntax Coloring with Google-Code-Prettify