Syntax Coloring with Google-Code-Prettify
This page gives some examples using Google-code-prettify technology, and evaluates its quality.
Google-code-prettify (GCP) is a JavaScript library that syntax color computer lang source code in html. It does the job on the fly.
see [2017-06-17 https://github.com/google/code-prettify ]
Basic Use
It is very easy to use. All you have to do is to download the JavaScript files. Then, in your web page, add these lines:
<link rel="stylesheet" type="text/css" href="gcp/prettify.css"> <script src="gcp/prettify.js"></script> <body onload="prettyPrint()">
Then, in places you want your source code to be colored, wrap it with a “pre” tag like this:
<pre class="prettyprint"> x = 1+1; # something something </pre>
Advantages:
- Your lang source code is still readable in raw html. This means, if you need to edit it or update it, it can be easily done.
- Easy to use. You don't need to run some script to generate span markup everytime you have a source code snippet you want to publish.
- Easy to install. Works in all major browsers.
- Support all major langs (for example, any lang with C-like syntax). Also support some wiki syntax. It also make URLs into a links.
Disadvantages:
- Requires JavaScript turned on. If js is not turned on in browser, readers won't see colored syntax. (about 5% or less browsers have js turned off as of 2009)
- The coloring is often incorrect.
- If your lang has complex syntax, such as Perl, or contain complex regex, then GCP does not do well.
- You still need to encode characters such as LESS-THAN SIGN. For example, change
i<5
toi<5
. If you don't, browsers may not show parts of the text.
Examples
Here are some examples with different languages. For comparison, each example has a version using HTML span, done by htmlize elisp package with emacs. The GCP version used here is “small-21-Jul-2010”.
Java:
Python:
Perl:
Emacs Lisp:
Conclusion
Google-code-prettify is suitable for small number of lines or for non-critical writing such as on wiki or blogs. It is probably not the right tool for large number of code that needs more precision. (for example: computer language documentation, tech books.)
For code over 2 hundred lines, it also takes a second to load. In comparison, bulky HTML with span wraps that has more coloring and correct syntax is still instaneous.
More Comments
It appears that the concept of a simple JavaScript based parser that syntax color a number of languages on the fly is too much of a dream come true. Note that, usually, syntax coloring algorithm is specific to a language. When a editor syntax colors java code, it has code that deals with java syntax, when it syntax color python, it calls code that deals with python syntax, when it syntax color C#, F#, HTML, CSS, LaTeX, etc, there are code that deals with that particular language's syntax. For some progamer editors, it has one single generic syntax coloring module but reads in a language specific syntax file for dealing with that particular lang. This way, the program knows all the special keywords and their roles of a particular lang, and can thus color it properly according to their semantic role. Doesn't matter how it is implemented, the point here is that they deal with each language specifically. In contrast, GCP is a generic that attempt to deal with all languages, with some special code that acts as helpers for a particular lang that has syntax sufficiently different from C-like langs. This generic approach seems magical, but so far GCP' generic approach does not seem to perform anywhere close to lang-specific approach.
Another thing notable with GCP is that it uses dynamic HTML technology to
color text in html. In particular, it seems to me, GCP does not read in the text
and replace it with a HTML marked version for your browser to render. I'm not
exactly sure how GCP works technically, but this approach seems much simpler
and advantageous. However, this approach also has severe
problems. When your code contains chars such as
< >
,
they cause a lot problem in browsers when they are not encoded as
< >
. If you take the time to pre-process your source
code to encode these chars before putting it in HTML with GCP, then GCP loses
its major advantage of not requiring the pre-process step before pasting the code in a HTML page.
When you have text such as
x<y
(without space in between), the “less than” char MUST be
encoded, else it is not a valid HTML. This issue causes practical problems
too. For example, languages such as Perl, PHP, Python uses regex heavily, and
often these regexes parse URL or HTML tags. For example:
# Python import re text = r'''<p>look at this <img src="./some.gif" width="30" height="20"> ...</p>''' new = re.sub(r'src\s*=\s*"([^"]+)\.gif"', r'src="\1.png"', text) print new
If you don't encode those < >
,
browsers will freak out.
Because GCP does
not change the text, thus these regexes are passed to browser directly, and
browser will freak out when encountering raw regex, resulting in broken links or
missing text following that point.
The source code examples in this page are basically randomly picked from my
programing language tutorials, and 3 out of 4 examples had coloring problems,
but more seriously, GCP damaged the HTML links that comes after the pre
block. For example, if you just have this code:
if (x<y) {print 5;}
, then GCP will render it like this:
if (x
, and break all your HTML links after this pre block. (this may have been fixed in current version)
Theoretically, a JavaScript based parser that syntax color any number of languages on the fly in reasonable amount of time is not impossible. But GCP hasn't reached that state of maturity. If you look at the main source code of GCP, it is only 1.4k lines, too small to be realistic. In contrast, for example, js2-mode and nxml mode in emacs each contain 10k lines of elisp (of course, they do much more than syntax coloring). The htmlize-mode, which translates emacs's text properties info into html/css (given a already syntax colored buffer in emacs), is 1.7k lines of elisp itself.