Syntax Design: Use of Unicode Matching Brackets as Specialized Delimiters
In my tech blogs, often i give instructions involving the graphical menu. For example, i'd say: it's at the menu “File ▸ Open”. Today i decided to use a special delimiter to indicate menu. The delimiter is the Unicode 〖WHITE LENTICULAR BRACKET〗 (U+3016, U+3017). So, the menu would be written as 〖File ▸ Open〗. I just spend a couple hours changing all mentions of menu on my site to use the new delimiter. (in my Emacs Tutorial, there are 64 invocations of menu, among ~300 files.)
Here is a summary of my usage of special Unicode brackets:
- ANGLE BRACKET. Article title. For example, 〈Xah's Emacs Lisp Tutorial〉.
- DOUBLE ANGLE BRACKET. Book title. For example, 〈Basic Economics〉.
- BLACK LENTICULAR BRACKET. Key combinations. For example, 【Ctrl+c】.
- WHITE LENTICULAR BRACKET. Menu. For example, 〖File ▸ Open〗.
- TORTOISE SHELL BRACKET. File names, path, URL. For example, 〔~/Documents/notes.txt〕.
- CORNER BRACKET. Computer code, or math expression. For example, 「x = 3;」.
- ANGLE QUOTATION MARK. A variable in computer language syntax description. For example,
file = ‹file path›
. - DOUBLE ANGLE QUOTATION MARK. verbatim quote. For example, «Every society honors its live conformists and its dead troublemakers.» — Mignon McLaughlin.
- DOUBLE QUOTATION MARK. Generic delimiter. For example, “something”.
Why Are These Brackets Chosen?
There are many other brackets in Unicode. 〔see Matching Brackets in Unicode〕 I choose these brackets and my use of them carefully. The following are the reasons:
- The meaning i assigned to them must be compatible with the semantics given to the char in Unicode.
- It must be a fairly common character, so that most browsers, editors, fonts, or other tools can display them. (For example, Windows or Mac machines made 4 years ago must have no problem displaying them, out of the box.)
All the brackets i've used are common ones. The “curly quote” and ‹angle/French quote› are widely used in western languages. The 〈〉《》【】〖〗「」〔〕 are used daily in Chinese and Japanese. 〔see Intro to Chinese Punctuation〕 These languages are widely used in computing in China and Japan, and they are also widely supported even in non-Asian countries.
If a font or tool has any support for Unicode, these brackets are probably among the top 100 or so symbols supported.
Are the Use of These Delimiters Necessary?
Are the Use of These Delimiters Necessary? Not all, but they provide meaningful info, as visual enhancement but especially for computer processing.
For example, once you realized that the lenticular bracket 【Ctrl+x】 is a marker for computer keyboard shortcut notation, users can easily recognize all keys on the page at a glance.
For another example, with these markers, i can easily write a program that extract all book titles, computer keys shortcuts mentioned, program menus, or code snippets from my website articles (of few thousand files). Without these markers, the problem is non-trivial.
Here is a example of the benefit of computer recognition: suppose in my Emacs Tutorial, i want to add interactive annotation for all emacs key shortcuts mentioned in the tutorial. (emacs has few hundred key shortcuts by default) When user hovers mouse over a emacs key shortcut on the article, it should have a pop-up box indicating the associated name of the command. When keys are marked with a specific delimiter for that purpose, such as 【Ctrl+x】, a program can trivially identify all of them.
What About Using HTML Markup Instead?
HTML markup is great. It serves the same purpose. I have dithered on whether to use HTML markup instead, or by special brackets in Unicode, or a mixture of both. I've experimented with that over the past 2 years. Right now, i use a mixture of both.
Here is a sample HTML markup snippet:
Computer code: 「x = 3;」 Keyboard shortcut: <span class="keyboard_shortcut">Ctrl+c</span> Book Title: <span class="book_title">Emacs Tutorial</span>
Here is a CSS definition that automatically makes a text colored, and also inserts the brackets for display, for any text marked up with the “code” tag:
code{color:red;font-family:"DejaVu Sans Mono",monospace} code:before,code:after{color:black;background-color:white} code:before{content:"「"} code:after{content:"」"}
The advantage of HTML markup is that it's a more elaborate system.
For example, you can color the text, specify font, text size. You can add brackets if you want. The markup is also more precise. For example, with <span class="book_title">…</span>
, there's a 99.99% certaintly that the text enclosed is a book title, while a text enclosed by bracket 《…》 could mean something else (just look at this page you are reading, where the text inside that bracket is not necessarily book title.)
The disadvantage is that it's much more verbose, and makes the raw source code much harder to read.
Right now, all my book titles, article titles, computer code snippet, are marked using HTML, and CSS is used to display specialized brackets for visual clue.
A Finer Point: Are Delimiter Brackets Semantically Meaningful or Just for Visual Enhancement?
Suppose you use CSS. For example, a book title is wrapped up by HTML tag like this:
<span class="book_title">The Story Of My Life</span>
and here's CSS code to add color:
span.book_title{color:red}
You can also add brackets:
span.book_title:before{content:"《"} span.book_title:after{content:"》"}
If you want the text to be colored, you must use CSS. However, you can add the bracket in the text without relying on CSS, like this:
《<span class="book_title">The Story Of My Life</span>》
The question for me was, should the bracket be part of the text or added by CSS? Which format should i choose?
The answer depends on whether the bracket is considered just a visual enhancement, or semantically meaningful. If it's just visual enhancement, then it should be part of CSS (Cascading Style Sheet), as implied by the word “style” in its name. When CSS is off, readers won't see the bracket, and it doesn't matter. However, if the bracket is considered semantically meaningful, then it should not be in CSS. That way, doesn't matter whether CSS is on or off, you still see the bracket.
There are opposing views on whether the bracket should be in text or added by CSS.
(1) The brackets are semantically meaningful, thus should be part of text. For example, in Chinese, book titles are enclosed by angle brackets. They are semantically meaningful. It is not just a decoration. In the same way, western text involving matched pairs: “curly quotes”, «french quote», or various brackets (paren), [square bracket], {braces}, are almost always semantically meaningful. If you remove them, it effects the text in major ways.
(2) A bracket in a text when the text is already marked up, is redundant. Therefore, in this view, one should add the brackets by CSS and not in the text. Even though CSS is considered for appearances, but the fact is that appearances, layout, are often intertwined with semantics to various degrees. Positioning (layout), sizes, often adds subtle but non-trivial semantics to a page. In practice, probably a significant percentage of web pages would become unreadable or its meaning effected if you turn off CSS, and as a fact, probably less than 0.01% pages are ever read without CSS. The bottom line of this reasoning is that, if you use the HTML/CSS tech bundle, then you shouldn't add the bracket in the text, because it's already precisely marked up. Add the bracket by CSS.
Right now i haven't decided which is “better”. More precisely, i think one way might be more suitable than the other, if a more precise goal, purpose, is given. As for now for me, it doesn't matter much for the purpose of online articles.
As a example where it might matters, is when in defining a document using XML, or the article in HTML is a basis for printed publication that goes thru further processing. (for example, The finely printed book A New Kind of Science is based on Mathematica notebook format. (see also: Notes on A New Kind of Science.) Some books are based on HTML/CSS tech. For example, Håkon Wium Lie's book. Some books are based on unix's troff system (man pages). Then there systems expressly designed for publishing, layout, typesetting: QuarkXPress, Adobe InDesign (PageMaker), DocBook, LaTeX, etc. When using one of these specialized technology, especially with a particular format or standard by some publication or organization's conventions, many subtle question of syntax details are naturally answered.)
Unicode, Encoding, Escape Sequence, Issues
- Unicode Symbol for “e.g.” (exempli gratia)
- Semantics and Symbols: Examples of Unicode Symbols Usage
- Semantic of Symbol: Unicode Ellipsis Symbol vs Dot Dot Dot
- Problems of Symbol Congestion in Computer Languages; ASCII Jam vs Unicode
- Programing Language Design: String Syntax
- Syntax Design: Use of Unicode Matching Brackets as Specialized Delimiters
- Unicode Semantics: the ∀ in Turn A Gundam
- URL Percent Encoding and Unicode
- URL Percent Encoding and Ampersand Char
- Semantic of Symbols: HTML Entities, Ampersand, Unicode