Programing Language Design: Syntax Sugar Problem: Irregularity vs Convenience

, ,

one of the idiocy of HTML spec is that the “pre” tag discards the first blank line.

for example, if you have:

<pre style="border:solid thin red">
x = 3
</pre>

Here's how your browser renders it:

x = 3

The first blank line is ignored. However, only the FIRST blank line is ignored. If you have 2 blank lines in the beginning, it'll be rendered with 1 blank line.

<pre style="border:solid thin red">

x = 3
</pre>

x = 3

They do this, because, it's convenient for coder. Because, one likes to see the pre content aligned to the left in raw HTML.

For example, you rather write it this way:

<pre>
1
2
3
</pre>

than

<pre>1
2
3</pre>

this is a idiocy because it mixes convenience with syntax.

According to http://www.w3.org/TR/html4/appendix/notes.html#notes-line-breaks, quote:

SGML (see [ISO8879], section 7.6.1) specifies that a line break immediately following a start tag must be ignored, as must a line break immediately before an end tag. This applies to all HTML elements without exception.

The problem comes, when you have programs that deal with code. That's why, in programing, computing tech, there are one hundred exceptions, irregularities, and thus bugs, headaches. The worst offender is unix shell syntax. 〔➤ Unix Shell Syntax Irregularities Galore

At first, syntax conveniences like these are nice. The rules are lax, and you use it without problems. But then, once the language grew, and you deal with many languages, you find everywhere there's exceptions, special rules, and you can't remember what rule they thought were convenient at the time, and there is no simple systematic rule about them. Each one becomes a ad hoc syntax soup of hell.

C = Syntax Soup

Almost all languages abuse syntax sugar for convenience, to various degrees. C language syntax is worst. It is basically of no design. Most of the syntax “design” is based on user's typing convenience at the time. 〔➤ Programing: Why I Hate C

LISP = Irregular Monster Hidden

Even lisp, didn't escape this problem. Contrary to popular belief, there are quite a few irregularities in lisp syntax. 〔➤ Fundamental Problems of Lisp

XML Syntax is Regular??

Even XML, whose syntax is more regular than lisp, cannot escape irregularities.

Here's a sample valid XML, from ATOM webfeed.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:base="http://xahlee.info/comp/">

  <title>…</title>
  <subtitle>…</subtitle>
  <link rel="self" href="blog.xml"/>
  <link rel="alternate" href="blog.html"/>
  <updated>…</updated>
  <author>
    <name>…</name>
    <uri>…</uri>
  </author>
  <id>…</id>
  <icon>…</icon>
  <rights>…</rights>

  <entry>
    <title>…</title>
    <id>…</id>
    <updated>…</updated>
    <summary>…</summary>
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml">
        <p><a href="…">…</a></p>
        <p>…</p>
        <p>…</p>
      </div>
    </content>
    <link rel="alternate" href="…"/>
  </entry>

</feed>

Can you spot the syntax irregularity?

Sin: Omitting Ending Tags

Another major problem of HTML irregularity is omitted ending tags. HTML, due to its SGML baggage, has what's called self-closing tags. When XML was hot, it fixed the problem by requiring all such tags with a special syntax of a slash at the end. For example:

<img src="cat.jpg" alt="cat" />

but the maverick HTML5 started by commercial Apple, Mozilla, Opera, twarted all this, and in fact wanted to kill XML and is succeeding.

Big offender is Google, telling users to omit ending tags in their HTML5 style guide. The consequence is that people will omit ending tags that are not allowed to be omitted, and we are back to syntax-soup quirk-mode hell. See:

How to Solve the Syntax Sugar Problem?

This problem should be solved by clear separation of issues. For example, XML takes the regularity approach, and you can have editors that represent the data to the user in a most easy-to-read format, or structural editors.

Another approach is Mathematica, where you have a systematic syntax layer. So, at the bottom layer, it's purely nested like XML and LISP, but without irregularities, and another layer on top, that supports all the syntax warts we human have got used to, as in traditional math notation and infix notation. Yet, there's a simple, regular, systematic, transformation rules that can change these two layers easily.

Instead of syntax sugar, you should have a 100% regular syntax, or a layer with systematic rule, and let editor deal with it, and present code to user in a different layer.

blog comments powered by Disqus