HTML4 to HTML 4 STRICT Transition

By Xah Lee. Date:

In 2008-05, i started to change all my site from HTML 4 transitional to HTML4 strict. It was a huge, non-trivial, manual intensive job. Automation couldn't do it, because HTML does not really have strict semantic meaning, like some XML. (e.g. You could put anything in “h1” tag, even if it is not really a header. Even if you used it correctly as header, the meaning is not well defined. For example, “h1” could be the title of a book, or chapter name of the book, or section name of a chapter, of a page. Also, it's perfectly valid to have a “h3” tag without “h2” or “h1”.)

The idea of transition is that, eventually i would migrate to more and more strict, more semantically well defined, uses of HTML, so eventually i'll migrate to XHTML, then from there to absolutely well-defined tags (such as HTML Microformat). The idea is that, eventually every tag in every page have syntactical and semantic grammar, so that a dumb machine can read off a personal DTD and parse it or transform it to any form or layout.

The transition took me like full time for a few weeks, with as much automation i could possibly make it with the help of many custom Emacs Lisp scripts. But still, i didn't do about 1/3 of my 4000 pages. Considering that my HTML are all W3C valid to begin with.

I lived with that, and every week i manually convert some pages on occasion when i was revisiting on that page.

There are a few major pain in html4strict. One is that the target="_blank" is gone. So, if you want to have a link open its own window, a perfectly legit need, you now have to resort to JavaScript.

The other major difference is that all “img” tags need to be inside “div”. Also, text inside “blockquote” needs to have “p” or “pre”; not just text.