I learned the term Microformat today. Basically, you just use your html's “class” attribute and tag structure to represent structured data, so that you can parse and manipulate them easily. It is more or less a home-cooked method of using HTML/XHTML to achieve the purposes of specialized XML. (The concept of microformat is similar to many software's ad hoc line-based text file formats. (⁖ unix config files))
I've been using microformat for my English Vocabulary project. For example, see the source code for this page: Vocabulary Study: Hyphenated Wonders. Effectively, i created a microformat for vocabulary citation. Namely, each entry is a word entry, with container for usage example, cited source, definition. Here's a example:
<div class="ent"> <p class="wd">‹a word›</p> <div class="ex"> <div class="bdy">‹some example usage involving the word›</div> <div class="src">‹source of the above example usage›</div> </div> <div class="def">‹a word› = ‹word's definition›</div> </div>
I've also been using microformat for annotation on literature of my World Literature Classics project. For example, see HTML source code of: What Desires Are Politically Important? (by Bertrand Russell). A lose microformat is basically a structured HTML/XHTML. When done in a strict way, it effectively makes the page ready for semantic web, and can be machine parsed and transformed easily.
In the past year, i've been gradually cleaning up the 3500+ pages of my website towards more and more structure use of HTML, with the eventual goal that they can be validated by a lexical grammar validator (besides being already a syntactically valid HTML/XHTML). This work has been done haphazardly in a gradual manner. Part of it is designing bits of microformat for my website diverse projects. For example, you'll need various miroformats for projects that are math expositions, programing tutorials, annotated literature, art/photo gallery, commentary/essays. Part of the job is converting snippets of existing pages to the newly formed microformat, using combination of emacs/elisp and perl, python, primarily based on regex. Part of it is ongoing coding of bits of grammar validator in elisp. (in elisp because so that i can interactively validate as i write new pages.) Part of this started me in studying parsers and especially the promising Parsing Expression Grammar for these purposes. See Pattern Matching vs Lexical Grammar Specification.