wrote a html/xml syntax validator (checking well-formedness) in emacs lisp.
checking over my 11k files on my site.
about 300 my own files have invalid xml. (typically, missing closing p, or li.) That is, strictly by xml, not html.
HTML5 is much loose, can have missing closing tags, and allow less-than greater-than chars as long as they have spaces in between
also, not counting files of programing language spec mirrorred on my site. They light up like xmas tree
The invalid ones from spec includes, emacs manual, elisp manual, css spec, java doc, golang doc. basically, all programing language's docs and spec, are all invalid html.