URL Percent Encoding and Ampersand Char
Discovered a subtle issue with automating URL encoding. In URL, if you have the ampersand “&” char, and if this URL is to be inside a HTML doc as a link, then, there is no automated procedure that determines correctly whether the char should be encoded as %26
or &
.
If the char is used as part of the file name or path, then it should be encoded as %26
, but if it is used as a separator for CGI parameters, then it should be encoded as &
.
The ampersand char is a reserved char in
Percent encoding.
Therefore it must be percent encoded if it is used for normal file path names. So, when it is NOT used as part of path names, but used as
CGI parameter separaters, with GET method of
HTTP request, then it must be left as it is. Now, in HTML, the ampersand char must be encoded as HTML entities &
when adjacent chars are not space (basically). So, it becomes &
.
Of course i knew the above, but my realization is that, the purpose of the char used in URL cannot be syntactically determined with 100% accuracy.
This is interesting to me because i work in HTML and using emacs, and i have written personal elisp code that automatically turns a URL into a link. The situation is that, this lisp code cannot do that with 100% accuracy in theory.
This problem is easily solved in practice. Just look for the “?” char. Any “&” before the “?” should be “%26”, and after should be “&”. Of course, the ampersand after the question mark may be part of the parameter name and not a parameter separator, but in practice that never happens.
Of course, in practice, all this matters shit. Just use &
plainly and it works in all browsers.
Unicode, Encoding, Escape Sequence, Issues
- Unicode Symbol for “e.g.” (exempli gratia)
- Semantics and Symbols: Examples of Unicode Symbols Usage
- Semantic of Symbol: Unicode Ellipsis Symbol vs Dot Dot Dot
- Problems of Symbol Congestion in Computer Languages; ASCII Jam vs Unicode
- Programing Language Design: String Syntax
- Syntax Design: Use of Unicode Matching Brackets as Specialized Delimiters
- Unicode Semantics: the ∀ in Turn A Gundam
- URL Percent Encoding and Unicode
- URL Percent Encoding and Ampersand Char
- Semantic of Symbols: HTML Entities, Ampersand, Unicode