URL Percent Encoding and Ampersand Char
Discovered a subtle issue with automating URL encoding. In URL, if you have the ampersand “&” char, and if this URL is to be inside a HTML doc as a link, then, there is no automated procedure that determines correctly whether the char should be encoded as %26
or &
.
If the char is used as part of the file name or path, then it should be encoded as %26
, but if it is used as a separator for CGI parameters, then it should be encoded as &
.
The ampersand char is a reserved char in
[ Percent encoding ] [ https://en.wikipedia.org/wiki/Percent-encoding ].
Therefore it must be percent encoded if it is used for normal file path names. So, when it is NOT used as part of path names, but used as
[ CGI ] [ https://en.wikipedia.org/wiki/Common_Gateway_Interface ] parameter separaters, with GET method of
[ HTTP request ] [ https://en.wikipedia.org/wiki/HTTP ], then it must be left as it is. Now, in HTML, the ampersand char must be encoded as HTML entities &
when adjacent chars are not space (basically). So, it becomes &
.
Of course i knew the above, but my realization is that, the purpose of the char used in URL cannot be syntactically determined with 100% accuracy.
This is interesting to me because i work in HTML and using emacs, and i have written personal elisp code that automatically turns a URL into a link. The situation is that, this lisp code cannot do that with 100% accuracy in theory.
This problem is easily solved in practice. Just look for the “?” char. Any “&” before the “?” should be “%26”, and after should be “&”. Of course, the ampersand after the question mark may be part of the parameter name and not a parameter separator, but in practice that never happens.
Of course, in practice, all this matters shit. Just use &
plainly and it works in all browsers.
Unicode, Encoding, Escape Sequence, Issues
- Unicode Symbol for “e.g.” (exempli gratia)
- Semantics and Symbols: Examples of Unicode Symbols Usage
- Semantic of Symbol: Unicode Ellipsis Symbol vs Dot Dot Dot
- Problems of Symbol Congestion in Computer Languages; ASCII Jam vs Unicode
- Programing Language Design: String Syntax
- Syntax Design: Use of Unicode Matching Brackets as Specialized Delimiters
- Unicode Semantics: the ∀ in Turn A Gundam
- URL Percent Encoding and Unicode
- URL Percent Encoding and Ampersand Char
- Semantic of Symbols: HTML Entities, Ampersand, Unicode