URL Percent Encoding and Ampersand Char

By Xah Lee. Date: . Last updated: .

Discovered a subtle issue with automating URL encoding. In URL, if you have the ampersand “&” char, and if this URL is to be inside a HTML doc as a link, then, there's no automated procedure that determines correctly whether the char should be encoded as %26 or &.

If the char is used as part of the file name or path, then it should be encoded as %26, but if it is used as a separator for CGI parameters, then it should be encoded as &.

The ampersand char is a reserved char in [ Percent encoding ] [ https://en.wikipedia.org/wiki/Percent-encoding ]. Therefore it must be percent encoded if it is used for normal file path names. So, when it is NOT used as part of path names, but used as [ CGI ] [ https://en.wikipedia.org/wiki/Common_Gateway_Interface ] parameter separaters, with GET method of [ HTTP request ] [ https://en.wikipedia.org/wiki/HTTP ], then it must be left as it is. Now, in HTML, the ampersand char must be encoded as HTML entities & when adjacent chars are not space (basically). So, it becomes &.

Of course i knew the above, but my realization is that, the purpose of the char used in URL cannot be syntactically determined with 100% accuracy.

This is interesting to me because i work in HTML and using emacs, and i have written personal elisp code that automatically turns a URL into a link. The situation is that, this lisp code cannot do that with 100% accuracy in theory.

This problem is easily solved in practice. Just look for the “?” char. Any “&” before the “?” should be “%26”, and after should be “&”. Of course, the ampersand after the question mark may be part of the parameter name and not a parameter separator, but in practice that never happens.

Of course, in practice, all this matters shit. Just use & plainly and it works in all browsers.

Unicode, Encoding, Escape Sequence, Issues

JavaScript in Depth