Programing Language: Self-Reference Problem in String Syntax

By Xah Lee. Date: .

self-reference problem in programing

there are many self-reference issues in programing, and in math.

here in particular, we are thinking about designing a string syntax, that needs to contain text of the source code itself, or using regex to search regex.

string delimiter self ref problem

one thing i noticed in past 10 years coding is the self-reference problem. e.g. suppose u design lang string delimiter. say, python triple quote

"""big text here"""

the triple quote is meant to avoid the delimiter appearing in the text.

good solution, until, you start to process python source code itself.

See also: Programing Language Design: String Syntax

using regex to search regex problem

similar problem is, when you use regex to match a regex string. The escape become a problem that is basically unsolvable. i think it's still mathematically possible in general, but practically not. e.g. emacs in cygwin trying to pass regex on file path regex to Windows's .exe

ampersand encoding and html entities problem

other problems are, html entities encode/decode, and url % encoding/decoding, and especially together e.g. ampersand in url. HTML ampersand is especially nasty. It has the property that it's mathematically impossible to tell if a string is in decoded/encoded state.

it's mathematically impossible to tell if a string is in html ampersand decoded/encoded state, when, your string is html text that discuss ampersand encoding/decoding, e.g. a tutorial of it. You need general AI or human, to tell.

See also: Google Code Prettify and Ampersand Encoding

solution

the string delimiter solution is here-string.

the solution for escape mechanism, and html entities, and ampersand encoding, is to simply not allow them.

the solution to regex regexing regex, is to not allow meta characters in regex.

all these are related. basicaly, you want to ban any escape mechanism, and use a random-string-like bracketing syntax.