ELisp: Problems of thing-at-point

By Xah Lee. Date: . Last updated: .

This page discuss some problems of the function thing-at-point.

For tutorial, see: ELisp: thing-at-point

thing-at-point Behavior Dependents on Syntax Table

When you call (thing-at-point 'word), what string you get exactly depends on the syntax table of the current buffer.

For example, if you always want your “symbol” to mean any alphanumeric plus hyphen -, you can't rely on thing-at-point to give you the right thing, because it may include low line _, or may not include hyphen, or may include apostrophe ', depending on the current major mode's syntax table.

You might think that depending on syntax table is great, because it provides a abstract layer that allows different languages to define its own syntactic units. But in practice, computer languages, or other arbitrary modes such as dired, irc, etc, they do not have concepts that neatly fit into {“symbol”, “word”}

This problem also applies for “things” of 'sentence, 'paragraph and all others.

Here's a test.

(defun f ()
  "print current word."
  (interactive)
  (message "%s" (thing-at-point 'symbol)))

Eval the code. [see Evaluate Emacs Lisp Code]

Then, put the following in a buffer:

aa_bb-cc

Put your cursor between “b”.

This is so as of 2021-03-06 GNU Emacs 27.1.

return nil problem

when you get a symbol, it returns nil if cursor is at a blank spot or empty line.

this is a problem because it means, you have to check the return status.

better if it just return current cursor positions as both begin and end.

return a cons pair problem

It returns a cons pair. This make it hard to use seq-setq and seq-let

Inconsistent Behavior for line

When you call (thing-at-point 'line), it return the line with the newline character. However, if the line is at the end of buffer with no newline , then no newline is included.

This means you have to write extra code to check the newline char.

This is so as of 2021-03-06 GNU Emacs 27.1 to at least 2024-03-24 emacs 29.

Here's a test.

(defun mytest ()
  "print current line."
  (interactive)
  (message "[%s]" (thing-at-point 'line)))

Then, put the following in a buffer:

this line
last line

Make sure there is no newline char at the last line.

Then, call “f”, on the lines.

If you want to get the line, and always without line ending char, better is:

(buffer-substring-no-properties
 (line-beginning-position)
 (line-end-position))

[see ELisp: Functions on Line]

thing-at-point 'filename problem

if the file name contains non-ASCII character, it returns the wrong name.

Try test it on a line of this text: "some→thing.txt" . Non-ASCII file names happens for example when you download some Wikipedia image, it has en-dash in file name or é.

(defun xxtest ()
  "print `thing-at-point' filename"
  (interactive)
  (message "[%s]" (thing-at-point 'filename)))

thing-at-point 'url problem (fixed as of emacs 24.4)

What thing-at-point returns is not necessarily the exact text under cursor. When the URL you want to grab does not start with “http”, it adds it.

This is FIXED as of emacs 24.4.1.

For example, if the text under cursor is

example_org/emacs/elisp.html

it'll return

http://example_org/emacs/elisp.html

This is very annoying.

Sometimes i just want to grab a sequence of chars that may be file path or URL, in a HTML file text such as href="my_cat.html" or href="http://example/my_cat.html". You do not know which in advance, but after you got the thing you can test it by checking for “http” or other things. But if you use thing-at-point with 'filename or 'url, it does things to the string that you didn't expect.

(thing-at-point 'url) gets confused if the URL contains parenthesis. e.g. http://en.wikipedia.org/wiki/Oz_(programming_language). (this is fixed in emacs 23.2. [see Emacs 23.2 (Released 2010-05)] )

Here's test code.

(defun f ()
  "print `thing-at-point' url"
  (interactive)
  (message "[%s]" (thing-at-point 'url)))

Get Text Selection or Unit at Current Cursor Position

Starting with emacs 23.x, text selection is highlighted by default. (this means: transient-mark-mode is on by default. [see ELisp: Mark, Region, Active Region]) There's a new user interface idiom. When there is a text selection, the command will act on the text selection. Otherwise, the command acts on the current word, line, paragraph, buffer, etc, whichever is appropriate for the command. This is great because users don't have to think about whether to call the “-region” version of the command. [see Emacs 23 (Released 2009-07)]

When you write a command to do this, the code typically looks like this:

;; get current selection or word
(let (bds p1 p2 inputStr resultStr)

  ;; get boundary
  (if (use-region-p)
      (setq bds (cons (region-beginning) (region-end) ))
      (setq bds (bounds-of-thing-at-point 'word)) )
  (setq p1 (car bds) )
  (setq p2 (cdr bds) )

  ;; grab the string
  (setq inputStr (buffer-substring-no-properties p1 p2)  )

  ;; do something with inputStr here

  (delete-region p1 p2 ) ; delete the region
  (insert resultStr) ; insert new string
 )

It takes about 6 lines to get the boundary and the string. If you are grabbing line, then you need few more lines to check EOL.

Alternative Solution: xah-get-thing.el

Because i need to grab the text so often, i got tired of repeatedly writing these 10 or so lines. I wrote a function that does this. See: Emacs: xah-get-thing.el.

Emacs Lisp, Get Thing at Point