Emacs Lisp: Command to Extract URL
Here's a command to extract all URLs in a HTML file.
put this in your Emacs Init File:
(defun xah-html-extract-url (Begin End &optional FullPathQ) "Extract URLs in current block or selection to `kill-ring'. When called interactively, copy result to `kill-ring', each URL in a line. If the URL is a local file relative path, convert it to full path. If `universal-argument' is called first, don't convert relative URL to full path. This command extracts all text of the forms <‹letter› … href=‹path› …> <‹letter› … src=‹path› …> The quote for ‹path› may be double or single quote. When called in lisp code, Begin End are region begin/end positions. Optional FullPathQ, if true, convert local links to full path. Returns a list of strings. URL `http://xahlee.info/emacs/emacs/elisp_extract_url_command.html' Version: 2021-02-20 2021-03-22 2021-04-30 2021-08-11 2022-09-30" (interactive (let ($p1 $p2) (let (($bds (xah-get-bounds-of-thing-or-region 'block))) (setq $p1 (car $bds) $p2 (cdr $bds))) (list $p1 $p2 (not current-prefix-arg)))) (let (($regionText (buffer-substring-no-properties Begin End)) ($urlList (list))) (with-temp-buffer (insert $regionText) (goto-char (point-min)) (while (search-forward "<" nil t) (replace-match "\n<" t t)) (goto-char (point-min)) (while (re-search-forward "<[A-Za-z]+.+?\\(href\\|src\\)[[:blank:]]*?=[[:blank:]]*?\\([\"']\\)\\([^\"']+?\\)\\2" nil t) (push (match-string-no-properties 3) $urlList))) (setq $urlList (reverse $urlList)) (when FullPathQ (setq $urlList (mapcar (lambda ($x) (if (string-match "^http:\\|^https:" $x ) $x (expand-file-name $x (file-name-directory (if (buffer-file-name) (buffer-file-name) default-directory ))))) $urlList))) (when (called-interactively-p 'any) (let (($printedResult (mapconcat 'identity $urlList "\n"))) (kill-new $printedResult) (message "%s" $printedResult))) $urlList ))
you need Emacs: xah-get-thing.el
this is part of Emacs: Xah HTML Mode.