Elisp: Create Sitemap
This page shows how to use emacs lisp to create a sitemap.
Problem
Write a elisp script to generate a sitemap. That is: create a file of sitemap format that lists all files in a directory.
A sitemap is a XML file that lists URLs of all files in a website for web crawlers to crawl.
A sitemap file looks like this:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2005-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> </urlset>
- The file can have many
<url>…</url>
item. - Each
<url>
container represent a file and other info. - The
<loc>
is a URL of the file. - The
<lastmod>
,<changefreq>
,<priority>
are optional. - A sitemap file can list a max of 50k URLs.
The purpose of sitemap file is for web crawlers to easily know all files that exist on your site.
Solution
The general plan is very simple. Here's one way to do it.
- Create a new file, insert XML header tags.
- Traverse the web root dir. For each file, determine whether it should be listed in the sitemap.
- If so, generate the proper URL tag and insert it into the new file.
- When done visiting files, insert the XML footer tags. Save the file.
;; -*- coding: utf-8; lexical-binding: t; -*- ;; version: 2023-08-25 ;; http://xahlee.info/emacs/emacs/make_sitemap.html (require 'seq) (setq xah-web-root-path "~/web/" ) ;; require ;; xahsite-external-docs (defun xahsite-generate-sitemap (DomainName) "Generate a sitemap.xml.gz file of xahsite at doc root. DomainName must match a existing one. Created: 2018-09-17 Version: 2023-08-25" (interactive (list (completing-read "choose:" '( "xahlee.info" "xahlee.org" ) nil t))) (let ( xpageMovedQ (xsitemapFileName "sitemap.xml" ) (xwebsiteDocRootPath (concat xah-web-root-path (replace-regexp-in-string "\\." "_" DomainName t t) "/"))) ;; (print (concat "begin: " (format-time-string "%Y-%m-%dT%T"))) (let ( (xfilePath (concat xwebsiteDocRootPath xsitemapFileName )) (xsitemapBuffer (generate-new-buffer "sitemapbuff"))) (with-current-buffer xsitemapBuffer (set-buffer-file-coding-system 'unix) (insert "<?xml version=\"1.0\" encoding=\"UTF-8\"?> <urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\"> ")) D (mapc (lambda (xf) (setq xpageMovedQ nil) (when (not (or (string-match "/xx" xf) ; ; dir/file starting with xx are not public (string-match "403error.html" xf) (string-match "404error.html" xf))) (with-temp-buffer (insert-file-contents xf nil 0 100) (when (search-forward "page_moved_64598" nil t) (setq xpageMovedQ t))) (when (not xpageMovedQ) (with-current-buffer xsitemapBuffer (insert "<url><loc>" "http://" DomainName "/" (substring xf (length xwebsiteDocRootPath)) "</loc></url>\n" ))))) (seq-filter (lambda (path) (not (seq-some (lambda (x) (string-match x path)) xahsite-external-docs ))) (directory-files-recursively xwebsiteDocRootPath "\\.html$" ))) (with-current-buffer xsitemapBuffer (insert "</urlset>\n") (write-region (point-min) (point-max) xfilePath nil 3) (kill-buffer )) (find-file xfilePath)) ;; (print (concat "done: " (format-time-string "%Y-%m-%dT%T"))) ))
On a site with 7091 html files (10 times more if counting image files etc), the script takes 3 seconds to run. (timing based on running it a second time, thus not counting disk reading time. )