Xah Talk Show 2021-11-25 Coding Session. Fix Link in HTML Files, WolframLang

Xah Talk Show 2021-11-25 Coding Session. Fix Link in HTML Files, WolframLang

for each html file in a directory, find all links that uses fully qualified url of http://xahlee.info/ , change them to relative links.

for example:

change this:

<a href="http://xahlee.info/M/WolframLang_syntax.html" data-accessed="2021-11-12">http://xahlee.info/M/WolframLang_syntax.html</a>

to:

<a href="../../M/WolframLang_syntax.html">WolframLang Syntax/Operators</a>

coding notes

rootDir = "c:/Users/xah/web/xahlee_info/emacs/emacs/";

files = FileNames[ "*.html", rootDir , Infinity ] ;

domainUrlToFilePath::usage= "domainUrlToFilePath[url] return a corresponding file full path.";

domainUrlToFilePath = Function[ {path}, StringReplace[path, "http://xahlee.info/" -> "c:/Users/xah/web/xahlee_info/" ] ]

getRelativePath::usage= "getRelativePath[fullPath, dirPath] return a relative path of fullPath relative to dirPath.";

getRelativePath = Function[
{inputPath, dirPath},
doSomething
 ]

(*

need LongestCommonSequence start from beginning

• inputPath = "c:/Users/xah/web/xahlee_info/M/WolframLang_syntax.html"
• dirPath = "c:/Users/xah/web/xahlee_info/emacs/emacs/"
• result = "../../M/WolframLang_syntax.html"\n

• algorithm notes:
• inputPathLevel = count slashes
• dirPathLevel = count slashes
• levelDiff = inputPathLevel - dirPathLevel

• commonRoot =find the string that the comment beginning of fullPath and dirPath
• levelRoot = count the slash in commonRoot
• levelDirPat = count the slash in dirPath
• levelInputPath = count the slash in fullPath

• if inputPath is a subdir of dirPath, then just inputPath - dirPath
• if inputPath and dirPath are the same dir, then result is filename of inputPath
• if dirPath is a subdir of inputPath, then add ../ to get to the same level as inputPath, then add the rest of inputPath
*)

processFile::usage= "processFile[path] change all links to http://xahlee.info/ to be relative link of file at path, and make a backup.";

processFile = Function[ {currentHtmlFilePath},
(*
need:
• current href value
• convert href value domain path to full file path
*)
Module[ {
fileContent = ReadString[ currentHtmlFilePath ],
currentDirPath= FileNameDrop[ currentHtmlFilePath ]
},
StringReplace[fileContent,
RegularExpression[
"<a href=\"http://xahlee.info/([^\"]+)\" data-accessed=\"([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9])\">"
 ]
->
"a function, that takes a domain url, change it to local file full path, and change it to relative path, and return that here"

]
(* FileBaseName[ currentHtmlFilePath ]  *)
(* WriteString[ "c:/Users/xah/some.txt", "something" ] *)
]
];

Map[ processFile  , files ]

(*
regular expression to represent
http://xahlee.info/M/WolframLang_syntax.html

*)

xah_talk_show_2021-11-25.txt