This page is a tutorial on how to copy or sync between 2 machines using {rsync, unison, curl, wget}, and unix utilities for comparing directories.
Note: common unix shell utilities such as {find, xargs, ps, diff, basename, …} on this page are the GNU version. Most linuxes is bundled with GNU versions. (as opposed to the BSD, Solaris, or other unix versions. GNU and BSD versions differ in options and features, and are not practically compatible. On Mac OS X, some utils are GNU while others are BSD. One easy way to tell is that if the tool supports the syntax “--version”, and check if its output says GNU.)
How to copy local directory to a remote machine, in one shot?
For a one-way copying (or updating), use rsync. The remote machine must have rsync installed. Example:
rsync -z -a -v -t --rsh="ssh -l mary" ~/web/ mary@example.org:~/
This will copy the local dir 〔~/web/〕 to the remote dir 〔~/〕 on the machine with domain name “example.org”, using login “mary” thru the ssh protocol. The “-z” is to use compression. The “-a” is for archived mode, basically making the file's meta data (owner/perm/timestamp) same as the local file (when possible) and do recursive (i.e. upload the whole dir). The “-v” is for verbose mode, which basically makes rsync print out which files is being updated. (rsync does not upload files that's already on the destination and identical.)
For example, here's what i use to sync/upload my website on my local machine to my server.
rsync -z -a -v -t --exclude="*~" --exclude=".DS_Store" --exclude=".bash_history" --exclude="*/_curves_robert_yates/*.png" --exclude="logs/*" --exclude="xlogs/*" --delete --rsh="ssh -l u40651121" ~/web/ u40651121@s168753656.onlinehome.us:~/
The “--exclude” tells it to disregard any files matching that pattern (i.e. if it matches, don't upload it nor delete it on remote server)
Here's a example of syncing Windows and Mac.
rsync -z -r -v --delete --rsh="ssh -l xah" ~/web/ xah@169.254.125.147:~/web/
Note that “-r” is used instead of “-a”. The “-r” means recursive, all sub directories and files. Don't use “-a” because that will sync file owner, group, permissions, and others, but because Windows and unix has different permission systems and file systems, so “-a” is usually not what you want. (For a short intro to perm systems on unix and Windows, see: Unix & Windows File Permission Systems)
You can create a bash alias for the long command ⁖ alias l="ls -al";, or use bash's back history by 【Ctrl+r】 then type “rsync”.
How to download a entire website for offline reading?
Use wget.. Example:
wget --wait=9 --recursive --level=2 http://example.org/
will download all files from example.org, up to 2 levels deep, with 9 seconds between each fetch. (so you don't spam the server) Some sites check on user agent, so you might add this option “--user-agent='Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)'”.
How to download just one single file from a website?
Use cURL. Example:
curl -O http://example.org/somedir/largeMovie.mov
will download largeMovie.mov to your current dir.
Curl can be also used to download a series files with a pattern in their name. For example, curl -O http://example.org/somedir/girl[01-20].jpg will download all files in somedir named girl01.jpg, girl02.jpg …etc. If you use girl[1-20].jpg, then it'll be girl1.jpg, girl2.jpg etc.
Other useful options are “--referer http://example.org/”, “--user-agent "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"”. These can be used in case the porn site blogs requests by referer or browser.
Note: curl cannot be used to download entire website recursively like wget can.
How to tell if 2 binary files have identical content?
cmp ~/myfile1 ~/myfile2. This is particular useful for binary files.
How to compare 2 text files's differences?
diff ~/myfile1 ~/myfile2.
Some useful options are: “-i --ignore-case”, “-E --ignore-tab-expansion”, “-b --ignore-space-change”, “-w --ignore-all-space”, “-B --ignore-blank-lines”, “--strip-trailing-cr”
How to test if 2 directories have identical content? (same subdirs and all files in any subdir)
diff -r --brief ~/mydir1 ~/mydir2. The “-r” means recurvise (all subdirs), and the “--brief” means only output if files differ (as opposed to how they differ) or non-existant.