UNIX Tar Problem: File Length Truncation, Unicode Name Support

By Xah Lee. Date: . Last updated: .

Discovered, that GNU tar now has a --help option. So, instead of typing man tar, you type tar --help. Not sure if this has been there for long or what.

Much better. I always hated the “man” fck. You can never be sure if the man page correspond to the version you are using, and because the doc is separate, it's also pain to maintain for dev, tends to get out of sync.

Another thing about tar is that i never figured out why its syntax doesn't use the dash. You use tar xvf myfile.tar instead of tar -xvf myfile.tar. Many years ago, with dash won't work. Not sure all tar programs support that today.

Also, you can't talk about tar without talking about unix line truncation problem. Tar used to truncate your file names if the path is long (For example, ~120 chars). See: Unix, RFC, Line Truncation. what's the max length of file path till it truncates, and is it done silently?

Something i still wanted to test but never got to it. Does current version of tar preserve file name that has lots Unicode character? (For example, Chinese, math symbols.)

According to tar (file format), there seems to be a new spec in “POSIX.1-2001” that addressed file name length and charset encoding, and is implemented by GNU tar in 2004.

The Wikipedia article turns out quite informative. One thing it mentioned is the “tarbomb”. That is, when untar, the file gets scattered all over your dir, or even to parent dirs, and OVERWRITES your files. This is a extreme pain in the a��, and still happens today.

Another problem interesting is that tar doesn't support table of contents so no random access. If you need to list files or extract one file, you need to read thru it from the beginning.

Here's another good resource discussing tar's problems. New file format? At http://duplicity.nongnu.org/new_format.html.

In recent month i read that Google still use tape drive as one of their backup. I wonder if they use tar as the file format.

Alright, today, i'm deprecating tar for any personal use. If you are making decisions for yourself, i suggest zip as replacement. Zip is open source and well supported. Adopted by Java (in its jar file) and others. Gzip is also well supported by the industry. (For example, adopted in Sitemap. [see Creating A Sitemap With Emacs Lisp]

See also: ZIP, Open Source, Mother-Son Relationship.

If you have a question, put $5 at patreon and message me.