Why LD_LIBRARY_PATH is bad

By David Barr.

Background

This is one system administrator's point of view why LD_LIBRARY_PATH, as frequently used, is bad. This is written from a SunOS 4.x/5.x (and to some extent Linux) point of view, but this also applies to most other UNIXes.

What LD_LIBRARY_PATH does

LD_LIBRARY_PATH is an environment variable you set to give the run-time shared library loader (ld.so) an extra set of directories to look for when searching for shared libraries. Multiple directories can be listed, separated with a colon (:). This list is prepended to the existing list of compiled-in loader paths for a given executable, and any system default loader paths.

For security reasons, LD_LIBRARY_PATH is ignored at runtime for executables that have their setuid or setgid bit set. This severely limits the usefulness of LD_LIBRARY_PATH.

Why was it invented?

There were a couple good reasons why it was invented:

As an often unwanted side effect, LD_LIBRARY_PATH will also be searched at link (ld) stage after directories specified with -L (also if no -L flag is given).

Some good examples of how LD_LIBRARY_PATH is used:

How has it been corrupted?

Too often people use it as a crutch for not doing the right thing (i.e. relying on the compiled in path). Often programs (even commercial ones) are compiled without any run-time loader paths at all, forcing you to have LD_LIBRARY_PATH set or else the program won't run.

LD_LIBRARY_PATH is one of those insidious things that once it gets set globally for a user, things tend to happen which cause people to rely on it being set. Eventually when LD_LIBRARY_PATH needs to be changed or removed, mass breakage will occur!

How does the shared loader work?

SunOS 4.x uses major and minor revision numbers. If you have a library “Xt”, then it's named something like “libXt.so.4.10” (Major version 4, minor 10). If you update the library (to correct a bug, for example), you would install libX11.so.4.11 and applications would automatically use the new version. To do this, the loader must do a readdir() for every directory in the loader path and glob out the correct file name. This is quite expensive especially if the directories are large, contain symlinks, and/or are located over NFS.

Linux, SunOS 5.x and most other SYSV variants use only major revision numbers. A library “Xt” is just named something like “libXt.so.4”. (Linux confuses things by generally using major/minor library file names, but always include a symlink that is the actual library path referenced. So, for example, a library “libXt.so.6” is actually a symlink to “libXt.so.6.0”. The linker/loader actually looks for “libXt.so.6”.)

The loader works essentially the same except that you don't have minor library updates (you update the existing library) and the loader just does a stat() for each directory in the loader path. (This is much faster)

The bad old days before separate run-time vs link-time paths

Nowadays you specify the run-time path for an executable at link stage with the -R (or sometimes -rpath) flag to “ld”. There's also LD_RUN_PATH which is an environment variable which acts to “ld” just like specifying -R.

Before all this you had only -L, which applied not only during compile-time, but during run time as well. There was no way to say “use this directory during compile time” but “use this other directory at run time”. There were some rather spectacular failure modes that one could get in to because of this. For example, say you are building X11R6 in an NFS automounted directory /home/snoopy/src. X11R6 is made up of shared libraries as well as programs. The programs are compiled against the libraries when they are located in the build tree, not in their final installed location. Since the linker must resolve symbols at link time, you need a -L path that includes the link-time path in addition to the final run-time path of, say, /usr/local/X11R6/lib. Now all the programs which use shared libraries will look first in /home/snoopy/src for their libraries and then in the correct place. Now every time an X11R6 app starts up it NFS automounts its build directory! You probably removed the temporary build directory ages ago, but the linker will still search there. What's worse, say snoopy is down or no longer exists, no X11R6 apps will run! Bummer! Happily this all has been fixed, assuming your OS has a modern linker/loader. It also is worked around by specifying the final run time path first, before the build path in the -L options.

Evil Case Study #1

My first experience with this breakage was under SunOS 4.x, with OpenWindows. For some dumb reason, a few Sun OpenWindows apps were not compiled with correct run-time loader paths, forcing you to have LD_LIBRARY_PATH set all the time. Remember, at this time, in the global OpenWindows startup scripts the system would automatically set your LD_LIBRARY_PATH to be $OPENWINHOME/lib.

Okay, how did it break? Well, it just so happens that this site also had compiled X11R4 from source, in /usr/local/X11R4. Things got really confusing because if you ever wanted to run the X11R4 apps, they would run against the OpenWindows libraries in /usr/openwin/lib, not the libraries in /usr/local/X11R4/lib! Things got even more confusing once X11R5 and then X11R6 came out. Now we had four different and often incompatible versions of a given shared library.

Hm. What do you do? If you set LD_LIBRARY_PATH to put OpenWindows first, then at best it will slow things down (since most people were running X11R5 and X11R6 stuff, searching for libraries in /usr/openwin/lib was a waste). At worst it caused spurious warnings (“ld.so: warning: libX11.x.y has older revision than expected z”) or caused apps to break altogether due to incompatibilities. It was also confusing to lots of people trying to compile X apps and forget to use -L.

What did I do? I whipped out emacs and binary edited the few OpenWindows apps which didn't have a correct run-time path compiled in, and changed to the correct location in /usr/openwin/lib. (it should be noted that these tended to be apps which were fixed with system patches… alas it seems guys who build the patched versions didn't have the same environment as the FCS guys). I then changed all the startup scripts and removed any setenv LD_LIBRARY_PATH statements. I even put in an unsetenv LD_LIBRARY_PATH in my own .cshrc for good measure.

Evil Case Study #2

(based on a true story).

Due to licensing issues, it's common for commercial apps to ship in binary form a copy of the shared Motif library. Motif is a commercial product, and not all OS's come with it. It's a common toolkit for commercial programs to write applications against. It's also an evolving product, with ongoing bugfixes and new features.

Say application WidgetMan is one such application. In its startup script, it sets LD_LIBRARY_PATH to point to its copy of Motif so it uses that one when it runs. As it happens, WidgetMan is designed to launch other programs too. Unfortunately, when WidgetMan launches other apps, they inherit the LD_LIBRARY_PATH setting and some Motif based apps now break when run from WidgetMan because WidgetMan's Motif is incompatible with (but the same library version as) the system Motif library. Bummer!

Imagine if you had followed what some clueless commercial install apps tell you to do and set LD_LIBRARY_PATH globally!

Half-hearted attempts to improve things

Some OS's (⁖ Linux) have a configurable loader. You can configure what run-time paths to look in by modifying /etc/ld.so.conf. This is almost as bad a LD_LIBRARY_PATH! Install scripts should never modify this file! This file should contain only the standard library locations as shipped with the OS.

Canonical rules for handling LD_LIBRARY_PATH

  1. Never ever set LD_LIBRARY_PATH globally.
  2. If you must ship binaries that use shared libraries and want to allow your clients to install the program outside a 'standard' location, do one of the following:
  3. If you are forced to set LD_LIBRARY_PATH, do so only as part of a wrapper.

Some software packages make you install a symlink from the standard location pointing to the real location. While this “works”, it does not solve the problem. What if you need to have two versions installed? Not to mention the fact that many vendors seem to choose stupid locations as their “standard” location (like putting them in / or /usr). This also typically makes things difficult for network installations, since even though you install an application on a network directory, you need to go around to every computer on the network and make a symlink.

Thoughts on improving LD_LIBRARY_PATH implementations in UNIX

• Remove the link-time aspect of LD_LIBRARY_PATH. (Solaris's ld will do this with the -i flag). Too often people just lazily set LD_LIBRARY_PATH so they don't have to specify -L, causing bad consequences at run time for other apps. Or on the flip side people will set LD_LIBRARY_PATH to fix some brokenness at run time with some app, but it will lead to confusion or breakage at compile time for some other app if they don't specify a correct -L path. It would be much cleaner if LD_LIBRARY_PATH only had influence at run-time. If necessary, invent some other environment variable for the job (LD_LINK_PATH ?).

• Have OS's ship with programs which allow one to safely change an executable's run-time linker path.

• Implement -s option to ldd which prints this run-time path for a given executable. (You can also see this with dump -Lv in Solaris.)

• Solaris 7 has a neat idea. There you can can specify a run time path which is also evaluated at run time. You link with an rpath of $ORIGIN/../lib. Here, $ORIGIN evaluates at run time to be the installation path of the binary. Now you can move the installation tree to another location entirely and everything will still work. We need this in other OS's! Unfortunately, at least in Solaris 7, $ORIGIN is considered a “relative” path (you can subvert it if you have a writable directory on the same filesystem because UNIX lets you hard link even a setuid executable) so it is ignored on setuid/setgid binaries. Sun has fixed this in Solaris 8. You can specify with crle(1) paths that are “trustworthy”.

Notes from Xah Lee

This article is written by David Barr 〔barr@visi.com〕, accessed on 2005-07. http://www.visi.com/~barr/ldpath.html If i recall correctly, this document exists as early as 2001.

The HTML has been edited so that it is valid HTML.

I was a sys admin and web application developer at Netopia in 2001. (Netopia is acquired by Motorola in 2007) At the time, there are serious disasters and bugs in our product and installation system, that i eventually traced to LD_LIBRARY_PATH, in another program used in our system.

In general, the whole concept of Unix's environment variables are bad. It is a symptom of unix's “solve it simply, quickly, brainlessly” philosophy.

blog comments powered by Disqus