-*- coding: utf-8 -*-

the following is from

MySQL and PostgreSQL Compared
2000-07-30
By Tim Perdue
http://www.phpbuilder.com/print/columns/tim20000705.php3

accessed on 2012-09-21

ssss---------------------------------------------------

   picture of Tim Perdue
   Which database do I use: Postgres or MySQL? This age-old question has plagued
   developers for, what, at least a couple years now. I've used both databases
   extensively (MySQL for about one year and Postgres for about 2 years) and was
   curious if the performance differences between the two were as stark as the MySQL
   website suggests.
   I had actually benchmarked the two databases back in September 1999 when we were
   starting to lay the groundwork for SourceForge. At the time, the performance
   difference was so stark that we had to go with MySQL even though I had always
   used Postgres for all my work. The rest of the developers were used to MySQL and
   that pretty much cinched the decision.
   This time around, rather than using some contrived benchmarking scheme, I wanted
   to grab a "real life" web page from a "real" web site and see how it performed on
   the two databases. The page in question was the discussion forum on SourceForge.
   It involves some relatively straightforward joins of three tables, each with
   20-30,000 rows of data. It also involves some recursive queries to show nested
   messages, so the database is the true bottleneck on this page, not PHP.
   To get started, I dumped real data out of the production database, modified the
   table SQL and imported it all into MySQL 3.22.30 and PostgreSQL 7.0.2 on Red Hat
   Linux 6.2 and a VA Linux quad-xeon 4100 server with 1GB RAM.
   The first problem I ran into was that Postgres has an arcane limit of 8k of data
   per row. In a message board, you're going to occasionally surpass 8k of data in a
   row, and so postgres choked on the import. To get around this, I just dropped out
   the "body" of the message and re-imported the data. The Postgres development team
   is aware of this limitation and are fixing it in v7.1, and they also noted that
   you can recompile Postgres to support up to 32k, although at a possible detriment
   to overall performance.
   At this point, I ran into another small issue with Postgres - its "serial" data
   type (the equivalent of MySQL's auto_increment) creates a "sequence" which does
   not get dropped when its parent table is dropped. So if you try to re-create the
   table, you'll get a name conflict for this sequence. A lot of new users would be
   confused by this, so Postgres loses a couple points for that. Also, MySQL is
   "smart" enough to increment its auto_increment value when you import data,
   whereas Postgres' sequence does not get reset when you import data, causing all
   new inserts to fail.

Methodology

   To try to make this as realistic as possible, I took an actual page from a
   website and made it portable across both MySQL and Postgres. This basically meant
   replacing all mysql_query() calls with pg_exec(). This page involves a lot of
   selects and joins, as probably most pages on a typical website do.
   Once the test page was up and debugged, I then ran "ab", the "Apache
   Benchmarking" utility, from my workstation across my 100-mbit LAN to the
   quad-xeon machine. To get an idea of scalability under load, I varied the
   "concurrent connections" on ab from 10-120, while leaving the number of page
   views steady at 1000.
   To more closely simulate real-world use, I set up a random-number generator in
   the script that inserts a row into the database on 10% of the page views. My own
   numbers on PHPBuilder show that about 10% of all pages in the discussion forums
   are for posting new messages.
   Further, as mentioned above, I used real data from a production database. You
   can't get a whole lot more realistic than this scenario.

The Numbers

   [1][Raw Test Results]
   The most interesting thing about my test results was to see how much of a load
   Postgres could withstand before giving any errors. In fact, Postgres seemed to
   scale 3 times higher than MySQL before giving any errors at all. MySQL begins
   collapsing at about 40-50 concurrent connections, whereas Postgres handily scaled
   to 120 before balking. My guess is, that Postgres could have gone far past 120
   connections with enough memory and CPU.
   On the surface, this can appear to be a huge win for Postgres, but if you look at
   the results in more detail, you'll see that Postgres took up to 2-3 times longer
   to generate each page, so it needs to scale 2-3 times higher just to break even
   with MySQL. So in terms of max numbers of pages generated concurrently without
   giving errors, it's pretty much a dead heat between the two databases. In terms
   of generating one page at a time, MySQL does it up to 2-3 times faster.
   Another interesting point was that MySQL crumbles faster in the "10% insert" test
   described above. Research reveals that MySQL locks the entire table when an
   insert occurs, while Postgres has a pretty nifty "better than row-level locking"
   feature. This difference quickly causes MySQL to pile up concurrent connections
   and thus collapse. The same is true if you are doing a large select out of a
   database while another process is inserting into that table. Postgres is
   completely unfazed, while MySQL piles up connections until it falls apart like a
   house of cards.
   For those of you wondering about persistent connections in PHP, they don't appear
   to benefit MySQL that much, whereas they are a clear boon for Postgres. In fact,
   Postgres benchmarked as much as 30% faster just by using persistent connections.
   That tells me that Postgres has a tremendous amount of overhead in its
   connection-opening and authentication process. Some of this may be the fault of
   Linux and its relatively lame process scheduler. Still, MySQL on the same box
   beat it handily no matter how you look at it.

MySQL

   The numbers for MySQL ring true with what most people already know: it's a fast,
   although lightweight database that will probably serve well for the vast majority
   of web sites. However, if you plan on having a high-traffic site (say, greater
   than 500,000 pages per day), then forget MySQL as it can tend to fold up and die
   under load. Anyone who has ever visited slashdot can attest to the fragility of
   its setup (mod_perl and MySQL).
   But again, the vast, vast majority of web sites fall well under the 15-pages per
   second demonstrated by MySQL here. If you ever surpass a sustained 15 pages per
   second, you'll be delighted to fork over the cash for a bigger server or an
   Oracle license.

Wins

   Obviously, the advantage MySQL has over Postgres is performance. It also has some
   more powerful admin tools included in the distribution (mysqladmin allows you to
   watch processes and queries in-progress), like hot backup, a file corruption
   recovery tool and a couple others.
   I'm also a fan of MySQL's command-line tools. You can see database and table
   structures using describe and show commands. Postgres' commands are less obvious
   ( \d to show a list of tables for instance).

Limitations

   The first thing you hear from hard-core database gurus is that MySQL lacks
   transactions, rollbacks, and subselects. You'll really miss transactions if
   you're trying to write a banking application, accounting application, or trying
   to maintain some sort of counter that needs to increment linearly over time.
   Forget attempting any of those with released versions of MySQL (it should be
   noted that the unstable 3.23.x series of MySQL now includes transaction support).
   For many, if not most, web sites out there, MySQL's limitations can be overcome
   with a little elbow grease on the part of the developer. The primary feature
   you'll miss in MySQL is powerful subselect syntax that is present in almost every
   other production database. If I had a nickle for every time I could've used
   subselects in MySQL, I'd be able to buy a case or two of beer. In other words,
   this missing feature can be a pain in the neck, but it can be overcome.

Stability

   MySQL loses points in the long-term stability department. Simply put, MySQL gives
   up the ghost randomly and for no obvious reason after running for semi-long
   periods of time (say 30-60 days). Many developers will compile MySQL "statically"
   for just that reason, and doing so has helped some people.
   That problem again can be overcome with a good pager or a simple crontab entry
   that kills and restarts MySQL monthly. Not that I find that at all acceptable,
   but it is a solution.
   Where MySQL loses points in the daemon robustness department, it makes up for it
   by apparently never corrupting its data files. The last thing you want is your
   precious data files fouled randomly, and MySQL does well here. In over a year of
   running MySQL, I haven't ever seen a single case of database or index corruption.
   In the same timeframe, I have done 2 or 3 recoveries of a couple different
   Postgres databases. (Regardless, backups are always your best friend, as shown by
   the database fiasco here on PHPBuilder.)

PostgreSQL

   The results for Postgres might surprise a few people, as Postgres has somewhat of
   a negative reputation among some web developers (initial releases of Postgres had
   widely-rumored issues in addition to laggard performance). According to my
   experience, and these benchmarks, most of that reputation is unfounded. In fact,
   it appears that PostgreSQL withstands up to 3 times the load that MySQL can
   before throwing any errors -- on the same hardware/OS combination.
   Postgres happily chugs along at roughly 10 pages/second, enough to serve about
   400,000 pages/day, assuming a regular traffic curve with the peak at 2x the
   bottom. That's an awful lot of pages and is far beyond what most people will see
   on their websites. In addition, most of the pages on your site will not be as
   complex as the one in this test. As with MySQL, you'll be happy to pay for a
   hardware upgrade if you pass this ceiling. Because of Postgres' architecture, it
   could probably continue to scale up the more processors and RAM you give it.

Wins

   Well, postgres has some extremely advances features when shown next to MySQL.
   While I don't use most of the features myself, they are available for the
   truly-hardcore developers out there. Many developers don't even realize what
   they're missing by not having some of these features available.
   An example of where you should be using a transaction is if you are doing more
   than one update/insert/delete in a sequence. For instance, your script inserts a
   new user into your user table, then also inserts a row in another table, and you
   update a flag somewhere else. In this case, if the first insert succeeds, but the
   second fails, what do you do? With Postgres, you could Rollback the entire
   operation and show an appropriate error. With MySQL, you would wind up in an
   invalid state, unless you program in a bunch of logic to handle the situation. In
   real-world use, most queries don't fail unless you're a lousy programmer, and if
   the second query did fail, the results may not be dire (unless we're talking
   about an accounting/banking/critical application where there can be no risk of
   incorrect data).
   Anyway, foreign-key support is now in Postgres 7.0+, which means that when you
   insert a row, the database can do some fairly impressive validation checks. Same
   if you delete a row - it just plain won't let you delete a row if another table
   is depending on it. I love this idea and can envision rewriting entire websites
   just to take advantage of this feature.
   Triggers and views are interesting and powerful tools that can be used in
   Postgres, but not MySQL. I haven't used either one, but I can think of a hundred
   uses for Views if I were to redesign SourceForge from the ground up on Postgres.

Limitations

   The primary limitation with Postgres is not its performance (as most web sites
   will never run into that barrier), but hard-coded limits like the 8k row size
   limit (which probably dates back to its earliest days). When I designed
   Geocrawler.com on Postgres, I had to segment large emails into 8k chunks to work
   around this lame limitation. Also, by default, Postgres is compiled to only
   support 32 connections, which is not enough for a high-traffic web site,
   especially when you consider that postgres delivers each page much more slowly
   than MySQL.
   One other limitation may bug a lot of PHP users - Postgres has no equivalent to
   MySQL's mysql_insertid() function call. That is, if you insert a row into a MySQL
   database, MySQL will hand you back the primary key ID for that row. There is an
   extremely round-about way of doing this in Postgres, but it's a headache and is
   probably slow if used a lot.

Stability

   Postgres will run smoothly for extended periods of time without trouble. My
   Postgres 6.5.3 install has run for 90 days without blinking on my tired old
   PowerMac 8500, while getting about 50-100,000 pages per day. And when postgres
   gets loaded, it just bogs down, it doesn't quit and give up the ghost under
   stress.
   The problem with Postgres is that when you do have a problem with it, it's
   usually really bad. Like a fubar database file or, more commonly, a corrupted
   index (which can frequently be dropped/rebuilt). I have encountered other serious
   problems with older versions of postgres (6.4.x) where multiple same numbers were
   inserted into a primary key (something that should be impossible under any
   circumstance).
   There have also been problems with Postgres where you can wind up with
   "half-baked" indexes, tables, etc that you cannot drop or get rid of. I have not
   seen these yet on Postgres 7, but I haven't used it enough to know.

Conclusion

   These tests pretty much confirmed what I already knew - both databases serve
   quite well for the vast majority of web sites out there. Both are actually
   extremely fast when compared to desktop databases like FileMaker and MS Access.
   Both are now free and supported by an active developer community.
   To choose between the two databases, you first need to understand your
   scalability limits and whether you need the transaction support of Postgres or
   the large-text-area support in MySQL. You may need both, in which case you have
   to wait for future stable releases of both databases.
   It's interesting to note that the two databases appear to be converging to meet
   in the middle somewhere. While MySQL is working on adding transaction support and
   slowly adding features like subselects, Postgres is making headway in the
   performance and stability departments.
   Finally, for the hardest-core developers, Postgres could be pretty slick. Foreign
   keys, views, subselects, and transactions can all be pretty cool -- if you need
   them and you will make any use of them. If you don't need them or won't use them,
   then you're probably better off with MySQL and its superior performance.
   --Tim

References

   1. http://www.phpbuilder.com/print/columns/tim20000705-res.php3