-*- coding: utf-8 -*- the following is from MySQL and PostgreSQL Compared 2000-07-30 By Tim Perdue http://www.phpbuilder.com/print/columns/tim20000705.php3 accessed on 2012-09-21 ssss--------------------------------------------------- picture of Tim Perdue Which database do I use: Postgres or MySQL? This age-old question has plagued developers for, what, at least a couple years now. I've used both databases extensively (MySQL for about one year and Postgres for about 2 years) and was curious if the performance differences between the two were as stark as the MySQL website suggests. I had actually benchmarked the two databases back in September 1999 when we were starting to lay the groundwork for SourceForge. At the time, the performance difference was so stark that we had to go with MySQL even though I had always used Postgres for all my work. The rest of the developers were used to MySQL and that pretty much cinched the decision. This time around, rather than using some contrived benchmarking scheme, I wanted to grab a "real life" web page from a "real" web site and see how it performed on the two databases. The page in question was the discussion forum on SourceForge. It involves some relatively straightforward joins of three tables, each with 20-30,000 rows of data. It also involves some recursive queries to show nested messages, so the database is the true bottleneck on this page, not PHP. To get started, I dumped real data out of the production database, modified the table SQL and imported it all into MySQL 3.22.30 and PostgreSQL 7.0.2 on Red Hat Linux 6.2 and a VA Linux quad-xeon 4100 server with 1GB RAM. The first problem I ran into was that Postgres has an arcane limit of 8k of data per row. In a message board, you're going to occasionally surpass 8k of data in a row, and so postgres choked on the import. To get around this, I just dropped out the "body" of the message and re-imported the data. The Postgres development team is aware of this limitation and are fixing it in v7.1, and they also noted that you can recompile Postgres to support up to 32k, although at a possible detriment to overall performance. At this point, I ran into another small issue with Postgres - its "serial" data type (the equivalent of MySQL's auto_increment) creates a "sequence" which does not get dropped when its parent table is dropped. So if you try to re-create the table, you'll get a name conflict for this sequence. A lot of new users would be confused by this, so Postgres loses a couple points for that. Also, MySQL is "smart" enough to increment its auto_increment value when you import data, whereas Postgres' sequence does not get reset when you import data, causing all new inserts to fail. Methodology To try to make this as realistic as possible, I took an actual page from a website and made it portable across both MySQL and Postgres. This basically meant replacing all mysql_query() calls with pg_exec(). This page involves a lot of selects and joins, as probably most pages on a typical website do. Once the test page was up and debugged, I then ran "ab", the "Apache Benchmarking" utility, from my workstation across my 100-mbit LAN to the quad-xeon machine. To get an idea of scalability under load, I varied the "concurrent connections" on ab from 10-120, while leaving the number of page views steady at 1000. To more closely simulate real-world use, I set up a random-number generator in the script that inserts a row into the database on 10% of the page views. My own numbers on PHPBuilder show that about 10% of all pages in the discussion forums are for posting new messages. Further, as mentioned above, I used real data from a production database. You can't get a whole lot more realistic than this scenario. The Numbers [1][Raw Test Results] The most interesting thing about my test results was to see how much of a load Postgres could withstand before giving any errors. In fact, Postgres seemed to scale 3 times higher than MySQL before giving any errors at all. MySQL begins collapsing at about 40-50 concurrent connections, whereas Postgres handily scaled to 120 before balking. My guess is, that Postgres could have gone far past 120 connections with enough memory and CPU. On the surface, this can appear to be a huge win for Postgres, but if you look at the results in more detail, you'll see that Postgres took up to 2-3 times longer to generate each page, so it needs to scale 2-3 times higher just to break even with MySQL. So in terms of max numbers of pages generated concurrently without giving errors, it's pretty much a dead heat between the two databases. In terms of generating one page at a time, MySQL does it up to 2-3 times faster. Another interesting point was that MySQL crumbles faster in the "10% insert" test described above. Research reveals that MySQL locks the entire table when an insert occurs, while Postgres has a pretty nifty "better than row-level locking" feature. This difference quickly causes MySQL to pile up concurrent connections and thus collapse. The same is true if you are doing a large select out of a database while another process is inserting into that table. Postgres is completely unfazed, while MySQL piles up connections until it falls apart like a house of cards. For those of you wondering about persistent connections in PHP, they don't appear to benefit MySQL that much, whereas they are a clear boon for Postgres. In fact, Postgres benchmarked as much as 30% faster just by using persistent connections. That tells me that Postgres has a tremendous amount of overhead in its connection-opening and authentication process. Some of this may be the fault of Linux and its relatively lame process scheduler. Still, MySQL on the same box beat it handily no matter how you look at it. MySQL The numbers for MySQL ring true with what most people already know: it's a fast, although lightweight database that will probably serve well for the vast majority of web sites. However, if you plan on having a high-traffic site (say, greater than 500,000 pages per day), then forget MySQL as it can tend to fold up and die under load. Anyone who has ever visited slashdot can attest to the fragility of its setup (mod_perl and MySQL). But again, the vast, vast majority of web sites fall well under the 15-pages per second demonstrated by MySQL here. If you ever surpass a sustained 15 pages per second, you'll be delighted to fork over the cash for a bigger server or an Oracle license. Wins Obviously, the advantage MySQL has over Postgres is performance. It also has some more powerful admin tools included in the distribution (mysqladmin allows you to watch processes and queries in-progress), like hot backup, a file corruption recovery tool and a couple others. I'm also a fan of MySQL's command-line tools. You can see database and table structures using describe and show commands. Postgres' commands are less obvious ( \d to show a list of tables for instance). Limitations The first thing you hear from hard-core database gurus is that MySQL lacks transactions, rollbacks, and subselects. You'll really miss transactions if you're trying to write a banking application, accounting application, or trying to maintain some sort of counter that needs to increment linearly over time. Forget attempting any of those with released versions of MySQL (it should be noted that the unstable 3.23.x series of MySQL now includes transaction support). For many, if not most, web sites out there, MySQL's limitations can be overcome with a little elbow grease on the part of the developer. The primary feature you'll miss in MySQL is powerful subselect syntax that is present in almost every other production database. If I had a nickle for every time I could've used subselects in MySQL, I'd be able to buy a case or two of beer. In other words, this missing feature can be a pain in the neck, but it can be overcome. Stability MySQL loses points in the long-term stability department. Simply put, MySQL gives up the ghost randomly and for no obvious reason after running for semi-long periods of time (say 30-60 days). Many developers will compile MySQL "statically" for just that reason, and doing so has helped some people. That problem again can be overcome with a good pager or a simple crontab entry that kills and restarts MySQL monthly. Not that I find that at all acceptable, but it is a solution. Where MySQL loses points in the daemon robustness department, it makes up for it by apparently never corrupting its data files. The last thing you want is your precious data files fouled randomly, and MySQL does well here. In over a year of running MySQL, I haven't ever seen a single case of database or index corruption. In the same timeframe, I have done 2 or 3 recoveries of a couple different Postgres databases. (Regardless, backups are always your best friend, as shown by the database fiasco here on PHPBuilder.) PostgreSQL The results for Postgres might surprise a few people, as Postgres has somewhat of a negative reputation among some web developers (initial releases of Postgres had widely-rumored issues in addition to laggard performance). According to my experience, and these benchmarks, most of that reputation is unfounded. In fact, it appears that PostgreSQL withstands up to 3 times the load that MySQL can before throwing any errors -- on the same hardware/OS combination. Postgres happily chugs along at roughly 10 pages/second, enough to serve about 400,000 pages/day, assuming a regular traffic curve with the peak at 2x the bottom. That's an awful lot of pages and is far beyond what most people will see on their websites. In addition, most of the pages on your site will not be as complex as the one in this test. As with MySQL, you'll be happy to pay for a hardware upgrade if you pass this ceiling. Because of Postgres' architecture, it could probably continue to scale up the more processors and RAM you give it. Wins Well, postgres has some extremely advances features when shown next to MySQL. While I don't use most of the features myself, they are available for the truly-hardcore developers out there. Many developers don't even realize what they're missing by not having some of these features available. An example of where you should be using a transaction is if you are doing more than one update/insert/delete in a sequence. For instance, your script inserts a new user into your user table, then also inserts a row in another table, and you update a flag somewhere else. In this case, if the first insert succeeds, but the second fails, what do you do? With Postgres, you could Rollback the entire operation and show an appropriate error. With MySQL, you would wind up in an invalid state, unless you program in a bunch of logic to handle the situation. In real-world use, most queries don't fail unless you're a lousy programmer, and if the second query did fail, the results may not be dire (unless we're talking about an accounting/banking/critical application where there can be no risk of incorrect data). Anyway, foreign-key support is now in Postgres 7.0+, which means that when you insert a row, the database can do some fairly impressive validation checks. Same if you delete a row - it just plain won't let you delete a row if another table is depending on it. I love this idea and can envision rewriting entire websites just to take advantage of this feature. Triggers and views are interesting and powerful tools that can be used in Postgres, but not MySQL. I haven't used either one, but I can think of a hundred uses for Views if I were to redesign SourceForge from the ground up on Postgres. Limitations The primary limitation with Postgres is not its performance (as most web sites will never run into that barrier), but hard-coded limits like the 8k row size limit (which probably dates back to its earliest days). When I designed Geocrawler.com on Postgres, I had to segment large emails into 8k chunks to work around this lame limitation. Also, by default, Postgres is compiled to only support 32 connections, which is not enough for a high-traffic web site, especially when you consider that postgres delivers each page much more slowly than MySQL. One other limitation may bug a lot of PHP users - Postgres has no equivalent to MySQL's mysql_insertid() function call. That is, if you insert a row into a MySQL database, MySQL will hand you back the primary key ID for that row. There is an extremely round-about way of doing this in Postgres, but it's a headache and is probably slow if used a lot. Stability Postgres will run smoothly for extended periods of time without trouble. My Postgres 6.5.3 install has run for 90 days without blinking on my tired old PowerMac 8500, while getting about 50-100,000 pages per day. And when postgres gets loaded, it just bogs down, it doesn't quit and give up the ghost under stress. The problem with Postgres is that when you do have a problem with it, it's usually really bad. Like a fubar database file or, more commonly, a corrupted index (which can frequently be dropped/rebuilt). I have encountered other serious problems with older versions of postgres (6.4.x) where multiple same numbers were inserted into a primary key (something that should be impossible under any circumstance). There have also been problems with Postgres where you can wind up with "half-baked" indexes, tables, etc that you cannot drop or get rid of. I have not seen these yet on Postgres 7, but I haven't used it enough to know. Conclusion These tests pretty much confirmed what I already knew - both databases serve quite well for the vast majority of web sites out there. Both are actually extremely fast when compared to desktop databases like FileMaker and MS Access. Both are now free and supported by an active developer community. To choose between the two databases, you first need to understand your scalability limits and whether you need the transaction support of Postgres or the large-text-area support in MySQL. You may need both, in which case you have to wait for future stable releases of both databases. It's interesting to note that the two databases appear to be converging to meet in the middle somewhere. While MySQL is working on adding transaction support and slowly adding features like subselects, Postgres is making headway in the performance and stability departments. Finally, for the hardest-core developers, Postgres could be pretty slick. Foreign keys, views, subselects, and transactions can all be pretty cool -- if you need them and you will make any use of them. If you don't need them or won't use them, then you're probably better off with MySQL and its superior performance. --Tim References 1. http://www.phpbuilder.com/print/columns/tim20000705-res.php3