Quick Database Comparison |
Prev | Next |
After giving this presentation several people seemed interested in discussing the relative merits of several major databases. Although I am not an expert is any of the three databases I will discuss (I'm good with ORACLE, but could not be a Senior DBA), I do keep my ear to the ground, and have a reasonable idea of the strengths and weaknesses of the three I will discuss.
MySQLMySQL is the most commonly deployed Open Source database in the world. It is used by sites as popular as Slashdot which has hundreds of thousands of readers and thousands of writers a day. Some people think that MySQL is the greatest thing since sliced bread and swear by it for every project. This is probably because they haven't worked on a database with hundreds of thousands of writes per hour, the sort of situation which taxes industrial strength databases.ORACLE
MySQL is under massive development and seems to pop up important new features every few months. In the time it takes for you to request this page some of my criticisms may already have been addressed in some bleeding edge version of MySQL.
The Good: MySQL is quite possibly the fastest database out there for readers. It is blazing fast with small and medium size data sets, I cannot say how it stands up to really large data sets, but I've heard rumour that it doesn't have a cap on the size of the database. As I said it is constantly under development and new features are constantly cropping up. Given that it is so widely deployed there is a lot of knowledge about how to contend with it's shortcomings, and these changes probably get a lot of bug reports and fixes.
The Bad: MySQL was for a while far behind the SQL-92 standard in features. It did not have outer joins, or subselects two very important features for anyone dealing with some seriously funky data models. I believe that both these features have been released, or are at least scheduled for release. It was not ACID compliant, meaning that it did not support transactions. While there aren't a ton of problems which require transactions, when you do need them you need them fairly badly. It's worst feature is still a problem, and one that I don't see being solved anytime soon .. that is it's liberal use of table level locking. It's all fine and good that reads are blazing fast, but if readers have to wait for writers then this database will always be relegated to the minor leagues. If you've been paying attention to all my talk about Slashdot you will inevitably say, "Slashdot gets a lot of writes and it seems to scale". As I have not reviewed the SlashCode (and probably never will as that much PERL makes my skin crawl) I cannot say with any certainty how they manage to scale that much. My guess would be that it involves clever caching, that they don't have lots of sustained writes, and even if they have 20,000 writes a day, that only amounts to 1 write every 5 seconds which is still not that many.When I used to work for arsDigita we used our close integration with ORACLE as a major selling point. ORACLE is the number one proprietary database in the world (assuming you don't count toy databases like MS ACCESS). It is a workhorse able to stand up to some serious stress almost out of the box. As wonderful as this all sounds, the truth is that once your database starts getting real usage you will inevitably need to hire a DBA. arsDigita didn't have sufficient DBA talent, so it was rare that one of our installations could really hum like a totally tuned ORACLE database can.PostgreSQL
The Good: ORACLE has (pretty much) rightfully earned its place at the head of the pack. It is capable of handling 99.9% of database backed websites needs. It is also overkill for 90% of database backed websites needs, which only proves how dumb most corporations are when it comes to money. It comes with a huge number of features, and even an application server on which to build your Java based website. It has pretty much always been at the head of the pack for implementing important features, and even farther ahead at advertising features it only partially has. Wisely ORACLE has put its money where it's mouth is and uses its own database to manage ORACLE corporation. For the ridiculously huge databases it includes some replication, clustering, and materialized view functionality, none of which I've worked with personally, all of which intimidate me, and all of which probably require a very good DBA to get working properly.
The Bad: Let's play the odds here, your website probably doesn't require ORACLE, and your company probably doesn't want to pay 5k+$ a CPU to install it on your server. Therefore for your needs this enormous pile of bytes is like using a cruise missile to slaughter livestock. Speaking of price, the cost of a good DBA is 80k+$ a year, which definitely increases your variable costs. For those of you who are attracted to OSS for political reasons, it's completely proprietary, and even as Larry Ellison (CEO of ORACLE) says to the world that Linux will eventually beat down Microsoft you don't see him opening ORACLE's source. While ORACLE is as feature rich as anything out there, it's closed source nature means that you won't be adding that needed piece of functionality. Unlike the other OSS databases whose best interest lays in implementing the ANSI-92 and 99 standards, ORACLE can probably ignore that as they have such huge market share.
The Unclear: ORACLE has a great system for using multiple disks to allow mirroring (automatic backup) and to accelerate writes and reads. The only problem is that the optimal setup requires 44 SCSI drives. From an administration and equipment perspective that is an awful lot of hardware to watch over per database.Although I've only played around with PostgreSQL and have yet to use it on a major project, I have used it some and am familiar with some of its strengths/defects. As mentioned earlier it is the first ACID compliant RDBMS (I believe). OpenACS is "meant" to be used with PostgreSQL and is optimized for that purpose.Conclusion
The Good: In case you haven't guessed, ACID compliance is a big deal to me. Once you start working with money transactions become a whole lot more important. PostgreSQL has good write support as readers don't have to wait for writers. Unlike ORACLE which can be tuned to meet your most every need, PostgreSQL doesn't get that much faster with tuning. This is a blessing as your sysadmin can probably learn enough to do most of the administration, but also a curse as your performance will never be equal to that of ORACLE. Take consolation in the fact that as a user you pay 0$ for PostgreSQL.
The Bad: Slower then both ORACLE and MySQL, PostgreSQL will never win when it comes to data retrieval. This is not to say that it is slow, but that it is merely adequate. It's capabilities to stand up to writers and long transactions more then compensates (in my mind). It is not as widely deployed, or as widely developed on as MySQL. This may mean that in the long run MySQL will overtake it in important features. While the documentation seems to indicate that PostgreSQL can stand up to large volumes of data I have yet to see any tests which demonstrate it's performance with 10 million row tables. While this may be debatable, it's lack of a CONNECT BY feature (a SQL feature handy for dealing with trees) may be viewed by some as unfortunate. I personally hate CONNECT BY and find it to be more trouble then it is worth, but some people may enjoy the confusion it usually brings.Ignoring the compatibility with existing software, which of these do think is the best? It is totally a question of your situation. If you are going to have a situation where your database doesn't really need ACID-compliance and if it's going to be write-light then MySQL is a natural choice. If on the other hand you're not going to have a huge database and your numbers of writes versus reads is unknown (or if it's not going to be write-light) then PostgreSQL is probably your best choice. If you've got money to spare, and your database is going to store an enormous amount of data then ORACLE is definitely a good choice. If you have lots of money to spare and are serious about having the sort of database that you could run an airline reservation system on, then use IBM's DB2. My friend, an ORACLE DBA, who swears that most databases "Suck donkeyballs through nanotubes" thinks that DB2 is better for a totally ridiculous project, the kind of which any self-respecting software engineer might love to try, but hate to get.