Functions of an RDBMS

Prev | Next

In a world of individual users and small data sets the flat file (a standard text file) would probably be a reasonable way to store all your data. Who is not charmed by a file that looks like:
  
   Tristan Cohen, 123 Kukamunga Drive, Charleston, NC, USA
   Bob Lablaw, 251 Street Road, Yangtze-ville, ShuangHuan Provice, CHINA
   Elvis Presley, 12 Peanut Butter and Banana Road, Graceland, NE, JAPAN

This file stores basic data about a group of three users and is adequate for simple storage. What happens to this file when you have 10,000 people trying to edit it at the same time? Even worse, if you've solved the delicate locking semantics issue .. what happens when the file is 10 million lines long and you want to find out the address of 'Jim Beam'? The RDBMS aims to solve this issue for you. Keeping with the Open Source ideal the people at OpenACS focus their efforts on working with PostgreSQL, the world's first and foremost ACID-compliant RDBMS. They also support ORACLE, the world's most commonly deployed commercial RDBMS.

People often come up to me on the street and ask, "Tristan, what is an ACID-compliant RDBMS?". Not wanting to write what has been already said better before I would point them to Philip's description of ACID compliance. I have kindly included the excerpt here.

Atomicity

Results of a transaction's execution are either all committed or all rolled back. All changes take effect, or none do. That means, for Joe User's money transfer, that both his savings and checking balances are adjusted or neither are.

Consistency

The database is transformed from one valid state to another valid state. This defines a transaction as legal only if it obeys user-defined integrity constraints. Illegal transactions aren't allowed and, if an integrity constraint can't be satisfied then the transaction is rolled back. For example, suppose that you define a rule that, after a transfer of more than $10,000 out of the country, a row is added to an audit table so that you can prepare a legally required report for the IRS. Perhaps for performance reasons that audit table is stored on a separate disk from the rest of the database. If the audit table's disk is off-line and can't be written, the transaction is aborted.

Isolation

The results of a transaction are invisible to other transactions until the transaction is complete. For example, if you are running an accounting report at the same time that Joe is transferring money, the accounting report program will either see the balances before Joe transferred the money or after, but never the intermediate state where checking has been credited but savings not yet debited.

Durability

Once committed (completed), the results of a transaction are permanent and survive future system and media failures. If the airline reservation system computer gives you seat 22A and crashes a millisecond later, it won't have forgotten that you are sitting in 22A and also give it to someone else. Furthermore, if a programmer spills coffee into a disk drive, it will be possible to install a new disk and recover the transactions up to the coffee spill, showing that you had seat 22A.

pilfered from here

Having an ACID RBDMS is no guarantee that it is good for the web. Even though this guarantees the integrity of your data, it says nothing about retrieving data, or the speed of it's implementation.

SQL is a declarative language defined by an occasionally refined ANSI standard. Most programming languages are imperative, meaning that you tell the computer how you want it to accomplish a particular task. SQL is declarative meaning you say what you want and you leave the database engine to figuring out how to get the information. The standard as defined by the kind folks at ANSI is constantly ignored by each of the RDBMS manufacturers. It is a large and difficult standard to implement so admittedly this is a tough job for these companies/organizations to keep up with. That being said, this does not stop them from updating the standard every few years. No RDBMS is entirely ANSI-92 (nee 1992), and they are even less ANSI-99 (nee 1999) compliant. Working under the assumption that standards are built to be largely ignored companies like ORACLE actually implement their own versions of SQL which have important features (eg. OUTER JOINS), but they are implemented using custom syntax. This means that while ANSI-SQL is supposed to make most data models and queries cross platform, in practice SQL must be rewritten for each RDBMS.

Another major issue in RDBMS's is indexing. Even if a hacker is experienced in multi-process/thread environments and can implement a proper ACID-compliant database, that doesn't mean they are familiar the mathematical algorithms needed to make data retrieval fast. Once you have a million users who have each on average purchased 5 items it becomes difficult to determine who has purchased 2 or more "Teddy Ruxpins" in under a second. The quality of a RDMS's indexing implementation is all important if your site will ever have even a medium amount of data.

Another major issue facing someone trying to choose an RDBMS is it's behaviour when dealing with simultaneous writes and reads. One way to implement transactions is to perform table wide locks. This means that when someone is writing to a table no one can write or read from that table. Sure this guarantees Isolation, but it also means that when someone is doing a huge hour long transaction your database a part of your database may be effectively locked down. For a while even major RDBMS's like Microsoft's SQL Server had this type of locking. Both ORACLE and PostgreSQL implement what is called "row-level locking". While a table's row is locked (which is only while it is being written to) no one else can write to this row. As an individual row-write is fast, this means that the wait time is near 0. These database implementations guarantee that readers don't wait for writers, so go on, be inefficient, have lots of 1 hour transactions, readers won't wait a second (although your redo logs may become swollen, but that is another matter).

tristancohen@yahoo.com