J. Holliday, D. Agrawal and A. El Abbadi
Department of Computer Science
University of California, Santa Barbara, CA 93106
Transaction processing [1,3] is an important application of computer technology. Transactions are used in a wide range of applications, such as banking systems, airline reservation systems, and organizational information systems. One of the primary goals of a transaction processing system is to allow several users to interact with the data in the system simultaneously while preserving the illusion that each user is executing alone. In order to do this, users interact with the data through transactions. In this article, we present a brief description of standard models for transactions and transaction processing. Additional information about this topic is readily available in several books on transaction processing including [1,3].
A system that uses transactions can be thought of as a collection of objects or data resources with a persistent state that satisfies a set of integrity constraints. For example, in a bank, an integrity constraint may be that no account can have a negative balance. Users interact with the system by executing programs that perform the functions desired by the users. The execution of such programs results in a partially ordered set of read and write operations. This set of operations is called a transaction. For example, a user wishing to withdraw $50 from an account x, may execute a program resulting in the following transaction:
read_1[x,200] write_1[x,150] commit_1
We use read_1 [x,v] to refer to a read operation executed by transaction 1, where the read value is v; write_1[x,v] refers to a write operation executed by transaction 1 with value v. A transaction terminates by either executing a commit or an abort operation. A commit operation implies that the transaction was successful, and hence all its updates should be incorporated into the persistent state of the data. An abort operation indicates that the transaction has failed, and hence requires the system to cancel or abolish all its effects on the persistent state.
Traditionally, transactions are expected to satisfy the following four conditions, known as the ACID properties [2,3]. Atomicity, refers to the fact that all the operations of a transaction must be treated as a single unit; hence, either all the operations are executed, or none. Consistency requires a transaction to be correct, i.e., if executed alone, the transaction takes the system from one consistent state to another. Note that a transaction may (or by necessity must) violate some of the integrity constraints during its execution. However, once it terminates (commits), it must restore the system to a consistent state. When transactions are executed concurrently, the transaction processing system must ensure that the execution of a set of concurrent and correct transactions also maintains the consistency of the data. Isolation requires each transaction to observe consistent data, i.e., not to read the intermediate results of other transactions. Finally, durability requires the results of a committed transaction to be made permanent in the system in spite of failures.
The ACID properties of transactions are usually ensured using two algorithms or protocols that ensure execution atomicity, and failure atomicity. Execution atomicity refers to the problem of ensuring the overall consistency of the data, and hence the consistency property of transactions, even when they are executed concurrently. Protocols that ensure execution atomicity are called concurrency control protocols. Failure atomicity ensures the all-or-nothing as well as isolation and durability properties. Protocols that ensure failure atomicity are usually referred to as recovery protocols.
One of the main objectives of a transaction processing system is controlling and coordinating the execution of a set of concurrent transactions accessing shared data and resources. This may also be stated as requiring different transactions to access shared data without interfering with each other and still preserving the integrity constraints of the system.
Consider a bank account s, and two transactions, each withdrawing $200 when the account has an initial balance of $300. We assume that a withdraw transaction is executed by first reading the value of account s, and then writing it with the new value. The operations of withdrawal transaction 1 are allowed to interleave with the operations of withdrawal transaction 2 as follows:
read_1[s,300] read_2[s,300] write_2[s,100] write_1[s,100]
In this execution, both transactions read the same initial value, and then update the account with the new value $100. Note that as far as each transaction is concerned, the integrity constraint of non-negative accounts is not violated. However, in the interleaved execution, the overall result does not reflect the correct execution of two withdraw transactions. In particular, if the two transactions were not interleaved, the second of the two transactions would have violated the integrity constraints, and hence would not have been executed. Alternatively, if in the above interleaved execution the withdraw transactions are replaced with transactions that deposit $200, both transactions would have written $500. In this case, $200 would have disappeared. Again this is not acceptable. This problem is known as the missing or lost update problem.
Consider another type of transaction that transfers money from a savings account to a checking account. Also, the transfer transaction was executed concurrently with a balance inquiry transaction that computes the total amount in both accounts. In an interleaved execution, the balance transaction could read one account before the transfer and one account after the transfer and thus compute the wrong total. This problem is referred to as the inconsistent retrieval problem.
The examples above illustrate that simply requiring each transaction to be individually correct does not guarantee that the interleaved execution of a set of transactions will be correct. The system could require transactions to execute one at a time or serially. However, since transactions often require access to objects that are stored in secondary storage and perhaps at remote sites, transactions will take a long time to complete. In such an environment, serial execution will result in extremely poor utilization of system resources. If a pair of transactions does not share a common data item, interleaving their operations cannot violate any integrity constraints. Even if a pair of transactions shares some data items, interleaving the execution of their operations may be acceptable. Thus the main notion of correctness for the execution of a set of transactions is serializability. Since a serial execution is correct, we would like to allow any interleaved execution whose behavior cannot be distinguished from a serial execution as far as the users of the system are concerned. A concurrency control protocol is used to ensure the serializable execution of transactions. Several efficient concurrency control protocols have been proposed including two-phase locking, timestamp ordering, optimistic, and serialization graph testing (see Concurrency Control).
A multi-user transaction processing system must provide the atomic property of transactions even in the presence of various types of failures. This property is referred to as failure atomicity. A transaction could be aborted by its initiator. For example, if the withdrawal of two bank accounts would result in a negative balance in one of them then the transaction may be aborted. Aborts may also occur as a result of other system related faults such as divide-by-zero, memory fault, or deadlock. There can be other kinds of more drastic failures that may occur in a system, e.g., crash failures, disk or media failure, network failure, etc. In such situations the operations executed by interrupted transactions must be “undone”. That is, the effect of aborted transactions should be as if those transactions were never executed. Similarly, when a system failure occurs we must ensure that the effects of committed transactions are not lost.
We now discuss how failure atomicity of transactions can be guaranteed when transactions abort as a result of faults. Note that a user initiated abort can also be modeled as a fault in the system. Consider, for example, the following execution of a transaction:
read_1[x,3] write_1[x,5] write_1[y,25]
If the next operation of transaction 1 is abort_1 due to a fault, then the above operations should be undone. That is, the partial effects of transaction 1 on the persistent state of the data must be eliminated. Note that there is no need to perform any undo action for read operations since they do not modify the state of the data. On the other hand, write operations must be undone to restore the data to the correct state. In particular, operation write_1[x,5] should be undone by restoring the value of object x that existed before write_1[x,5]. Similarly, the value of y before write_1[y,25] should be restored.
Failure atomicity of transactions in the absence of concurrency can be easily accomplished by using a simple bookkeeping technique. Read operations do not require any undo action whereas write operations require restoration of the before-image of the data objects involved in the write operations [1,2]. This can be accomplished by requiring that transactions save the before-image of each object on nonvolatile storage before performing a write operation on that object. Before-images are usually stored in a structure called a log on nonvolatile storage. Guaranteeing failure atomicity of transactions in a concurrent or distributed environment is more complex. A distributed transaction uses an atomic commit protocol (see Two Phase Commit and Atomic Commit Protocols) to ensure that all sites involved commit the transaction or all sites abort the transaction. In a concurrent environment, when a transaction aborts, all its effects on the data as well as other concurrently executing transactions should be eliminated. Consider, for example, the following execution of two transactions:
read_1[x,3] write_1[x,6] read_2[x,6] write_2[z,18] commit_2
In the above execution, if transaction 1 aborts, then we have the following problem. Since the effects of transaction 1 must be eliminated from the data, transaction 2 through its read action read_2[x] observes a value of x that did not exist. The only alternative left now is to withdraw the commitment of transaction 2. However, this approach of guaranteeing failure atomicity violates the property of durability that states that once a transaction commits, all its effects become permanent. In order to circumvent this problem, the concurrent execution of transactions is constrained. The constraint requires that if a transaction T reads a value of an object written by transaction T' then the commit action of T can be executed only after T' has committed. Executions that satisfy this property are referred to as recoverable executions. If transactions are restricted to only read values written by committed transactions, then the execution is said to avoid cascading aborts because the abort of one transaction will never necessitate the abort of others. Further constraints can ensure that restoring the before-image of the objects written by the transaction will eliminate all the effects of an aborted transaction. These can be stated as requiring that if a transaction T writes an object, then that object can be read or written by other transactions only after T has either committed or aborted. Executions that satisfy the above constraints are referred to as strict executions. Strict executions greatly simplify the task of ensuring failure atomicity; however, it is at the expense of reduced concurrency.
Now we consider the problem of ensuring failure atomicity of transactions in the presence of system failures. Transaction processing systems usually use two types of memory, volatile, which is usually small but has fast access time, and nonvolatile, such as a disk, which has slower access time, but has much larger capacity. The data is stored in nonvolatile memory, and part of the data is cached or buffered in the volatile memory. A transaction usually executes by reading and updating the cache. Ideally, when a transaction commits, all its changes should be incorporated into the data on nonvolatile memory. Since disk accesses are generally slower, coherency between the cache and the data itself is not strictly maintained. Hence, the nonvolatile data may contain data written by uncommitted transactions (which may potentially abort) and the cache may contain data, which has not yet been incorporated into the nonvolatile memory, but that was written by committed transactions. A system failure is modeled by assuming that the state of the system in the volatile memory is lost, but the state in the nonvolatile memory survives following the system failure. The recovery from system failure involves the following:
1. The effects of transactions that were in committed state at the time of the system failure must be incorporated into the data.
2. The effects of transactions that were aborted or were active at the time of the system failure must be eliminated. Note that transactions that were active when the failure occurs are considered aborted since their internal state is lost due to the failure.
Recovery from system failures can be categorized into two processes:
· Redo Process -- For each committed transaction, redo the write operations that are not incorporated into the nonvolatile data.
· Undo Process -- For each active and aborted transaction, undo the write operations that were incorporated into the nonvolatile data.
In order to successfully recover from system failures, additional bookkeeping needs to be performed. In particular, the system maintains on nonvolatile storage, a list of committed transactions, a list of aborted transactions, and a log of write operations performed on each data object. A restart procedure is invoked when the system recovers after a system failure and the procedure recovers the persistent state by using the two lists and the log. A comprehensive treatment for recovery from system failures can be found in [2,4].
 P. Bernstein and E. Newcomer,
Principles of Transaction Processing,
Morgan Kaufmann, 1997.
 P. Bernstein, V. Hadzilacos, and N. Goodman,
Concurrency Control and Recovery in Database Systems,
Addison Wesley, Reading, MA. 1987
 A. Elmagarmid, editor,
Database Transaction Models for Advanced Applications,
Morgan Kaufmann, 1995
 J. Gray and A. Reuter,
Transaction Processing: Concepts and Techniques,
Morgan Kaufmann, 1993
ACID properties see Transaction Processing
Durability see Transaction Processing
Execution Atomicity see Transaction Processing
Failure Atomicity see Transaction Processing
Isolation see Transaction Processing
Redo Process see Transaction Processing
Transaction Failures see Transaction Processing
Undo Process see Transaction Processing