J.
Holliday, D. Agrawal and A. El Abbadi
Department
of Computer Science
University
of California, Santa Barbara, CA 93106
Transaction
processing [1,3] is an important application of computer technology. Transactions are used in a wide range of
applications, such as banking systems, airline reservation systems, and
organizational information systems. One
of the primary goals of a transaction processing system is to allow several
users to interact with the data in the system simultaneously while preserving
the illusion that each user is executing alone. In order to do this, users interact with the data through
transactions. In this article, we
present a brief description of standard models for transactions and transaction
processing. Additional information
about this topic is readily available in several books on transaction
processing including [1,3].
A
system that uses transactions can be thought of as a collection of objects or
data resources with a persistent state that satisfies a set of integrity
constraints. For example, in a bank, an
integrity constraint may be that no account can have a negative balance. Users interact with the system by executing
programs that perform the functions desired by the users. The execution of such programs results in a
partially ordered set of read and write operations. This set of operations is called a transaction. For example, a user wishing to withdraw $50
from an account x, may execute a program resulting in the following
transaction:
read_1[x,200]
write_1[x,150] commit_1
We use read_1
[x,v] to refer to a read operation executed by transaction 1, where the
read value is v; write_1[x,v]
refers to a write operation executed by transaction 1 with value v. A transaction terminates by either executing
a commit or an abort operation.
A commit operation implies that the transaction was successful, and
hence all its updates should be incorporated into the persistent state of the
data. An abort operation indicates that
the transaction has failed, and hence requires the system to cancel or abolish
all its effects on the persistent state.
Traditionally,
transactions are expected to satisfy the following four conditions, known as
the ACID properties [2,3]. Atomicity,
refers to the fact that all the operations of a transaction must be treated as
a single unit; hence, either all the operations are executed, or none. Consistency requires a transaction to
be correct, i.e., if executed alone, the transaction takes the system from one
consistent state to another. Note that
a transaction may (or by necessity must) violate some of the integrity
constraints during its execution.
However, once it terminates (commits), it must restore the system to a
consistent state. When transactions are
executed concurrently, the transaction processing system must ensure that the
execution of a set of concurrent and correct transactions also maintains the
consistency of the data. Isolation
requires each transaction to observe consistent data, i.e., not to read the
intermediate results of other transactions.
Finally, durability requires the results of a committed
transaction to be made permanent in the system in spite of failures.
The
ACID properties of transactions are usually ensured using two algorithms or
protocols that ensure execution atomicity, and failure atomicity. Execution atomicity refers to the problem of
ensuring the overall consistency of the data, and hence the consistency
property of transactions, even when they are executed concurrently. Protocols that ensure execution atomicity
are called concurrency control protocols. Failure atomicity ensures the all-or-nothing as well as isolation
and durability properties. Protocols
that ensure failure atomicity are usually referred to as recovery
protocols.
One of the
main objectives of a transaction processing system is controlling and
coordinating the execution of a set of concurrent transactions accessing shared
data and resources. This may also be
stated as requiring different transactions to access shared data without
interfering with each other and still preserving the integrity constraints of
the system.
Consider
a bank account s, and two transactions, each withdrawing $200 when the
account has an initial balance of $300.
We assume that a withdraw transaction is executed by first reading the
value of account s, and then writing it with the new value. The operations of withdrawal transaction 1
are allowed to interleave with the operations of withdrawal transaction 2 as
follows:
read_1[s,300]
read_2[s,300] write_2[s,100] write_1[s,100]
In this
execution, both transactions read the same initial value, and then update the
account with the new value $100. Note
that as far as each transaction is concerned, the integrity constraint of
non-negative accounts is not violated.
However, in the interleaved execution, the overall result does not
reflect the correct execution of two withdraw transactions. In particular, if the two transactions were
not interleaved, the second of the two transactions would have violated the
integrity constraints, and hence would not have been executed. Alternatively, if in the above interleaved
execution the withdraw transactions are replaced with transactions that deposit
$200, both transactions would have written $500. In this case, $200 would have disappeared. Again this is not acceptable. This problem is known as the missing
or lost update problem.
Consider
another type of transaction that transfers money from a savings account to a
checking account. Also, the transfer
transaction was executed concurrently with a balance inquiry transaction that
computes the total amount in both accounts.
In an interleaved execution, the balance transaction could read one
account before the transfer and one account after the transfer and thus compute
the wrong total. This problem is
referred to as the inconsistent retrieval problem.
The
examples above illustrate that simply requiring each transaction to be
individually correct does not guarantee that the interleaved execution of a set
of transactions will be correct. The
system could require transactions to execute one at a time or serially. However, since transactions often require
access to objects that are stored in secondary storage and perhaps at remote
sites, transactions will take a long time to complete. In such an environment, serial execution
will result in extremely poor utilization of system resources. If a pair of transactions does not share a
common data item, interleaving their operations cannot violate any integrity
constraints. Even if a pair of
transactions shares some data items, interleaving the execution of their
operations may be acceptable. Thus the
main notion of correctness for the execution of a set of transactions is serializability[2]. Since a serial execution is correct, we
would like to allow any interleaved execution whose behavior cannot be
distinguished from a serial execution as far as the users of the system are
concerned. A concurrency control
protocol is used to ensure the serializable execution of transactions. Several efficient concurrency control
protocols have been proposed including two-phase locking, timestamp ordering,
optimistic, and serialization graph testing (see Concurrency Control).
A multi-user
transaction processing system must provide the atomic property of transactions
even in the presence of various types of failures. This property is referred to
as failure atomicity. A
transaction could be aborted by its initiator.
For example, if the withdrawal of two bank accounts would result in a
negative balance in one of them then the transaction may be aborted. Aborts may also occur as a result of other
system related faults such as divide-by-zero, memory fault, or deadlock. There can be other kinds of more drastic
failures that may occur in a system, e.g., crash failures, disk or media
failure, network failure, etc. In such
situations the operations executed by interrupted transactions must be “undone”.
That is, the effect of aborted transactions should be as if those transactions
were never executed. Similarly, when a
system failure occurs we must ensure that the effects of committed transactions
are not lost.
We now
discuss how failure atomicity of transactions can be guaranteed when
transactions abort as a result of faults. Note that a user initiated abort can
also be modeled as a fault in the system. Consider, for example, the following
execution of a transaction:
read_1[x,3]
write_1[x,5] write_1[y,25]
If the
next operation of transaction 1 is abort_1 due to a fault, then the
above operations should be undone. That
is, the partial effects of transaction 1 on the persistent state of the data
must be eliminated. Note that there is
no need to perform any undo action for read operations since they do not modify
the state of the data. On the other
hand, write operations must be undone to restore the data to the correct
state. In particular, operation write_1[x,5]
should be undone by restoring the value of object x that existed before write_1[x,5].
Similarly, the value of y before write_1[y,25] should be
restored.
Failure
atomicity of transactions in the absence of concurrency can be easily
accomplished by using a simple bookkeeping technique. Read operations do not require any undo action whereas write
operations require restoration of the before-image of the data objects
involved in the write operations [1,2].
This can be accomplished by requiring that transactions save the
before-image of each object on nonvolatile storage before performing a write
operation on that object. Before-images
are usually stored in a structure called a log on nonvolatile
storage. Guaranteeing failure atomicity
of transactions in a concurrent or distributed environment is more complex. A distributed transaction uses an atomic
commit protocol (see Two Phase Commit and Atomic Commit Protocols) to ensure
that all sites involved commit the transaction or all sites abort the
transaction. In a concurrent
environment, when a transaction aborts, all its effects on the data as well as
other concurrently executing transactions should be eliminated. Consider, for example, the following
execution of two transactions:
read_1[x,3]
write_1[x,6] read_2[x,6] write_2[z,18] commit_2
In the
above execution, if transaction 1 aborts, then we have the following
problem. Since the effects of
transaction 1 must be eliminated from the data, transaction 2 through its read
action read_2[x] observes a value of x that did not exist. The only alternative left now is to withdraw
the commitment of transaction 2.
However, this approach of guaranteeing failure atomicity violates the
property of durability that states that once a transaction commits, all
its effects become permanent. In order
to circumvent this problem, the concurrent execution of transactions is
constrained. The constraint requires
that if a transaction T reads a value of an object written by
transaction T' then the commit action of T can be executed only
after T' has committed.
Executions that satisfy this property are referred to as recoverable
executions. If transactions are
restricted to only read values written by committed transactions, then the
execution is said to avoid cascading aborts because the abort of one
transaction will never necessitate the abort of others. Further constraints can ensure that
restoring the before-image of the objects written by the transaction will
eliminate all the effects of an aborted transaction. These can be stated as requiring that if a transaction T
writes an object, then that object can be read or written by other transactions
only after T has either committed or aborted. Executions that satisfy the above constraints are referred to as strict
executions. Strict executions greatly
simplify the task of ensuring failure atomicity; however, it is at the expense
of reduced concurrency.
Now we
consider the problem of ensuring failure atomicity of transactions in the
presence of system failures.
Transaction processing systems usually use two types of memory,
volatile, which is usually small but has fast access time, and nonvolatile,
such as a disk, which has slower access time, but has much larger
capacity. The data is stored in
nonvolatile memory, and part of the data is cached or buffered in the volatile
memory. A transaction usually executes
by reading and updating the cache.
Ideally, when a transaction commits, all its changes should be
incorporated into the data on nonvolatile memory. Since disk accesses are generally slower, coherency between the
cache and the data itself is not strictly maintained. Hence, the nonvolatile data may contain data written by
uncommitted transactions (which may potentially abort) and the cache may
contain data, which has not yet been incorporated into the nonvolatile memory,
but that was written by committed transactions. A system failure is modeled by assuming that the state of the
system in the volatile memory is lost, but the state in the nonvolatile memory
survives following the system failure.
The recovery from system failure involves the following:
1. The effects of transactions that
were in committed state at the time of the system failure must be incorporated
into the data.
2. The effects of transactions that
were aborted or were active at the time of the system failure must be
eliminated. Note that transactions that were active when the failure occurs are
considered aborted since their internal state is lost due to the failure.
Recovery
from system failures can be categorized into two processes:
·
Redo Process
-- For each committed transaction, redo the write operations that are not
incorporated into the nonvolatile data.
·
Undo
Process -- For each active and aborted transaction, undo the write operations
that were incorporated into the nonvolatile data.
In
order to successfully recover from system failures, additional bookkeeping
needs to be performed. In particular,
the system maintains on nonvolatile storage, a list of committed transactions,
a list of aborted transactions, and a log of write operations performed on each
data object. A restart procedure is
invoked when the system recovers after a system failure and the procedure
recovers the persistent state by using the two lists and the log. A comprehensive treatment for recovery from
system failures can be found in [2,4].
[1] P.
Bernstein and E. Newcomer,
Principles
of Transaction Processing,
Morgan
Kaufmann, 1997.
[2] P.
Bernstein, V. Hadzilacos, and N. Goodman,
Concurrency
Control and Recovery in Database Systems,
Addison
Wesley, Reading, MA. 1987
[3] A.
Elmagarmid, editor,
Database
Transaction Models for Advanced Applications,
Morgan
Kaufmann, 1995
[4] J.
Gray and A. Reuter,
Transaction
Processing: Concepts and Techniques,
Morgan
Kaufmann, 1993
ACID
properties see Transaction Processing
Durability
see Transaction Processing
Execution
Atomicity see Transaction Processing
Failure
Atomicity see Transaction Processing
Isolation
see Transaction Processing
Redo
Process see Transaction Processing
Transaction
Failures see Transaction Processing
Undo
Process see Transaction Processing