跳转至

Chapter 15 Concurrency Control Theory

A DBMS's concurrency control and recovery components permeate(贯穿) throughout the design of its entire architecture.

alt text

alt text

TRANSACTIONS

A transaction is the execution of a sequence of one or more operations (e.g., SQL queries) on a database to perform some higher-level function.

For example, "fetch $100 from A bank and deposit it into B bank" is a transaction.

It is the basic unit of change in a DBMS:

  • Partial transactions are not allowed!

For example, just "fetch $100 from A bank" or "deposit it into B bank" is a partial transaction and it's forbidden.

STRAWMAN SYSTEM

Execute each txn one-by-one (i.e., serial order) as they arrive at the DBMS.

  • One and only one txn can be running at the same time in the DBMS.

Before a txn starts, copy the entire database to a new file (tmp_file) and make all changes to that file (tmp_file).

  • If the txn completes successfully, overwrite the original file with the new one (tmp_file replaces ori_file).
  • If the txn fails, just remove the dirty copy (still ori_file).

A (potentially) better approach is to allow concurrent execution of independent transactions.

Arbitrary(任意的) interleaving(交错) of operations can lead to:

  • Temporary Inconsistency (ok, unavoidable)
  • Permanent Inconsistency (bad!)

We need formal correctness criteria to determine whether an interleaving is valid.

在并发操作中,操作的任意交错可能会导致临时不一致(这是不可避免的,可以接受),但如果不加以控制,可能会导致永久性不一致(这是非常糟糕的情况)。为了避免这种情况,我们需要制定正式的正确性标准来判断操作的交错是否有效。

DEFINITIONS

A txn may carry out many operations on the data retrieved from the database

The DBMS is only concerned about what data is read/written from/to the database.

  • Changes to the "outside world" are beyond the scope of the DBMS.

FORMAL DEFINITIONS

Database: A fixed set of named data objects (e.g., A, B, C, ......)

  • We do not need to define what these objects are now.
  • We will discuss how to handle inserts/deletes next week.

Transaction: A sequence of read and write operations (R(A), W(B), ......)

  • DBMS's abstract view of a user program

TRANSACTIONS IN SQL

  1. A new txn starts with the BEGIN command.
  2. The txn stops with either COMMIT or ABORT:
    • If commit, the DBMS either saves all the txn's changes or aborts it.
    • If abort, all changes are undone so that it's like as if the txn never executed at all.

Abort can be either self-inflicted(自身造成的) or caused by the DBMS.

CORRECTNESS CRITERIA: ACID

这些概念是数据库事务(transactions)的ACID特性,描述了事务处理的四个基本属性:原子性、一致性、隔离性和持久性。

这些特性共同保证了数据库在并发和故障情况下的数据完整性和可靠性。

I believe these concepts are of great significance in this chapter!

Atomicity

All actions in txn happen, or none happen.

"All or nothing…"

事务中的所有操作要么全部执行,要么全部不执行。不会出现只执行部分操作的情况。

这意味着事务中的操作要么全部成功,要么全部失败。如果在执行过程中出现错误,事务会回滚,恢复到事务开始前的状态。

Consistency

If each txn is consistent and the DB starts consistent, then it ends up consistent.

"It looks correct to me…"

这意味着事务在执行过程中保持数据的完整性和约束。如果每个事务都能维护数据库的一致性,那么整个数据库系统将始终处于一致状态。

Isolation

Execution of one txn is isolated from that of other txns.

"All by myself…"

这意味着一个事务的中间状态对其他事务是不可见的。事务之间是独立的,保证了并发事务的正确执行。

Durability

If a txn commits, its effects persist.

"I will survive…"

一旦事务提交,它的结果是永久性的,不会因为系统故障而丢失。

这意味着事务一旦提交,其改变将永久保存在数据库中。即使系统崩溃或重启,已提交的事务的结果也不会丢失。

ATOMICITY OF TRANSACTIONS

Two possible outcomes of executing a txn:

  • Commit after completing all its actions.
  • Abort (or be aborted by the DBMS) after executing some actions.

DBMS guarantees that txns are atomic.

  • From user's point of view: txn always either executes all its actions or executes no actions at all.

alt text

Approach #1: Logging

  • DBMS logs all actions so that it can undo the actions of aborted transactions.
  • Maintain undo records both in memory and on disk.
  • Think of this like the black box in airplanes...

Logging is used by almost every DBMS.

  • Audit Trail
  • Efficiency Reasons

Approach #2: Shadow Paging

  • DBMS makes copies of pages and txns make changes to those copies. Only when the txn commits is the page made visible to others.
  • Originally from IBM System R.

Few systems do this, because it wastes:

  • CouchDB
  • Tokyo Cabinet
  • LMDB (OpenLDAP)

CONSISTENCY

The "world" represented by the database is logically correct. All questions asked about the data are given logically correct answers.

DATABASE CONSISTENCY

The database accurately models the real world and follows integrity constraints.

Transactions in the future see the effects of transactions committed in the past inside of the database.

TRANSACTION CONSISTENCY

If the database is consistent before the transaction starts (running alone), it will also be consistent after.

Transaction consistency is the application's responsibility. DBMS cannot control this.

  • We won't discuss this issue further…

MECHANISMS FOR ENSURING ISOLATION

A concurrency control protocol is how the DBMS decides the proper interleaving(交错) of operations from multiple transactions.

Two categories of protocols:

  • Pessimistic: Don't let problems arise in the first place.
  • Optimistic: Assume conflicts are rare, deal with them after they happen.

alt text

There is no guarantee that \(T_1\) will execute before \(T_2\) or vice-versa, if both are submitted together. But the net effect (最终成效) must be equivalent to these two transactions running serially(连续地) in some order.

alt text

alt text

alt text

INTERLEAVING TRANSACTIONS

We interleave txns to maximize concurrency.

  • Slow disk/network I/O.
  • Multi-core CPUs.

When one txn(资源/任务) stalls(停滞) because of a resource (e.g., page fault), another txn can continue executing and make forward progress.

EXAMPLE (GOOD)

alt text

EXAMPLE (BAD)

alt text

alt text

How do we judge whether a schedule is correct

If the schedule is equivalent to some serial execution.

FORMAL PROPERTIES OF SCHEDULES

Serial Schedule(串行调度)

  • A schedule that does not interleave the actions of different transactions.

Equivalent Schedules(等价调度)

  • For any database state, the effect of executing the first schedule is identical to the effect of executing the second schedule.
  • Doesn't matter what the arithmetic operations are!
Note

等价调度意味着虽然两个调度中的操作可能以不同的顺序交错执行,但它们的最终结果是相同的。换句话说,不管事务的操作如何排列,只要执行结果一致,这两个调度就是等价的

Serializable Schedule(可串行化调度)

  • A schedule that is equivalent to some serial execution of the transactions.
  • If each transaction preserves consistency, every serializable schedule preserves consistency.

Serializability is a less intuitive notion of correctness compared to txn initiation time or commit order, but it provides the DBMS with more flexibility in scheduling operations. (More flexibility means better parallelism.)

CONFLICTING OPERATIONS

We need a formal notion of equivalence that can be implemented efficiently based on the notion of "conflicting" operations.

Two operations conflict if:

  • They are by different transactions,
  • They are on the same object and one of them is a write.
Interleaved Execution Anomalies
  • Read-Write Conflicts (R-W)
  • Write-Read Conflicts (W-R)
  • Write-Write Conflicts (W-W)

READ-WRITE CONFLICTS

Unrepeatable Read: Txn gets different values when reading the same object multiple times.

alt text

WRITE-READ CONFLICTS

Dirty Read: One txn reads data written by another txn that has not committed yet.

alt text

WRITE-WRITE CONFLICTS

Lost Update: One txn overwrites uncommitted data from another uncommitted txn.

alt text

CONFLICT SERIALIZABILITY INTUITION

这里介绍一种很简单且直观的避免冲突的设计(事实上,我们用的并不多,因为它实在是太简单了...)

alt text

alt text

alt text

alt text

alt text

但是像下面这样的显然是不行的:

alt text

SERIALIZABILITY

Swapping operations is easy when there are only two txns in the schedule. It's cumbersome(繁琐的) when there are many txns.

Are there any faster algorithms to figure this out other than transposing operations?

DEPENDENCY GRAPHS

alt text

Dependency Graphs Details

What do we need to care about when designing a dependency graph?

  • R and R - It's ok, don't need to care.
  • W and R - Need to care.
  • W and W - Need to care.

Examples

alt text

alt text

alt text

alt text

alt text

alt text

VIEW SERIALIZABILITY

Alternative (broader) notion of serializability.

Schedules \(S_1\) and \(S_2\) are view equivalent if:

  • If \(T_1\) reads initial value of A in \(S_1\) , then \(T_1\) also reads initial value of A in \(S_2\).
  • If \(T_1\) reads value of A written by \(T_2\) in \(S_1\) , then \(T_1\) also reads value of A written by \(T_2\) in \(S_2\).
  • If \(T_1\) writes final value of A in \(S_1\) , then \(T_1\) also writes final value of A in \(S_2\).

alt text

alt text

视图可串行化 (View Serializability)

定义

  • 视图可串行化允许更多的调度,相较于冲突可串行化。这是因为视图可串行化的定义更宽松,只要调度的视图等价于某个串行调度,就认为是可串行化的。

解释

  • 视图可串行化考虑了事务的最终结果是否与某个串行调度的结果相同,而不是关心操作的具体顺序。它关注的是读取和写入数据的最终一致性。
  • 然而,尽管视图可串行化允许更多的调度,但是在实际系统中很难有效地强制执行。因为判断一个调度是否视图可串行化需要复杂的计算,涉及到对所有可能的读写操作进行分析。

冲突可串行化 (Conflict Serializability)

定义

  • 冲突可串行化是一种更严格的可串行化标准,只允许那些调度中的操作没有冲突的调度。

解释

  • 冲突可串行化的定义依赖于事务间操作的冲突。两个操作冲突,如果它们访问相同的数据,并且至少有一个是写操作。
  • 这种方法更容易实现,因为可以通过检测和避免冲突来确保事务的可串行化。

共同的限制

  • 无论是视图可串行化还是冲突可串行化,都不能涵盖所有你可能认为是“可串行化”的调度。这是因为这两种定义都没有理解操作或数据的实际含义。

In practice, Conflict Serializability is what systems support because it can be enforced efficiently.

To allow more concurrency, some special cases get handled separately at the application level.

alt text

TRANSACTION DURABILITY

All the changes of committed transactions should be persistent.

  • No torn(中断) updates.
  • No changes from failed transactions.

The DBMS can use either logging or shadow paging to ensure that all changes are durable.