Chapter 9 Index Concurrency Control¶
We (mostly) assumed all the data structures that we have discussed so far are single-threaded.
But a DBMS needs to allow multiple threads to safely access data structures to take advantage of additional CPU cores and hide disk I/O stalls.
CONCURRENCY CONTROL¶
A concurrency control protocol is the method that the DBMS uses to ensure “correct” results for concurrent operations on a shared object.
A protocol's correctness criteria can vary:
- Logical Correctness: Can a thread see the data that it is supposed to see?
- Physical Correctness: Is the internal representation of the object sound?(对象的内部表现形式是否合理?)
LOCKS VS. LATCHES¶
LATCH MODES¶
Read Mode
- Multiple threads can read the same object at the same time.
- A thread can acquire the read latch if another thread has it in read mode.
Write Mode
- Only one thread can access the object.
- A thread cannot acquire a write latch if another thread has it in any mode.
LATCH IMPLEMENTATIONS¶
Approach #1: Blocking OS Mutex
- Simple to use
- Non-scalable (about 25ns per lock/unlock invocation)
- Example: std::mutex --> pthread_mutex --> futex
Approach #2: Reader-Writer Latches
- Allows for concurrent readers. Must manage read/write queues to avoid starvation.
- Can be implemented on top of spinlocks.
- Example: std::shared_mutex pthread_rwlock
A reader comes, and the readCount is set to 1.
Another reader comes, and this time we obey the rule that "2 readers can read one file at the same time", so the readCount is set to 2.
A writer comes, since "A write-thread cannot acquire a write latch if another thread has it in any mode", the writer is blocked, and the writeCount is set to 1 (right).
Another reader comes, since it comes after the writer above, it is blocked and will start after writer has been executed.
HASH TABLE LATCHING¶
Easy to support concurrent access due to the limited ways threads access the data structure.
- All threads move in the same direction and only access single page/slot at a time.
- Deadlocks are not possible.
To resize the table, take a global write latch on the entire table (e.g., in the header page).
Approach #1: Page Latches
- Each page has its own reader-writer latch that protects entire contents.
- Threads acquire either a read or write latch before they access a page.
Approach #2: Slot Latches
- Each slot has its own latch.
- Can use a single-mode latch to reduce meta-data and computational overhead.
B+TREE CONCURRENCY CONTROL¶
We want to allow multiple threads to read and update a B+Tree at the same time.
We need to protect against two types of problems:
- Threads trying to modify the contents of a node at the same time.
- One thread traversing the tree while another thread splits/merges nodes.
LATCH CRABBING / COUPLING¶
Protocol to allow multiple threads to access/modify B+Tree at the same time.
- Get latch for parent
- Get latch for child
- Release latch for parent if “safe”
A safe node is one that will not split or merge when updated.
- Not full (on insertion)
- More than half-full (on deletion)
THEORY¶
Find:
Start at root and traverse down the tree
- Acquire R latch on child.
- Then unlatch parent.
- Repeat until we reach the leaf node.
Insert/Delete:
Start at root and go down, obtaining W latches as needed. Once child is latched, check if it is safe:
- If child is safe, release all latches on ancestors
EXAMPLE¶
FIND 38
Firstly, R-latch A so that no bothersome threads can change A.
Secondly, R-latch B so that no bothersome threads can change B.
And unlatch A since A is safe.
Same reason, R-latch D and unlatch B.
Finally, R-latch H and unlatch D.
And we get 38 in H (leaf node).
DELETE 38
Firstly, W-latch A so that no bothersome threads can change A.
Secondly, W-latch B so that no bothersome threads can change B.
And we cannot release A since B is not "safe".
We know that D will not merge with C, so it is safe to release latches on A and B.
We release the latches on A and B.
Finally, we delete 38 in H.
INSERT 45 (safely)
INSERT 25 (hard)
Firstly, W-latch A so that no bothersome threads can change A.
Secondly, W-latch B so that no bothersome threads can change B.
Since B has enough space to resist any changes below, it is safe now. So we can unlatch A now.
Since C has enough space to resist any changes below, it is safe now. So we can unlatch B now.
We need to split F, so we need to hold the latch on its parent node.
So R-Latch C is remained this session.
We perform B+ tree transformation on node F and adjust C correspondingly
BETTER LATCHING ALGORITHM¶
Observation
What was the first step that all the update examples did on the B+Tree?
"Get W/R-latch on root" !
Taking a write latch on the root every time becomes a bottleneck with higher concurrency.
THEORY¶
What we suppose based on the fact that:
- Most modifications to a B+ Tree will not require a split or merge.
- Instead of assuming that there will be a split/merge, optimistically traverse the tree using read latches.
- If you guess wrong, repeat traversal with the pessimistic algorithm.
HOW TO EXECUTE¶
- Search:
- Same as before.
- Insert/Delete:
- Set latches as if for search, get to leaf, and set W latch on leaf.
- If leaf is not safe, release all latches, and restart thread using previous insert/delete protocol with write latches.
This approach optimistically assumes that only leaf node will be modified; if not, R latches set on the first pass to leaf are wasteful.
LEAF NODE SCAN¶
The threads in all the examples so far have acquired latches in a "top-down" manner.
- A thread can only acquire a latch from a node that is below its current node.
- If the desired latch is unavailable, the thread must wait until it becomes available.
But what if threads want to move from one leaf node to another leaf node?
Latches do not support deadlock detection or avoidance. The only way we can deal with this problem is through coding discipline (go or die).
The leaf node sibling latch acquisition protocol must support a “no-wait” mode.
The DBMS's data structures must cope with failed latch acquisitions.
CONCLUSION¶
Making a data structure thread-safe is notoriously difficult in practice.
We focused on B+Trees, but the same high-level techniques are applicable to other data structures.