The REPEATABLE READ
isolation level in Postgres is a great fit for a lot of applications. It strikes a balance between the performance overhead required for SERIALIZABLE
and the data guarantees that the default level, READ COMMITED
, lacks. It’s not a silver bullet though; let’s explore some nuances.
Our testbed
For the examples below, we’ll be working with a single table that has two columns: ID
and V
. We’ll skip using the table name in our queries for simplicity.
The pre-existing data is two rows: ID: 1, V: 'a'
and ID: 2, V: 'x'
.
When does the snapshot boundary start?
Let’s start with the more intuitive case:
Transaction 1 | Transaction 2 | |
---|---|---|
0 | BEGIN TRANSACTION | |
1 | SELECT V WHERE ID = 1 | |
2 | BEGIN TRANSACTION | |
3 | UPDATE V = 'b' WHERE ID = 1 | |
4 | COMMIT | |
5 | COMMIT |
Transaction 1 | Transaction 2 | |
---|---|---|
0 | BEGIN TRANSACTION | |
1 | BEGIN TRANSACTION | |
2 | UPDATE V = 'b' WHERE ID = 1 | |
3 | COMMIT | |
4 | SELECT V WHERE ID = 1 | |
5 | COMMIT |
Transaction 1 | Transaction 2 | |
---|---|---|
0 | BEGIN TRANSACTION | |
1 | SELECT current_timestamp | |
2 | BEGIN TRANSACTION | |
3 | UPDATE V = 'b' WHERE ID = 1 | |
4 | COMMIT | |
5 | SELECT V WHERE ID = 1 | |
6 | COMMIT |
Transaction 1 | Transaction 2 | |
---|---|---|
0 | BEGIN TRANSACTION | |
1 | SELECT V WHERE ID = 1 | |
2 | BEGIN TRANSACTION | |
3 | UPDATE V = 'b' WHERE ID = 1 | |
4 | COMMIT | |
5 | SELECT V WHERE ID = 1 | |
6 | COMMIT |
What happens if we update data concurrently?
We’ve now learned that Postgres doesn’t establish a transaction’s snapshot when the transaction is started - instead, it marks it when the first non-transaction-control statement has executed. Let’s see how that affects concurrent updates.
Transaction 1 | Transaction 2 | |
---|---|---|
0 | BEGIN TRANSACTION | |
1 | BEGIN TRANSACTION | |
2 | UPDATE V = 'b' WHERE ID = 1 | |
3 | COMMIT | |
4 | UPDATE V = 'c' WHERE ID = 1 | |
5 | COMMIT |
What happens if we wait to commit T1 until the update in T2 has happened?
Transaction 1 | Transaction 2 | |
---|---|---|
0 | BEGIN TRANSACTION | |
1 | BEGIN TRANSACTION | |
2 | UPDATE V = 'b' WHERE ID = 1 | |
3 | UPDATE V = 'c' WHERE ID = 1 | |
4 | COMMIT | |
5 | COMMIT |
Does the same hold true if we update different rows?
Transaction 1 | Transaction 2 | |
---|---|---|
0 | BEGIN TRANSACTION | |
1 | BEGIN TRANSACTION | |
2 | UPDATE V = 'b' WHERE ID = 1 | |
3 | UPDATE V = 'y' WHERE ID = 2 | |
4 | COMMIT | |
5 | COMMIT |
SELECT FOR UPDATE
Sometimes, you might want to query for data, do some computation on that data, and then write a different row. As we’ve just seen, Postgres only locks rows that you modify.
There’s a special modifier you can add to SELECT
queries to trigger blocks on rows that aren’t
being updated in a given transaction. Let’s compare the difference:
Transaction 1 | Transaction 2 | |
---|---|---|
0 | BEGIN TRANSACTION | |
1 | BEGIN TRANSACTION | |
2 | SELECT V WHERE ID = 2 | |
3 | SELECT V WHERE ID = 1 | |
4 | UPDATE V = 'b' WHERE ID = 1 | |
5 | UPDATE V = 'y' WHERE ID = 2 | |
6 | COMMIT | |
7 | COMMIT |
Transaction 1 | Transaction 2 | |
---|---|---|
0 | BEGIN TRANSACTION | |
1 | BEGIN TRANSACTION | |
2 | SELECT V WHERE ID = 2 FOR UPDATE | |
3 | SELECT V WHERE ID = 1 FOR UPDATE | |
4 | UPDATE V = 'b' WHERE ID = 1 | |
5 | UPDATE V = 'y' WHERE ID = 2 | |
6 | COMMIT | |
7 | COMMIT |
Are inserts special?
As we just saw, Postgres locks per row. Sometimes however, we may want to lock a batch of rows meeting a condition. Let’s add a new column to our testbed for a “group id” and call it GID
. Let’s also set our current two rows to have GID = 1
and allow ID
to be autogenerated on inserts if not provided.
We’re wanting to implement a feature such that any GID can only have a maximum of three rows, so we’ll leverage the FOR UPDATE
we just learned about:
Transaction 1 | Transaction 2 | |
---|---|---|
0 | BEGIN TRANSACTION | |
1 | BEGIN TRANSACTION | |
2 | SELECT current_timestamp | |
3 | SELECT * WHERE GID = 1 FOR UPDATE | |
4 | Verify count < 3 | |
5 | INSERT (GID,V) VALUES (1,'g') | |
6 | COMMIT | |
7 | SELECT * WHERE GID = 1 FOR UPDATE | |
8 | Verify count < 3 | |
9 | INSERT (GID,V) VALUES (1,'h') | |
10 | COMMIT |
The above problem is actually a bit tricky to solve. You might reach for doing it in a single query, e.g.
INSERT INTO x (gid, v)
SELECT 1, 'g'
WHERE (SELECT COUNT(*) FROM X WHERE gid = 1) < 3;
but this still won’t enforce the count limit under REPEATABLE READ
(it would work under SERIALIZABLE
as that would protect the entire table). The only safe way I’m aware of to achieve this under REPEATABLE READ
is to use advisory locks, but that’s a topic for another time.