Postgres repeatable read nuances

February 6, 2023

The REPEATABLE READ isolation level in Postgres is a great fit for a lot of applications. It strikes a balance between the performance overhead required for SERIALIZABLE and the data guarantees that the default level, READ COMMITED, lacks. It’s not a silver bullet though; let’s explore some nuances.

Our testbed

For the examples below, we’ll be working with a single table that has two columns: ID and V. We’ll skip using the table name in our queries for simplicity.

The pre-existing data is two rows: ID: 1, V: 'a' and ID: 2, V: 'x'.

When does the snapshot boundary start?

Let’s start with the more intuitive case:

Transaction 1 Transaction 2
0 BEGIN TRANSACTION
1 SELECT V WHERE ID = 1
2 BEGIN TRANSACTION
3 UPDATE V = 'b' WHERE ID = 1
4 COMMIT
5 COMMIT
What V did we get back from Transaction 1?
Transaction 1 Transaction 2
0 BEGIN TRANSACTION
1 BEGIN TRANSACTION
2 UPDATE V = 'b' WHERE ID = 1
3 COMMIT
4 SELECT V WHERE ID = 1
5 COMMIT
What V did we get back from Transaction 1?
Transaction 1 Transaction 2
0 BEGIN TRANSACTION
1 SELECT current_timestamp
2 BEGIN TRANSACTION
3 UPDATE V = 'b' WHERE ID = 1
4 COMMIT
5 SELECT V WHERE ID = 1
6 COMMIT
What V did we get back from Transaction 1?
Transaction 1 Transaction 2
0 BEGIN TRANSACTION
1 SELECT V WHERE ID = 1
2 BEGIN TRANSACTION
3 UPDATE V = 'b' WHERE ID = 1
4 COMMIT
5 SELECT V WHERE ID = 1
6 COMMIT
What V did we get back in Step 5 in Transaction 1?

What happens if we update data concurrently?

We’ve now learned that Postgres doesn’t establish a transaction’s snapshot when the transaction is started - instead, it marks it when the first non-transaction-control statement has executed. Let’s see how that affects concurrent updates.

Transaction 1 Transaction 2
0 BEGIN TRANSACTION
1 BEGIN TRANSACTION
2 UPDATE V = 'b' WHERE ID = 1
3 COMMIT
4 UPDATE V = 'c' WHERE ID = 1
5 COMMIT
Did Transaction 2 update V to 'c'?

What happens if we wait to commit T1 until the update in T2 has happened?

Transaction 1 Transaction 2
0 BEGIN TRANSACTION
1 BEGIN TRANSACTION
2 UPDATE V = 'b' WHERE ID = 1
3 UPDATE V = 'c' WHERE ID = 1
4 COMMIT
5 COMMIT
What happened?

Does the same hold true if we update different rows?

Transaction 1 Transaction 2
0 BEGIN TRANSACTION
1 BEGIN TRANSACTION
2 UPDATE V = 'b' WHERE ID = 1
3 UPDATE V = 'y' WHERE ID = 2
4 COMMIT
5 COMMIT
What happened?

SELECT FOR UPDATE

Sometimes, you might want to query for data, do some computation on that data, and then write a different row. As we’ve just seen, Postgres only locks rows that you modify. There’s a special modifier you can add to SELECT queries to trigger blocks on rows that aren’t being updated in a given transaction. Let’s compare the difference:

Transaction 1 Transaction 2
0 BEGIN TRANSACTION
1 BEGIN TRANSACTION
2 SELECT V WHERE ID = 2
3 SELECT V WHERE ID = 1
4 UPDATE V = 'b' WHERE ID = 1
5 UPDATE V = 'y' WHERE ID = 2
6 COMMIT
7 COMMIT
What happened?
Transaction 1 Transaction 2
0 BEGIN TRANSACTION
1 BEGIN TRANSACTION
2 SELECT V WHERE ID = 2 FOR UPDATE
3 SELECT V WHERE ID = 1 FOR UPDATE
4 UPDATE V = 'b' WHERE ID = 1
5 UPDATE V = 'y' WHERE ID = 2
6 COMMIT
7 COMMIT
What happened?

Are inserts special?

As we just saw, Postgres locks per row. Sometimes however, we may want to lock a batch of rows meeting a condition. Let’s add a new column to our testbed for a “group id” and call it GID. Let’s also set our current two rows to have GID = 1 and allow ID to be autogenerated on inserts if not provided.

We’re wanting to implement a feature such that any GID can only have a maximum of three rows, so we’ll leverage the FOR UPDATE we just learned about:

Transaction 1 Transaction 2
0 BEGIN TRANSACTION
1 BEGIN TRANSACTION
2 SELECT current_timestamp
3 SELECT * WHERE GID = 1 FOR UPDATE
4 Verify count < 3
5 INSERT (GID,V) VALUES (1,'g')
6 COMMIT
7 SELECT * WHERE GID = 1 FOR UPDATE
8 Verify count < 3
9 INSERT (GID,V) VALUES (1,'h')
10 COMMIT
What happened?

The above problem is actually a bit tricky to solve. You might reach for doing it in a single query, e.g.

INSERT INTO x (gid, v)
SELECT 1, 'g'
WHERE (SELECT COUNT(*) FROM X WHERE gid = 1) < 3;

but this still won’t enforce the count limit under REPEATABLE READ (it would work under SERIALIZABLE as that would protect the entire table). The only safe way I’m aware of to achieve this under REPEATABLE READ is to use advisory locks, but that’s a topic for another time.