pg_repack
principle
pg_repack 1.5.0 – Reorganize tables in PostgreSQL databases with minimal locks
https://github.com/reorg/pg_repack
- create a log table to record changes made to the original table
- add a trigger onto the original table, logging INSERTs, UPDATEs and DELETEs into our log table
- create a new table containing all the rows in the old table
- build indexes on this new table
- apply all changes which have accrued in the log table to the new table
- swap the tables, including indexes and toast tables, using the system catalogs
- drop the original table
The basic idea is
- transport the existent data with a old snapshot
- record the incremental data into a table and replay the record
And this idea is so general that almost all online-ddl ability in PG(supported in extensions) takes the way.
details
Although the idea is so simple, there are many problems to challenge. Such as how to ensure there are no duplicated or lost data in both existent part and incremental part. So code-level details are shown below:
All the 7 step are manipulated through 2 connections: See function repack_one_table
for detail:
-
create a log table to record changes made to the original table
-
add a trigger onto the original table, logging INSERTs, UPDATEs and DELETEs into our log table
conn1
starts a transaction and acquire an advisory lock to prevent potential conflict with other repack process- get the
AccessiveExclusive
lock to the original table,tbl
for example - create the trigger on
tbl
and the correspondinglog-table
where the incremental changes will be stored. - (Just comments: If we release the exclusive lock here, we may not able to acquire a shared lock later if another process has gotten a exclusive lock in the interval, which can cause that we have no way to continue or revert what we have done. So we must acquire a lock during the whole process. 👌)
conn2
tries to acquire theAccessiveShared
lock ontbl
. Since theconn1
’s transaction hasn’t finished, this lock acquisition will be blocked.conn1
kill all connections that tries to perform a DDL operation, whose character is waiting forAccessiveLock
. Then,conn1
commits.- Now
conn2
get theAccessiveShared
lock ontbl
, which can ensure that no other dll operation ontbl
✌️
-
create a new table containing all the rows in the old table
-
conn1
begins a serializable transaction( repeatable read, at least) -
conn1
get thevxids
of current active transactions -
conn1
delete all data intbl
with the current snapshot (This means we don’t perform a “truncate” operation ). This is a very skillful technique:-
The table shows the secret:
tbl log table visible existent data empty invisible incremental data incremental data -
All existent data is visible in
tbl
through the current snapshot -
All incremental data is invisible in
log table
andtbl
(The latter one isn’t important -
So there is no lost or duplicated data
-
-
conn1
copies all data intbl
to a temp tabletbl-tmp
for example -
conn1
commits
-
-
build indexes on this new table. (I don’t care this.)
-
apply all changes which have accrued in the log table to the new table
conn1
apply at most 1000 records inlog-table
, until- the remaining records are few. AND
- All transactions in
vxids
finish. This operation is to keep the ISOLATION, but it still has some accidence. #TODO
- (Just comments: Now we believe that there is few records in
log-table
.) conn2
acquire theAccessiveExclusive
lock. Note that no other process can do thatconn2
apply all data inlog-table
-
swap the tables, including indexes and toast tables, using the system catalogs
conn2
swapsrelfilenode
betweentbl-tmp
andtbl
conn2
commits
-
drop the original table
conn1
drop the currenttbl-tmp
conn1
analyze the currenttbl
conn1
release the advisory lock