pg_repack
principle
pg_repack 1.5.0 – Reorganize tables in PostgreSQL databases with minimal locks
https://github.com/reorg/pg_repack
- create a log table to record changes made to the original table
- add a trigger onto the original table, logging INSERTs, UPDATEs and DELETEs into our log table
- create a new table containing all the rows in the old table
- build indexes on this new table
- apply all changes which have accrued in the log table to the new table
- swap the tables, including indexes and toast tables, using the system catalogs
- drop the original table
The basic idea is
- transport the existent data with a old snapshot
- record the incremental data into a table and replay the record
And this idea is so general that almost all online-ddl ability in PG(supported in extensions) takes the way.
details
Although the idea is so simple, there are many problems to challenge. Such as how to ensure there are no duplicated or lost data in both existent part and incremental part. So code-level details are shown below:
All the 7 step are manipulated through 2 connections: See function repack_one_table for detail:
-
create a log table to record changes made to the original table
-
add a trigger onto the original table, logging INSERTs, UPDATEs and DELETEs into our log table
conn1starts a transaction and acquire an advisory lock to prevent potential conflict with other repack process- get the
AccessiveExclusivelock to the original table,tblfor example - create the trigger on
tbland the correspondinglog-tablewhere the incremental changes will be stored. - (Just comments: If we release the exclusive lock here, we may not able to acquire a shared lock later if another process has gotten a exclusive lock in the interval, which can cause that we have no way to continue or revert what we have done. So we must acquire a lock during the whole process. 👌)
conn2tries to acquire theAccessiveSharedlock ontbl. Since theconn1’s transaction hasn’t finished, this lock acquisition will be blocked.conn1kill all connections that tries to perform a DDL operation, whose character is waiting forAccessiveLock. Then,conn1commits.- Now
conn2get theAccessiveSharedlock ontbl, which can ensure that no other dll operation ontbl✌️
-
create a new table containing all the rows in the old table
-
conn1begins a serializable transaction( repeatable read, at least) -
conn1get thevxidsof current active transactions -
conn1delete all data intblwith the current snapshot (This means we don’t perform a “truncate” operation ). This is a very skillful technique:-
The table shows the secret:
tbl log table visible existent data empty invisible incremental data incremental data -
All existent data is visible in
tblthrough the current snapshot -
All incremental data is invisible in
log tableandtbl(The latter one isn’t important -
So there is no lost or duplicated data
-
-
conn1copies all data intblto a temp tabletbl-tmpfor example -
conn1commits
-
-
build indexes on this new table. (I don’t care this.)
-
apply all changes which have accrued in the log table to the new table
conn1apply at most 1000 records inlog-table, until- the remaining records are few. AND
- All transactions in
vxidsfinish. This operation is to keep the ISOLATION, but it still has some accidence. #TODO
- (Just comments: Now we believe that there is few records in
log-table.) conn2acquire theAccessiveExclusivelock. Note that no other process can do thatconn2apply all data inlog-table
-
swap the tables, including indexes and toast tables, using the system catalogs
conn2swapsrelfilenodebetweentbl-tmpandtblconn2commits
-
drop the original table
conn1drop the currenttbl-tmpconn1analyze the currenttblconn1release the advisory lock
written by mobilephone724