Revisiting pg_dump data ordering
Many, MANY years ago - more than I would like to admit - I was one of the people answering this particular Stack Overflow question about the order of data in the output of pg_dump.
In my answer I was pointing out then-recent patch to pg_dump that was providing the --ordered
param that would cause
dumping of the table data in a deterministic order, instead of reflecting the
current on-disk structure of the table.
That patch did not gain enough traction and was not accepted into the code base. And as far as I can see, no addition of this kind was made since then.
I'm revisiting this topic today by accident. But, by a funny coincidence,
tomorrow, to the day, is the anniversary of that Stack Overflow answer, and, by
another coincidence, I could have recalled this just a few weeks ago, when I
was testing PostgreSQL instance migration (and comparing two pg_dump
outputs
- dealing with this exact ordering issue).
So just as a quick reminder and a demonstration - if we run this...
psql -tAc "select version()"
# Enter test data
psql -c "drop table if exists test; create table test (a int); insert into test (a) values (1), (2)"
# Compare two dumps
echo "-- diff 1 --"
diff <(pg_dump) <(pg_dump)
# Compare two dumps after updating the first tuple
BEFORE="$(pg_dump)"
psql -c "update test set a = 1 where a = 1"
echo "-- diff 2 --"
diff -c1 <(echo "$BEFORE") <(pg_dump)
... the result will be:
PostgreSQL 17.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit
DROP TABLE
CREATE TABLE
INSERT 0 2
-- diff 1 --
UPDATE 1
-- diff 2 --
*** /dev/fd/63 Sat Aug 9 13:57:57 2025
--- /dev/fd/62 Sat Aug 9 13:57:57 2025
***************
*** 39,42 ****
COPY public.test (a) FROM stdin;
- 1
2
\.
--- 39,42 ----
COPY public.test (a) FROM stdin;
2
+ 1
\.
***************
*** 47 ****
--- 47,48 ----
--
+
The UPDATE of the row with "1" caused its tuple to now be located after the one
with row containing "2", and our pg_dump
output changed accordingly.
As a side note: here we also see that an UPDATE
that does not change the actual
row values is still generating a new physical tuple. This is worth knowing on
its own.
For my recent use case, comparing pg_dump
outputs was just an extra
sanity check of the migration process, so simply sorting the output files
themselves, and getting empty diff as a result, was good enough.