#postgresql #pg_dump

Revisiting pg_dump data ordering

Many, MANY years ago - more than I would like to admit - I was one of the people answering this particular Stack Overflow question about the order of data in the output of pg_dump.

In my answer I was pointing out then-recent patch to pg_dump that was providing the --ordered param that would cause dumping of the table data in a deterministic order, instead of reflecting the current on-disk structure of the table.

That patch did not gain enough traction and was not accepted into the code base. And as far as I can see, no addition of this kind was made since then.

I'm revisiting this topic today by accident. But, by a funny coincidence, tomorrow, to the day, is the anniversary of that Stack Overflow answer, and, by another coincidence, I could have recalled this just a few weeks ago, when I was testing PostgreSQL instance migration (and comparing two pg_dump outputs - dealing with this exact ordering issue).

So just as a quick reminder and a demonstration - if we run this...

psql -tAc "select version()"

# Enter test data
psql -c "drop table if exists test; create table test (a int); insert into test (a) values (1), (2)"

# Compare two dumps
echo "-- diff 1 --"
diff <(pg_dump) <(pg_dump)

# Compare two dumps after updating the first tuple
BEFORE="$(pg_dump)"
psql -c "update test set a = 1 where a = 1"
echo "-- diff 2 --"
diff -c1  <(echo "$BEFORE") <(pg_dump)

... the result will be:

PostgreSQL 17.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit
DROP TABLE
CREATE TABLE
INSERT 0 2
-- diff 1 --
UPDATE 1
-- diff 2 --
*** /dev/fd/63  Sat Aug  9 13:57:57 2025
--- /dev/fd/62  Sat Aug  9 13:57:57 2025
***************
*** 39,42 ****
  COPY public.test (a) FROM stdin;
- 1
  2
  \.
--- 39,42 ----
  COPY public.test (a) FROM stdin;
  2
+ 1
  \.
***************
*** 47 ****
--- 47,48 ----
  --
+

The UPDATE of the row with "1" caused its tuple to now be located after the one with row containing "2", and our pg_dump output changed accordingly.

As a side note: here we also see that an UPDATE that does not change the actual row values is still generating a new physical tuple. This is worth knowing on its own.

For my recent use case, comparing pg_dump outputs was just an extra sanity check of the migration process, so simply sorting the output files themselves, and getting empty diff as a result, was good enough.