#postgresql #timezones #sql

PostgreSQL does not store time zone in `timestamp with time zone` data type

PostgreSQL has two variants of a type called "timestamp": timestamp with time zone and timestamp without time zone. At a first glance, it looks as if those two data types must differ in terms of what and how is being stored, but that's not exactly the case.

In PostgreSQL, with time zone variant doesn't physically store any time zone information. This fact is a key to understand how those two types do differ and what are their use cases. But that's for another story - here we focus on storage format aspect.

Documentation on Date/Time types has this to say about what we discuss here: both types have storge size of 8 bytes and for timestamp with time zone "the value is stored internally as UTC, and the originally stated or assumed time zone is not retained."

As this fact of not storing time zone anywhere is somewhat burried within the docs and is quite confusing, it's worth taking a look at how this is actually true. In fact, we will see not only how time zone information is lost at INSERT time, but also that instances of both variants of timestamp have exactly same physical representation inside the data page.

First, we will load a single row containing one without time zone and one with time zone timestamp value:

$ psql <<EOF
-- Create table
drop table if exists test;
create table test (
    tnotz   timestamp without time zone,
    ttz     timestamp with time zone
);

-- Insert actual data
insert into test (tnotz, ttz) values (
    '2000-01-01 01:00:00',
    '2000-01-01 01:00:00' at time zone 'utc'
);

-- Flush WAL so that table data file is up to date
checkpoint;
EOF
DROP TABLE
CREATE TABLE
INSERT 0 1
CHECKPOINT

And now let's take a peek inside the table data file:

$ psql -tAc "select version()"
PostgreSQL 17.4 on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit

$ hexdump -C "$PGDATA/$(psql -tAc "SELECT pg_relation_filepath('test')")"
00000000  00 00 00 00 68 af b6 01  83 1d 00 00 1c 00 d8 1f  |....h...........|
00000010  00 20 04 20 00 00 00 00  d8 9f 50 00 00 00 00 00  |. . ......P.....|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001fd0  00 00 00 00 00 00 00 00  e8 02 00 00 00 00 00 00  |................|
00001fe0  00 00 00 00 00 00 00 00  01 00 02 00 00 08 18 00  |................|
00001ff0  00 a4 93 d6 00 00 00 00  00 a4 93 d6 00 00 00 00  |................|
00002000

First three rows above are the page header plus some of the subsequent empty space. After that we encounter our only table row. It starts with its own header and then we can see our two columns, both having exactly same physical representation on 8 bytes: 00 a4 93 d6 00 00 00 00. Both instances of this sequence are present, one after the other, on the last line of hexdump output.

Those 8 bytes represent the number of microseconds since Jan 1st 2000. Both timestamps that we've inserted earlier are pointing to exactly one hour past that moment, so what we see is simply the number of microseconds within one hour:

$ python -c "print(int.from_bytes(bytes.fromhex('00 a4 93 d6 00 00 00 00'), 'little') == 1 * 60 * 60 * 1_000_000)"
True

As you can see, both timestamps are stored in the same way - as the same epoch value. And the time zone is nowhere to be seen. The difference between those two data types lies in the read and write semantics - likely to be explored on some other occasion.