Hands-on with PostgreSQL Authorization, Part 2.5: How To Benchmark PostgreSQL Queries Well

There are different approaches to timing queries in PostgreSQL. We'll discuss each and the implications that come with them.

Greg Schaberg

,

Senior Infrastructure and Web

Programming Insights

May 18, 2022

As I was digging into writing the third article of my series on PostgreSQL Authorization (part 1, part 2), I realized that, in order to measure the performance impacts of different Row-Level Security policies, I first needed to understand how to accurately time queries. This article will be a short detour from the aforementioned series — we’ll explore and discuss different approaches to timing queries in PostgreSQL and mention important implications to be aware of when using these different techniques.

I have a few goals and disclaimers to outline before we dive in:

I want to measure time taken by a single SELECT query. I’m not looking to measure system throughput or real-world load scenarios (i.e. I won’t consider update/insert/delete performance, locking, contention, etc.).
I want to measure overall query duration — not just parsing or planning or execution.
I would prefer to have percentile or histogram results, not just an average.
I’m aiming to learn how to compare one query to another query to determine which is faster, so whatever tool we use for timing queries shouldn’t introduce systematic error that might mislead us about which query is faster. This partly comes down to testing methodology, which we’ll discuss in more detail soon.

Hands-On With an Example Schema

Of course, we need to actually try things out to measure performance! You can follow along (and let me know if you see different results!) using Docker and the repo at https://gitlab.com/tangram-vision/oss/tangram-visions-blog/-/tree/main/2022.05.17_HowToBenchmarkPostgreSQLQueriesWell

As described in the repo's README, you can run the command below, which uses the official Postgres Docker image to run a PostgreSQL database locally. The first volume mount will load the schema.sql file, which will create a simple table and fill it with 10 million rows. The -c config commands to postgres will make the pg_stat_statements extension available.

docker run --name=postgres \
    --rm \
    --volume=$(pwd)/schema.sql:/docker-entrypoint-initdb.d/schema.sql \<br> &nbsp; &nbsp;--volume=$(pwd):/repo \
    --env=PSQLRC=/repo/.psqlrc \
    --env=POSTGRES_PASSWORD=foo \
    postgres:latest \
    -c shared_preload_libraries='pg_stat_statements' \
    -c pg_stat_statements.track_planning

To open a psql prompt in that Docker container, run the following in another terminal:

docker exec --interactive --tty \
    --user=postgres \
    --workdir=/repo \
    postgres psql --username

General Benchmarking Concerns

When measuring database/query performance, there are many things that can introduce systematic error or produce low-confidence results. Let’s talk about some!

If possible, avoid connections and transferring data over the network. Even traffic on the loopback interface must travel through the network stack and suffer its associated copying and context switches, which will add noise to measurements. Prefer a UNIX domain socket, which is faster/simpler than TCP.
Database performance depends heavily on caching. Avoid comparing cold caches and cache misses against warm caches and cache hits. Run the query a few times before measuring runtime or throw away a few measurement outliers or use a percentile statistic rather than a mean or maximum. You can enable the track_io_timing setting in PostgreSQL and use EXPLAIN with the BUFFERS option to see if a query is reading from disk.
On your test machine, limit background activity such as open applications, heavy cronjobs, or antivirus scans. I ran a ~1.5 second query 100 times on my machine with background apps (slack, firefox, chrome, and vscode) open and with them closed — see the histogram below for a comparison, which shows that having extra apps open added considerable noise to query durations.

Use large enough data that queries take some time to finish. Also, run tests many times. Comparing queries that take <1 ms by running each of them 5 times will not give conclusive results — the measurements will be dominated by noise or happenstance error.
Make sure the query planner has up-to-date statistics by running the ANALYZE command before running tests.
If your tests involve deleting rows, be consistent about vacuuming, or removing deleted rows from the table. If a table is auto-vacuumed in the middle of a test, that test could be thrown off significantly by the extra I/O caused by vacuuming, and all subsequent tests will operate on a database table with a different disk layout and size.

Query Timing Approaches

There are a lot of different ways to time queries with PostgreSQL! I’ve picked a few different tools and approaches to demonstrate and discuss.

In the rest of this article, I’ll test different methods of timing queries. For all of these tests, I’ll be running 2 queries on a table called “artists”:

SELECT * FROM artists
SELECT count(1) FROM artists

The artists table contains 10 million rows. Each row is a 4-byte integer primary key column and a 13-byte text column (timings seem similar whether the text column uses EXTENDED or PLAIN storage). I run each query 5 times to warm the cache (ignoring the results) and then 100 times for the actual test.

psql \timing

A common first foray into timing queries is using psql and its \timing option. Here’s what it looks like:

-- Query output is not shown for brevity
=# \timing
Timing is on.
=# SELECT * FROM artists;
Time: 1550.342 ms (00:01.550)
=# SELECT count(1) FROM artists;
Time: 175.526

Simple! But note that the \timing option is a psql option and runs the timer in psql. That means the output time includes network latency (from the PostgreSQL server process to the psql client), which can vary a lot and throw off our comparisons. Network latency and speed are important concerns in real-world scenarios, but we don’t want spurious network-related noise throwing off our comparison of which versions of a query execute faster in PostgreSQL. If you’re connecting over a UNIX domain socket (as we are with the Docker approach described above), then network noise/overhead is less of a concern. Nevertheless, we’ll generally prefer other approaches in this article, most of which measure query duration server-side.

EXPLAIN ANALYZE

EXPLAIN is a command that explains how PostgreSQL will execute a query. It has lots of useful options, such as:

ANALYZE — actually runs the query and reports some stats
SETTINGS — shows non-default settings that affect the query
BUFFERS — shows caching info such as hits and misses and how much data involved disk I/O
TIMING — shows time spent in each step of the query

There’s a lot written about EXPLAIN around the internet, so I won’t rehash the details of the command. If you want to learn more, check out the docs about the command and using EXPLAIN and depesz’s series on understanding EXPLAIN output.

For our purposes, we’ll use EXPLAIN ANALYZE to run the query and tell us how long it took.

=# EXPLAIN ANALYZE SELECT * FROM artists;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Seq Scan on artists  (cost=0.00..163695.00 rows=10000000 width=17) (actual time=0.044..486.527 rows=10000000 loops=1)
 Planning Time: 0.079 ms
 Execution Time: 705.450 ms
(3 rows)

=# EXPLAIN ANALYZE SELECT count(1) FROM artists;
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=116778.55..116778.56 rows=1 width=8) (actual time=298.815..302.724 rows=1 loops=1)
   ->  Gather  (cost=116778.33..116778.54 rows=2 width=8) (actual time=298.721..302.716 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=115778.33..115778.34 rows=1 width=8) (actual time=280.831..280.832 rows=1 loops=3)
               ->  Parallel Seq Scan on artists  (cost=0.00..105361.67 rows=4166667 width=0) (actual time=0.024..162.195 rows=3333333 loops=3)
 Planning Time: 0.073 ms
 JIT:
   Functions: 8
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.866 ms, Inlining 0.000 ms, Optimization 0.337 ms, Emission 5.465 ms, Total 6.668 ms
 Execution Time: 303.638 ms
(12 rows)

Lots going on here! We see that the command provides an “Execution Time” measurement toward the bottom, which we could use for benchmarking. Comparing these times to the ones we got in the previous section using \timing is curious:

Query	SELECT * FROM artists	SELECT count(1) FROM artists
psql \timing	1550.342	175.526
EXPLAIN ANALYZE	705.450	303.638

‍We know that psql’s \timing includes network-related overhead, so we could claim that as the reason that EXPLAIN ANALYZE is faster for the first query. But, EXPLAIN ANALYZE is slower for the second query! Clearly, network-related overhead isn’t the whole story.

You may notice that the EXPLAIN ANALYZE output includes “actual time” measurements for every node. How does it do that? We might expect that a node tracks the time when it starts running and when it stops, and then it reports the difference as the time in that node. But, PostgreSQL doesn’t work like that — it follows the Volcano model, meaning that each node produces a stream of rows and its parent node pulls rows as needed. That’s how a query like SELECT * FROM artists LIMIT 5 is efficient: the LIMIT node pulls 5 rows from its child node (a sequential scan of the table) and the query is complete — the sequential scan doesn’t scan the entire table before returning results to the LIMIT operation. This means that the “actual time” measurements for a node are not from timing the entire node (because the node doesn’t know when or if it might need to produce more rows), they’re from timing how long it takes the node to produce each row. So, clock_gettime is called before and after producing each row (i.e. twice per row) for each node in the query plan, which you can confirm with a command like sudo ltrace -p <postgres_pid> -c -e clock_gettime (thanks to this article for the tip). A flamegraph also does a good job of visualizing how much time is spent calling clock_gettime .

Perf flamegraph of executing `EXPLAIN ANALYZE SELECT * FROM artists` .

The PostgreSQL docs warn about this timing overhead, and even provide a tool for profiling the overhead of your particular system: pg_test_timing. Here are the results on my machine, which show relatively low timing overhead:

$ pg_test_timing 
Testing timing overhead for 3 seconds.
Per loop time including overhead: 23.40 ns
Histogram of timing durations:
  < us   % of total      count
     1     97.66239  125227359
     2      2.33714    2996796
     4      0.00021        272
     8      0.00014        180
    16      0.00011        146
    32      0.00000          6
    64      0.00000          0
   128      0.00000          1

Different time sources have different overhead, so EXPLAIN ANALYZE timing results can vary a lot across operating systems and virtualization setups. Luckily, we can get more consistent results by asking PostgreSQL to skip timing all the query nodes. To do so, we turn off the TIMING option:

=# EXPLAIN (ANALYZE, TIMING FALSE) SELECT * FROM artists;
-- Query plan output is not shown for brevity
 Execution Time: 385.025 ms

=# EXPLAIN (ANALYZE, TIMING FALSE) SELECT count(1) FROM artists;
-- Query plan output is not shown for brevity
 Execution Time: 198.411

Now our table comparing query timings looks as follows:

Query	SELECT * FROM artists	SELECT count(1) FROM artists
psql \timing	1550.342	175.526
EXPLAIN ANALYZE	705.450	303.638
EXPLAIN (ANALYZE, TIMING FALSE)	385.025	198.411

Disabling TIMING cut execution time of the first query almost in half, which matches what we would expect based on how much of the above flamegraph was taken up by calls to clock_gettime.

Logging Options

PostgreSQL provides a few different config options that log query duration. These are useful settings if you want to track or analyze query durations for an active database, but we can also use them to benchmark in a local development database. These approaches time the overall query (not each and every node in the query plan like EXPLAIN ANALYZE does!), so timing overhead should be minimal — we should get accurate timings that let us compare queries’ performance.

log_duration

To log the duration of all queries with the log_duration option, set log_duration = on in postgresql.conf or in your psql session. Example usage and output follows:

-- Reminder: Lines starting with =# are commands entered into a psql prompt.
=# SET log_duration = on;
=# SELECT * FROM artists;

-- In postgres process stdout/log:
2022-05-01 18:56:57.418 UTC [82] LOG:  duration: 1642.956 ms

=# SELECT count(1) FROM artists;

-- In postgres process stdout/log:
2022-05-01 18:57:14.470 UTC [82] LOG:  duration: 194.590

💡 If you want to see database log messages inline in your psql process (rather than having to look at the postgres process’s stdout/log, you can change the client_min_messages setting to LOG by running SET client_min_messages = LOG. However, all the log messages seem to be accompanied by lines like LOCATION: exec_simple_query, postgres.c:1297, so it’s a bit messier to read than the database log.

log_min_duration_statement

Use the log_min_duration_statement option to log queries that take longer than some threshold (say, 250 ms). To use this option, set log_min_duration_statement = 250 in postgresql.conf or in your psql session. By setting the threshold to zero, the duration of all queries will be logged (similar to using log_duration). However, log_duration only logs the query duration, whereas log_min_duration_statement will also log the text of the query.

=# SET log_min_duration_statement = 0;
=# SELECT * FROM artists;

-- In postgres process stdout/log:
2022-05-01 19:08:44.395 UTC [82] LOG:  duration: 1634.927 ms  statement: SELECT * FROM artists;

=# SELECT count(1) FROM artists;

-- In postgres process stdout/log:
2022-05-01 19:08:56.390 UTC [82] LOG:  duration: 191.741 ms  statement: SELECT count(1) FROM

To disable log_min_duration_statement, set it to -1.

log_statement_stats

If you want more detailed information about query duration, memory usage, page faults, context switches, etc. then try out the log_statement_stats options, of which there are a few variations:

log_statement_stats — Log stats for the query overall (can’t be enabled if any of the following options are enabled)
log_parser_stats — Log stats for the query parsing stage
log_planner_stats — Log stats for the query planning stage
log_executor_stats — Log stats for the query execution stage

The last 3 options can all be enabled together. Below is an example of using log_statement_stats.

=# SET log_statement_stats = on;
=# SELECT * FROM artists;

-- In postgres process stdout/log:
2022-05-01 19:47:05.003 UTC [79] LOG:  QUERY STATISTICS
2022-05-01 19:47:05.003 UTC [79] DETAIL:  ! system usage stats:
    !   1.447567 s user, 0.107958 s system, 1.573427 s elapsed
    !   [182.988793 s user, 16.027573 s system total]
    !   187488 kB max resident size
    !   0/0 [0/0] filesystem blocks in/out
    !   0/0 [10892/6708] page faults/reclaims, 0 [0] swaps
    !   0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
    !   4/145 [2251/18917] voluntary/involuntary context switches
2022-05-01 19:47:05.003 UTC [79] STATEMENT:  select * from artists;

=# SELECT count(1) FROM artists;

-- In postgres process stdout/log:
2022-05-01 19:54:26.231 UTC [79] LOG:  QUERY STATISTICS
2022-05-01 19:54:26.231 UTC [79] DETAIL:  ! system usage stats:
    !   0.154934 s user, 0.015904 s system, 0.175565 s elapsed
    !   [199.261753 s user, 17.894312 s system total]
    !   187516 kB max resident size
    !   0/0 [0/0] filesystem blocks in/out
    !   0/10 [10893/7761] page faults/reclaims, 0 [0] swaps
    !   0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
    !   5/28 [3132/20791] voluntary/involuntary context switches
2022-05-01 19:54:26.231 UTC [79] STATEMENT:  select count(1) from

Note that the stats inside square brackets are cumulative for the session.

pg_stat_statements

The pg_stat_statements module tracks statistics for all queries run by the PostgreSQL server. It must be loaded by setting shared_preload_libraries='pg_stat_statements' on the command line or in postgresql.conf. Then, run the SQL statement CREATE EXTENSION pg_stat_statements and you’ll have a pg_stat_statements view containing rows like the below:

=# SELECT * FROM pg_stat_statements WHERE query = 'SELECT * FROM artists';
-[ RECORD 1 ]-------+----------------------
userid              | 10
dbid                | 13757
toplevel            | t
queryid             | 2059596454762915818
query               | SELECT * FROM artists
plans               | 100
total_plan_time     | 4.6373750000000005
min_plan_time       | 0.030738
max_plan_time       | 0.07399
mean_plan_time      | 0.04637375000000001
stddev_plan_time    | 0.0066957018263584574
calls               | 100
total_exec_time     | 158575.761936
min_exec_time       | 1570.0188819999998
max_exec_time       | 1624.1295579999999
mean_exec_time      | 1585.7576193599996
stddev_exec_time    | 10.636442294647289
rows                | 1000000000
shared_blks_hit     | 292800
shared_blks_read    | 6076700
shared_blks_dirtied | 0
shared_blks_written | 0
local_blks_hit      | 0
local_blks_read     | 0
local_blks_dirtied  | 0
local_blks_written  | 0
temp_blks_read      | 0
temp_blks_written   | 0
blk_read_time       | 0
blk_write_time      | 0
wal_records         | 0
wal_fpi             | 0
wal_bytes           | 0

-- Irrelevant columns omitted from below output for brevity
=# SELECT * FROM pg_stat_statements WHERE query = 'SELECT count($1) FROM artists';
-[ RECORD 1 ]-------+------------------------------
query               | SELECT count($1) FROM artists
plans               | 100
total_plan_time     | 4.789295
min_plan_time       | 0.035157
max_plan_time       | 0.056928
mean_plan_time      | 0.04789295
stddev_plan_time    | 0.0031252439596773894
calls               | 100
total_exec_time     | 17308.772363999997
min_exec_time       | 170.66001599999998
max_exec_time       | 176.27587
mean_exec_time      | 173.08772363999998
stddev_exec_time    | 1.3255601605010174
rows                | 100

💡 Enabling track_io_timing will populate the blk_read_time and blk_write_time columns.

Similar to the logging options, pg_stat_statements is a handy tool to enable on an active database, so you can keep an eye on which queries run slowly and/or are called often.

For the purpose of benchmarking queries, we can warm the cache, call SELECT pg_stat_statements_reset() to reset the stats, call our query repeatedly, then see the resulting timing stats in the pg_stat_statements view. The view is convenient, because we don’t have to parse log data and calculate all the stats ourselves (which I’ve been doing in a spreadsheet for all the other timing approaches so far!). But, we can’t produce percentile data or histograms when we only have aggregated values.

pg_bench

PostgreSQL has a dedicated benchmarking tool called pg_bench, which we can configure to test specific queries against each other. First, we write scripts containing the queries we want pg_bench to run — this is simple in our case of testing a single query at a time:

-- scripts/pg_bench_script.sql
SELECT * FROM

-- scripts/pg_bench_count_script.sql
SELECT count(1) FROM

Then, we call the pg_bench binary (pgbench) that’s included with PostgreSQL, passing command-line arguments to log all transactions and to run 100 transactions. (As far as I can tell, pg_bench uses the word “transaction” to simply mean “one execution of the script” — it doesn’t wrap the script in a database transaction.)

-- We use psql’s \! command to execute the pgbench binary in a shell,
-- rather than running a new docker command.

=# \! pgbench -f scripts/pg_bench_script.sql --log --transactions=100 --username=postgres postgres
pgbench (14.1 (Debian 14.1-1.pgdg110+1))
starting vacuum...pgbench: error: ERROR:  relation "pgbench_branches" does not exist
pgbench: (ignoring this error and continuing anyway)
pgbench: error: ERROR:  relation "pgbench_tellers" does not exist
pgbench: (ignoring this error and continuing anyway)
pgbench: error: ERROR:  relation "pgbench_history" does not exist
pgbench: (ignoring this error and continuing anyway)
end.
transaction type: scripts/pg_bench_script.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 100
number of transactions actually processed: 100/100
latency average = 1724.637 ms
initial connection time = 2.352 ms
tps = 0.579828 (without initial connection time)

=# \! pgbench -f scripts/pg_bench_count_script.sql --log --transactions=100 --username=postgres postgres
pgbench (14.1 (Debian 14.1-1.pgdg110+1))
-- Extra output omitted for brevity
latency average = 178.929 ms
initial connection time = 2.319 ms
tps = 5.588447 (without initial connection time)

You’ll notice some errors in the output — that’s because pg_bench expects to run against a preset schema based on the TPC-B benchmark (first established back in 1990 and marked obsolete in 1995!). Other than the output noise, pg_bench is perfectly happy to run whatever script you give it, and we can see that it ran our query 100 times and reported the average latency (how long it took to run the query).

Similar to pg_stat_statements, pg_bench provides already-computed stats. But, with the --log command-line option we provided, pg_bench writes the time taken for each transaction into a log file, and we can use those measurements to derive more stats on our own.

In the end, pg_bench is somewhat useful in our quest to benchmark and compare queries. But, it’s a better fit for load testing than query comparison — it lets you provide a mix of queries (you can simulate a realistic load of reads and writes), run many clients in parallel, target a specific transaction rate, and much more.

SQL Function with clock_timestamp

The final approach we’ll try is writing a PL/pgSQL function to compare clock_timestamp values before and after running our query. I’ve modified this StackOverflow answer to end up with the below function:

-- scripts/clock_timestamp_function.sql
CREATE OR REPLACE FUNCTION bench(query TEXT, iterations INTEGER = 100)
RETURNS TABLE(avg FLOAT, min FLOAT, q1 FLOAT, median FLOAT, q3 FLOAT, p95 FLOAT, max FLOAT) AS $$
DECLARE
  _start TIMESTAMPTZ;
  _end TIMESTAMPTZ;
  _delta DOUBLE PRECISION;
BEGIN
  CREATE TEMP TABLE IF NOT EXISTS _bench_results (
      elapsed DOUBLE PRECISION
  );

  -- Warm the cache
  FOR i IN 1..5 LOOP
    EXECUTE query;
  END LOOP;

  -- Run test and collect elapsed time into _bench_results table
  FOR i IN 1..iterations LOOP
    _start = clock_timestamp();
    EXECUTE query;
    _end = clock_timestamp();
    _delta = 1000 * ( extract(epoch from _end) - extract(epoch from _start) );
    INSERT INTO _bench_results VALUES (_delta);
  END LOOP;

  RETURN QUERY SELECT
    avg(elapsed),
    min(elapsed),
    percentile_cont(0.25) WITHIN GROUP (ORDER BY elapsed),
    percentile_cont(0.5) WITHIN GROUP (ORDER BY elapsed),
    percentile_cont(0.75) WITHIN GROUP (ORDER BY elapsed),
    percentile_cont(0.95) WITHIN GROUP (ORDER BY elapsed),
    max(elapsed)
    FROM _bench_results;
  DROP TABLE IF EXISTS _bench_results;

END
$$
LANGUAGE

We run the query many times, measuring the duration and storing results in a temporary table, which we use to calculate whatever statistics we want and return them to the function caller. Calling it looks like:

=# SELECT * FROM bench('SELECT * FROM artists');
    avg   |   min   |    q1     | median  |        q3         |    p95    |   max
----------+---------+-----------+---------+-------------------+-----------+---------
869.59934 | 868.069 | 869.15325 | 869.546 | 870.0382500000001 | 870.59285 | 872.134 

=# SELECT * FROM bench('SELECT count(1) FROM artists');
    avg    |   min   |    q1     |  median  |    q3    |   p95   |   max
-----------+---------+-----------+----------+----------+---------+---------
 177.44266 | 175.186 | 176.81625 | 177.1125 | 177.8295 | 180.107 | 183.201

This approach is flexible and customizable — we don’t have to parse out individual measurements or calculate statistics by hand, and we can include whatever statistics we care about. We could even make the function output a graphical histogram as described in these articles:

Conclusion

Whew! PostgreSQL has a lot of different ways to time queries and we didn’t even cover some other options, like pg_stat_monitor or dynamic tracing.

For fun, let’s compare our average query duration (in ms) for each query and each timing approach:

Query	SELECT * FROM artists	SELECT count(1) FROM artists
psql \timing	1550.342	175.526
EXPLAIN ANALYZE	705.450	303.638
EXPLAIN (ANALYZE, TIMING FALSE)	385.025	198.411
log_duration	1642.956	194.590
log_min_duration_statement	1634.927	191.741
log_statement_stats	1573.427	175.565
pg_stat_statements	1585.758	173.088
pg_bench	1724.638	178.929
function with clock_timestamp	869.599	177.443

‍Some concluding observations:

Despite taking precautions mentioned in the “General Benchmarking Concerns” section, there still appears to be quite a bit of noise! For example, we’d expect the client-side psql \timing approach to report longer query durations than approaches that measure time on the server, but these results show the opposite! This reinforces the message in the pg_bench documentation about getting reproducible benchmark results: “never believe any test that runs for only a few seconds… In some cases you could need hours to get numbers that are reproducible”.
In general, don’t compare query durations derived from different timing approaches — many use different timing mechanisms and start/stop timers in different locations in the code, so they’re not directly comparable.
The 3 logging approaches are pretty similar. In fact, they all end their timers and print results in the same location in the PostgreSQL code, though they use slightly different timer start locations and timing methods (on Linux: clock_gettime for log_duration and log_min_duration_statement vs getrusage for log_statement_stats).
The “function with clock_timestamp” and EXPLAIN approaches are much faster than all the other methods when a lot of data is returned to the database client, so there must be some allocation or serialization step that isn’t included in the times for these approaches.
Why is the “EXPLAIN (ANALYZE, TIMING FALSE)” approach so much faster than “function with clock_timestamp”? They both execute the query and (presumably) skip allocating or serializing the results. I didn’t get a chance to delve into this question, but I’d love to know what the difference is — let me know if you have some insight!

In future articles, I’ll use one of these logging approaches to measure and compare queries. It’s not as convenient as the “function with clock_timestamp” approach, but it seems more representative of real world performance, because it includes the time needed to return data for the client (an important part of what a database does!).

Thanks for reading! I hope you learned something new and will return for the next article in the PostgreSQL Authorization series. If you have any corrections to this article or suggestions/questions for future articles, please let me know or send us a tweet. If you’re curious to learn more about how we improve perception sensors, visit us at Tangram Vision.

‍—

Edits:

June 2, 2025: Removed hiring links.‍‍

❮

C++ & Rust: (Interior) Mutability, Moving and Ownership

❮

C++ & Rust: (Interior) Mutability, Moving and Ownership

C++ & Rust: Generics and Specialization

❯

C++ & Rust: Generics and Specialization

❯

Hands-on with PostgreSQL Authorization, Part 2.5: How To Benchmark PostgreSQL Queries Well

Greg Schaberg

,

Senior Infrastructure and Web

Hands-On With an Example Schema

General Benchmarking Concerns

Query Timing Approaches

psql \timing

EXPLAIN ANALYZE

Perf flamegraph of executing EXPLAIN ANALYZE SELECT * FROM artists .

Logging Options

log_duration

log_min_duration_statement

log_statement_stats

pg_stat_statements

pg_bench

SQL Function with clock_timestamp

Conclusion

Tangram Newsletter

PRODUCT

COMPANY

RESOURCES

Copyright 2025, Tangram Robotics Inc.

Tangram Newsletter

PRODUCT

COMPANY

RESOURCES

Copyright 2025, Tangram Robotics Inc.

Tangram Newsletter

PRODUCT

COMPANY

RESOURCES

Copyright 2025, Tangram Robotics Inc.

Perf flamegraph of executing `EXPLAIN ANALYZE SELECT * FROM artists` .