Zápisník Josefa Rouska

Python a React.js programátor, zaměřený na Shopify integrace

Speed up your reports using materialized views

We’re collecting anonymous customer events in our Shopify app into a single table. This table grew over time to sufficiently large volume, approximately 30 million, to make our reports take too much time for large customers. The first and easiest step to a better performance I could think of is to precompute the report. Materialized views seemed like a reasonable approach. So I went ahead and asked DB specialist if it’s a good idea.

Materialized views on PostgreSQL are a way to speed up your complex queries by saving the intermediate results. It’s a temporary table which can be refreshed with a single command.

I’ll walk you through the process using one of the queries as an example. This is a simplified version of what the query looked like.

SELECT
  count(distinct token) FILTER (where event_type = 'view') as views,
FROM
  bussiness_event
WHERE
  shop_id = %s
  and
  (
    %s is null or
    (
        created > %s::timestamptz
        and created < %s::timestamptz
    )
  )

In order to reduce number of rows we had to find a time unit that works for our use case. In our case we store metrics grouped by day, meaning we store daily reports for each shop.

CREATE MATERIALIZED VIEW analytics_summary AS
SELECT
  event.shop_id,
  event.created::date as event_date,
  count(distinct token) FILTER (where event_type = 'view') as views
FROM
  bussiness_event event
GROUP by
  event.shop_id,
  event.created::date
;

Then to use the view we could do this.

SELECT
  sum(views)::int as views
FROM
  analytics_summary
WHERE
  shop_id = %s
  and
  (
    %s is null or
    (
        event_date > %s::timestamptz
        and event_date < %s::timestamptz
    )
  )

The query is almost instant but the data are stale. To keep the report live we’ve used UNION ALL to include data for last day.

SELECT
  sum(views)::int as views
FROM (
    SELECT
      shop_id,
      created::date as event_date,
      count(distinct token) FILTER (where event_type = 'view') as views,
    FROM
      bussiness_event
    WHERE
      created >= now()::date - interval '2 days'
    GROUP BY 1,2

    UNION ALL

    SELECT
      shop_id,
      event_date::date,
      views
    FROM
      analytics_summary
    WHERE
      event_date < now()::date - interval '2 days'
) unioned
WHERE
  shop_id = %s
  and
  (
    %s is null or
    (
        event_date > %s::timestamptz
        and event_date < %s::timestamptz
    )
  )
GROUP BY shop_id

Are you wondering what does GROUP BY 1,2 do? Well, I did because I never saw this. It’s the equivalent of GROUP BY shop_id, created::date .

To update the view we have following query scheduled to run daily.

REFRESH MATERIALIZED VIEW analytics_offers;