PostgreSQL 9.5 リリース : BRIN インデックスが気になる

こんにちは。松本です。

PostgreSQL 9.5 がリリースされましたね！

このバージョンアップによりいくつかの新機能が追加されましたが、中でも私が注目したいのは、アナウンスでビッグデータ向け新機能として紹介されている BRIN インデックスです。

BRIN Indexing: This new type of index supports creating tiny, but effective indexes for very large, "naturally ordered" tables. For example, tables containing logging data with billions of rows could be indexed and searched in 5% of the time required by standard BTree indexes.

"BRIN" は "Block Range Index" の略です。

PostgreSQL ではテーブルデータをブロックと呼ばれる単位で格納し、インデックスはレコードがどのブロックのどの場所に格納されているかを情報として持っています。BRIN インデックスでは、連続して並ぶ複数のブロックをまとめて扱い、「ブロック範囲」ごとにその範囲に含まれるレコードのカラム（式）最大値と最小値を保持します。この仕組みにより、数十億行レベルの巨大なデータへのアクセスを高速に行うことを可能にしています。

ということで、実際に BRIN を試してみる

まずはテーブル作成とレコード追加。

postgres=# CREATE TABLE brin_test_table (id serial, ts timestamp);
CREATE TABLE
postgres=# INSERT INTO brin_test_table (ts) SELECT generate_series(timestamp '2015-01-01 00:00:00', timestamp '2015-12-31 23:59:59', interval '1 seconds');

INSERT 0 31536000

postgres=# CREATE TABLE brin_test_table (id serial, ts timestamp);

CREATE TABLE

postgres=# INSERT INTO brin_test_table (ts) SELECT generate_series(timestamp '2015-01-01 00:00:00', timestamp '2015-12-31 23:59:59', interval '1 seconds');

INSERT 0 31536000

約 3,000 万行ほどのレコードが登録されました。

この状態で SELECT するとこうなります。

postgres=# EXPLAIN ANALYZE SELECT * FROM brin_test_table WHERE ts = timestamp '2015-06-01 00:00:00';
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Seq Scan on brin_test_table  (cost=0.00..564665.30 rows=1 width=12) (actual time=8320.332..14432.488 rows=1 loops=1)
   Filter: (ts = '2015-06-01 00:00:00'::timestamp without time zone)
   Rows Removed by Filter: 31535999
 Planning time: 0.043 ms
 Execution time: 14432.512 ms
(5 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM brin_test_table WHERE ts = timestamp '2015-06-01 00:00:00';

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------

Seq Scan on brin_test_table (cost=0.00..564665.30 rows=1 width=12) (actual time=8320.332..14432.488 rows=1 loops=1)

Filter: (ts = '2015-06-01 00:00:00'::timestamp without time zone)

Rows Removed by Filter: 31535999

Planning time: 0.043 ms

Execution time: 14432.512 ms

(5 rows)

次に BRIN インデックスを作ります。

PostgreSQL のブロックサイズのデフォルト値は 8KB で、これを変更するには PostgreSQL をソースコードからコンパイルする (--with-blocksize オプション) 必要がありますが、BRIN のブロック範囲のサイズはインデックス作成時の pages_per_range パラメーターで簡単に設定できます。

ブロック範囲サイズが小さいほどインデックスされる情報が細やかになり、インデックススキャン中に、より多くのブロックがスキップされますが、インデックス自体のサイズは大きくなります。

今回はともにデフォルト設定のままとします（ブロックサイズ 8KB、ブロック範囲サイズ 128）。

postgres=# CREATE INDEX brin_test_index ON brin_test_table USING BRIN (ts);
CREATE INDEX
postgres=# SHOW block_size;
 block_size
------------
 8192
(1 row)

postgres=# CREATE INDEX brin_test_index ON brin_test_table USING BRIN (ts);

CREATE INDEX

postgres=# SHOW block_size;

block_size

------------

8192

(1 row)

ブロックサイズが 8KB になっていることがわかります。

ここで BRIN インデックスがどのように作成されたか、contrib の pageinspect を使ってみてみます。

まずは、pageinspect のインストール。

postgres=# SELECT * FROM pg_available_extensions WHERE name = 'pageinspect';
    name     | default_version | installed_version |                        comment
-------------+-----------------+-------------------+-------------------------------------------------------
 pageinspect | 1.3             |                   | inspect the contents of database pages at a low level
(1 row)


postgres=# CREATE EXTENSION pageinspect;
CREATE EXTENSION
postgres=# SELECT * FROM pg_available_extensions WHERE name = 'pageinspect';
    name     | default_version | installed_version |                        comment
-------------+-----------------+-------------------+-------------------------------------------------------
 pageinspect | 1.3             | 1.3               | inspect the contents of database pages at a low level
(1 row)

postgres=# SELECT * FROM pg_available_extensions WHERE name = 'pageinspect';

name | default_version | installed_version | comment

-------------+-----------------+-------------------+-------------------------------------------------------

pageinspect | 1.3 | | inspect the contents of database pages at a low level

(1 row)

postgres=# CREATE EXTENSION pageinspect;

CREATE EXTENSION

postgres=# SELECT * FROM pg_available_extensions WHERE name = 'pageinspect';

name | default_version | installed_version | comment

-------------+-----------------+-------------------+-------------------------------------------------------

pageinspect | 1.3 | 1.3 | inspect the contents of database pages at a low level

(1 row)

インデックス情報を見ます。

postgres=# SELECT * FROM brin_page_items(get_raw_page('brin_test_index', 2), 'brin_test_index');
 itemoffset | blknum | attnum | allnulls | hasnulls | placeholder |                    value
------------+--------+--------+----------+----------+-------------+----------------------------------------------
          1 |      0 |      1 | f        | f        | f           | {2015-01-01 00:00:00 .. 2015-01-01 06:34:39}
          2 |    128 |      1 | f        | f        | f           | {2015-01-01 06:34:40 .. 2015-01-01 13:09:19}
          3 |    256 |      1 | f        | f        | f           | {2015-01-01 13:09:20 .. 2015-01-01 19:43:59}
          4 |    384 |      1 | f        | f        | f           | {2015-01-01 19:44:00 .. 2015-01-02 02:18:39}
          5 |    512 |      1 | f        | f        | f           | {2015-01-02 02:18:40 .. 2015-01-02 08:53:19}
          6 |    640 |      1 | f        | f        | f           | {2015-01-02 08:53:20 .. 2015-01-02 15:27:59}
          7 |    768 |      1 | f        | f        | f           | {2015-01-02 15:28:00 .. 2015-01-02 22:02:39}
          8 |    896 |      1 | f        | f        | f           | {2015-01-02 22:02:40 .. 2015-01-03 04:37:19}
          9 |   1024 |      1 | f        | f        | f           | {2015-01-03 04:37:20 .. 2015-01-03 11:11:59}
         10 |   1152 |      1 | f        | f        | f           | {2015-01-03 11:12:00 .. 2015-01-03 17:46:39}
--More--

postgres=# SELECT * FROM brin_page_items(get_raw_page('brin_test_index', 2), 'brin_test_index');

------------+--------+--------+----------+----------+-------------+----------------------------------------------

1 | 0 | 1 | f | f | f | {2015-01-01 00:00:00 .. 2015-01-01 06:34:39}

2 | 128 | 1 | f | f | f | {2015-01-01 06:34:40 .. 2015-01-01 13:09:19}

3 | 256 | 1 | f | f | f | {2015-01-01 13:09:20 .. 2015-01-01 19:43:59}

4 | 384 | 1 | f | f | f | {2015-01-01 19:44:00 .. 2015-01-02 02:18:39}

5 | 512 | 1 | f | f | f | {2015-01-02 02:18:40 .. 2015-01-02 08:53:19}

6 | 640 | 1 | f | f | f | {2015-01-02 08:53:20 .. 2015-01-02 15:27:59}

7 | 768 | 1 | f | f | f | {2015-01-02 15:28:00 .. 2015-01-02 22:02:39}

8 | 896 | 1 | f | f | f | {2015-01-02 22:02:40 .. 2015-01-03 04:37:19}

9 | 1024 | 1 | f | f | f | {2015-01-03 04:37:20 .. 2015-01-03 11:11:59}

10 | 1152 | 1 | f | f | f | {2015-01-03 11:12:00 .. 2015-01-03 17:46:39}

--More--

128 ブロックずつ、サマリ情報が作成されています。

さて、SELECT はどうなったかな？

postgres=# EXPLAIN ANALYZE SELECT * FROM brin_test_table WHERE ts = timestamp '2015-06-01 00:00:00';
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on brin_test_table  (cost=28.01..32.02 rows=1 width=12) (actual time=2.775..2.894 rows=1 loops=1)
   Recheck Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)
   Rows Removed by Index Recheck: 23679
   Heap Blocks: lossy=128
   ->  Bitmap Index Scan on brin_test_index  (cost=0.00..28.01 rows=1 width=0) (actual time=0.413..0.413 rows=1280 loops=1)
         Index Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)
 Planning time: 0.083 ms
 Execution time: 2.923 ms
(8 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM brin_test_table WHERE ts = timestamp '2015-06-01 00:00:00';

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on brin_test_table (cost=28.01..32.02 rows=1 width=12) (actual time=2.775..2.894 rows=1 loops=1)

Recheck Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)

Rows Removed by Index Recheck: 23679

Heap Blocks: lossy=128

-> Bitmap Index Scan on brin_test_index (cost=0.00..28.01 rows=1 width=0) (actual time=0.413..0.413 rows=1280 loops=1)

Index Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)

Planning time: 0.083 ms

Execution time: 2.923 ms

(8 rows)

BRIN インデックスが効いているのが見て取れます。

ブロック範囲サイズを変えての比較

BRIN インデックスについて、ドキュメントには次のように書かれています。

BRIN indexes can satisfy queries via regular bitmap index scans, and will return all tuples in all pages within each range if the summary info stored by the index is consistent with the query conditions. The query executor is in charge of rechecking these tuples and discarding those that do not match the query conditions — in other words, these indexes are lossy. Because a BRIN index is very small, scanning the index adds little overhead compared to a sequential scan, but may avoid scanning large parts of the table that are known not to contain matching tuples.

(snip)

The size of the block range is determined at index creation time by the pages_per_range storage parameter. The number of index entries will be equal to the size of the relation in pages divided by the selected value for pages_per_range. Therefore, the smaller the number, the larger the index becomes (because of the need to store more index entries), but at the same time the summary data stored can be more precise and more data blocks can be skipped during an index scan.

ここから、BRIN インデックスは、「クエリの対象となるブロック範囲に含まれるレコードすべてを返し、クエリ実行部がその中から条件に一致しないレコードを除外する（recheck）」という点と、「ブロック範囲サイズを小さく設定するとインデックスサイズが大きくなる」という点について見てみます。

pages_per_range パラメーターを 64, 128, 256 とした場合それぞれのインデックスの概要と EXPLAIN ANALYZE 結果です。

pages_per_range = 64

postgres=# \di+ brin_test_index
                                  List of relations
 Schema |      Name       | Type  |  Owner   |      Table      |  Size  | Description
--------+-----------------+-------+----------+-----------------+--------+-------------
 public | brin_test_index | index | postgres | brin_test_table | 128 kB |
(1 row)

postgres=# EXPLAIN ANALYZE SELECT * FROM brin_test_table WHERE ts = timestamp '2015-06-01 00:00:00';
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on brin_test_table  (cost=52.01..56.02 rows=1 width=12) (actual time=1.757..1.867 rows=1 loops=1)
   Recheck Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)
   Rows Removed by Index Recheck: 11839
   Heap Blocks: lossy=64
   ->  Bitmap Index Scan on brin_test_index  (cost=0.00..52.01 rows=1 width=0) (actual time=0.795..0.795 rows=640 loops=1)
         Index Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)
 Planning time: 0.103 ms
 Execution time: 1.895 ms
(8 rows)

postgres=# \di+ brin_test_index

List of relations

--------+-----------------+-------+----------+-----------------+--------+-------------

(1 row)

postgres=# EXPLAIN ANALYZE SELECT * FROM brin_test_table WHERE ts = timestamp '2015-06-01 00:00:00';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on brin_test_table (cost=52.01..56.02 rows=1 width=12) (actual time=1.757..1.867 rows=1 loops=1)

Recheck Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)

Rows Removed by Index Recheck: 11839

Heap Blocks: lossy=64

-> Bitmap Index Scan on brin_test_index (cost=0.00..52.01 rows=1 width=0) (actual time=0.795..0.795 rows=640 loops=1)

Index Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)

Planning time: 0.103 ms

Execution time: 1.895 ms

(8 rows)

pages_per_range = 128

postgres=# \di+ brin_test_index
                                  List of relations
 Schema |      Name       | Type  |  Owner   |      Table      | Size  | Description
--------+-----------------+-------+----------+-----------------+-------+-------------
 public | brin_test_index | index | postgres | brin_test_table | 80 kB |
(1 row)

postgres=# EXPLAIN ANALYZE SELECT * FROM brin_test_table WHERE ts = timestamp '2015-06-01 00:00:00';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on brin_test_table  (cost=28.01..32.02 rows=1 width=12) (actual time=2.349..2.463 rows=1 loops=1)
   Recheck Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)
   Rows Removed by Index Recheck: 23679
   Heap Blocks: lossy=128
   ->  Bitmap Index Scan on brin_test_index  (cost=0.00..28.01 rows=1 width=0) (actual time=0.416..0.416 rows=1280 loops=1)
         Index Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)
 Planning time: 0.126 ms
 Execution time: 2.482 ms
(8 rows)

postgres=# \di+ brin_test_index

List of relations

--------+-----------------+-------+----------+-----------------+-------+-------------

(1 row)

postgres=# EXPLAIN ANALYZE SELECT * FROM brin_test_table WHERE ts = timestamp '2015-06-01 00:00:00';

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on brin_test_table (cost=28.01..32.02 rows=1 width=12) (actual time=2.349..2.463 rows=1 loops=1)

Recheck Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)

Rows Removed by Index Recheck: 23679

Heap Blocks: lossy=128

-> Bitmap Index Scan on brin_test_index (cost=0.00..28.01 rows=1 width=0) (actual time=0.416..0.416 rows=1280 loops=1)

Index Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)

Planning time: 0.126 ms

Execution time: 2.482 ms

(8 rows)

pages_per_range = 256

postgres=# \di+ brin_test_index
                                  List of relations
 Schema |      Name       | Type  |  Owner   |      Table      | Size  | Description
--------+-----------------+-------+----------+-----------------+-------+-------------
 public | brin_test_index | index | postgres | brin_test_table | 64 kB |
(1 row)

postgres=# EXPLAIN ANALYZE SELECT * FROM brin_test_table WHERE ts = timestamp '2015-06-01 00:00:00';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on brin_test_table  (cost=20.01..24.02 rows=1 width=12) (actual time=2.251..5.061 rows=1 loops=1)
   Recheck Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)
   Rows Removed by Index Recheck: 47359
   Heap Blocks: lossy=256
   ->  Bitmap Index Scan on brin_test_index  (cost=0.00..20.01 rows=1 width=0) (actual time=0.214..0.214 rows=2560 loops=1)
         Index Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)
 Planning time: 0.101 ms
 Execution time: 5.082 ms
(8 rows)

postgres=# \di+ brin_test_index

List of relations

--------+-----------------+-------+----------+-----------------+-------+-------------

(1 row)

postgres=# EXPLAIN ANALYZE SELECT * FROM brin_test_table WHERE ts = timestamp '2015-06-01 00:00:00';

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on brin_test_table (cost=20.01..24.02 rows=1 width=12) (actual time=2.251..5.061 rows=1 loops=1)

Recheck Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)

Rows Removed by Index Recheck: 47359

Heap Blocks: lossy=256

-> Bitmap Index Scan on brin_test_index (cost=0.00..20.01 rows=1 width=0) (actual time=0.214..0.214 rows=2560 loops=1)

Index Cond: (ts = '2015-06-01 00:00:00'::timestamp without time zone)

Planning time: 0.101 ms

Execution time: 5.082 ms

(8 rows)

インデックスサイズを比較して見ると、128KB, 80KB, 64KB というように、ブロック範囲サイズが小さいほど大きくなっていることがわかります。

recheck によって除外されたレコード数は 11,839 件、23,679 件、47,359 件となり、ブロック範囲サイズが小さいほど少なく、recheck 処理の効率が向上することがわかります。ブロック範囲サイズが 128 のケースを例にとると、インデックススキャンによって選択されたブロック数は 128 であり、このサンプルでは 1 ブロック範囲（128 ブロック）辺り 23,680 件のレコードを含むので、ここから除外された 23,679 件を引くと該当するレコードは 1 件、という結果になります。

ここで、BRIN インデックスのコストについて見てみると 52.01, 28.01, 20.01 となり、ブロック範囲サイズが小さいほど大きくなっていますが、このオーバーヘッドは多くの場合、無視できるほど小さいと考えられます。

EXPLAIN ANALYZE 結果として出力されている実行時間に関する数値は、様々な要因によって影響を受けるものなので、ここでは参考程度に見てください。

最後に

今回、お試し環境として AWS の EC2(Amazon Linux) t2.micro を使いました。こういう時、何の気兼ねもなく、すぐに環境を用意できるのは本当に便利ですね。

因みに、この記事を書くにあたり上記環境に PostgreSQL 9.5 をソースコードからインストールしましたが、Amazon Linux でコンパイル作業をする場合は Development Tools をインストールしておくと便利です。

[ec2-user ~]$ sudo yum groupinstall "Development Tools"

1	[ec2-user ~]$ sudo yum groupinstall "Development Tools"