MySQL 8.0のSQLの新機能を解説

MySQL 8.0のWindow関数のサンプル集

MySQL 8.0のSQLの新機能を解説第1回

山岸賢治[著]

更新日: 2019/05/30
公開日: 2019/03/08

ポスト

ダウンロードサンプルファイル (5.9 KB)

　本連載では、MySQL 8.0のリリースを記念して、3回にわたってSQLの新機能を解説します。第1回の本稿では、Window関数について解説します。

ポスト

はじめに

　本稿では、MySQL 8.0の新機能であるWindow関数について解説します。前半は、学習効率が良いと思われる順序で、Window関数の各機能をイメージを交えて説明します。後半は、PostgreSQLとOracleのWindow関数の機能をMySQL 8.0で模倣する方法を解説します。

対象読者

MySQLのWindow関数の理解を深めたい方

必要な環境

　本稿で扱うSQLは、MySQL 8.0.15で動作確認しました。

Window関数とは

　Window関数は、select句とorder by句で使うことができます。とはいえ、order by句で使うことは、ほとんどありません。

Window関数のメリット

　自己結合や相関サブクエリを使ったり、親言語（PHPやRubyなど）やストアドプロシージャで求めていた結果を、SQLで容易に求めることができるようになります。帳票作成やデータ分析において、特に使います。

MySQL 8.0のWindow関数の使用頻度

　使用頻度の高いものからおさえていくと学習効率が良いです。

最頻出	count　max　min　sum　Row_Number
頻出	Lag　Lead　Rank　dense_rank
たまに	avg　First_Value　Last_Value　nth_Value　JSON_ArrayAgg
レア	NTile

　本稿では、以下の機能を順に、サンプルとともに紹介していきます。

1. Select文の件数取得

　Select文の結果の件数がほしいことは、よくあります。そんな時に使うのが、Window関数のcount関数です。以下のように使います。

create table CountT(ID int,Val int);
insert into CountT values(1,10),
                         (1,20),
                         (2,10),
                         (2,30),
                         (2,50);

Sample01

select ID,Val,
count(*) over() as recordCount
  from CountT;

+------+------+-------------+
| ID   | Val  | recordCount |
+------+------+-------------+
|    1 |   10 |           5 |
|    1 |   20 |           5 |
|    2 |   10 |           5 |
|    2 |   30 |           5 |
|    2 |   50 |           5 |
+------+------+-------------+

Sample01のサブクエリを使った代替方法

select ID,Val,
(select count(*) from CountT) as recordCount
  from CountT;

　脳内のイメージは、このようになります

　where句やgroup by句、having句があっても、select文の結果の件数が求まります。

Sample02

select ID,Val,
count(*) over() as recordCount
  from CountT
 where Val in(10,20);

Sample03

select ID,max(Val) as maxVal,
count(*) over() as recordCount
  from CountT
group by ID
having max(Val) = 20;

　distinct指定、もしくは、Limit句があるselect文の結果の件数を求めるには、インラインビューを使えば良いです。MySQLのselect文は、下記の順序で動作するからです。

　from句
　where句
　group by句
　having句
　select句
　order by句
　distinct
　Limit句

create table disT(ColA int,ColB int);
insert into disT values(1,null),
                       (1,   3),
                       (1,   3),
                       (2,null),
                       (2,null);

Sample04

select ColA,ColB,count(*) over() as recordCount
from (select distinct ColA,ColB
        from disT
      order by ColA,ColB Limit 2) tmp
order by ColA,ColB;

2. 最大値の行の取得

　IDごとに、Valが最大値の行を取得します。サンプルは以下の通りです。

create table MaxT(ID int,Val int,extraCol char(1));
insert into MaxT values(1,10,'A'),
                       (1,20,'B'),
                       (2,10,'C'),
                       (2,30,'D'),
                       (2,50,'E'),
                       (3,70,'F'),
                       (3,70,'G');

単純に、IDごとのValの最大値がほしいなら、以下で可

select ID,max(Val)
  from MaxT
group by ID;

Sample05

select ID,Val,extraCol
from (select ID,Val,extraCol,
      max(Val) over(partition by ID) as maxVal
        from MaxT) tmp
 where Val = maxVal;

+------+------+----------+
| ID   | Val  | extraCol |
+------+------+----------+
|    1 |   20 | B        |
|    2 |   50 | E        |
|    3 |   70 | F        |
|    3 |   70 | G        |
+------+------+----------+

Sample05の相関サブクエリを使った代替方法

select ID,Val,extraCol
  from MaxT a
 where Val = (select max(b.Val)
                from MaxT b
               where b.ID = a.ID);

　Window関数が使えるのは、select句かorder by句です。なので、Window関数の結果をwhere句で使うには、インラインビューを使う必要があります。

　脳内のイメージは、このようになります。partition byで、脳内で赤線を引くと分かりやすいです。

3. 順位を付ける

　順位や連番を付けたい時に使うのが、Window関数のRow_Number関数、rank関数、dense_rank関数です。以下のようになります。

create table rnT(ID int,score int);
insert into rnT values(1,100),
                      (1,100),
                      (1, 90),
                      (1, 80),
                      (2,100),
                      (2, 70),
                      (2, 70);

Sample06

select ID,score,
Row_Number() over(partition by ID order by score desc) as "Row_Number",
      rank() over(partition by ID order by score desc) as "rank",
dense_rank() over(partition by ID order by score desc) as "dense_rank"
  from rnT
order by ID,score desc;

+------+-------+------------+------+------------+
| ID   | score | Row_Number | rank | dense_rank |
+------+-------+------------+------+------------+
|    1 |   100 |          1 |    1 |          1 |
|    1 |   100 |          2 |    1 |          1 |
|    1 |    90 |          3 |    3 |          2 |
|    1 |    80 |          4 |    4 |          3 |
|    2 |   100 |          1 |    1 |          1 |
|    2 |    70 |          2 |    2 |          2 |
|    2 |    70 |          3 |    2 |          2 |
+------+-------+------------+------+------------+

　Row_Number関数は、1から始まって、必ず連番になります。rank関数は、同点があると順位が飛びます。dense_rank関数は、同点があっても順位が飛びません。denseは、「密集した」という意味です。

Sample06の相関サブクエリを使った代替方法 (Row_Number関数以外)

select ID,score,
(select count(*)+1 from rnT b
  where b.ID = a.ID and b.score > a.score) as "rank",
(select count(distinct b.score)+1 from rnT b
  where b.ID = a.ID and b.score > a.score) as "dense_rank"
from rnT a
order by ID,"rank";

　脳内のイメージは、このようになります。partition byで、脳内で赤線を引くと分かりやすいです。

4. 最大値の行の取得（ソートキーが複数）

　「2. 最大値の行の取得」では、ソートキーが1つだったので、Window関数のmax関数を使いましたが、ソートキーが複数になると、順位を付ける関数を使う必要があります。

　IDごとで、SortKey1が最大の行の中で、SortKey2が最大の行を取得するには、以下のようになります。

create table multiSortKeyT(ID int,SortKey1 int,SortKey2 int,extraCol char(3));
insert into multiSortKeyT values(1,10, 2,'AAA'),
                                (1,10, 3,'BBB'),
                                (1,30, 1,'CCC'),
                                (2,20, 1,'DDD'),
                                (2,50, 2,'EEE'),
                                (2,50, 2,'FFF'),
                                (3,60, 1,'GGG'),
                                (3,60, 2,'HHH'),
                                (3,60, 3,'III'),
                                (4,10,20,'JJJ');

Sample07

select ID,SortKey1,SortKey2,extraCol
from (select ID,SortKey1,SortKey2,extraCol,
      rank() over(partition by ID order by SortKey1 desc,SortKey2 desc) as rn
        from multiSortKeyT) a
 where rn = 1
order by ID,extraCol;

+------+----------+----------+----------+
| ID   | SortKey1 | SortKey2 | extraCol |
+------+----------+----------+----------+
|    1 |       30 |        1 | CCC      |
|    2 |       50 |        2 | EEE      |
|    2 |       50 |        2 | FFF      |
|    3 |       60 |        3 | III      |
|    4 |       10 |       20 | JJJ      |
+------+----------+----------+----------+

Sample07の相関サブクエリを使った代替方法1

select ID,SortKey1,SortKey2,extraCol
  from multiSortKeyT a
 where (SortKey1,SortKey2) = (select b.SortKey1,b.SortKey2
                                from multiSortKeyT b
                               where b.ID = a.ID
                              order by b.SortKey1 desc,b.SortKey2 desc Limit 1)
order by ID,extraCol;

Sample07の相関サブクエリを使った代替方法2

select ID,SortKey1,SortKey2,extraCol
  from multiSortKeyT a
 where not exists(select 1 from multiSortKeyT b
                   where b.ID = a.ID
                     and (b.SortKey1 > a.SortKey1
                       or b.SortKey1 = a.SortKey1 and b.SortKey2 > a.SortKey2))
order by ID,extraCol;

　脳内のイメージは、このようになります。partition byで、脳内で赤線を引くと分かりやすいです。