如果Aggregate操作中同时包含Distinct与非Distinct操作,优化器可以将该操作改写成两个不包含Distinct的Aggregate 假设schema如下
1
2
3
4
create table animal(gkey varchar(128),
cat varchar(128),
dog varchar(128),
price double);
animal表中的数据如下
| gkey | cat | dog | price |
|---|---|---|---|
| a | ca1 | cb1 | 10 |
| a | ca1 | cb2 | 5 |
| b | ca1 | cb1 | 13 |
测试语句如下
1
2
3
4
5
6
7
8
SELECT
gkey, SUM(price),
COUNT(DISTINCT cat),
COUNT(DISTINCT dog)
FROM
animal
GROUP BY
gkey
该测试语句拥有3个aggregate,其中两个包含distinct,优化策略如下 首先将animal表格的每行扩展成3行,并添加新的一列grid,类型为整形,记新的表为animal2
| gkey | cat | dog | price | grid |
|---|---|---|---|---|
| $gkey | null | null | $price | 0 |
| $gkey | $cat | null | null | 1 |
| $gkey | null | $dog | null | 2 |
表animal2数据如下
| gkey | cat | dog | price | grid |
|---|---|---|---|---|
| a | null | null | 10 | 0 |
| a | ca1 | null | null | 1 |
| a | null | cb1 | null | 2 |
| a | null | null | 5 | 0 |
| a | ca1 | null | null | 1 |
| a | null | cb2 | null | 2 |
| b | null | null | 13 | 0 |
| b | ca1 | null | null | 1 |
| b | null | cb1 | null | 2 |
在animal2上按照gkey,cat,dog,grid做group by创建视图
1
2
3
4
5
6
7
8
CREATE VIEW v_animal(v_gkey, v_cat, v_dog, v_grid, v_total ) AS
SELECT
gkey, cat, dog, grid
SUM(price) as total
FROM
animal2
GROUP BY
gkey, cat, dog, grid
然后在v_animal上再一次做aggregate操作,语句如下
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT
v_gkey AS gkey
SUM(CASE
WHEN grid==0 THEN v_total
ELSE null
END) AS total
COUNT(CASE
WHEN grid==1 THEN v_cat
ELSE null
END) AS cat_total
COUNT(CASE
WHEN grid==2 THEN v_dog
ELSE null
END) AS dog_total
FROM
v_animal
GROUP BY
v_gkey