Posts 将 distinct agg 改写成 非 distinct agg
Post
Cancel

将 distinct agg 改写成 非 distinct agg

如果Aggregate操作中同时包含Distinct与非Distinct操作,优化器可以将该操作改写成两个不包含DistinctAggregate 假设schema如下

1
2
3
4
create table animal(gkey varchar(128), 
                    cat varchar(128), 
                    dog varchar(128), 
                    price double);

animal表中的数据如下

gkeycatdogprice
aca1cb110
aca1cb25
bca1cb113

测试语句如下

1
2
3
4
5
6
7
8
SELECT 
	gkey, SUM(price), 
	COUNT(DISTINCT cat),
	COUNT(DISTINCT dog)
FROM 
	animal
GROUP BY
	gkey

该测试语句拥有3个aggregate,其中两个包含distinct,优化策略如下 首先将animal表格的每行扩展成3行,并添加新的一列grid,类型为整形,记新的表为animal2

gkeycatdogpricegrid
$gkeynullnull$price0
$gkey$catnullnull1
$gkeynull$dognull2

animal2数据如下

gkeycatdogpricegrid
anullnull100
aca1nullnull1
anullcb1null2
anullnull50
aca1nullnull1
anullcb2null2
bnullnull130
bca1nullnull1
bnullcb1null2

animal2上按照gkey,cat,dog,gridgroup by创建视图

1
2
3
4
5
6
7
8
CREATE VIEW v_animal(v_gkey, v_cat, v_dog, v_grid, v_total ) AS
SELECT
	gkey, cat, dog, grid
	SUM(price) as total
FROM
	animal2
GROUP BY
	gkey, cat, dog, grid

然后在v_animal上再一次做aggregate操作,语句如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT
	v_gkey AS gkey
	SUM(CASE
			WHEN grid==0 THEN v_total
			ELSE null
		END) AS total
	COUNT(CASE
			WHEN grid==1 THEN v_cat
			ELSE null
		END) AS cat_total
	COUNT(CASE
			WHEN grid==2 THEN v_dog
			ELSE null
		END) AS dog_total
FROM
	v_animal
GROUP BY
	v_gkey

参考资料: RewriteDistinctAggregates.scala

This post is licensed under CC BY 4.0 by the author.