元旦前一周到现在总共接到9个sparksql相关的优化咨询,这些案例中,有4个和count(distinct)有关。...我们知道sparksql处理count(distinct)时,分两种情况:
with one count distinct
more than one count distinct
这两种情况,sparksql...如果sql中没有非distinct类的聚合,比如,sql是:
select
count(distinct a) as a_num,
count(distinct b) as b_num...(distinct if(b=1,a,null)) as a_num1,
count(distinct if(b=3,a,null)) as a_num2 ,
count(distinct if(b...max(if(b=1,1,0)) as b1_flag,
max(if(b=3,1,0)) as b3_flag,
max(if(b=4,1,0)) as b4_flag
from testdata2 group