例如,非标准的二维表是:再流表A(id,,time,col1,col2,col2)。Col10)和dfs表B( id,.值). I想要对表A中的记录进行相应的计算,其中id列匹配维表B中的id范围,当value>5时,我可以使用什么功能来实现?
发布于 2021-12-10 08:43:23
假设我们有表A和表B
n = 5000
tableB = table("no."+string(1..n) as `id, rand(1..10, n) as value)
rowA = 100000000
rowA = 100000000
tableA = table(rand("no."+string(1..n), rowA) as `id, rand(1..100000, rowA) as `col1, rand(1..100000, rowA) as `col2, rand(1..100000, rowA) as `col3)对维度表使用公共联接方法:
select tableA.* from lj(tableA, tableB, `id) where tableB.value > 5使用字典方法而不是联接方法。结果与上面的示例相同。
dict_B = dict(tableB.id, tableB.value)
select * from tableA where dict_B[tableA.id] > 5用字典代替维度表的连接方法,不仅提高了性能,而且使代码更加简洁,也可以在流计算引擎中使用。
// Simulated table A, table B
n = 5000
tableB = table("no."+string(1..n) as `id, rand(1..10, n) as value)
rowA = 1000000
streamtableA_temp = streamTable(rand("no."+string(1..n), rowA) as `id, rand(1..100000, rowA) as `col1, rand(1..100000, rowA) as `col2, rand(1..100000, rowA) as `col3, rand(1..100000, rowA) as `col4, rand(1..100000, rowA) as `col5, rand(1..100000, rowA) as `col6)
share streamtableA_temp as streamtableA
// create a cross-sectional engine, filter table A for calculation
// Here, dict_B[id] is passed in the filter parameter, which realizes filtering table A according to the records in table B
output = table(1:0, `code`v1`v2`v3`b_value, [STRING,INT, INT,DOUBLE,INT])
rse = createReactiveStateEngine(name="ReactiveStateEngine1", metrics =[<col1>, <col2-col3>, <(col4+col5)\col6>, <dict_B[id]>], dummyTable=streamtableA, outputTable=output, keyColumn=`id, filter= < dict_B[id]>5 >)
// subscribe to table A
subscribeTable(tableName = "streamtableA", actionName="filterAndCalculate", offset=0, handler=append!{rse}, msgAsTable=true, reconnect=true)https://stackoverflow.com/questions/70212484
复制相似问题