专栏首页码匠的流水账kafka streams的join实例

kafka streams的join实例

本文简单介绍一下kafka streams的join操作

join

A join operation merges two streams based on the keys of their data records, and yields a new stream. A join over record streams usually needs to be performed on a windowing basis because otherwise the number of records that must be maintained for performing the join may grow indefinitely.

实例

        KStreamBuilder builder = new KStreamBuilder();
        KStream<String, String> left = builder.stream("intpu-left");
        KStream<String, String> right = builder.stream("intpu-right");

        KStream<String, String> all = left.selectKey((key, value) -> value.split(",")[1])
                .join(right.selectKey((key, value) -> value.split(",")[0]), new ValueJoiner<String, String, String>() {
            @Override
            public String apply(String value1, String value2) {
                return value1 + "--" + value2;
            }
        }, JoinWindows.of(30000));

        all.print();

由于join操作是根据key来,所以通常一般要再次映射一下key

测试

sh bin/kafka-topics.sh --create --topic intpu-left --replication-factor 1 --partitions 3 --zookeeper localhost:2181

sh bin/kafka-topics.sh --create --topic intpu-right --replication-factor 1 --partitions 3 --zookeeper localhost:2181

sh bin/kafka-console-producer.sh --broker-list localhost:9092 --topic intpu-left
sh bin/kafka-console-producer.sh --broker-list localhost:9092 --topic intpu-right

左边输入诸如

1,a
2,b
3,c
3,c
4,d
1,a
2,b
3,c
1,a
2,b
3,c
4,e
5,h
6,f
7,g

右边输入诸如

a,hello
b,world
c,hehehe
c,aaa
d,eee
a,cccc
b,aaaaaa
c,332435
a,dddd
b,2324
c,ddddd
e,23453
h,2222222
f,0o0o0o0
g,ssss

输出实例

[KSTREAM-MERGE-0000000014]: a , 1,a--a,dddd
[KSTREAM-MERGE-0000000014]: b , 2,b--b,2324
2017-10-17 22:17:34.578  INFO   --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [StreamThread-1] Committing all tasks because the commit interval 30000ms has elapsed
2017-10-17 22:17:34.578  INFO   --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [StreamThread-1] Committing task StreamTask 0_0
2017-10-17 22:17:34.585  INFO   --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [StreamThread-1] Committing task StreamTask 0_1

join类别

这里使用的是inner join,也有left join,也有outer join。如果要记录在时间窗口没有匹配上的记录,可以使用outer join,额外存储下来,然后再根据已经匹配的记录再过滤一次。

输出实例

[KSTREAM-MERGE-0000000014]: f , null--f,ddddddd
[KSTREAM-MERGE-0000000014]: f , 4,f--f,ddddddd
2017-10-17 22:31:12.530  INFO   --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [StreamThread-1] Committing all tasks because the commit interval 30000ms has elapsed
2017-10-17 22:31:12.530  INFO   --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [StreamThread-1] Committing task StreamTask 0_0
2017-10-17 22:31:12.531  INFO   --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [StreamThread-1] Committing task StreamTask 0_1
2017-10-17 22:31:12.531  INFO   --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [StreamThread-1] Committing task StreamTask 1_0
2017-10-17 22:31:12.531  INFO   --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [StreamThread-1] Committing task StreamTask 0_2
2017-10-17 22:31:12.533  INFO   --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [StreamThread-1] Committing task StreamTask 1_1
2017-10-17 22:31:12.533  INFO   --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [StreamThread-1] Committing task StreamTask 2_0
2017-10-17 22:31:12.539  INFO   --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [StreamThread-1] Committing task StreamTask 1_2
2017-10-17 22:31:12.540  INFO   --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [StreamThread-1] Committing task StreamTask 2_1
2017-10-17 22:31:12.541  INFO   --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread         : stream-thread [StreamThread-1] Committing task StreamTask 2_2
[KSTREAM-MERGE-0000000014]: g , 5,g--null
[KSTREAM-MERGE-0000000014]: h , 6,h--null
[KSTREAM-MERGE-0000000014]: h , 6,h--h,ddddddd

小结

kafka streams的join操作,非常适合不同数据源的实时匹配操作。

本文分享自微信公众号 - 码匠的流水账(geek_luandun),作者:patterncat

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2017-10-17

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • kafka stream errorlog报警实例

    log4j-core-2.7-sources.jar!/org/apache/logging/log4j/core/appender/mom/kafka/Kaf...

    codecraft
  • reactor3 flux的map与flatMap的区别

    flatMap的转换Function要求返回一个Publisher,这个Publisher代表一个作用于元素的异步的转换操作;而map仅仅是同步的元素转换操作。

    codecraft
  • 聊聊kingbus的binlog_progress.go

    kingbus的binlog_progress.go提供了newBinlogProgress、updateProcess方法用于存储binglogProgres...

    codecraft
  • 前端基础-Vue.js组件

    https://cn.vuejs.org/v2/guide/components.html

    cwl_java
  • 响应式网页

    安装react-bootstrap(react-bootstrap标签自定义,属性和bootstrap相同)

    sofu456
  • 010.Debian系统基本操作

    debian默认不允许使用root用户远程登录,需要修改/etc/ssh/sshd_config文件:

    CoderJed
  • java支付宝开发-异常-01-"sub_code":"isv.invalid-app-id","sub_msg":"无效的AppID参数"

    后来检查了一下,发现是因为 我支付宝网关写错了。沙箱环境和正式环境 的支付宝网关不同,如下

    shirayner
  • reactor3 flux的map与flatMap的区别

    flatMap的转换Function要求返回一个Publisher,这个Publisher代表一个作用于元素的异步的转换操作;而map仅仅是同步的元素转换操作。

    codecraft
  • 谈谈IE针对Ajax请求结果的缓存

    在默认情况下,IE会针对请求地址缓存Ajax请求的结果。换句话说,在缓存过期之前,针对相同地址发起的多个Ajax请求,只有第一次会真正发送到服务端。在某些情况下...

    蒋金楠
  • Zabbix常用监控项整理

    最近整理了一份常用Zabbix监控项说明,主要包括常见Windows & Linux监控,如下:

    拓荒者

扫码关注云+社区

领取腾讯云代金券