本文是《Flink on Yarn三部曲》系列的终篇,先简单回顾前面的内容:
现在Flink、Yarn、HDFS都就绪了,接下来实践提交Flink任务到Yarn执行;
实践之前,对Flink on YARN先简单了解一下,如下图所示,Flink on Yarn在使用的时候分为两种模式,Job Mode和Session Mode:
Session Mode:在YARN中提前初始化一个Flink集群,以后所有Flink任务都提交到这个集群,如下图:
Job Mode:每次提交Flink任务都会创建一个专用的Flink集群,任务完成后资源释放,如下图:
接下来分别实战这两种模式;
接下来提交的Flink任务是经典的WordCount,先在HDFS中准备一份文本文件,后面提交的Flink任务都会读取这个文件,统计里面每个单词的数字,准备文本的步骤如下:
wget https://github.com/zq2599/blog_demos/blob/master/files/GoneWiththeWind.txt
准备工作完成,可以提交任务试试了。
./bin/yarn-session.sh -n 2 -jm 1024 -tm 1024
bin/flink run ./examples/batch/WordCount.jar \
-input hdfs://192.168.50.134:8020/input/GoneWiththeWind.txt \
-output hdfs://192.168.50.134:8020/wordcount-result.txt
Session Mode的实战就完成了,接下来我们来尝试Job Mode;
bin/flink run -m yarn-cluster \
-yn 2 \
-yjm 1024 \
-ytm 1024 \
./examples/batch/WordCount.jar \
-input hdfs://192.168.50.134:8020/input/GoneWiththeWind.txt \
-output hdfs://192.168.50.134:8020/wordcount-result-1.txt
bin/flink run -m yarn-cluster \
-yn 2 \
-yjm 1024 \
-ytm 1024 \
./examples/batch/WordCount.jar \
-input hdfs://192.168.50.134:8020/input/GoneWiththeWind.txt \
-output hdfs://192.168.50.134:8020/wordcount-result-2.txt