Added Depedency Pom Details :
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>1.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>1.7.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core</artifactId>
<version>1.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-runtime_2.11</artifactId>
<version>1.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table_2.11</artifactId>
<version>1.7.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_2.11</artifactId>
<version>1.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-filesystem_2.11</artifactId>
<version>1.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-compatibility_2.11</artifactId>
<version>1.7.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-s3-fs-hadoop</artifactId>
<version>1.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-shaded-hadoop</artifactId>
<version>1.7.1</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.8.5</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.8.5</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.8.5</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.11.529</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-connectors</artifactId>
<version>1.1.5</version>
<type>pom</type>
</dependency>
</dependencies>org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.(HadoopRecoverableWriter.java:57) org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202) at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69) at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets只支持HDFS和HadoopVersion2.7或更高版本( org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink$RowFormatBuilder.createBuckets(StreamingFileSink.java:242) at org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.initializeState(StreamingFileSink.java:327) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160) )在org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:278) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) at java.lang.Thread.run(Thread.java:748)
发布于 2019-10-09 18:02:05
Flink使用一个叫做ServiceLoader的东西来加载与可插入文件系统的接口所需的组件。如果您想查看Flink在代码中的位置,请转到org.apache.flink.core.fs.FileSystem。注意initialize函数,它使用RAW_FACTORIES变量。RAW_FACTORIES是由函数loadFileSystems创建的,您可以看到它使用了ServiceLoader。
在应用程序从Flink开始之前,需要设置文件系统组件。这意味着您的Flink应用程序不需要捆绑这些组件,它们应该为您的应用程序提供。
EMR不提供Flink需要使用S3作为流文件接收器的S3文件系统组件。抛出此异常并不是因为版本不够高,而是因为Flink在没有匹配HadoopFileSystem方案(参见这里的代码)的FileSystem的情况下加载了s3。
您可以通过为我的Flink应用程序启用调试日志级别来查看您的文件系统是否正在加载,EMR允许您在配置中这样做:
{
"Classification": "flink-log4j",
"Properties": {
"log4j.rootLogger": "DEBUG,file"
}
},{
"Classification": "flink-log4j-yarn-session",
"Properties": {
"log4j.rootLogger": "DEBUG,stdout"
}
}相关的日志在纱线资源管理器中可用,查看单个节点的日志。搜索字符串"Added file system"将帮助您找到所有成功加载的文件系统。
本研究还方便地将SSH发送到主节点并使用flink-scala REPL,在这里我可以看到FileSystem Flink在给定文件URI时决定加载什么。
解决方案是在启动Flink应用程序之前将S3文件系统实现的JAR放到/usr/lib/flink/lib/中。这可以通过获取flink-s3-fs-hadoop或flink-s3-fs-presto的引导操作来完成(取决于所使用的实现)。我的引导操作脚本如下所示:
sudo mkdir -p /usr/lib/flink/lib
cd /usr/lib/flink/lib
sudo curl -O https://search.maven.org/remotecontent?filepath=org/apache/flink/flink-s3-fs-hadoop/1.8.1/flink-s3-fs-hadoop-1.8.1.jar发布于 2019-04-08 06:34:11
为了使用Flink的StreamingFileSink和一次保证,您需要使用Hadoop >= 2.7。不支持2.7下面的版本。因此,请确保您正在EMR上运行最新的Hadoop版本。
https://stackoverflow.com/questions/55517566
复制相似问题