[1133]flink问题集锦

周小董

发布于 2022-04-28 17:27:38

4.8K0

文章目录

问题1： Could not get job jar and dependencies from JAR file: JAR file does not exist: -yn

原因：flink1.8版本之后已弃用该参数，ResourceManager将自动启动所需的尽可能多的容器，以满足作业请求的并行性。解决方法：去掉即可

问题2： java.lang.IllegalStateException: No Executor found. Please make sure to export the HADOOP_CLASSPATH environment variable or have hadoop in your classpath.

方法1：配置环境变量

vi /etc/profile

#添加下面一行
export HADOOP_CLASSPATH=`hadoop classpath`

# 环境生效
source /etc/profile

方法2：下载对应版本 flink-shaded-hadoop-2-uber，放到flink的lib目录下

问题3： Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (default) on project book-stream: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)

产生这个问题的原因有很多，重要的是查看error报错的信息，我这边主要是scala中调用了java的方法，但build时只指定了打包scala的资源，所以会找不到类报错，下面是build出错的行，把它注释掉、删掉，不指定sourceDirectory，所有的sourceDirectory都会打包进去就可解决。

<sourceDirectory>src/main/scala</sourceDirectory>

问题4： org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Could not find a suitable table factory for ‘org.apache.flink.table.planner.delegation.ParserFactory’ in the classpath.

这个错误也是因为打包时候没有将依赖打包进去、或者需要将依赖放到flink的lib目录下

maven换成了如下的build 的pulgin

 <build>
        <plugins>
 
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <version>2.15.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
 
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.6.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
 
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.19</version>
                <configuration>
                    <skip>true</skip>
                </configuration>
            </plugin>
 
        </plugins>
    </build>

问题5： Multiple versions of scala libraries detected! Expected all dependencies to require Scala version: 2.11.12 org.apache.flink:flink-runtime_2.11:1.13.2 requires scala version: 2.11.12 org.apache.flink:flink-scala_2.11:1.13.2 requires scala version: 2.11.12 org.apache.flink:flink-scala_2.11:1.13.2 requires scala version: 2.11.12 org.scala-lang:scala-reflect:2.11.12 requires scala version: 2.11.12 org.apache.flink:flink-streaming-scala_2.11:1.13.2 requires scala version: 2.11.12 org.apache.flink:flink-streaming-scala_2.11:1.13.2 requires scala version: 2.11.12 org.scala-lang:scala-compiler:2.11.12 requires scala version: 2.11.12 org.scala-lang.modules:scala-xml_2.11:1.0.5 requires scala version: 2.11.7

这是由于scala-maven-plugin打包插件版本低的问题

Starting from Scala 2.10 all changes in bugfix/patch version should be backward compatible, so these warnings don’t really have the point in this case. But they are still very important in case when, let’s say, you somehow end up with scala 2.9 and 2.11 libraries. It happens that since version 3.1.6 you can fix this using scalaCompatVersion configuration

方法1：指定scalaCompatVersion一样的版本

 <configuration>
        <scalaCompatVersion>${scala.binary.version}</scalaCompatVersion>
        <scalaVersion>${scala.version}</scalaVersion> 
 </configuration>

下面是完整的

<plugin>    
    <groupId>net.alchim31.maven</groupId>    
    <artifactId>scala-maven-plugin</artifactId>    
    <version>3.1.6</version>    
    <configuration>        
        <scalaCompatVersion>${scala.binary.version}</scalaCompatVersion>                   <scalaVersion>${scala.binary.version}</scalaVersion>    
    </configuration>    
    <executions>        
        <execution>            
            <goals>                
                <goal>compile</goal>            
            </goals>        
        </execution>    
    </executions>
</plugin>

方法2：打包插件换成4.x的版本

<plugin>    
    <groupId>net.alchim31.maven</groupId>    
    <artifactId>scala-maven-plugin</artifactId>    
    <version>4.2.0</version>    
    <executions>        
        <execution>            
            <goals>                
                <goal>compile</goal>            
            </goals>        
        </execution>    
    </executions>
</plugin>

问题6： cannot be cast to com.google.protobuf.Message

Caused by: java.lang.ClassCastException: org.apache.hadoop.yarn.proto.YarnServiceProtos$RegisterApplicationMasterRequestProto cannot be cast to com.google.protobuf.Message
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
	at com.sun.proxy.$Proxy14.registerApplicationMaster(Unknown Source)
	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
	at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
	at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222)
	at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214)
	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
	at org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:205)
	at org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:234)
	... 11 common frames omitted

这种问题一般是由于自己工程的hadoop的jar包和flink集群的jar包冲突导致的，解决办法：排除自己工程中的hadoop相关的jar,打包的时候不要打进来.

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
            <scope>provided</scope>
        </dependency>

问题7： Flink应用提交到集群报错：org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies

产生上述问题是在Flink中操作了HDFS文件系统（比如checkpoint到HDFS）但是缺少配置导致的（缺少hadoop相关依赖）

解决方法: 1.环境变量加入以下配置（别忘了刷新环境变量，之后重启Flink或者刷新环境变量无效的可以重启）

HADOOP_HOME=xxx
export HADOOP_HOME
export HADOOP_CLASSPATH=`hadoop classpath`

2.如果第一个步骤确定没问题还是不行的话需要下载一个jar包放在Flink的lib目录下

flink-shaded-hadoop-2-uber-2.7.5-7.0下载地址： https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.7.5-7.0/flink-shaded-hadoop-2-uber-2.7.5-7.0.jar

flink-shaded-hadoop-2-uber下载地址：https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/

要按照HADOOP版本下载，我这里是2.9.2

放进去后要重启Flink集群部分系统要重启系统才行我这里就是放进去了不行重启后好了！！！

问题8：完整报错信息： Exception in thread “main” java.lang.NoSuchMethodError: org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder; at org.apache.flink.runtime.entrypoint.parser.CommandLineOptions.(CommandLineOptions.java:27) at org.apache.flink.runtime.entrypoint.DynamicParametersConfigurationParserFactory.options(DynamicParametersConfigurationParserFactory.java:43) at org.apache.flink.runtime.entrypoint.DynamicParametersConfigurationParserFactory.getOptions(DynamicParametersConfigurationParserFactory.java:50) at org.apache.flink.runtime.entrypoint.parser.CommandLineParser.parse(CommandLineParser.java:42) at org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils.parseParametersOrExit(ClusterEntrypointUtils.java:63) at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:89)

报错原因：

依赖中commons-cli版本过低导致运行时找不到新版本的方法

解决办法： 排除Hadoop中commons-cli依赖，并添加高版本

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.2.2</version>
            <exclusions>
                <exclusion>
                    <groupId>commons-cli</groupId>
                    <artifactId>commons-cli</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        
        <dependency>
            <groupId>commons-cli</groupId>
            <artifactId>commons-cli</artifactId>
            <version>1.3.1</version>
        </dependency>

问题9： Exception in thread “Thread-8” java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration ‘classloader.check-leaked-classloader’.

解决方法：在flink-conf.yaml中添加

classloader.check-leaked-classloader: false

问题10： Could not deploy Yarn job cluster

任务提交时，报错： Could not deploy Yarn job cluster

原因：我们往下看

原因：设置的内存超过了限制。

解决：修改内存大小设置

yarn.scheduler.maximum-allocation-mb yarn.nodemanager.resource.memory-mb

问题11： org.apache.flink.client.deployment.ClusterDeploymentException Couldn’t deploy Yarn

出现此类错误，主要的原因是Current usage: 75.1 MB of 1 GB physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing container.

字面原因是容器内存不够，实际上是flink on yarn启动时检查虚拟内存造成的

所以修改配置文件，让它不检查就没事了

修改etc/hadoop/yarn-site.xml

<property> 
    <name>yarn.nodemanager.vmem-check-enabled</name> 
    <value>false</value> 
</property>

问题12： org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032

尝试重新连接Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s)失败.

2021-03-19 07:43:15,103 WARN  org.apache.flink.runtime.util.HadoopUtils                     - Could not find Hadoop configuration via any of the supported methods (Flink configuration, environment variables).
2021-03-19 07:43:15,545 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-03-19 07:43:15,657 INFO  org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user set to ryxiong (auth:SIMPLE), credentials check status: true
2021-03-19 07:43:15,715 INFO  org.apache.flink.runtime.security.modules.JaasModule          - Jaas file will be created as /tmp/jaas-1195372589162118065.conf.
2021-03-19 07:43:15,734 WARN  org.apache.flink.yarn.cli.FlinkYarnSessionCli                 - The configuration directory ('/opt/module/flink-1.10.1/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2021-03-19 07:43:15,802 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032

2021-03-19 07:43:27,189 INFO  org.apache.hadoop.ipc.Client                                  - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-03-19 07:43:28,195 INFO  org.apache.hadoop.ipc.Client                                  - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-03-19 07:43:29,201 INFO  org.apache.hadoop.ipc.Client                                  - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

原因: 1.检查是否启动hadoop集群, 如果没有启动, 是无法连接到hadoop的yarn

2.flink运行于yarn上，flink要能找到hadoop配置，因为要连接到yarn的resourcemanager和hdfs。

如果正常启动还无法连接yarn, 可以查看一下hadoop的环境变量是否配置好

解决方案: 1.启动hadoop集群

2.配置hadoop的环境变量

# HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

问题13： HADOOP启动错误: LOG4J.PROPERTIES IS NOT FOUND…CORE-SITE.XML NOT FOUND的解决办法

描述：在禁用CDH集群的KERBEROS认证后，进行**服务功能验证,**查看HDFS文件系统时出现CORE-SITE.XML找不到错误

[root@utility ~]# hadoop fs -ls /
WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.
Exception in thread "main" java.lang.RuntimeException: core-site.xml not found
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2867)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2815)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2692)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1329)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1301)
        at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1642)
        at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:339)
        at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:569)
        at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:174)
        at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:156)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:389)

按照提示说core-site.xml找不到, 很疑惑；明明core-site.xml等配置文件存在，且配置没有任何问题。好在经过查阅资料找到了解决办法：原来是环境变量的问题，需要配置HADOOP_CONF_DIR路径。HADOOP_CONF_DIR 变量为自己的Hadoop目录（默认是个错误的路径所以会跳错）

vi /etc/profile
 
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

保存退出使环境变量立即生效。

source /etc/profile

再次运行命令，恢复正常。

如果修改之后仍报错，可以查看是否将HADOOP_CONF_DIR的路径配置在hadoop-env.sh中，若没有，添加保存即可解决。

注：出现此错误主要需要检查了etc/hadoop目录下的hadoop-env.sh，mapred-env.sh与yarn-env.sh下配置的HADOOP_CONF_DIR路径。

问题14： java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory

解决方法 第一种方法：导入commons-logging.jar包

第二种方法，如果用的是maven项目，则直接在pom.xml中加入commons-logging依赖包，如下：

<dependency>
	<groupId>commons-logging</groupId>
	<artifactId>commons-logging</artifactId>
	<version>1.2</version>
</dependency>

注：需在标签里的开头位置添加，若在其它位置添加，则可能还会存在该问题。

参考：

flink问题集锦：https://blog.csdn.net/Chenftli/article/details/123581749
flink开发过程中的常见问题：https://www.codeleading.com/article/8265926617/
flink on yarn的一则jar冲突问题，你遇到过没：https://cloud.tencent.com/developer/article/1863679
flink1.13启动失败：https://blog.csdn.net/syyyyyyyyyyyyyyh/article/details/118027965
org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/depend： https://blog.csdn.net/qq_39211575/article/details/104460319
flink1.12 提交Job 时 Exception in thread “main“ java.lang.NoSuchMethodError: org.apache.commons.cli.Opt：https://blog.csdn.net/M283592338/article/details/120768957
flink1.13.2运行错误问题：https://blog.csdn.net/jkllb123/article/details/120433753
任务提交时，报错： Could not deploy Yarn job cluster：https://blog.csdn.net/u011110301/article/details/119104942
flink on yarn模式出现The main method caused an error: Could not deploy Yarn job cluster问题排查+解决 https://www.codeleading.com/article/76613848513/
Flink集群启动报错 org.apache.flink.client.deployment.ClusterDeploymentException https://blog.csdn.net/A1585570507/article/details/114991690 https://blog.csdn.net/weixin_44393345/article/details/106517394
flink无法连接yarn- Connecting to ResourceManager at /0.0.0.0:8032： https://blog.csdn.net/Ryxiong728/article/details/115558066
HADOOP启动错误: LOG4J.PROPERTIES IS NOT FOUND…CORE-SITE.XML NOT FOUND的解决办法： https://www.freesion.com/article/5840820802/ https://blog.csdn.net/danielchan2518/article/details/81007308
用History Server实现Flink 作业宕机查看：https://blog.csdn.net/weixin_42073629/article/details/109213696
java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory解决方法：https://www.cnblogs.com/jun-zi/p/12079586.html https://blog.csdn.net/hymas/article/details/77963833
Hive连接+Kerberos认证各种报错及解决办法汇总 https://blog.csdn.net/Jason_yesly/article/details/110840130 https://blog.csdn.net/Jason_yesly/article/details/110845993

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2022-04-23，如有侵权请联系 cloudcommunity@tencent.com 删除

网络安全