我试图在执行SQL
的MR上使用Hive
,但失败了一半,错误如下:
Application application_1570514228864_0001 failed 2 times due to AM Container for appattempt_1570514228864_0001_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2019-10-08 13:57:49.272]Failed to download resource { { s3a://tpcds/tmp/hadoop-yarn/staging/root/.staging/job_1570514228864_0001/libjars, 1570514262820, FILE, null },pending,[(container_1570514228864_0001_02_000001)],1132444167207544,DOWNLOADING} java.io.IOException: Resource s3a://tpcds/tmp/hadoop-yarn/staging/root/.staging/job_1570514228864_0001/libjars changed on src filesystem (expected 1570514262820, was 1570514269265
从我的角度来看,错误日志中的关键消息是libjars changed on src filesystem (expected 1570514262820, was 1570514269265
。在SO上有几个关于这个问题的帖子,但还没有得到回答,比如thread1和thread2。
我从apache jira和redhat bugzilla那里发现了一些有价值的东西。我通过所有相关的节点通过NTP
同步时钟。但同样的问题仍然存在。
欢迎任何评论,谢谢。
发布于 2019-10-16 11:40:50
我仍然不知道为什么资源文件的时间戳不一致,也没有办法以配置的方式修复它,AFAIK。
然而,我设法找到了一个解决方法来跳过这个问题。让我在这里为任何可能遇到同样问题的人总结一下。
通过检查错误日志并在Hadoop
源代码中进行搜索,我们可以在hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
中跟踪问题。
只需删除异常抛出语句,
private void verifyAndCopy(Path destination)
throws IOException, YarnException {
final Path sCopy;
try {
sCopy = resource.getResource().toPath();
} catch (URISyntaxException e) {
throw new IOException("Invalid resource", e);
}
FileSystem sourceFs = sCopy.getFileSystem(conf);
FileStatus sStat = sourceFs.getFileStatus(sCopy);
if (sStat.getModificationTime() != resource.getTimestamp()) {
/**
throw new IOException("Resource " + sCopy +
" changed on src filesystem (expected " + resource.getTimestamp() +
", was " + sStat.getModificationTime());
**/
LOG.debug("[Gearon][Info] The timestamp is not consistent among resource files.\n" +
"Stop throwing exception . It doesn't affect other modules. ");
}
if (resource.getVisibility() == LocalResourceVisibility.PUBLIC) {
if (!isPublic(sourceFs, sCopy, sStat, statCache)) {
throw new IOException("Resource " + sCopy +
" is not publicly accessible and as such cannot be part of the" +
" public cache.");
}
}
downloadAndUnpack(sCopy, destination);
}
构建‘hadoop-yarn-common-x.jarto
$HADOOP_HOME/share/hadoop/yarn`.和复制hadoop-yarn-project
请在这里留下这个帖子,并感谢您对如何在不更改hadoop
源代码的情况下修复它的任何进一步解释。
发布于 2021-02-25 06:15:38
我必须做同样的事情,这应该是可配置的,即使是很小的延迟也会导致执行失败,这可能会发生,如果有人更改hadoop文件系统以使用s3并运行MR程序,注意*请确保您使用的是与apache hadoop文档中提到的相同的jdk版本,否则您可能会遇到错误。
https://stackoverflow.com/questions/58300578
复制相似问题