如何使用火花-env.sh文件设置SMARK_LOCAL_DERS参数

内容来源于 Stack Overflow,并遵循CC BY-SA 3.0许可协议进行翻译与使用

  • 回答 (1)
  • 关注 (0)
  • 查看 (46)

我试图改变位置火花写入临时文件到。我在网上找到的所有内容都会通过设置SPARK_LOCAL_DIRS parameter in the spark-env.sh文件,但我对实际生效的更改没有任何进展。

以下是我所做的:

  1. 使用AmazonEC 2实例创建了一个2工测试集群。我用的是SMARK 2.2.0和Rsparklyr作为前端的包装。工作节点使用自动缩放组进行拆分。
  2. Created a directory to store temporary files in at /tmp/jaytest. There is one of these in each worker and one in the master.
  3. Puttied into the spark master machine and the two workers, navigated to home/ubuntu/spark-2.2.0-bin-hadoop2.7/conf/spark-env.sh, and modified the file to contain this line: SPARK_LOCAL_DIRS="/tmp/jaytest"

Permissions for each of the spark-env.sh files are -rwxr-xr-x, and for the jaytest folders are drwxrwxr-x.

As far as I can tell this is in line with all the advice I've read online. However, when I load some data into the cluster it still ends up in /tmp, rather than /tmp/jaytest.

I have also tried setting the spark.local.dir parameter to the same directory, but also no luck.

Can someone please advise on what I might be missing here?

Edit: I'm running this as a standalone cluster (as the answer below indicates that the correct parameter to set depends on the cluster type).

提问于
用户回答回答于

根据星星之火文档,很明显,如果您已经配置了Yarn集群管理器,那么它将被覆盖火花-env.sh设置。您能否只签入Yarn-env或纱线-站点文件中的本地dir文件夹设置。

"this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager." source - https://spark.apache.org/docs/2.3.1/configuration.html

扫码关注云+社区

领取腾讯云代金券