我正在尝试在airflow中安排一些r脚本,我在我的脚本中使用了rJava库。rJava和xlsx在R端子上工作正常,但在气流环境下不能正常工作。我得到了这个错误,
libjvm.so: cannot open shared object file: No such file or directory
在我的~/.bashrc
文件中,
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/bin/jar
export LD_LIBRARY_PATH=/usr/lib/jvm/java-8-openjdk-amd64/lib/amd64:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server
在我的~/.profile
文件中,
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/bin/jar
export HADOOP_HOME='/home/ubuntu/spark-2.2.0-bin-hadoop2.7/hadoop-2.7.4'
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server:$LD_LIBRARY_PATH
在我的/etc/environment
中
JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/bin/jar";
LD_LIBRARY_PATH="/usr/lib/jvm/java-8-openjdk-amd64/lib/amd64:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server";
另外,在导入rJava之前,我尝试在R脚本的顶部添加这一行。
system('export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/bin/jar')
system('export LD_LIBRARY_PATH=/usr/lib/jvm/java-8-openjdk-amd64/lib/amd64:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server')
即使这样,我仍然得到libjvm.so文件丢失的错误。但我可以在/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server
中看到该文件
当我在airflow中检查日志时,dag正在Temporary script location: /tmp/airflowtmp7Ws3X2//tmp/airflowtmp7Ws3X2/nz-property-report6vTyGr
中运行脚本
我认为它不是选择环境变量,而是得到这个错误,
Loading required package: xlsx
[2018-08-09 21:39:23,755] {base_task_runner.py:98} INFO - Subtask: [2018-08-09 21:39:23,755] {bash_operator.py:101} INFO - Error: package or namespace load failed for ‘xlsx’:
[2018-08-09 21:39:23,755] {base_task_runner.py:98} INFO - Subtask: [2018-08-09 21:39:23,755] {bash_operator.py:101} INFO - .onLoad failed in loadNamespace() for 'rJava', details:
[2018-08-09 21:39:23,755] {base_task_runner.py:98} INFO - Subtask: [2018-08-09 21:39:23,755] {bash_operator.py:101} INFO - call: dyn.load(file, DLLpath = DLLpath, ...)
[2018-08-09 21:39:23,755] {base_task_runner.py:98} INFO - Subtask: [2018-08-09 21:39:23,755] {bash_operator.py:101} INFO - error: unable to load shared object '/home/ubuntu/R/x86_64-pc-linux-gnu-library/3.4/rJava/libs/rJava.so':
[2018-08-09 21:39:23,756] {base_task_runner.py:98} INFO - Subtask: [2018-08-09 21:39:23,755] {bash_operator.py:101} INFO - libjvm.so: cannot open shared object file: No such file or directory
有人能帮我在airflow的R脚本中使用rJava吗?
编辑:按照要求,这是我的DAG脚本,
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
#from airflow.models import DAG
from datetime import datetime
dag = DAG(
dag_id='property_report',
schedule_interval=None,
)
task = BashOperator(
task_id='report',
dag=dag,
bash_command="Rscript /home/ubuntu/airflow/dags/scripts/r-scripts/recreate_lastmonthreport_from_snapshotdata.R",
start_date=airflow.utils.dates.days_ago(1),
owner='airflow')
发布于 2018-08-10 06:42:06
只是为了帮助那些寻找这个问题答案的人。我只需要在两个屏幕上分别运行web服务器和调度程序并重新启动它们,然后使用source ~/.bashrc
命令。它很好地提取了环境变量。
https://stackoverflow.com/questions/51763693
复制相似问题