首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >Dataproc:限制长期运行/流火花作业的日志大小

Dataproc:限制长期运行/流火花作业的日志大小
EN

Stack Overflow用户
提问于 2022-09-19 23:40:41
回答 1查看 155关注 0票数 2

我在GCP Dataproc上有一个Spark结构化的流媒体工作--它从Kafka获取数据,处理数据并将数据推回kafka主题。

几个问题:

  1. 星火是否把所有的原木(包括。信息,警告等)进入标准?我注意到的是stdout是空的,而所有的日志都放在stderr中。
  2. 是否有办法使stderr中的数据过期(即过期旧日志)?因为我有一个长时间运行的流作业,所以随着时间的推移,stderr会被填满,节点/VM变得不可用。

请指教。

以下是纱线日志命令的输出:

代码语言:javascript
运行
复制
root@versa-structured-stream-v1-w-1:/home/karanalang# yarn logs -applicationId application_1663623368960_0008 -log_files stderr -size -500
2022-09-19 23:25:34,876 INFO client.RMProxy: Connecting to ResourceManager at versa-structured-stream-v1-m/10.142.0.62:8032
2022-09-19 23:25:35,144 INFO client.AHSProxy: Connecting to Application History server at versa-structured-stream-v1-m/10.142.0.62:10200
Can not find any log file matching the pattern: [stderr] for the container: container_e01_1663623368960_0008_01_000003 within the application: application_1663623368960_0008
Container: container_e01_1663623368960_0008_01_000002 on versa-structured-stream-v1-w-2.c.versa-sml-googl.internal:8026
LogAggregationType: LOCAL
=======================================================================================================================
LogType:stderr
LogLastModifiedTime:Mon Sep 19 23:25:35 +0000 2022
LogLength:43251469683
LogContents:
 applianceName=usa-isn0784-rt01, tenantName=NOV, mstatsTimeBlock=1663507200, tenantId=2, vsnId=0, mstatsTotSentOctets=11596, mstatsTotRecvdOctets=24481, mstatsTotSessDuration=300000, mstatsTotSessCount=1, mstatsType=sdwan-acc-ckt-app-stats, appId=https, site=usa-isn0784-rt01, accCkt=WAN-DIA, siteId=442, accCktId=1, user=10.126.117.196, risk=3, productivity=3, family=general-internet, subFamily=web, bzTag=Unknown,topic=syslog.ueba-us4.v1.versa.demo3,customer=versa  type(row) is ->  <class 'str'>
End of LogType:stderr.This log file belongs to a running container (container_e01_1663623368960_0008_01_000002) and so may not be complete.
***********************************************************************


Container: container_e01_1663623368960_0008_01_000001 on versa-structured-stream-v1-w-1.c.versa-sml-googl.internal:8026
LogAggregationType: LOCAL
=======================================================================================================================
LogType:stderr
LogLastModifiedTime:Mon Sep 19 22:54:55 +0000 2022
LogLength:17367929
LogContents:
on syslog.ueba-us4.v1.versa.demo3-2
22/09/19 22:52:52 INFO org.apache.kafka.clients.consumer.internals.SubscriptionState: [Consumer clientId=consumer-spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor-1, groupId=spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor] Resetting offset for partition syslog.ueba-us4.v1.versa.demo3-2 to offset 449568676.
22/09/19 22:54:55 ERROR org.apache.spark.executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
End of LogType:stderr.
***********************************************************************


root@versa-structured-stream-v1-w-1:/home/karanalang# yarn logs -applicationId application_1663623368960_0008 -log_files stderr -size -500
2022-09-19 23:26:01,439 INFO client.RMProxy: Connecting to ResourceManager at versa-structured-stream-v1-m/10.142.0.62:8032
2022-09-19 23:26:01,696 INFO client.AHSProxy: Connecting to Application History server at versa-structured-stream-v1-m/10.142.0.62:10200
Can not find any log file matching the pattern: [stderr] for the container: container_e01_1663623368960_0008_01_000003 within the application: application_1663623368960_0008
Container: container_e01_1663623368960_0008_01_000002 on versa-structured-stream-v1-w-2.c.versa-sml-googl.internal:8026
LogAggregationType: LOCAL
=======================================================================================================================
LogType:stderr
LogLastModifiedTime:Mon Sep 19 23:26:02 +0000 2022
LogLength:44309782124
LogContents:
, tenantId=3, vsnId=0, mstatsTotSentOctets=48210, mstatsTotRecvdOctets=242351, mstatsTotSessDuration=300000, mstatsTotSessCount=34, mstatsType=dest-stats, destIp=165.225.216.24, mstatsAttribs=,topic=syslog.ueba-us4.v1.versa.demo3,customer=versa  type(row) is ->  <class 'str'>
22/09/19 23:26:02 WARN org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer: KafkaDataConsumer is not running in UninterruptibleThread. It may hang when KafkaDataConsumer's methods are interrupted because of KAFKA-1894
End of LogType:stderr.This log file belongs to a running container (container_e01_1663623368960_0008_01_000002) and so may not be complete.
***********************************************************************


Container: container_e01_1663623368960_0008_01_000001 on versa-structured-stream-v1-w-1.c.versa-sml-googl.internal:8026
LogAggregationType: LOCAL
=======================================================================================================================
LogType:stderr
LogLastModifiedTime:Mon Sep 19 22:54:55 +0000 2022
LogLength:17367929
LogContents:
on syslog.ueba-us4.v1.versa.demo3-2
22/09/19 22:52:52 INFO org.apache.kafka.clients.consumer.internals.SubscriptionState: [Consumer clientId=consumer-spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor-1, groupId=spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor] Resetting offset for partition syslog.ueba-us4.v1.versa.demo3-2 to offset 449568676.
22/09/19 22:54:55 ERROR org.apache.spark.executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
End of LogType:stderr.

更新:基于@Dagang的笔记,我在log4j.properties中使用了log4j.properties。而新的日志文件正在被创建。然而-一些数据仍然进入性病的错误。

以下是更新的代码:

代码语言:javascript
运行
复制
spark-submit

gcloud dataproc jobs submit pyspark process-appstat.py \
  --cluster $CLUSTER  \
  --properties ^#^spark.jars.packages=org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2,org.mongodb.spark:mongo-spark-connector_2.12:3.0.2#spark.dynamicAllocation.enabled=true#spark.dynamicAllocation.executorIdleTimeout=120s#spark.shuffle.service.enabled=true#spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j-executor.properties#spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j-driver.properties\
  --jars=gs://dataproc-spark-jars/spark-avro_2.12-3.1.3.jar,gs://dataproc-spark-jars/isolation-forest_2.4.3_2.12-2.0.8.jar,gs://dataproc-spark-jars/spark-bigquery-with-dependencies_2.12-0.23.2.jar,gs://dataproc-spark-jars/mongo-spark-connector_2.12-3.0.2.jar,gs://dataproc-spark-jars/bson-4.0.5.jar,gs://dataproc-spark-jars/mongodb-driver-sync-4.0.5.jar,gs://dataproc-spark-jars/mongodb-driver-core-4.0.5.jar \
  --files=gs://kafka-certs/versa-kafka-gke-ca.p12,gs://kafka-certs/syslog-vani-noacl.p12,gs://kafka-certs/alarm-compression-user.p12,gs://kafka-certs/alarm-compression-user-test.p12,gs://kafka-certs/appstats-user.p12,gs://kafka-certs/appstats-user-test.p12,gs://kafka-certs/insights-user.p12,gs://kafka-certs/insights-user-test.p12,gs://kafka-certs/intfutil-user.p12,gs://kafka-certs/intfutil-user-test.p12,gs://dataproc-spark-configs/metrics.properties,gs://dataproc-spark-configs/params.cfg,gs://kafka-certs/appstat-anomaly-user.p12,gs://kafka-certs/appstat-anomaly-user-test.p12,gs://kafka-certs/appstat-agg-user.p12,gs://kafka-certs/appstat-agg-user-test.p12,gs://kafka-certs/alarmblock-user.p12,gs://kafka-certs/alarmblock-user-test.p12,gs://kafka-certs/versa-alarmblock-test-user.p12,gs://kafka-certs/versa-bandwidth-test-user.p12,gs://kafka-certs/versa-appstat-test-user.p12,gs://kafka-certs/versa-alarmblock-user.p12,gs://kafka-certs/versa-bandwidth-user.p12,gs://kafka-certs/versa-appstat-user.p12,gs://dataproc-spark-configs/log4j-executor.properties,gs://dataproc-spark-configs/log4j-driver.properties  \
  --region $REGION \
  --py-files streams.zip,utils.zip \
  -- isdebug=$isdebug


log4j-executor.properties:
--------------------------

# Set everything to be logged to the console
# log4j.rootCategory=INFO, console
# log4j.appender.console=org.apache.log4j.ConsoleAppender
# log4j.appender.console.target=System.err
# log4j.appender.console.layout=org.apache.log4j.PatternLayout
# log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n

# logging to rolling_file, using RolligFileAppender
log4j.rootLogger=INFO, rolling_file

log4j.appender.rolling_file=org.apache.log4j.RollingFileAppender
log4j.appender.rolling_file.File=${spark.yarn.app.container.log.dir}/versa-ss-executor.log
log4j.appender.rolling_file.MaxFileSize=100MB
log4j.appender.rolling_file.MaxBackupIndex=10
log4j.appender.rolling_file.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling_file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.eclipse.jetty=WARN

# Allow INFO logging from Spark Env for EFM
log4j.logger.org.apache.spark.SparkEnv=INFO

# Spark 3.x
log4j.logger.org.sparkproject.jetty.server.handler.ContextHandler=WARN

# Spark 2.x
log4j.logger.org.spark_project.jetty.server.handler.ContextHandler=WARN

# Reduce verbosity for other spammy core classes
log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN
log4j.logger.org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter=WARN
log4j.logger.org.apache.spark.ExecutorAllocationManager=ERROR
log4j.logger.org.apache.spark=WARN


log4j-driver.properties:
-------------------------

log4j.rootLogger=INFO, rolling_file

log4j.appender.rolling_file=org.apache.log4j.RollingFileAppender
log4j.appender.rolling_file.File=${spark.yarn.app.container.log.dir}/versa-ss-driver.log
log4j.appender.rolling_file.MaxFileSize=100MB
log4j.appender.rolling_file.MaxBackupIndex=10
log4j.appender.rolling_file.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling_file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n



# Settings to quiet third party logs that are too verbose
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.eclipse.jetty=WARN

# Allow INFO logging from Spark Env for EFM
log4j.logger.org.apache.spark.SparkEnv=INFO

# Spark 3.x
log4j.logger.org.sparkproject.jetty.server.handler.ContextHandler=WARN

# Spark 2.x
log4j.logger.org.spark_project.jetty.server.handler.ContextHandler=WARN

# Reduce verbosity for other spammy core classes
log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN
log4j.logger.org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter=WARN
log4j.logger.org.apache.spark.ExecutorAllocationManager=ERROR
log4j.logger.org.apache.spark=WARN

对于这需要做些什么,有什么想法吗?

关于-> ${spark.yarn.app.container.log.dir}的问题,它被翻译到什么位置?

当我登录worker节点并检查它时,我得到以下信息:

代码语言:javascript
运行
复制
karanalang@versa-structured-stream-v1-w-0:~$ echo $spark.yarn.app.container.log.dir
.yarn.app.container.log.dir


In yarn-site.xml:

Here are the relevant configs:

<property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/hadoop/yarn/nm-local-dir</value>
    <description>
      Directories on the local machine in which to application temp files.
    </description>
  </property>
  <property>
<property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>gs://dataproc-temp-us-east1-939354532596-4ln8c3y1/fe57047f-13d9-4b9b-8bce-baa4a911aa65/yarn-logs</value>
    <description>
      The remote path, on the default FS, to store logs.
    </description>
  </property>  
 

However the logs are in the location below:

root@versa-structured-stream-v1-w-0:/# find . -name versa-ss-executor.log
./var/log/hadoop-yarn/userlogs/application_1664926662510_0002/container_1664926662510_0002_01_000001/versa-ss-executor.log
./var/log/hadoop-yarn/userlogs/application_1664926662510_0003/container_1664926662510_0003_01_000179/versa-ss-executor.log
./var/log/hadoop-yarn/userlogs/application_1664926662510_0003/container_1664926662510_0003_01_000250/versa-ss-executor.log
./var/log/hadoop-yarn/userlogs/application_1664926662510_0003/container_1664926662510_0003_01_000299/versa-ss-executor.log

位置-/var/log/hadoop-纱线/userlogs摘自何处(它不在Sear-site.sml中)?

EN

回答 1

Stack Overflow用户

发布于 2022-09-20 04:54:39

短答案

可以使用自定义log4j配置和RollingFileAppender来限制长期运行的作业的日志大小。

长答案:

Dataproc上星火的默认log4j配置位于/etc/spark/conf/log4j.properties。它在信息级别将根记录器配置为stderr。但是在运行时,Dataproc代理将驱动程序日志(在客户端模式下)定向到GCS并流回客户端,执行器日志(以及集群模式中的驱动程序日志)将被纱线重定向到容器的纱线日志dir中的stderr文件。考虑使用/etc/spark/conf/log4j.properties作为自定义配置的模板。

在自定义配置中,可以配置要写入RollingFileAppender的日志,例如,

代码语言:javascript
运行
复制
log4j.rootLogger=INFO, rolling_file

log4j.appender.rolling_file=org.apache.log4j.RollingFileAppender
log4j.appender.rolling_file.File=${spark.yarn.app.container.log.dir}/my_app.log
log4j.appender.rolling_file.MaxFileSize=100MB
log4j.appender.rolling_file.MaxBackupIndex=10
...

注意,对于执行器(以及集群模式下的驱动程序),log4j.appender.rolling_file.File的值需要是${spark.yarn.app.container.log.dir}下的路径,请参见此问题和此文档

将您的log4j配置上传到GCS桶中,驱动程序和执行器可能共享也可能不共享相同的配置。在您的示例中,您可能希望只更新executor log4j配置,只需使用默认的驱动程序即可。

然后使用以下方式之一提交带有自定义log4j配置的作业:

  1. 文件名必须是log4j.properties,驱动程序和执行器将共享相同的配置:
代码语言:javascript
运行
复制
gcloud dataproc jobs submit spark ... \
  --files gs://my-bucket/log4j.properties
  1. 文件名不必是log4j.properties,驱动程序和执行器可以有不同的配置:
代码语言:javascript
运行
复制
gcloud dataproc jobs submit spark ... \
  --files gs://my-bucket/my-log4j.properties \
  --properties 'spark.executor.extraJavaOptions=-Dlog4j.configuration=file:my-log4j.properties'

预期将有滚动日志在纱线容器日志and下(通过yarn.nodemanager.log-dirs配置,在Dataproc上具有默认值/var/log/hadoop-yarn/userlogs ),这些日志将自动聚合并存储在GCS和Cloud中。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73780259

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档