前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >yarn任务信息的几种获取方式

yarn任务信息的几种获取方式

作者头像
陈猿解码
发布2023-02-28 14:59:51
1.8K0
发布2023-02-28 14:59:51
举报
文章被收录于专栏:陈猿解码

在问题定位、日常巡检、特定开发任务中,都会涉及查看yarn任务的相关信息,包括applicaiton的ID、类型、名称、起始时间、app所包含的container、以及每个container的日志文件内容等信息。本文就来聊聊查询查看这些信息的几种方式。

【通过RM的WEB查看】


这个是最简单的方式,直接在浏览器中打开RM的web网页就可以查看所有的App信息以及对应的日志了。

1. 查看application

进入RM的首页,就展示了所有的application信息,当然还可以根据需要点击查看不同状态的application。

2. 查看application的详细信息

在上图中,点击对应的app,就能看到app的详细信息了,例如app提交的用户、app的名称、类型、优先级、当前状态,提交时间,启动时间、结束时间等等。

3. 查看AM的日志

点击attempt的Logs(上图中红框标出的部分)就会跳到am的日志界面,这里会对am的日志以及日志的部分内容展示出来。部分日志因为内容太多,仅展示了部分内容,想要查看完整内容时,可以点击here来查看。

另外,对于运行中的AM,点击Logs时,不会展示文件的具体内容,取而代之的是该am的所有日志文件。

如果想要查看文件的具体内容,点击对应的文件即可。

小结:从RM的web上查看任务的日志,是几种方式里面最简单直接的,通常在问题定位时,快速查看任务的ID及状态等。但稍有不足的是,只能查看am的日志,任务container的日志则没有办法查看。

【通过命令行查询】


使用yarn自带的命令行可以查询到所有的application、application对应的attempt、每个attempt申请创建的container、各个container的日志文件列表、以及日志文件的详细内容。

1. 查看所有的application

通过下面的命令可以查询指定的applicaiton集合:

代码语言:javascript
复制
yarn applicaion -list
# 可选参数
# -appStates: 配合-list使用,列出指定状态的application,有效状态包括ALL/NEW/NEW_SAVING/SUBMITTED/ACCEPTED/RUNNING/FINISHED/FAILED/KILLED
# -appTypes: 配合-list使用,列出指定类型的application,常用类型为MAPREDUCE/SPARK/FLINK
# -appTags: 配合-list使用,列出指定标签的application

例如:

代码语言:javascript
复制
[root@hdp-hadoop-hdp-resourcemanager-0 ~]# yarn application -list -appStates ALL
Total number of applications (application-types: [], states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED] and tags: []):4
                Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1653966005529_0003         Hive on Spark SPARK hadoop default ACCEPTED UNDEFINED 0% N/A
application_1653961942912_0001  hadoop-mapreduce-client-jobclient-2.10.1-tests.jar MAPREDUCE hadoop default FINISHED SUCCEEDED 100% http://hdp-hadoop-hdp-history-0.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:19888/jobhistory/job/job_1653961942912_0001
application_1653966005529_0002         Hive on Spark SPARK hadoop default ACCEPTED UNDEFINED 0% N/A
application_1653966005529_0001     GenTable+all_5120               MAPREDUCE hadoop default RUNNING UNDEFINED 61.21% http://hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:33937

2. 查看application的attempts

通过下面的命令可以查询指定application的attempt集合(任务可能重试了多次)

代码语言:javascript
复制
yarn applicationattempt -list <ApplicationID>
# ApplicationID 为待查询的app的ID

示例:

代码语言:javascript
复制
[root@hdp-hadoop-hdp-resourcemanager-0 ~]# yarn applicationattempt -list application_1653966005529_0021
Total number of application attempts :1
         ApplicationAttempt-Id State AM-Container-Id Tracking-URL
appattempt_1653966005529_0021_000001                 RUNNING container_e613_1653966005529_0021_01_000001     http://hdp-hadoop-hdp-resourcemanager-1.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8088/proxy/application_1653966005529_0021/

3. 查看指定attempt所申请创建的containers

具体命令为:

代码语言:javascript
复制
yarn container -list <ApplicationAttemptID>

示例:

代码语言:javascript
复制
[root@hdp-hadoop-hdp-resourcemanager-0 ~]# yarn container -list appattempt_1653966005529_0021_000001
22/06/01 11:20:16 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
Total number of containers :9
                  Container-Id Start Time Finish Time State Host Node Http Address LOG-URL
container_e613_1653966005529_0021_01_000005 Wed Jun 01 10:44:41 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-2.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-2.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-2.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000005/hadoop
container_e613_1653966005529_0021_01_000006 Wed Jun 01 10:44:41 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-7.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-7.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-7.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000006/hadoop
container_e613_1653966005529_0021_01_000007 Wed Jun 01 10:44:41 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-8.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-8.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-8.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000007/hadoop
container_e613_1653966005529_0021_01_000008 Wed Jun 01 10:44:41 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-3.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-3.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-3.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000008/hadoop
container_e613_1653966005529_0021_01_000001 Wed Jun 01 10:44:35 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000001/hadoop
container_e613_1653966005529_0021_01_000002 Wed Jun 01 10:44:40 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-0.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-0.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-0.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000002/hadoop
container_e613_1653966005529_0021_01_000003 Wed Jun 01 10:44:40 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000003/hadoop
container_e613_1653966005529_0021_01_000004 Wed Jun 01 10:44:41 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-1.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-1.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-1.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000004/hadoop
container_e613_1653966005529_0021_01_000009 Wed Jun 01 10:44:41 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-4.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-4.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-4.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000009/hadoop

4. 查看container的日志列表

具体命令为:

代码语言:javascript
复制
yarn logs -containerId <ContainerID> -show_container_log_info
# ContainerID为任意container的ID, 即可以是AM,也可以是任务的container

简单示例:

代码语言:javascript
复制
[root@hdp-hadoop-hdp-resourcemanager-0 ~]# yarn logs -containerId container_e613_1653966005529_0021_01_000001 -show_container_log_info
22/06/01 11:23:27 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
Container: container_e613_1653966005529_0021_01_000001 on hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100
==================================================================================================================================================
                       LogFile LogLength LastModificationTime LogAggregationType
====================================================================================================================================================================================================================================================================================================
                 prelaunch.out 70 Wed Jun 01 10:44:36 +0800 2022 LOCAL
                 prelaunch.err 0 Wed Jun 01 10:44:35 +0800 2022 LOCAL
                        stdout 0 Wed Jun 01 10:44:36 +0800 2022 LOCAL
                        stderr 1462151 Wed Jun 01 11:23:27 +0800 2022 LOCAL

5. 查看container的某个日志文件的具体内容

具体命令为:

代码语言:javascript
复制
yarn logs -containerId <ContainerID> -logFiles <LogFileName>
# LogFileName 就是上面一条命令所罗列出来的其中一个文件

简单示例:

代码语言:javascript
复制
[root@hdp-hadoop-hdp-resourcemanager-0 ~]# yarn logs -containerId container_e613_1653966005529_0021_01_000001 -logFiles prelaunch.out
22/06/01 11:26:15 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
Container: container_e613_1653966005529_0021_01_000001 on hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100
LogAggregationType: LOCAL
==================================================================================================================================================
LogType:prelaunch.out
LogLastModifiedTime:Wed Jun 01 10:44:36 +0800 2022
LogLength:70
LogContents:
Setting up env variables
Setting up job resources
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_e613_1653966005529_0021_01_000001) and so may not be complete.
******************************************************************************

小结一下:通过命令行可以查看所有app的信息,以及每个app中container的信息,以及每个container的日志详情。但也有不足的是:命令行强依赖于hadoop的环境,只能在hadoop相关的节点上才能执行这些命令。

【通过Rest查看】


通过RM的web查看app信息,以及container的日志,本质上也是发送http请求,但响应结果是html网页。

实际上还可以通过rest请求来查看,其URL和http请求稍有不同。结果也可以根据需要返回xml形式或json格式。

1. 获取Application的信息

这一步主要是为了获取application的attempt所在的NM节点,因为后续的请求都是直接向NM发起的。

请求的URL为:

代码语言:javascript
复制
http://$RMAddr/ws/v1/cluster/apps
# 还可以在apps后加applicationID,获取指定某一个application的信息
http://$RMAddr/ws/v1/cluster/apps/$ApplicationID

例如:

代码语言:javascript
复制
curl -X GET -H "Accept:application/json" "http://172.168.3.31:8088/ws/v1/cluster/apps"

{
  "apps": {
    "app": [{
      "id": "application_1653899571088_0001",
      "user": "root",
      "name": "hadoop-mapreduce-client-jobclient-2.10.1-tests.jar",
      "queue": "default",
      "state": "FINISHED",
      "finalStatus": "SUCCEEDED",
      "progress": 100.0,
      "trackingUI": "History",
      "trackingUrl": "http://172.168.3.31:8088/proxy/application_1653899571088_0001/",
      "diagnostics": "",
      "clusterId": 1653899571088,
      "applicationType": "MAPREDUCE",
      "applicationTags": "",
      "priority": 0,
      "startedTime": 1653900968047,
      "launchTime": 1653900969192,
      "finishedTime": 1653901001551,
      "elapsedTime": 33504,
      "amContainerLogs": "http://172.168.3.53:8042/node/containerlogs/container_1653899571088_0001_01_000001/root",
      "amHostHttpAddress": "172.168.3.53:8042",
      "amRPCAddress": "172.168.3.53:38418",
      "allocatedMB": -1,
      "allocatedVCores": -1,
      "reservedMB": -1,
      "reservedVCores": -1,
      "runningContainers": -1,
      "memorySeconds": 93806,
      "vcoreSeconds": 50,
      "queueUsagePercentage": 0.0,
      "clusterUsagePercentage": 0.0,
      "resourceSecondsMap": {
        "entry": {
          "key": "memory-mb",
          "value": "93806"
        },
        "entry": {
          "key": "yarn.io/gpu",
          "value": "0"
        },
        "entry": {
          "key": "vcores",
          "value": "50"
        }
      },
      "preemptedResourceMB": 0,
      "preemptedResourceVCores": 0,
      "numNonAMContainerPreempted": 0,
      "numAMContainerPreempted": 0,
      "preemptedMemorySeconds": 0,
      "preemptedVcoreSeconds": 0,
      "preemptedResourceSecondsMap": null,
      "logAggregationStatus": "SUCCEEDED",
      "unmanagedApplication": false,
      "amNodeLabelExpression": "",
      "timeouts": {
        "timeout": [{
          "type": "LIFETIME",
          "expiryTime": "UNLIMITED",
          "remainingTimeInSeconds": -1
        }]
      }
    }]
  }

2. 获取Application的container信息

请求URL为:

代码语言:javascript
复制
http://$NMAddr/ws/v1/node/apps/$ApplicationID
# $NMAddr 为NodeManager的web服务的监听IP:PORT
# $ApplicationID 为指定的Application的ID

示例:

代码语言:javascript
复制
curl -X GET -H "Accept:application/json" "http://172.16.40.206:8042/ws/v1/node/apps/application_1653966005529_0025"
{
  "app": {
    "id": "application_1653966005529_0025",
    "state": "RUNNING",
    "user": "hadoop",
    "containerids": [
            "container_e613_1653966005529_0025_01_000002",
            "container_e613_1653966005529_0025_01_000001"]
  }
}

3. 获取container的详细信息

这里主要是为了获取container的日志文件列表

简单示例:

代码语言:javascript
复制
curl -X GET -H "Accept:application/json" "http://172.16.40.206:8042/ws/v1/node/containers/container_e613_1653966005529_0025_01_000001"

{
  "container": {
    "id": "container_e613_1653966005529_0025_01_000001",
    "state": "RUNNING",
    "exitCode": -1000,
    "diagnostics": "",
    "user": "hadoop",
    "totalMemoryNeededMB": 3072,
    "totalVCoresNeeded": 1,
    "executionType": "GUARANTEED",
    "containerLogsLink": "http://172.16.40.206:8042/node/containerlogs/container_e613_1653966005529_0025_01_000001/hadoop",
    "nodeId": "hdp-hadoop-hdp-nodemanager-8.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100",
    "containerLogFiles": [
            "prelaunch.out", 
            "prelaunch.err", 
            "stdout", 
            "stderr"
        ]
  }
}

4. 获取日志的具体内容

有了container的日志文件列表后,就可以请求日志的具体内容了。

例如:

代码语言:javascript
复制
[root@hdp-hadoop-hdp-resourcemanager-0 ~]# curl -X GET "http://172.16.40.206:8042/ws/v1/node/containers/container_e613_1653966005529_0025_01_000001/logs/prelaunch.out"

Container: container_e613_1653966005529_0025_01_000001 on 172.16.40.206:9100
LogAggregationType: LOCAL
==================================================================================================================================================
LogType:prelaunch.out
LogLastModifiedTime:Wed Jun 01 13:43:26 +0800 2022
LogLength:70
LogContents:
Setting up env variables
Setting up job resources
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_e613_1653966005529_0025_01_000001) and so may not be complete.
******************************************************************************

【总结】


总结一下:本文介绍了获取yarn任务信息(app信息、container信息、container日志)的几种方式。不同场景下,会选择不同的方式来查看任务信息。

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2022-06-02,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 陈猿解码 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档