在问题定位、日常巡检、特定开发任务中,都会涉及查看yarn任务的相关信息,包括applicaiton的ID、类型、名称、起始时间、app所包含的container、以及每个container的日志文件内容等信息。本文就来聊聊查询查看这些信息的几种方式。
【通过RM的WEB查看】
这个是最简单的方式,直接在浏览器中打开RM的web网页就可以查看所有的App信息以及对应的日志了。
1. 查看application
进入RM的首页,就展示了所有的application信息,当然还可以根据需要点击查看不同状态的application。
2. 查看application的详细信息
在上图中,点击对应的app,就能看到app的详细信息了,例如app提交的用户、app的名称、类型、优先级、当前状态,提交时间,启动时间、结束时间等等。
3. 查看AM的日志
点击attempt的Logs(上图中红框标出的部分)就会跳到am的日志界面,这里会对am的日志以及日志的部分内容展示出来。部分日志因为内容太多,仅展示了部分内容,想要查看完整内容时,可以点击here来查看。
另外,对于运行中的AM,点击Logs时,不会展示文件的具体内容,取而代之的是该am的所有日志文件。
如果想要查看文件的具体内容,点击对应的文件即可。
小结:从RM的web上查看任务的日志,是几种方式里面最简单直接的,通常在问题定位时,快速查看任务的ID及状态等。但稍有不足的是,只能查看am的日志,任务container的日志则没有办法查看。
【通过命令行查询】
使用yarn自带的命令行可以查询到所有的application、application对应的attempt、每个attempt申请创建的container、各个container的日志文件列表、以及日志文件的详细内容。
1. 查看所有的application
通过下面的命令可以查询指定的applicaiton集合:
yarn applicaion -list
# 可选参数
# -appStates: 配合-list使用,列出指定状态的application,有效状态包括ALL/NEW/NEW_SAVING/SUBMITTED/ACCEPTED/RUNNING/FINISHED/FAILED/KILLED
# -appTypes: 配合-list使用,列出指定类型的application,常用类型为MAPREDUCE/SPARK/FLINK
# -appTags: 配合-list使用,列出指定标签的application
例如:
[root@hdp-hadoop-hdp-resourcemanager-0 ~]# yarn application -list -appStates ALL
Total number of applications (application-types: [], states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED] and tags: []):4
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1653966005529_0003 Hive on Spark SPARK hadoop default ACCEPTED UNDEFINED 0% N/A
application_1653961942912_0001 hadoop-mapreduce-client-jobclient-2.10.1-tests.jar MAPREDUCE hadoop default FINISHED SUCCEEDED 100% http://hdp-hadoop-hdp-history-0.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:19888/jobhistory/job/job_1653961942912_0001
application_1653966005529_0002 Hive on Spark SPARK hadoop default ACCEPTED UNDEFINED 0% N/A
application_1653966005529_0001 GenTable+all_5120 MAPREDUCE hadoop default RUNNING UNDEFINED 61.21% http://hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:33937
2. 查看application的attempts
通过下面的命令可以查询指定application的attempt集合(任务可能重试了多次)
yarn applicationattempt -list <ApplicationID>
# ApplicationID 为待查询的app的ID
示例:
[root@hdp-hadoop-hdp-resourcemanager-0 ~]# yarn applicationattempt -list application_1653966005529_0021
Total number of application attempts :1
ApplicationAttempt-Id State AM-Container-Id Tracking-URL
appattempt_1653966005529_0021_000001 RUNNING container_e613_1653966005529_0021_01_000001 http://hdp-hadoop-hdp-resourcemanager-1.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8088/proxy/application_1653966005529_0021/
3. 查看指定attempt所申请创建的containers
具体命令为:
yarn container -list <ApplicationAttemptID>
示例:
[root@hdp-hadoop-hdp-resourcemanager-0 ~]# yarn container -list appattempt_1653966005529_0021_000001
22/06/01 11:20:16 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
Total number of containers :9
Container-Id Start Time Finish Time State Host Node Http Address LOG-URL
container_e613_1653966005529_0021_01_000005 Wed Jun 01 10:44:41 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-2.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-2.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-2.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000005/hadoop
container_e613_1653966005529_0021_01_000006 Wed Jun 01 10:44:41 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-7.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-7.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-7.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000006/hadoop
container_e613_1653966005529_0021_01_000007 Wed Jun 01 10:44:41 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-8.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-8.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-8.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000007/hadoop
container_e613_1653966005529_0021_01_000008 Wed Jun 01 10:44:41 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-3.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-3.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-3.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000008/hadoop
container_e613_1653966005529_0021_01_000001 Wed Jun 01 10:44:35 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000001/hadoop
container_e613_1653966005529_0021_01_000002 Wed Jun 01 10:44:40 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-0.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-0.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-0.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000002/hadoop
container_e613_1653966005529_0021_01_000003 Wed Jun 01 10:44:40 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000003/hadoop
container_e613_1653966005529_0021_01_000004 Wed Jun 01 10:44:41 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-1.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-1.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-1.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000004/hadoop
container_e613_1653966005529_0021_01_000009 Wed Jun 01 10:44:41 +0800 2022 N/A RUNNING hdp-hadoop-hdp-nodemanager-4.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100 http://hdp-hadoop-hdp-nodemanager-4.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042 http://hdp-hadoop-hdp-nodemanager-4.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:8042/node/containerlogs/container_e613_1653966005529_0021_01_000009/hadoop
4. 查看container的日志列表
具体命令为:
yarn logs -containerId <ContainerID> -show_container_log_info
# ContainerID为任意container的ID, 即可以是AM,也可以是任务的container
简单示例:
[root@hdp-hadoop-hdp-resourcemanager-0 ~]# yarn logs -containerId container_e613_1653966005529_0021_01_000001 -show_container_log_info
22/06/01 11:23:27 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
Container: container_e613_1653966005529_0021_01_000001 on hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100
==================================================================================================================================================
LogFile LogLength LastModificationTime LogAggregationType
====================================================================================================================================================================================================================================================================================================
prelaunch.out 70 Wed Jun 01 10:44:36 +0800 2022 LOCAL
prelaunch.err 0 Wed Jun 01 10:44:35 +0800 2022 LOCAL
stdout 0 Wed Jun 01 10:44:36 +0800 2022 LOCAL
stderr 1462151 Wed Jun 01 11:23:27 +0800 2022 LOCAL
5. 查看container的某个日志文件的具体内容
具体命令为:
yarn logs -containerId <ContainerID> -logFiles <LogFileName>
# LogFileName 就是上面一条命令所罗列出来的其中一个文件
简单示例:
[root@hdp-hadoop-hdp-resourcemanager-0 ~]# yarn logs -containerId container_e613_1653966005529_0021_01_000001 -logFiles prelaunch.out
22/06/01 11:26:15 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
Container: container_e613_1653966005529_0021_01_000001 on hdp-hadoop-hdp-nodemanager-9.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100
LogAggregationType: LOCAL
==================================================================================================================================================
LogType:prelaunch.out
LogLastModifiedTime:Wed Jun 01 10:44:36 +0800 2022
LogLength:70
LogContents:
Setting up env variables
Setting up job resources
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_e613_1653966005529_0021_01_000001) and so may not be complete.
******************************************************************************
小结一下:通过命令行可以查看所有app的信息,以及每个app中container的信息,以及每个container的日志详情。但也有不足的是:命令行强依赖于hadoop的环境,只能在hadoop相关的节点上才能执行这些命令。
【通过Rest查看】
通过RM的web查看app信息,以及container的日志,本质上也是发送http请求,但响应结果是html网页。
实际上还可以通过rest请求来查看,其URL和http请求稍有不同。结果也可以根据需要返回xml形式或json格式。
1. 获取Application的信息
这一步主要是为了获取application的attempt所在的NM节点,因为后续的请求都是直接向NM发起的。
请求的URL为:
http://$RMAddr/ws/v1/cluster/apps
# 还可以在apps后加applicationID,获取指定某一个application的信息
http://$RMAddr/ws/v1/cluster/apps/$ApplicationID
例如:
curl -X GET -H "Accept:application/json" "http://172.168.3.31:8088/ws/v1/cluster/apps"
{
"apps": {
"app": [{
"id": "application_1653899571088_0001",
"user": "root",
"name": "hadoop-mapreduce-client-jobclient-2.10.1-tests.jar",
"queue": "default",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100.0,
"trackingUI": "History",
"trackingUrl": "http://172.168.3.31:8088/proxy/application_1653899571088_0001/",
"diagnostics": "",
"clusterId": 1653899571088,
"applicationType": "MAPREDUCE",
"applicationTags": "",
"priority": 0,
"startedTime": 1653900968047,
"launchTime": 1653900969192,
"finishedTime": 1653901001551,
"elapsedTime": 33504,
"amContainerLogs": "http://172.168.3.53:8042/node/containerlogs/container_1653899571088_0001_01_000001/root",
"amHostHttpAddress": "172.168.3.53:8042",
"amRPCAddress": "172.168.3.53:38418",
"allocatedMB": -1,
"allocatedVCores": -1,
"reservedMB": -1,
"reservedVCores": -1,
"runningContainers": -1,
"memorySeconds": 93806,
"vcoreSeconds": 50,
"queueUsagePercentage": 0.0,
"clusterUsagePercentage": 0.0,
"resourceSecondsMap": {
"entry": {
"key": "memory-mb",
"value": "93806"
},
"entry": {
"key": "yarn.io/gpu",
"value": "0"
},
"entry": {
"key": "vcores",
"value": "50"
}
},
"preemptedResourceMB": 0,
"preemptedResourceVCores": 0,
"numNonAMContainerPreempted": 0,
"numAMContainerPreempted": 0,
"preemptedMemorySeconds": 0,
"preemptedVcoreSeconds": 0,
"preemptedResourceSecondsMap": null,
"logAggregationStatus": "SUCCEEDED",
"unmanagedApplication": false,
"amNodeLabelExpression": "",
"timeouts": {
"timeout": [{
"type": "LIFETIME",
"expiryTime": "UNLIMITED",
"remainingTimeInSeconds": -1
}]
}
}]
}
2. 获取Application的container信息
请求URL为:
http://$NMAddr/ws/v1/node/apps/$ApplicationID
# $NMAddr 为NodeManager的web服务的监听IP:PORT
# $ApplicationID 为指定的Application的ID
示例:
curl -X GET -H "Accept:application/json" "http://172.16.40.206:8042/ws/v1/node/apps/application_1653966005529_0025"
{
"app": {
"id": "application_1653966005529_0025",
"state": "RUNNING",
"user": "hadoop",
"containerids": [
"container_e613_1653966005529_0025_01_000002",
"container_e613_1653966005529_0025_01_000001"]
}
}
3. 获取container的详细信息
这里主要是为了获取container的日志文件列表
简单示例:
curl -X GET -H "Accept:application/json" "http://172.16.40.206:8042/ws/v1/node/containers/container_e613_1653966005529_0025_01_000001"
{
"container": {
"id": "container_e613_1653966005529_0025_01_000001",
"state": "RUNNING",
"exitCode": -1000,
"diagnostics": "",
"user": "hadoop",
"totalMemoryNeededMB": 3072,
"totalVCoresNeeded": 1,
"executionType": "GUARANTEED",
"containerLogsLink": "http://172.16.40.206:8042/node/containerlogs/container_e613_1653966005529_0025_01_000001/hadoop",
"nodeId": "hdp-hadoop-hdp-nodemanager-8.hdp-hadoop-hdp-nodemanager.yarnonk8s.svc.cluster.local:9100",
"containerLogFiles": [
"prelaunch.out",
"prelaunch.err",
"stdout",
"stderr"
]
}
}
4. 获取日志的具体内容
有了container的日志文件列表后,就可以请求日志的具体内容了。
例如:
[root@hdp-hadoop-hdp-resourcemanager-0 ~]# curl -X GET "http://172.16.40.206:8042/ws/v1/node/containers/container_e613_1653966005529_0025_01_000001/logs/prelaunch.out"
Container: container_e613_1653966005529_0025_01_000001 on 172.16.40.206:9100
LogAggregationType: LOCAL
==================================================================================================================================================
LogType:prelaunch.out
LogLastModifiedTime:Wed Jun 01 13:43:26 +0800 2022
LogLength:70
LogContents:
Setting up env variables
Setting up job resources
Launching container
End of LogType:prelaunch.out.This log file belongs to a running container (container_e613_1653966005529_0025_01_000001) and so may not be complete.
******************************************************************************
【总结】
总结一下:本文介绍了获取yarn任务信息(app信息、container信息、container日志)的几种方式。不同场景下,会选择不同的方式来查看任务信息。