前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
社区首页 >专栏 >聊聊flink的slot.request.timeout配置

聊聊flink的slot.request.timeout配置

原创
作者头像
code4it
发布于 2019-03-04 05:21:36
发布于 2019-03-04 05:21:36
2.2K00
代码可运行
举报
文章被收录于专栏:码匠的流水账码匠的流水账
运行总次数:0
代码可运行

本文主要研究一下flink的slot.request.timeout配置

JobManagerOptions

flink-release-1.7.2/flink-core/src/main/java/org/apache/flink/configuration/JobManagerOptions.java

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
@PublicEvolving
public class JobManagerOptions {
    //....../**
     * The timeout in milliseconds for requesting a slot from Slot Pool.
     */
    public static final ConfigOption<Long> SLOT_REQUEST_TIMEOUT =
        key("slot.request.timeout")
        .defaultValue(5L * 60L * 1000L)
        .withDescription("The timeout in milliseconds for requesting a slot from Slot Pool.");//......
}
  • slot.request.timeout默认为5分钟

SlotManagerConfiguration

flink-release-1.7.2/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/SlotManagerConfiguration.java

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
public class SlotManagerConfiguration {private static final Logger LOGGER = LoggerFactory.getLogger(SlotManagerConfiguration.class);private final Time taskManagerRequestTimeout;
    private final Time slotRequestTimeout;
    private final Time taskManagerTimeout;public SlotManagerConfiguration(
            Time taskManagerRequestTimeout,
            Time slotRequestTimeout,
            Time taskManagerTimeout) {
        this.taskManagerRequestTimeout = Preconditions.checkNotNull(taskManagerRequestTimeout);
        this.slotRequestTimeout = Preconditions.checkNotNull(slotRequestTimeout);
        this.taskManagerTimeout = Preconditions.checkNotNull(taskManagerTimeout);
    }public Time getTaskManagerRequestTimeout() {
        return taskManagerRequestTimeout;
    }public Time getSlotRequestTimeout() {
        return slotRequestTimeout;
    }public Time getTaskManagerTimeout() {
        return taskManagerTimeout;
    }public static SlotManagerConfiguration fromConfiguration(Configuration configuration) throws ConfigurationException {
        final String strTimeout = configuration.getString(AkkaOptions.ASK_TIMEOUT);
        final Time rpcTimeout;try {
            rpcTimeout = Time.milliseconds(Duration.apply(strTimeout).toMillis());
        } catch (NumberFormatException e) {
            throw new ConfigurationException("Could not parse the resource manager's timeout " +
                "value " + AkkaOptions.ASK_TIMEOUT + '.', e);
        }
​
        final Time slotRequestTimeout = getSlotRequestTimeout(configuration);
        final Time taskManagerTimeout = Time.milliseconds(
                configuration.getLong(ResourceManagerOptions.TASK_MANAGER_TIMEOUT));return new SlotManagerConfiguration(rpcTimeout, slotRequestTimeout, taskManagerTimeout);
    }private static Time getSlotRequestTimeout(final Configuration configuration) {
        final long slotRequestTimeoutMs;
        if (configuration.contains(ResourceManagerOptions.SLOT_REQUEST_TIMEOUT)) {
            LOGGER.warn("Config key {} is deprecated; use {} instead.",
                ResourceManagerOptions.SLOT_REQUEST_TIMEOUT,
                JobManagerOptions.SLOT_REQUEST_TIMEOUT);
            slotRequestTimeoutMs = configuration.getLong(ResourceManagerOptions.SLOT_REQUEST_TIMEOUT);
        } else {
            slotRequestTimeoutMs = configuration.getLong(JobManagerOptions.SLOT_REQUEST_TIMEOUT);
        }
        return Time.milliseconds(slotRequestTimeoutMs);
    }
}
  • SlotManagerConfiguration的getSlotRequestTimeout方法会从配置文件读取JobManagerOptions.SLOT_REQUEST_TIMEOUT

SlotManager

flink-release-1.7.2/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/SlotManager.java

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
public class SlotManager implements AutoCloseable {
    private static final Logger LOG = LoggerFactory.getLogger(SlotManager.class);/** Scheduled executor for timeouts. */
    private final ScheduledExecutor scheduledExecutor;/** Timeout for slot requests to the task manager. */
    private final Time taskManagerRequestTimeout;/** Timeout after which an allocation is discarded. */
    private final Time slotRequestTimeout;/** Timeout after which an unused TaskManager is released. */
    private final Time taskManagerTimeout;/** Map for all registered slots. */
    private final HashMap<SlotID, TaskManagerSlot> slots;/** Index of all currently free slots. */
    private final LinkedHashMap<SlotID, TaskManagerSlot> freeSlots;/** All currently registered task managers. */
    private final HashMap<InstanceID, TaskManagerRegistration> taskManagerRegistrations;/** Map of fulfilled and active allocations for request deduplication purposes. */
    private final HashMap<AllocationID, SlotID> fulfilledSlotRequests;/** Map of pending/unfulfilled slot allocation requests. */
    private final HashMap<AllocationID, PendingSlotRequest> pendingSlotRequests;private final HashMap<TaskManagerSlotId, PendingTaskManagerSlot> pendingSlots;/** ResourceManager's id. */
    private ResourceManagerId resourceManagerId;/** Executor for future callbacks which have to be "synchronized". */
    private Executor mainThreadExecutor;/** Callbacks for resource (de-)allocations. */
    private ResourceActions resourceActions;private ScheduledFuture<?> taskManagerTimeoutCheck;private ScheduledFuture<?> slotRequestTimeoutCheck;/** True iff the component has been started. */
    private boolean started;public SlotManager(
            ScheduledExecutor scheduledExecutor,
            Time taskManagerRequestTimeout,
            Time slotRequestTimeout,
            Time taskManagerTimeout) {
        this.scheduledExecutor = Preconditions.checkNotNull(scheduledExecutor);
        this.taskManagerRequestTimeout = Preconditions.checkNotNull(taskManagerRequestTimeout);
        this.slotRequestTimeout = Preconditions.checkNotNull(slotRequestTimeout);
        this.taskManagerTimeout = Preconditions.checkNotNull(taskManagerTimeout);
​
        slots = new HashMap<>(16);
        freeSlots = new LinkedHashMap<>(16);
        taskManagerRegistrations = new HashMap<>(4);
        fulfilledSlotRequests = new HashMap<>(16);
        pendingSlotRequests = new HashMap<>(16);
        pendingSlots = new HashMap<>(16);
​
        resourceManagerId = null;
        resourceActions = null;
        mainThreadExecutor = null;
        taskManagerTimeoutCheck = null;
        slotRequestTimeoutCheck = null;
​
        started = false;
    }public void start(ResourceManagerId newResourceManagerId, Executor newMainThreadExecutor, ResourceActions newResourceActions) {
        LOG.info("Starting the SlotManager.");this.resourceManagerId = Preconditions.checkNotNull(newResourceManagerId);
        mainThreadExecutor = Preconditions.checkNotNull(newMainThreadExecutor);
        resourceActions = Preconditions.checkNotNull(newResourceActions);
​
        started = true;
​
        taskManagerTimeoutCheck = scheduledExecutor.scheduleWithFixedDelay(
            () -> mainThreadExecutor.execute(
                () -> checkTaskManagerTimeouts()),
            0L,
            taskManagerTimeout.toMilliseconds(),
            TimeUnit.MILLISECONDS);
​
        slotRequestTimeoutCheck = scheduledExecutor.scheduleWithFixedDelay(
            () -> mainThreadExecutor.execute(
                () -> checkSlotRequestTimeouts()),
            0L,
            slotRequestTimeout.toMilliseconds(),
            TimeUnit.MILLISECONDS);
    }/**
     * Suspends the component. This clears the internal state of the slot manager.
     */
    public void suspend() {
        LOG.info("Suspending the SlotManager.");// stop the timeout checks for the TaskManagers and the SlotRequests
        if (taskManagerTimeoutCheck != null) {
            taskManagerTimeoutCheck.cancel(false);
            taskManagerTimeoutCheck = null;
        }if (slotRequestTimeoutCheck != null) {
            slotRequestTimeoutCheck.cancel(false);
            slotRequestTimeoutCheck = null;
        }for (PendingSlotRequest pendingSlotRequest : pendingSlotRequests.values()) {
            cancelPendingSlotRequest(pendingSlotRequest);
        }
​
        pendingSlotRequests.clear();
​
        ArrayList<InstanceID> registeredTaskManagers = new ArrayList<>(taskManagerRegistrations.keySet());for (InstanceID registeredTaskManager : registeredTaskManagers) {
            unregisterTaskManager(registeredTaskManager);
        }
​
        resourceManagerId = null;
        resourceActions = null;
        started = false;
    }public boolean registerSlotRequest(SlotRequest slotRequest) throws SlotManagerException {
        checkInit();if (checkDuplicateRequest(slotRequest.getAllocationId())) {
            LOG.debug("Ignoring a duplicate slot request with allocation id {}.", slotRequest.getAllocationId());return false;
        } else {
            PendingSlotRequest pendingSlotRequest = new PendingSlotRequest(slotRequest);
​
            pendingSlotRequests.put(slotRequest.getAllocationId(), pendingSlotRequest);try {
                internalRequestSlot(pendingSlotRequest);
            } catch (ResourceManagerException e) {
                // requesting the slot failed --> remove pending slot request
                pendingSlotRequests.remove(slotRequest.getAllocationId());throw new SlotManagerException("Could not fulfill slot request " + slotRequest.getAllocationId() + '.', e);
            }return true;
        }
    }private void checkSlotRequestTimeouts() {
        if (!pendingSlotRequests.isEmpty()) {
            long currentTime = System.currentTimeMillis();
​
            Iterator<Map.Entry<AllocationID, PendingSlotRequest>> slotRequestIterator = pendingSlotRequests.entrySet().iterator();while (slotRequestIterator.hasNext()) {
                PendingSlotRequest slotRequest = slotRequestIterator.next().getValue();if (currentTime - slotRequest.getCreationTimestamp() >= slotRequestTimeout.toMilliseconds()) {
                    slotRequestIterator.remove();if (slotRequest.isAssigned()) {
                        cancelPendingSlotRequest(slotRequest);
                    }
​
                    resourceActions.notifyAllocationFailure(
                        slotRequest.getJobId(),
                        slotRequest.getAllocationId(),
                        new TimeoutException("The allocation could not be fulfilled in time."));
                }
            }
        }
    }//......}
  • SlotManager的构造器接收slotRequestTimeout参数;它维护了pendingSlotRequests的map;start方法会注册slotRequestTimeoutCheck,每隔slotRequestTimeout的时间调度一次,执行的是checkSlotRequestTimeouts方法;suspend方法会cancel这些pendingSlotRequest,然后情况pendingSlotRequests的map
  • registerSlotRequest方法会先执行checkDuplicateRequest判断是否有重复,没有重复的话,则将该slotRequest维护到pendingSlotRequests,然后调用internalRequestSlot进行分配,如果出现异常则从pendingSlotRequests中异常,然后抛出SlotManagerException
  • checkSlotRequestTimeouts则会遍历pendingSlotRequests,然后根据slotRequest.getCreationTimestamp()及当前时间判断时间差是否大于等于slotRequestTimeout,已经超时的话,则会从pendingSlotRequests中移除该slotRequest,然后进行cancel,同时触发resourceActions.notifyAllocationFailure

小结

  • SlotManagerConfiguration的getSlotRequestTimeout方法会从配置文件读取JobManagerOptions.SLOT_REQUEST_TIMEOUT;slot.request.timeout默认为5分钟
  • SlotManager的构造器接收slotRequestTimeout参数;它维护了pendingSlotRequests的map;start方法会注册slotRequestTimeoutCheck,每隔slotRequestTimeout的时间调度一次,执行的是checkSlotRequestTimeouts方法;suspend方法会cancel这些pendingSlotRequest,然后情况pendingSlotRequests的map
  • registerSlotRequest方法会先执行checkDuplicateRequest判断是否有重复,没有重复的话,则将该slotRequest维护到pendingSlotRequests,然后调用internalRequestSlot进行分配,如果出现异常则从pendingSlotRequests中异常,然后抛出SlotManagerException;checkSlotRequestTimeouts则会遍历pendingSlotRequests,然后根据slotRequest.getCreationTimestamp()及当前时间判断时间差是否大于等于slotRequestTimeout,已经超时的话,则会从pendingSlotRequests中移除该slotRequest,然后进行cancel,同时触发resourceActions.notifyAllocationFailure

doc

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
暂无评论
推荐阅读
编辑精选文章
换一批
聊聊flink的slot.idle.timeout配置
flink-release-1.7.2/flink-core/src/main/java/org/apache/flink/configuration/JobManagerOptions.java
code4it
2019/03/05
1.4K0
聊聊flink的slot.idle.timeout配置
聊聊flink的jobstore配置
flink-1.7.2/flink-core/src/main/java/org/apache/flink/configuration/JobManagerOptions.java
code4it
2019/03/09
1.3K0
聊聊flink的jobstore配置
聊聊flink的RestClientConfiguration
flink-release-1.7.2/flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClientConfiguration.java
code4it
2019/03/07
1.3K0
聊聊flink的RestClientConfiguration
[源码解析] Flink的Slot究竟是什么?(1)
Flink的Slot概念大家应该都听说过,但是可能很多朋友还不甚了解其中细节,比如具体Slot究竟代表什么?在代码中如何实现?Slot在生成执行图、调度、分配资源、部署、执行阶段分别起到什么作用?本文和下文将带领大家一起分析源码,为你揭开Slot背后的机理。
罗西的思考
2020/09/07
3.2K0
[源码解析] Flink的Slot究竟是什么?(1)
[源码解析] Flink的Slot究竟是什么?(2)
Flink的Slot概念大家应该都听说过,但是可能很多朋友还不甚了解其中细节,比如具体Slot究竟代表什么?在代码中如何实现?Slot在生成执行图、调度、分配资源、部署、执行阶段分别起到什么作用?本文和上文将带领大家一起分析源码,为你揭开Slot背后的机理。
罗西的思考
2020/09/07
1.3K0
[源码解析] Flink的Slot究竟是什么?(2)
聊聊flink的RestartStrategies
flink-core-1.7.1-sources.jar!/org/apache/flink/api/common/restartstrategy/RestartStrategies.java
code4it
2019/02/11
9730
聊聊flink的RestartStrategies
聊聊flink的ScheduledExecutor
java.base/java/util/concurrent/Executor.java
code4it
2019/03/13
1.1K0
聊聊flink的ScheduledExecutor
聊聊flink taskmanager的jvm-exit-on-oom配置
本文主要研究一下flink taskmanager的jvm-exit-on-oom配置
code4it
2019/02/23
1.2K0
聊聊flink taskmanager的jvm-exit-on-oom配置
聊聊flink的CheckpointScheduler
flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorDeActivator.java
code4it
2018/12/07
1K0
聊聊flink的CheckpointScheduler
聊聊flink的MetricQueryServiceGateway
flink-1.7.2/flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/retriever/MetricQueryServiceGateway.java
code4it
2019/03/17
4750
聊聊flink的MetricQueryServiceGateway
聊聊flink的RpcServer
flink-release-1.7.2/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/RpcGateway.java
code4it
2019/03/14
8940
聊聊flink的RpcServer
聊聊flink的SourceFunction
flink-streaming-java_2.11-1.6.2-sources.jar!/org/apache/flink/streaming/api/functions/source/SourceFunction.java
code4it
2018/12/18
1.9K0
聊聊flink的RpcService
flink-release-1.7.2/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/RpcService.java
code4it
2019/03/12
1.3K0
聊聊flink的RpcService
聊聊flink的InputFormatSourceFunction
flink-streaming-java_2.11-1.6.2-sources.jar!/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.java
code4it
2018/11/29
1.5K0
聊聊flink的InputFormatSourceFunction
聊聊flink的SourceFunction
flink-streaming-java_2.11-1.6.2-sources.jar!/org/apache/flink/streaming/api/functions/source/SourceFunction.java
code4it
2018/11/27
3.5K0
聊聊flink的SourceFunction
聊聊flink的StateTtlConfig
flink-core-1.7.0-sources.jar!/org/apache/flink/api/common/state/StateTtlConfig.java
code4it
2018/12/24
9290
聊聊flink的StateTtlConfig
聊聊flink的log.file配置
flink-release-1.6.2/flink-dist/src/main/flink-bin/conf/log4j.properties
code4it
2018/11/22
6K0
聊聊flink的log.file配置
聊聊flink jdbc的ParameterValuesProvider
本文主要研究一下flink jdbc的ParameterValuesProvider
code4it
2019/04/23
9150
聊聊flink jdbc的ParameterValuesProvider
聊聊flink KeyedStream的intervalJoin操作
flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/datastream/KeyedStream.java
code4it
2019/01/11
1.3K0
聊聊flink KeyedStream的intervalJoin操作
聊聊flink的FencedAkkaInvocationHandler
本文主要研究一下flink的FencedAkkaInvocationHandler
code4it
2019/03/15
7880
聊聊flink的FencedAkkaInvocationHandler
相关推荐
聊聊flink的slot.idle.timeout配置
更多 >
领券
社区富文本编辑器全新改版!诚邀体验~
全新交互,全新视觉,新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能,全面提升创作效率和体验
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档
查看详情【社区公告】 技术创作特训营有奖征文