前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >聊聊flink的MemoryStateBackend

聊聊flink的MemoryStateBackend

原创
作者头像
code4it
发布2018-12-10 14:41:02
7780
发布2018-12-10 14:41:02
举报
文章被收录于专栏:码匠的流水账码匠的流水账

本文主要研究一下flink的MemoryStateBackend

StateBackend

flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/StateBackend.java

代码语言:javascript
复制
@PublicEvolving
public interface StateBackend extends java.io.Serializable {
​
    // ------------------------------------------------------------------------
    //  Checkpoint storage - the durable persistence of checkpoint data
    // ------------------------------------------------------------------------
​
    /**
     * Resolves the given pointer to a checkpoint/savepoint into a checkpoint location. The location
     * supports reading the checkpoint metadata, or disposing the checkpoint storage location.
     *
     * <p>If the state backend cannot understand the format of the pointer (for example because it
     * was created by a different state backend) this method should throw an {@code IOException}.
     *
     * @param externalPointer The external checkpoint pointer to resolve.
     * @return The checkpoint location handle.
     *
     * @throws IOException Thrown, if the state backend does not understand the pointer, or if
     *                     the pointer could not be resolved due to an I/O error.
     */
    CompletedCheckpointStorageLocation resolveCheckpoint(String externalPointer) throws IOException;
​
    /**
     * Creates a storage for checkpoints for the given job. The checkpoint storage is
     * used to write checkpoint data and metadata.
     *
     * @param jobId The job to store checkpoint data for.
     * @return A checkpoint storage for the given job.
     *
     * @throws IOException Thrown if the checkpoint storage cannot be initialized.
     */
    CheckpointStorage createCheckpointStorage(JobID jobId) throws IOException;
​
    // ------------------------------------------------------------------------
    //  Structure Backends 
    // ------------------------------------------------------------------------
​
    /**
     * Creates a new {@link AbstractKeyedStateBackend} that is responsible for holding <b>keyed state</b>
     * and checkpointing it. Uses default TTL time provider.
     *
     * <p><i>Keyed State</i> is state where each value is bound to a key.
     *
     * @param <K> The type of the keys by which the state is organized.
     *
     * @return The Keyed State Backend for the given job, operator, and key group range.
     *
     * @throws Exception This method may forward all exceptions that occur while instantiating the backend.
     */
    default <K> AbstractKeyedStateBackend<K> createKeyedStateBackend(
            Environment env,
            JobID jobID,
            String operatorIdentifier,
            TypeSerializer<K> keySerializer,
            int numberOfKeyGroups,
            KeyGroupRange keyGroupRange,
            TaskKvStateRegistry kvStateRegistry) throws Exception {
        return createKeyedStateBackend(
            env,
            jobID,
            operatorIdentifier,
            keySerializer,
            numberOfKeyGroups,
            keyGroupRange,
            kvStateRegistry,
            TtlTimeProvider.DEFAULT
        );
    }
​
    /**
     * Creates a new {@link AbstractKeyedStateBackend} that is responsible for holding <b>keyed state</b>
     * and checkpointing it.
     *
     * <p><i>Keyed State</i> is state where each value is bound to a key.
     *
     * @param <K> The type of the keys by which the state is organized.
     *
     * @return The Keyed State Backend for the given job, operator, and key group range.
     *
     * @throws Exception This method may forward all exceptions that occur while instantiating the backend.
     */
    default <K> AbstractKeyedStateBackend<K> createKeyedStateBackend(
        Environment env,
        JobID jobID,
        String operatorIdentifier,
        TypeSerializer<K> keySerializer,
        int numberOfKeyGroups,
        KeyGroupRange keyGroupRange,
        TaskKvStateRegistry kvStateRegistry,
        TtlTimeProvider ttlTimeProvider
    ) throws Exception {
        return createKeyedStateBackend(
            env,
            jobID,
            operatorIdentifier,
            keySerializer,
            numberOfKeyGroups,
            keyGroupRange,
            kvStateRegistry,
            ttlTimeProvider,
            new UnregisteredMetricsGroup());
    }
​
    /**
     * Creates a new {@link AbstractKeyedStateBackend} that is responsible for holding <b>keyed state</b>
     * and checkpointing it.
     *
     * <p><i>Keyed State</i> is state where each value is bound to a key.
     *
     * @param <K> The type of the keys by which the state is organized.
     *
     * @return The Keyed State Backend for the given job, operator, and key group range.
     *
     * @throws Exception This method may forward all exceptions that occur while instantiating the backend.
     */
    <K> AbstractKeyedStateBackend<K> createKeyedStateBackend(
        Environment env,
        JobID jobID,
        String operatorIdentifier,
        TypeSerializer<K> keySerializer,
        int numberOfKeyGroups,
        KeyGroupRange keyGroupRange,
        TaskKvStateRegistry kvStateRegistry,
        TtlTimeProvider ttlTimeProvider,
        MetricGroup metricGroup) throws Exception;
    
    /**
     * Creates a new {@link OperatorStateBackend} that can be used for storing operator state.
     *
     * <p>Operator state is state that is associated with parallel operator (or function) instances,
     * rather than with keys.
     *
     * @param env The runtime environment of the executing task.
     * @param operatorIdentifier The identifier of the operator whose state should be stored.
     *
     * @return The OperatorStateBackend for operator identified by the job and operator identifier.
     *
     * @throws Exception This method may forward all exceptions that occur while instantiating the backend.
     */
    OperatorStateBackend createOperatorStateBackend(Environment env, String operatorIdentifier) throws Exception;
}
  • StateBackend接口定义了有状态的streaming应用的state是如何stored以及checkpointed
  • flink目前内置支持MemoryStateBackend、FsStateBackend、RocksDBStateBackend三种,如果没有配置默认为MemoryStateBackend;在flink-conf.yaml里头可以进行全局的默认配置,不过具体每个job还可以通过StreamExecutionEnvironment.setStateBackend来覆盖全局的配置
  • MemoryStateBackend可以在构造器中指定大小,默认是5MB,可以增大但是不能超过akka frame size;FsStateBackend模式把TaskManager的state存储在内存,但是可以把checkpoint的state存储到filesystem中(比如HDFS);RocksDBStateBackend把working state存储在RocksDB中,checkpoint的state存储在filesystem
  • StateBackend接口定义了createCheckpointStorage、createKeyedStateBackend、createOperatorStateBackend方法;同时继承了Serializable接口;StateBackend接口的实现要求是线程安全的
  • StateBackend有个直接实现的抽象类AbstractStateBackend,而AbstractFileStateBackend及RocksDBStateBackend继承了AbstractStateBackend,之后MemoryStateBackend、FsStateBackend都继承了AbstractFileStateBackend

AbstractStateBackend

flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/AbstractStateBackend.java

代码语言:javascript
复制
/**
 * An abstract base implementation of the {@link StateBackend} interface.
 *
 * <p>This class has currently no contents and only kept to not break the prior class hierarchy for users.
 */
@PublicEvolving
public abstract class AbstractStateBackend implements StateBackend, java.io.Serializable {
​
    private static final long serialVersionUID = 4620415814639230247L;
​
    // ------------------------------------------------------------------------
    //  State Backend - State-Holding Backends
    // ------------------------------------------------------------------------
​
    @Override
    public abstract <K> AbstractKeyedStateBackend<K> createKeyedStateBackend(
        Environment env,
        JobID jobID,
        String operatorIdentifier,
        TypeSerializer<K> keySerializer,
        int numberOfKeyGroups,
        KeyGroupRange keyGroupRange,
        TaskKvStateRegistry kvStateRegistry,
        TtlTimeProvider ttlTimeProvider,
        MetricGroup metricGroup) throws IOException;
​
    @Override
    public abstract OperatorStateBackend createOperatorStateBackend(
            Environment env,
            String operatorIdentifier) throws Exception;
}
  • AbstractStateBackend声明实现StateBackend及Serializable接口,这里没有新增其他内容

AbstractFileStateBackend

flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/filesystem/AbstractFileStateBackend.java

代码语言:javascript
复制
@PublicEvolving
public abstract class AbstractFileStateBackend extends AbstractStateBackend {
​
    private static final long serialVersionUID = 1L;
​
    // ------------------------------------------------------------------------
    //  State Backend Properties
    // ------------------------------------------------------------------------
​
    /** The path where checkpoints will be stored, or null, if none has been configured. */
    @Nullable
    private final Path baseCheckpointPath;
​
    /** The path where savepoints will be stored, or null, if none has been configured. */
    @Nullable
    private final Path baseSavepointPath;
​
    //......
​
    // ------------------------------------------------------------------------
    //  Initialization and metadata storage
    // ------------------------------------------------------------------------
​
    @Override
    public CompletedCheckpointStorageLocation resolveCheckpoint(String pointer) throws IOException {
        return AbstractFsCheckpointStorage.resolveCheckpointPointer(pointer);
    }
​
    // ------------------------------------------------------------------------
    //  Utilities
    // ------------------------------------------------------------------------
​
    /**
     * Checks the validity of the path's scheme and path.
     *
     * @param path The path to check.
     * @return The URI as a Path.
     *
     * @throws IllegalArgumentException Thrown, if the URI misses scheme or path.
     */
    private static Path validatePath(Path path) {
        final URI uri = path.toUri();
        final String scheme = uri.getScheme();
        final String pathPart = uri.getPath();
​
        // some validity checks
        if (scheme == null) {
            throw new IllegalArgumentException("The scheme (hdfs://, file://, etc) is null. " +
                    "Please specify the file system scheme explicitly in the URI.");
        }
        if (pathPart == null) {
            throw new IllegalArgumentException("The path to store the checkpoint data in is null. " +
                    "Please specify a directory path for the checkpoint data.");
        }
        if (pathPart.length() == 0 || pathPart.equals("/")) {
            throw new IllegalArgumentException("Cannot use the root directory for checkpoints.");
        }
​
        return path;
    }
​
    @Nullable
    private static Path parameterOrConfigured(@Nullable Path path, Configuration config, ConfigOption<String> option) {
        if (path != null) {
            return path;
        }
        else {
            String configValue = config.getString(option);
            try {
                return configValue == null ? null : new Path(configValue);
            }
            catch (IllegalArgumentException e) {
                throw new IllegalConfigurationException("Cannot parse value for " + option.key() +
                        " : " + configValue + " . Not a valid path.");
            }
        }
    }
}
  • AbstractFileStateBackend继承了AbstractStateBackend,它有baseCheckpointPath、baseSavepointPath两个属性,允许为null,路径的格式为hdfs://或者file://开头;resolveCheckpoint方法用于解析checkpoint或savepoint的location,这里使用AbstractFsCheckpointStorage.resolveCheckpointPointer(pointer)来完成

MemoryStateBackend

flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/memory/MemoryStateBackend.java

代码语言:javascript
复制
@PublicEvolving
public class MemoryStateBackend extends AbstractFileStateBackend implements ConfigurableStateBackend {
​
    private static final long serialVersionUID = 4109305377809414635L;
​
    /** The default maximal size that the snapshotted memory state may have (5 MiBytes). */
    public static final int DEFAULT_MAX_STATE_SIZE = 5 * 1024 * 1024;
​
    /** The maximal size that the snapshotted memory state may have. */
    private final int maxStateSize;
​
    /** Switch to chose between synchronous and asynchronous snapshots.
     * A value of 'UNDEFINED' means not yet configured, in which case the default will be used. */
    private final TernaryBoolean asynchronousSnapshots;
​
    // ------------------------------------------------------------------------
​
    /**
     * Creates a new memory state backend that accepts states whose serialized forms are
     * up to the default state size (5 MB).
     *
     * <p>Checkpoint and default savepoint locations are used as specified in the
     * runtime configuration.
     */
    public MemoryStateBackend() {
        this(null, null, DEFAULT_MAX_STATE_SIZE, TernaryBoolean.UNDEFINED);
    }
​
    /**
     * Creates a new memory state backend that accepts states whose serialized forms are
     * up to the default state size (5 MB). The state backend uses asynchronous snapshots
     * or synchronous snapshots as configured.
     *
     * <p>Checkpoint and default savepoint locations are used as specified in the
     * runtime configuration.
     *
     * @param asynchronousSnapshots Switch to enable asynchronous snapshots.
     */
    public MemoryStateBackend(boolean asynchronousSnapshots) {
        this(null, null, DEFAULT_MAX_STATE_SIZE, TernaryBoolean.fromBoolean(asynchronousSnapshots));
    }
​
    /**
     * Creates a new memory state backend that accepts states whose serialized forms are
     * up to the given number of bytes.
     *
     * <p>Checkpoint and default savepoint locations are used as specified in the
     * runtime configuration.
     *
     * <p><b>WARNING:</b> Increasing the size of this value beyond the default value
     * ({@value #DEFAULT_MAX_STATE_SIZE}) should be done with care.
     * The checkpointed state needs to be send to the JobManager via limited size RPC messages, and there
     * and the JobManager needs to be able to hold all aggregated state in its memory.
     *
     * @param maxStateSize The maximal size of the serialized state
     */
    public MemoryStateBackend(int maxStateSize) {
        this(null, null, maxStateSize, TernaryBoolean.UNDEFINED);
    }
​
    /**
     * Creates a new memory state backend that accepts states whose serialized forms are
     * up to the given number of bytes and that uses asynchronous snashots as configured.
     *
     * <p>Checkpoint and default savepoint locations are used as specified in the
     * runtime configuration.
     *
     * <p><b>WARNING:</b> Increasing the size of this value beyond the default value
     * ({@value #DEFAULT_MAX_STATE_SIZE}) should be done with care.
     * The checkpointed state needs to be send to the JobManager via limited size RPC messages, and there
     * and the JobManager needs to be able to hold all aggregated state in its memory.
     *
     * @param maxStateSize The maximal size of the serialized state
     * @param asynchronousSnapshots Switch to enable asynchronous snapshots.
     */
    public MemoryStateBackend(int maxStateSize, boolean asynchronousSnapshots) {
        this(null, null, maxStateSize, TernaryBoolean.fromBoolean(asynchronousSnapshots));
    }
​
    /**
     * Creates a new MemoryStateBackend, setting optionally the path to persist checkpoint metadata
     * to, and to persist savepoints to.
     *
     * @param checkpointPath The path to write checkpoint metadata to. If null, the value from
     *                       the runtime configuration will be used.
     * @param savepointPath  The path to write savepoints to. If null, the value from
     *                       the runtime configuration will be used.
     */
    public MemoryStateBackend(@Nullable String checkpointPath, @Nullable String savepointPath) {
        this(checkpointPath, savepointPath, DEFAULT_MAX_STATE_SIZE, TernaryBoolean.UNDEFINED);
    }
​
    /**
     * Creates a new MemoryStateBackend, setting optionally the paths to persist checkpoint metadata
     * and savepoints to, as well as configuring state thresholds and asynchronous operations.
     *
     * <p><b>WARNING:</b> Increasing the size of this value beyond the default value
     * ({@value #DEFAULT_MAX_STATE_SIZE}) should be done with care.
     * The checkpointed state needs to be send to the JobManager via limited size RPC messages, and there
     * and the JobManager needs to be able to hold all aggregated state in its memory.
     *
     * @param checkpointPath The path to write checkpoint metadata to. If null, the value from
     *                       the runtime configuration will be used.
     * @param savepointPath  The path to write savepoints to. If null, the value from
     *                       the runtime configuration will be used.
     * @param maxStateSize   The maximal size of the serialized state.
     * @param asynchronousSnapshots Flag to switch between synchronous and asynchronous
     *                              snapshot mode. If null, the value configured in the
     *                              runtime configuration will be used.
     */
    public MemoryStateBackend(
            @Nullable String checkpointPath,
            @Nullable String savepointPath,
            int maxStateSize,
            TernaryBoolean asynchronousSnapshots) {
​
        super(checkpointPath == null ? null : new Path(checkpointPath),
                savepointPath == null ? null : new Path(savepointPath));
​
        checkArgument(maxStateSize > 0, "maxStateSize must be > 0");
        this.maxStateSize = maxStateSize;
​
        this.asynchronousSnapshots = asynchronousSnapshots;
    }
​
    /**
     * Private constructor that creates a re-configured copy of the state backend.
     *
     * @param original The state backend to re-configure
     * @param configuration The configuration
     */
    private MemoryStateBackend(MemoryStateBackend original, Configuration configuration) {
        super(original.getCheckpointPath(), original.getSavepointPath(), configuration);
​
        this.maxStateSize = original.maxStateSize;
​
        // if asynchronous snapshots were configured, use that setting,
        // else check the configuration
        this.asynchronousSnapshots = original.asynchronousSnapshots.resolveUndefined(
                configuration.getBoolean(CheckpointingOptions.ASYNC_SNAPSHOTS));
    }
​
    // ------------------------------------------------------------------------
    //  Properties
    // ------------------------------------------------------------------------
​
    /**
     * Gets the maximum size that an individual state can have, as configured in the
     * constructor (by default {@value #DEFAULT_MAX_STATE_SIZE}).
     *
     * @return The maximum size that an individual state can have
     */
    public int getMaxStateSize() {
        return maxStateSize;
    }
​
    /**
     * Gets whether the key/value data structures are asynchronously snapshotted.
     *
     * <p>If not explicitly configured, this is the default value of
     * {@link CheckpointingOptions#ASYNC_SNAPSHOTS}.
     */
    public boolean isUsingAsynchronousSnapshots() {
        return asynchronousSnapshots.getOrDefault(CheckpointingOptions.ASYNC_SNAPSHOTS.defaultValue());
    }
​
    // ------------------------------------------------------------------------
    //  Reconfiguration
    // ------------------------------------------------------------------------
​
    /**
     * Creates a copy of this state backend that uses the values defined in the configuration
     * for fields where that were not specified in this state backend.
     *
     * @param config the configuration
     * @return The re-configured variant of the state backend
     */
    @Override
    public MemoryStateBackend configure(Configuration config) {
        return new MemoryStateBackend(this, config);
    }
​
    // ------------------------------------------------------------------------
    //  checkpoint state persistence
    // ------------------------------------------------------------------------
​
    @Override
    public CheckpointStorage createCheckpointStorage(JobID jobId) throws IOException {
        return new MemoryBackendCheckpointStorage(jobId, getCheckpointPath(), getSavepointPath(), maxStateSize);
    }
​
    // ------------------------------------------------------------------------
    //  state holding structures
    // ------------------------------------------------------------------------
​
    @Override
    public OperatorStateBackend createOperatorStateBackend(
            Environment env,
            String operatorIdentifier) throws Exception {
​
        return new DefaultOperatorStateBackend(
                env.getUserClassLoader(),
                env.getExecutionConfig(),
                isUsingAsynchronousSnapshots());
    }
​
    @Override
    public <K> AbstractKeyedStateBackend<K> createKeyedStateBackend(
            Environment env,
            JobID jobID,
            String operatorIdentifier,
            TypeSerializer<K> keySerializer,
            int numberOfKeyGroups,
            KeyGroupRange keyGroupRange,
            TaskKvStateRegistry kvStateRegistry,
            TtlTimeProvider ttlTimeProvider,
            MetricGroup metricGroup) {
​
        TaskStateManager taskStateManager = env.getTaskStateManager();
        HeapPriorityQueueSetFactory priorityQueueSetFactory =
            new HeapPriorityQueueSetFactory(keyGroupRange, numberOfKeyGroups, 128);
        return new HeapKeyedStateBackend<>(
                kvStateRegistry,
                keySerializer,
                env.getUserClassLoader(),
                numberOfKeyGroups,
                keyGroupRange,
                isUsingAsynchronousSnapshots(),
                env.getExecutionConfig(),
                taskStateManager.createLocalRecoveryConfig(),
                priorityQueueSetFactory,
                ttlTimeProvider);
    }
​
    // ------------------------------------------------------------------------
    //  utilities
    // ------------------------------------------------------------------------
​
    @Override
    public String toString() {
        return "MemoryStateBackend (data in heap memory / checkpoints to JobManager) " +
                "(checkpoints: '" + getCheckpointPath() +
                "', savepoints: '" + getSavepointPath() +
                "', asynchronous: " + asynchronousSnapshots +
                ", maxStateSize: " + maxStateSize + ")";
    }
}
  • MemoryStateBackend继承了AbstractFileStateBackend,实现ConfigurableStateBackend接口(configure方法);它将TaskManager的working state及JobManager的checkpoint state存储在JVM heap中(但是为了高可用,也可以设置checkpoint state存储到filesystem);MemoryStateBackend仅仅用来做实验用途,比如本地启动或者所需的state非常小,对于生产需要改为使用FsStateBackend(将TaskManager的working state存储在内存,但是将JobManager的checkpoint state存储到文件系统以支持更大的state存储)
  • MemoryStateBackend有个maxStateSize属性(默认DEFAULT_MAX_STATE_SIZE为5MB),每个state的大小不能超过maxStateSize,一个task的所有state不能超过RPC系统的限制(默认是10MB,可以修改但不建议),所有retained checkpoints的state大小总和不能超过JobManager的JVM heap大小;另外如果创建MemoryStateBackend时未指定checkpointPath及savepointPath,则会从flink-conf.yaml中读取全局默认值;MemoryStateBackend里头还有一个asynchronousSnapshots属性,是TernaryBoolean类型(TRUE、FALSE、UNDEFINED),其中UNDEFINED表示没有配置,将会使用默认值
  • MemoryStateBackend的createCheckpointStorage创建的是MemoryBackendCheckpointStorage;createOperatorStateBackend方法创建的是OperatorStateBackend;createKeyedStateBackend方法创建的是HeapKeyedStateBackend

小结

  • StateBackend接口定义了有状态的streaming应用的state是如何stored以及checkpointed;目前内置支持MemoryStateBackend、FsStateBackend、RocksDBStateBackend三种,如果没有配置默认为MemoryStateBackend;在flink-conf.yaml里头可以进行全局的默认配置,不过具体每个job还可以通过StreamExecutionEnvironment.setStateBackend来覆盖全局的配置
  • StateBackend接口定义了createCheckpointStorage、createKeyedStateBackend、createOperatorStateBackend方法;同时继承了Serializable接口;StateBackend接口的实现要求是线程安全的;StateBackend有个直接实现的抽象类AbstractStateBackend,而AbstractFileStateBackend及RocksDBStateBackend继承了AbstractStateBackend,之后MemoryStateBackend、FsStateBackend都继承了AbstractFileStateBackend
  • MemoryStateBackend继承了AbstractFileStateBackend,实现ConfigurableStateBackend接口(configure方法);它将TaskManager的working state及JobManager的checkpoint state存储在JVM heap中;MemoryStateBackend的createCheckpointStorage创建的是MemoryBackendCheckpointStorage;createOperatorStateBackend方法创建的是OperatorStateBackend;createKeyedStateBackend方法创建的是HeapKeyedStateBackend

doc

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • StateBackend
    • AbstractStateBackend
      • AbstractFileStateBackend
      • MemoryStateBackend
      • 小结
      • doc
      领券
      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档