前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >coach运行流程梳理

coach运行流程梳理

作者头像
CreateAMind
发布2018-07-20 17:30:47
3680
发布2018-07-20 17:30:47
举报
文章被收录于专栏:CreateAMind

env::::

代码语言:javascript
复制
env_instance = create_environment(tuning_parameters)
agent = eval(tuning_parameters.agent.type + '(env_instance, tuning_parameters)')
代码语言:javascript
复制
def create_environment(tuning_parameters):
    env_type_name, env_type = EnvTypes().verify(tuning_parameters.env.type)
    env = eval(env_type)(tuning_parameters)
    return env
代码语言:javascript
复制
class GymEnvironmentWrapper(EnvironmentWrapper):
    def __init__(self, tuning_parameters):
        EnvironmentWrapper.__init__(self, tuning_parameters)

        # env parameters
代码语言:javascript
复制
class EnvironmentWrapper(object):
    def __init__(self, tuning_parameters):
        """
        :param tuning_parameters:
        :type tuning_parameters: Preset
        """
        # env initialization
        self.game = []
        self.actions = {}
        self.state = []
        self.reward = 0
        self.done = False
        self.default_action = 0
        self.last_action_idx = 0
        self.episode_idx = 0
        self.last_episode_time = time.time()
        self.info = []
        self.action_space_low = 0
        self.action_space_high = 0
        self.action_space_abs_range = 0
        self.actions_description = {}
        self.discrete_controls = True
        self.action_space_size = 0
        self.key_to_action = {}
        self.width = 1
        self.height = 1
        self.is_state_type_image = True
        self.measurements_size = 0
        self.phase = RunPhase.TRAIN
        self.tp = tuning_parameters
        self.record_video_every = self.tp.visualization.record_video_every
        self.env_id = self.tp.env.level
        self.video_path = self.tp.visualization.video_path
        self.is_rendered = self.tp.visualization.render
        self.seed = self.tp.seed
        self.frame_skip = self.tp.env.frame_skip
        self.human_control = self.tp.env.human_control
        self.wait_for_explicit_human_action = False
        self.is_rendered = self.is_rendered or self.human_control
        self.game_is_open = True
        self.renderer = Renderer()

    @property
    def measurements(self):
        assert False

    @measurements.setter
    def measurements(self, value):
        assert False

    @property
    def observation(self):
        assert False

    @observation.setter
    def observation(self, value):
        assert False

    def _idx_to_action(self, action_idx):
        """
代码语言:javascript
复制
class EnvRegistry(object):
    """Register an env by ID. IDs remain stable over time and are

797个env

agent:::

!!!

代码语言:javascript
复制
if 'action_intrinsic_reward' in action_info.keys():

下面是3个类的继承及函数重载。

代码语言:javascript
复制
class PolicyOptimizationAgent(Agent):

下面就是agent初始化完毕开始循环跑起来

代码语言:javascript
复制
agent = eval(tuning_parameters.agent.type + '(env_instance, tuning_parameters)')
# Start the training or evaluation
if tuning_parameters.evaluate:
    agent.evaluate(sys.maxsize, keep_networks_synced=True)  # evaluate forever
else:
    agent.improve()

基本流程大致如此。后面继续细化。

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2018-03-30,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档