我试着用这个代码训练一个强化学习智能体,使用的是gym和tflearn:
from tflearn import *
import gym
import numpy as np
env = gym.make('CartPole-v0')
x = []
y = []
max_reward = 0
for i in range(1000):
env.reset()
while True:
action = env.action_space.sample()
observation, reward, done, info = env
Tensorflow具有以下功能:
tf.matmul
它将两个向量相乘,并产生一个标量。
但是,我需要执行以下操作:
# dense dim: (?,227)
dense_part = tf.nn.relu(some stuff here)
# softmax matrix dim: (?,227,19) or (?,19,227) or (?,227,227), where I
# ....can slice the last dim down to (?,227,19)
softmax_matrix = tf.matmul(dense_part,softmax_weight_var
我正在学习神经网络,并用python实现它。我首先定义了一个softmax函数,我遵循这个问题Softmax function - python给出的解决方案。以下是我的代码: def softmax(A):
"""
Computes a softmax function.
Input: A (N, k) ndarray.
Returns: (N, k) ndarray.
"""
s = 0
e = np.exp(A)
s = e / np.sum(e, axis =0)
为了理解这段代码是如何工作的,我编写了一个。self.hidden变量如何在forward方法中使用变量x?
enter code class Network(nn.Module):
def __init__(self):
super().__init__()
# Inputs to hidden layer linear transformation
self.hidden = nn.Linear(784, 256)
# Output layer, 10 units - one for each digit
self.output = nn.Line