这篇文章的出发点是自动化选择网络表示学习的参数从而适应不同网络的需求。同时文章也证明了DeepWalk的工作其实等同于矩阵分解。
文章分析了之前一些工作的一个不可避免的问题,那就是模型参数的选择问题。对于不同的网络,模型能达到最好效果的参数是不同的,如果对于每个新出现的问题都要反复试验找到合适的参数,这显然是低效的。这些参数包括DeepWalk中窗口大小w的选取,node2vec中定义的两个跟随机游走有关的参数p、q。参数的选取完全影响着模型的好坏。
这些参数其实构成的是一个关于采样节点的一个概率分布,以DeepWalk中的w为例,对于随机游走的每一个节点v,采样的节点为v的w邻域的2w个节点,而采样每个节点的概率就是
,这实际上就是一个采样节点的均匀分布。本文的思想就是将这些超参数作为要学习的参数,这样就能自动学习出最适合网络的超参数。
具体要做的有以下几点:
本文的主要思想就是通过Attention Model自动化学习网络中的参数,将网络中的参数看成随机游走中对邻居采样的一种概率分布,通过学习最适应网络的这种分布达到好的网络表示结果。
原文:【论文笔记】Watch Your Step https://zhuanlan.zhihu.com/p/46935910
完整代码可以查看google research:https://github.com/google-research/google-research/tree/master/graph_embedding/watch_your_step
def GetOrMakeAdjacencyMatrix():
"""Creates Adjacency matrix and caches it on disk with name a.npy."""
a_file = os.path.join(FLAGS.dataset_dir, 'a.npy')
if os.path.exists(a_file):
return numpy.load(open(a_file, 'rb'))
num_nodes = GetNumNodes()
a = numpy.zeros(shape=(num_nodes, num_nodes), dtype='float32')
train_edges = numpy.load(
open(os.path.join(FLAGS.dataset_dir, 'train.txt.npy'), 'rb'))
a[train_edges[:, 0], train_edges[:, 1]] = 1.0
if not IsDirected():
a[train_edges[:, 1], train_edges[:, 0]] = 1.0
numpy.save(open(a_file, 'wb'), a)
return a
def GetPowerTransitionPairs(highest_power):
return list(IterPowerTransitionPairs(highest_power))
def IterPowerTransitionPairs(highest_power):
"""Yields powers of transition matrix (T, T*T, T*T*T, ...).
It caches them on disk as t_<i>.npy, where <i> is the power. The first power
(i = 1) is not cached as it is trivially computed from the adjacency matrix.
Args:
highest_power: integer representing the highest power of the transition
matrix. This will be the number of yields.
"""
num_nodes = GetNumNodes()
for i in range(highest_power):
if i == 0:
a = GetOrMakeAdjacencyMatrix()
transition = a.T
degree = transition.sum(axis=0)
transition /= degree + 0.0000001
power_array = transition
else:
power_filename = os.path.join(FLAGS.dataset_dir, 't_%i.npy' % (i + 1))
if os.path.exists(power_filename):
power_array = numpy.load(open(power_filename, 'rb'))
else:
power_array = power_array.dot(transition)
print('Computing T^%i ...' % (i + 1)) # pylint: disable=superfluous-parens
numpy.save(open(power_filename, 'wb'), power_array)
print(' ... Saved T^%i' % (i + 1)) # pylint: disable=superfluous-parens
placeholder = tf.placeholder(tf.float32, shape=(num_nodes, num_nodes))
yield (placeholder, power_array)
def GetParametrizedExpectation(references):
r"""Calculates E[D; q_1, q_2, ...]: a parametrized (tensor) matrix D.
Which is defined as:
E[D; q] = P_0 * (Q_1*T + Q_2*T^2 + Q_3*T^3 + ...)
where Q_1, Q_2, ... = softmax(q_1, q_2, ...)
and vector (q_1, q_2, ...) is created as a "trainable variable".
Args:
references: Dict that will be populated as key-value pairs:
'combination': \sum_j Q_j T^j (i.e. E[D] excluding P_0).
'normed': The vector Q_1, Q_2, ... (sums to 1).
'mults': The vector q_1, q_2, ... (Before softmax, does not sum to 1).
Returns:
Tuple (E[D; q], feed_dict) where the first entry contains placeholders and
the feed_dict contains is a dictionary from the placeholders to numpy arrays
of the transition powers.
"""
feed_dict = {}
n = FLAGS.transition_powers
regularizer = FLAGS.context_regularizer
a = GetOrMakeAdjacencyMatrix()
transition = a.T
degree = transition.sum(axis=0)
# transition /= degree + 0.0000001
# transition_pow_n = transition
convex_combination = []
# vector q
mults = tf.Variable(numpy.ones(shape=(n), dtype='float32'))
# vector Q (output of softmax)
normed = tf.squeeze(tf.nn.softmax(tf.expand_dims(mults, 0)), 0)
references['mults'] = mults
references['normed'] = normed
transition_powers = GetPowerTransitionPairs(n)
for k, (placeholder, transition_pow) in enumerate(transition_powers):
feed_dict[placeholder] = transition_pow
convex_combination.append(normed[k] * placeholder)
d_sum = tf.add_n(convex_combination)
d_sum *= degree
tf.losses.add_loss(tf.reduce_mean(mults**2) * regularizer)
references['combination'] = convex_combination
return tf.transpose(d_sum) * GetNumNodes() * 80, feed_dict
# Helper function 1/3 for PercentDelta.
def GetPD(target_num_steps):
global_step = tf.train.get_or_create_global_step()
global_step = tf.cast(global_step, tf.float32)
# gs = 0, target = 1
# gs = num_steps, target = 0.01
# Solve: y = mx + c
# gives: c = 1
# m = dy / dx = (1 - 0.01) / (0 - num_steps) = - 0.99 / num_steps
# Therefore, y = 1 - (0.99/num_steps) * x
return -global_step * 0.99 / target_num_steps + 1
# Helper function 2/3 for PercentDelta.
def PlusEpsilon(x, eps=1e-5):
"""Returns x+epsilon, without changing element-wise sign of x."""
return x + (tf.cast(x < 0, tf.float32) * -eps) + (
tf.cast(x >= 0, tf.float32) * eps)
# Helper function 3/3 for PercentDelta.
def CreateGradMultipliers(loss):
"""Returns a gradient multiplier so that SGD becomes PercentDelta."""
variables = tf.trainable_variables() # tf.global_variables()
gradients = tf.gradients(loss, variables)
multipliers = {}
target_pd = GetPD(FLAGS.max_number_of_steps)
for v, g in zip(variables, gradients):
if g is None:
continue
multipliers[v] = target_pd / PlusEpsilon(
tf.reduce_mean(tf.abs(g / PlusEpsilon(v))))
return multipliers
def CreateEmbeddingDictionary(side, size):
num_nodes = GetNumNodes()
embeddings = numpy.array(
numpy.random.uniform(low=-0.1, high=0.1, size=(num_nodes, size)),
dtype='float32')
embeddings = tf.Variable(embeddings, name=side + 'E')
tf.losses.add_loss(tf.reduce_mean(embeddings**2) * 1e-6)
return embeddings
def CreateObjective(g, target_matrix):
"""Returns the objective function (can be nlgl or rmse)."""
if FLAGS.objective == 'nlgl': # Negative log likelihood
# target_matrix is E[D; q], which is used in the "positive part" of the
# likelihood objective. We use true adjacency for the "negative part", as
# described in our paper.
true_adjacency = tf.Variable(
GetOrMakeAdjacencyMatrix(), name='adjacency', trainable=False)
logistic = tf.sigmoid(g)
return -tf.reduce_mean(
tf.multiply(target_matrix, tf.log(PlusEpsilon(logistic))) +
tf.multiply(1 - true_adjacency, tf.log(PlusEpsilon(1 - logistic))))
elif FLAGS.objective == 'rmse': # Root mean squared error
return tf.reduce_mean((g - target_matrix)**2)
else:
logging.fatal('unknown objective "%s".', FLAGS.objective)
def CreateGFn(net_l, net_r):
return tf.matmul(net_l, tf.transpose(net_r))