专栏首页深度学习框架Building deep retrieval models
原创

Building deep retrieval models

In the featurization tutorial we incorporated multiple features into our models, but the models consist of only an embedding layer. We can add more dense layers to our models to increase their expressive power. In general, deeper models are capable of learning more complex patterns than shallower models. For example, our user model incorporates user ids and timestamps to model user preferences at a point in time. A shallow model (say, a single embedding layer) may only be able to learn the simplest relationships between those features and movies: a given movie is most popular around the time of its release, and a given user generally prefers horror movies to comedies. To capture more complex relationships, such as user preferences evolving over time, we may need a deeper model with multiple stacked dense layers.

Of course, complex models also have their disadvantages. The first is computational cost, as larger models require both more memory and more computation to fit and serve. The second is the requirement for more data: in general, more training data is needed to take advantage of deeper models. With more parameters, deep models might overfit or even simply memorize the training examples instead of learning a function that can generalize. Finally, training deeper models may be harder, and more care needs to be taken in choosing settings like regularization and learning rate. Finding a good architecture for a real-world recommender system is a complex art, requiring good intuition and careful hyperparameter tuning. For example, factors such as the depth and width of the model, activation function, learning rate, and optimizer can radically change the performance of the model. Modelling choices are further complicated by the fact that good offline evaluation metrics may not correspond to good online performance, and that the choice of what to optimize for is often more critical than the choice of model itself. Nevertheless, effort put into building and fine-tuning larger models often pays off. In this tutorial, we will illustrate how to build deep retrieval models using TensorFlow Recommenders. We'll do this by building progressively more complex models to see how this affects model performance.

import os
import tempfile
​
%matplotlib inline
import matplotlib.pyplot as plt
​
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
​
import tensorflow_recommenders as tfrs
​
plt.style.use('seaborn-whitegrid')

In this tutorial we will use the models from the featurization tutorial to generate embeddings. Hence we will only be using the user id, timestamp, and movie title features.

ratings = tfds.load("movielens/100k-ratings", split="train")
movies = tfds.load("movielens/100k-movies", split="train")
​
ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
    "timestamp": x["timestamp"],
})
movies = movies.map(lambda x: x["movie_title"])

We also do some housekeeping to prepare feature vocabularies.

timestamps = np.concatenate(list(ratings.map(lambda x: x["timestamp"]).batch(100)))
​
max_timestamp = timestamps.max()
min_timestamp = timestamps.min()
​
timestamp_buckets = np.linspace(
    min_timestamp, max_timestamp, num=1000,
)
​
unique_movie_titles = np.unique(np.concatenate(list(movies.batch(1000))))
unique_user_ids = np.unique(np.concatenate(list(ratings.batch(1_000).map(
    lambda x: x["user_id"]))))

Model definition

Query model

We start with the user model defined in the featurization tutorial as the first layer of our model, tasked with converting raw input examples into feature embeddings.

class UserModel(tf.keras.Model):
​
  def __init__(self):
    super().__init__()
​
    self.user_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.StringLookup(
            vocabulary=unique_user_ids, mask_token=None),
        tf.keras.layers.Embedding(len(unique_user_ids) + 1, 32),
    ])
    self.timestamp_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.Discretization(timestamp_buckets.tolist()),
        tf.keras.layers.Embedding(len(timestamp_buckets) + 1, 32),
    ])
    self.normalized_timestamp = tf.keras.layers.experimental.preprocessing.Normalization()
​
    self.normalized_timestamp.adapt(timestamps)
​
  def call(self, inputs):
    # Take the input dictionary, pass it through each input layer,
    # and concatenate the result.
    return tf.concat([
        self.user_embedding(inputs["user_id"]),
        self.timestamp_embedding(inputs["timestamp"]),
        self.normalized_timestamp(inputs["timestamp"]),
    ], axis=1)

Defining deeper models will require us to stack mode layers on top of this first input. A progressively narrower stack of layers, separated by an activation function, is a common pattern:

                            +----------------------+
                            |      128 x 64        |
                            +----------------------+
                                       | relu
                          +--------------------------+
                          |        256 x 128         |
                          +--------------------------+
                                       | relu
                        +------------------------------+
                        |          ... x 256           |
                        +------------------------------+

Since the expressive power of deep linear models is no greater than that of shallow linear models, we use ReLU activations for all but the last hidden layer. The final hidden layer does not use any activation function: using an activation function would limit the output space of the final embeddings and might negatively impact the performance of the model. For instance, if ReLUs are used in the projection layer, all components in the output embedding would be non-negative.

We're going to try something similar here. To make experimentation with different depths easy, let's define a model whose depth (and width) is defined by a set of constructor parameters.

class QueryModel(tf.keras.Model):
  """Model for encoding user queries."""
​
  def __init__(self, layer_sizes):
    """Model for encoding user queries.
​
    Args:
      layer_sizes:
        A list of integers where the i-th entry represents the number of units
        the i-th layer contains.
    """
    super().__init__()
​
    # We first use the user model for generating embeddings.
    self.embedding_model = UserModel()
​
    # Then construct the layers.
    self.dense_layers = tf.keras.Sequential()
​
    # Use the ReLU activation for all but the last layer.
    for layer_size in layer_sizes[:-1]:
      self.dense_layers.add(tf.keras.layers.Dense(layer_size, activation="relu"))
​
    # No activation for the last layer.
    for layer_size in layer_sizes[-1:]:
      self.dense_layers.add(tf.keras.layers.Dense(layer_size))
​
  def call(self, inputs):
    feature_embedding = self.embedding_model(inputs)
    return self.dense_layers(feature_embedding)

The layer_sizes parameter gives us the depth and width of the model. We can vary it to experiment with shallower or deeper models.

Candidate model

We can adopt the same approach for the movie model. Again, we start with the MovieModel from the featurization tutorial:

class MovieModel(tf.keras.Model):
​
  def __init__(self):
    super().__init__()
​
    max_tokens = 10_000
​
    self.title_embedding = tf.keras.Sequential([
      tf.keras.layers.experimental.preprocessing.StringLookup(
          vocabulary=unique_movie_titles,mask_token=None),
      tf.keras.layers.Embedding(len(unique_movie_titles) + 1, 32)
    ])
​
    self.title_vectorizer = tf.keras.layers.experimental.preprocessing.TextVectorization(
        max_tokens=max_tokens)
​
    self.title_text_embedding = tf.keras.Sequential([
      self.title_vectorizer,
      tf.keras.layers.Embedding(max_tokens, 32, mask_zero=True),
      tf.keras.layers.GlobalAveragePooling1D(),
    ])
​
    self.title_vectorizer.adapt(movies)
​
  def call(self, titles):
    return tf.concat([
        self.title_embedding(titles),
        self.title_text_embedding(titles),
    ], axis=1)

And expand it with hidden layers:

class CandidateModel(tf.keras.Model):
  """Model for encoding movies."""
​
  def __init__(self, layer_sizes):
    """Model for encoding movies.
​
    Args:
      layer_sizes:
        A list of integers where the i-th entry represents the number of units
        the i-th layer contains.
    """
    super().__init__()
​
    self.embedding_model = MovieModel()
​
    # Then construct the layers.
    self.dense_layers = tf.keras.Sequential()
​
    # Use the ReLU activation for all but the last layer.
    for layer_size in layer_sizes[:-1]:
      self.dense_layers.add(tf.keras.layers.Dense(layer_size, activation="relu"))
​
    # No activation for the last layer.
    for layer_size in layer_sizes[-1:]:
      self.dense_layers.add(tf.keras.layers.Dense(layer_size))
​
  def call(self, inputs):
    feature_embedding = self.embedding_model(inputs)
    return self.dense_layers(feature_embedding)

Combined model

With both QueryModel and CandidateModel defined, we can put together a combined model and implement our loss and metrics logic. To make things simple, we'll enforce that the model structure is the same across the query and candidate models.

class MovielensModel(tfrs.models.Model):
​
  def __init__(self, layer_sizes):
    super().__init__()
    self.query_model = QueryModel(layer_sizes)
    self.candidate_model = CandidateModel(layer_sizes)
    self.task = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=movies.batch(128).map(self.candidate_model),
        ),
    )
​
  def compute_loss(self, features, training=False):
    # We only pass the user id and timestamp features into the query model. This
    # is to ensure that the training inputs would have the same keys as the
    # query inputs. Otherwise the discrepancy in input structure would cause an
    # error when loading the query model after saving it.
    query_embeddings = self.query_model({
        "user_id": features["user_id"],
        "timestamp": features["timestamp"],
    })
    movie_embeddings = self.candidate_model(features["movie_title"])
​
    return self.task(
        query_embeddings, movie_embeddings, compute_metrics=not training)

Training the model

Prepare the data

We first split the data into a training set and a testing set.

tf.random.set_seed(42)
shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)
​
train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)
​
cached_train = train.shuffle(100_000).batch(2048)
cached_test = test.batch(4096).cache()

Shallow model

We're ready to try out our first, shallow, model!

num_epochs = 300
​
model = MovielensModel([32])
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
​
one_layer_history = model.fit(
    cached_train,
    validation_data=cached_test,
    validation_freq=5,
    epochs=num_epochs,
    verbose=0)
​
accuracy = one_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"][-1]
print(f"Top-100 accuracy: {accuracy:.2f}.")

This gives us a top-100 accuracy of around 0.27. We can use this as a reference point for evaluating deeper models.

Deeper model

What about a deeper model with two layers?

model = MovielensModel([64, 32])
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
​
two_layer_history = model.fit(
    cached_train,
    validation_data=cached_test,
    validation_freq=5,
    epochs=num_epochs,
    verbose=0)
​
accuracy = two_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"][-1]
print(f"Top-100 accuracy: {accuracy:.2f}.")

The accuracy here is 0.29, quite a bit better than the shallow model.

We can plot the validation accuracy curves to illustrate this:

Even early on in the training, the larger model has a clear and stable lead over the shallow model, suggesting that adding depth helps the model capture more nuanced relationships in the data. However, even deeper models are not necessarily better. The following model extends the depth to three layers:

model = MovielensModel([128, 64, 32])
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
​
three_layer_history = model.fit(
    cached_train,
    validation_data=cached_test,
    validation_freq=5,
    epochs=num_epochs,
    verbose=0)
​
accuracy = three_layer_history.history["val_factorized_top_k/top_100_categorical_accuracy"][-1]
print(f"Top-100 accuracy: {accuracy:.2f}.")

代码链接: https://codechina.csdn.net/csdn_codechina/enterprise_technology/-/blob/master/NLP_recommend/Building%20deep%20retrieval%20models.ipynb

原创声明,本文系作者授权云+社区发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 【论文推荐】最新六篇聊天机器人相关论文—弱监督信息、内容驱动、对话管理系统、可扩展情感序列到序列、自主性

    WZEARW
  • 【世界读书日】2018版十大引用数最高的深度学习论文集合

    量化投资与机器学习微信公众号
  • 图像检索中的DELF模型(DEep Local Features)实践

      近日,抽空跑通了delf模型,它已经成为tensorflow models中research的一个子工程(见网址:https://github.com/te...

    sparkexpert
  • 最全深度学习资源集合(Github:Awesome Deep Learning)Awesome Deep Learning

    偶然在github上看到Awesome Deep Learning项目,故分享一下。其中涉及深度学习的免费在线书籍、课程、视频及讲义、论文、教程、网站、数据集、...

    古柳_DesertsX
  • 【专知荟萃05】聊天机器人Chatbot知识资料全集(入门/进阶/论文/软件/数据/专家等)(附pdf下载)

    【导读】主题荟萃知识是专知的核心功能之一,为用户提供AI领域系统性的知识学习服务。主题荟萃为用户提供全网关于该主题的精华(Awesome)知识资料收录整理,使得...

    WZEARW
  • Recommending movies: retrieval

    Real-world recommender systems are often composed of two stages:

    XianxinMao
  • 【论文推荐】最新5篇聊天机器人(Chatbot)相关论文—深度强化学习、社交聊天机器人小冰、对话聊天助手、序列-序列、动态词汇

    【导读】专知内容组整理了最近五篇聊天机器人(Chatbot)相关文章,为大家进行介绍,欢迎查看! 1. A Deep Reinforcement Learnin...

    WZEARW
  • 【重磅】深度学习顶会 ICLR 2018 匿名提交论文列表(附pdf下载链接)

    【导读】ICLR,全称为「International Conference on Learning Representations」(国际学习表征会议),201...

    WZEARW
  • 聊天机器人资源合集:项目,语聊,论文,教程。

    Awesome Chatbot ? Github:https://github.com/fendouai/Awesome-Chatbot Chatbot Par...

    磐创AI
  • 【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

    WZEARW
  • 【论文推荐】最新五篇度量学习相关论文—无标签、三维姿态估计、主动度量学习、深度度量学习、层次度量学习与匹配

    【导读】专知内容组整理了最近五篇度量学习(Metric Learning )相关文章,为大家进行介绍,欢迎查看! 1.Mining on Manifolds: ...

    WZEARW
  • 【专知荟萃12】信息检索 Information Retrieval 知识资料全集(入门/进阶/综述/代码/专家,附PDF下载)

    【导读】主题荟萃知识是专知的核心功能之一,为用户提供AI领域系统性的知识学习服务。主题荟萃为用户提供全网关于该主题的精华(Awesome)知识资料收录整理,使得...

    WZEARW
  • 【论文推荐】最新6篇行人重识别相关论文—深度空间特征重构、生成对抗网络、图像生成、系列实战、图像-图像域自适应方法、行人检索

    【导读】专知内容组整理了最近六篇行人重识别(Person Re-identification)相关文章,为大家进行介绍,欢迎查看! 1. Deep Spatia...

    WZEARW
  • [计算机视觉论文速递] 2018-03-11

    通知:这篇推文有10篇论文速递信息,涉及目标检测、行人重识别Re-ID、图像检索和Zero-Shot Learning等方向 这篇文章本来是在2018-03-1...

    Amusi
  • 计算机视觉论文-2021-07-08

    1, TITLE:Samplets: A New Paradigm for Data Compression

    计算机视觉联盟
  • 【论文推荐】最新六篇自动问答相关论文—排序函数、文本摘要评估、信息抽取框架、层次递归编码器、半监督问答

    WZEARW
  • 2018 AI、机器学习、深度学习与 Tensorflow 相关优秀书籍、课程、示例链接集锦

    人工智能、深度学习与 Tensorflow 相关书籍、课程、示例列表是笔者 Awesome Links 系列的一部分;对于其他的资料集锦、模型、开源工具与框架请...

    王下邀月熊
  • 使用基于深度学习的暹罗结构和成对存在矩阵的多标签声音事件检索(CS SD)

    真实的音景录音通常有多个声音事件同时发生,如汽车喇叭、引擎和人声。声音事件检索是一种基于内容的搜索,目的是找到音频样本,类似于基于声音或语义内容的音频查询。目前...

    用户6853689
  • 检索内容中的社会偏见:BERT排名器的测量框架和对抗性缓解措施(CS IR)

    社会偏见在信息检索(IR)系统的检索内容中产生了共鸣,导致现有的陈规定型观念得到加强。要解决这个问题,需要建立关于各种社会群体在检索内容中的表现的公平性措施,以...

    用户8128510

扫码关注云+社区

领取腾讯云代金券