文章/答案/技术大牛

发布

社区首页 >问答首页 >如何冻结T5变压器模型的部件

问如何冻结T5变压器模型的部件
EN

Stack Overflow用户

提问于 2022-02-09 11:10:46

回答 1查看 808关注 0票数 2

我知道T5在每一层中都有K，Q和V矢量。它还有一个前馈网络。我想冻结K，Q和V向量，只训练每一层T5的前馈层。我用的是毕火炬图书馆。该模型可以是拥抱面T5模型的包装器，也可以是它的修改版本。我知道如何使用以下代码冻结所有参数：

tokenizer = AutoTokenizer.from_pretrained(underlying_model_name)
model = T5ForConditionalGeneration.from_pretrained(underlying_model_name)

for p in model.parameters():
    p.requires_grad = False # freezing

你能指点我怎么做吗？

这个github项目可能会很有帮助，但是对于Roberta和GPT，我能把它改编成T5吗？

huggingface-transformers

t5-transformer

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-02-10 15:51:20

我已经适应了一个基于这一讨论的解决方案，这个解决方案来自Huggingface论坛。基本上，您必须指定您想要冻结的模块/pytorch层的名称。

在T5的特定示例中，我首先查看了模型摘要：

from transformers import T5ModelForConditionalGeneration

model = T5ModelForConditionalGeneration.from_pretrained("t5-small")
print(model)

这提供了以下(简略的输出)：

T5ForConditionalGeneration(
  (shared): Embedding(32128, 512)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 512)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=512, out_features=512, bias=False)
              (k): Linear(in_features=512, out_features=512, bias=False)
              (v): Linear(in_features=512, out_features=512, bias=False)
              (o): Linear(in_features=512, out_features=512, bias=False)
              (relative_attention_bias): Embedding(32, 8)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseReluDense(
              (wi): Linear(in_features=512, out_features=2048, bias=False)
              (wo): Linear(in_features=2048, out_features=512, bias=False)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
[...]  # abbreviated output

这样，我们就可以生成一个要冻结的模块列表。特别是，我决定冻结编码器的整个T5LayerSelfAttention块(另外还有用于解码器的T5LayerCrossAttention )：

# All modules in the 
modules_to_freeze = [model.encoder.block[i].layer[0] for i in range(len(model.encoder.block))]
# And the decoder modules, which has both a SelfAttention (layer[0]) 
modules_to_freeze.extend([model.decoder.block[i].layer[0] for i in range(len(model.decoder.block))])
# and CrossAttention (layer[1]) block
modules_to_freeze.extend([model.decoder.block[i].layer[1] for i in range(len(model.decoder.block))])

然后简单地冻结各个模块中的所有参数：

for module in modules_to_freeze:
    for param in module.parameters():
        param.requires_grad = False  # Actual freezing operation

您可以通过运行以下命令来验证这些代码在您的模型中是否真的被冻结：

for param in model.parameters():
    print(param.requires_grad)

它还应该打印出相当多的False。如果你真的只想冻结K，Q和V，你可以调整上面的过程，只需要分选择你想要的模块。

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71048521

复制

相似问题

问如何冻结T5变压器模型的部件
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何冻结T5变压器模型的部件EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何冻结T5变压器模型的部件
EN