我知道T5在每一层中都有K,Q和V矢量。它还有一个前馈网络。我想冻结K,Q和V向量,只训练每一层T5的前馈层。我用的是毕火炬图书馆。该模型可以是拥抱面T5模型的包装器,也可以是它的修改版本。我知道如何使用以下代码冻结所有参数:
tokenizer = AutoTokenizer.from_pretrained(underlying_model_name)
model = T5ForConditionalGeneration.from_pretrained(underlying_model_name)
for p in model.parameters():
p.requires_grad = False # freezing
你能指点我怎么做吗?
这个github项目可能会很有帮助,但是对于Roberta和GPT,我能把它改编成T5吗?
发布于 2022-02-10 15:51:20
我已经适应了一个基于这一讨论的解决方案,这个解决方案来自Huggingface论坛。基本上,您必须指定您想要冻结的模块/pytorch层的名称。
在T5的特定示例中,我首先查看了模型摘要:
from transformers import T5ModelForConditionalGeneration
model = T5ModelForConditionalGeneration.from_pretrained("t5-small")
print(model)
这提供了以下(简略的输出):
T5ForConditionalGeneration(
(shared): Embedding(32128, 512)
(encoder): T5Stack(
(embed_tokens): Embedding(32128, 512)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
(relative_attention_bias): Embedding(32, 8)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseReluDense(
(wi): Linear(in_features=512, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
[...] # abbreviated output
这样,我们就可以生成一个要冻结的模块列表。特别是,我决定冻结编码器的整个T5LayerSelfAttention
块(另外还有用于解码器的T5LayerCrossAttention
):
# All modules in the
modules_to_freeze = [model.encoder.block[i].layer[0] for i in range(len(model.encoder.block))]
# And the decoder modules, which has both a SelfAttention (layer[0])
modules_to_freeze.extend([model.decoder.block[i].layer[0] for i in range(len(model.decoder.block))])
# and CrossAttention (layer[1]) block
modules_to_freeze.extend([model.decoder.block[i].layer[1] for i in range(len(model.decoder.block))])
然后简单地冻结各个模块中的所有参数:
for module in modules_to_freeze:
for param in module.parameters():
param.requires_grad = False # Actual freezing operation
您可以通过运行以下命令来验证这些代码在您的模型中是否真的被冻结:
for param in model.parameters():
print(param.requires_grad)
它还应该打印出相当多的False
。如果你真的只想冻结K,Q和V,你可以调整上面的过程,只需要分选择你想要的模块。
https://stackoverflow.com/questions/71048521
复制相似问题