RoBERTa

模型描述

Transformer 的雙向編碼器表示，簡稱 BERT，是一種革命性的自監督預訓練技術，它學習預測文字中故意隱藏（被掩蓋）的部分。至關重要的是，BERT 所學習的表示已被證明可以很好地泛化到下游任務，並且當 BERT 於 2018 年首次釋出時，它在許多 NLP 基準資料集上取得了最先進的結果。

RoBERTa 建立在 BERT 的語言掩碼策略之上，並修改了 BERT 中的關鍵超引數，包括移除 BERT 的下一句預訓練目標，並使用更大的 mini-batches 和學習率進行訓練。RoBERTa 的訓練資料量也比 BERT 多一個數量級，訓練時間也更長。這使得 RoBERTa 的表示與 BERT 相比，能更好地泛化到下游任務。

要求

我們需要一些額外的 Python 依賴項來進行預處理

pip install regex requests hydra-core omegaconf

示例

載入 RoBERTa

import torch
roberta = torch.hub.load('pytorch/fairseq', 'roberta.large')
roberta.eval()  # disable dropout (or leave in train mode to finetune)

將位元組對編碼 (BPE) 應用於輸入文字

tokens = roberta.encode('Hello world!')
assert tokens.tolist() == [0, 31414, 232, 328, 2]
assert roberta.decode(tokens) == 'Hello world!'

從 RoBERTa 提取特徵

# Extract the last layer's features
last_layer_features = roberta.extract_features(tokens)
assert last_layer_features.size() == torch.Size([1, 5, 1024])

# Extract all layer's features (layer 0 is the embedding layer)
all_layers = roberta.extract_features(tokens, return_all_hiddens=True)
assert len(all_layers) == 25
assert torch.all(all_layers[-1] == last_layer_features)

將 RoBERTa 用於句子對分類任務

# Download RoBERTa already finetuned for MNLI
roberta = torch.hub.load('pytorch/fairseq', 'roberta.large.mnli')
roberta.eval()  # disable dropout for evaluation

with torch.no_grad():
    # Encode a pair of sentences and make a prediction
    tokens = roberta.encode('Roberta is a heavily optimized version of BERT.', 'Roberta is not very optimized.')
    prediction = roberta.predict('mnli', tokens).argmax().item()
    assert prediction == 0  # contradiction

    # Encode another pair of sentences
    tokens = roberta.encode('Roberta is a heavily optimized version of BERT.', 'Roberta is based on BERT.')
    prediction = roberta.predict('mnli', tokens).argmax().item()
    assert prediction == 2  # entailment

註冊一個新的（隨機初始化的）分類頭

roberta.register_classification_head('new_task', num_classes=3)
logprobs = roberta.predict('new_task', tokens)  # tensor([[-1.1050, -1.0672, -1.1245]], grad_fn=<LogSoftmaxBackward>)

參考文獻

一種穩健最佳化的 BERT 預訓練方法

模型型別： Nlp

提交者：Facebook AI (fairseq 團隊)

在 GitHub 上檢視 31.9k

在Google Collab上開啟

開啟模型演示