GPT-3文本生成模型介紹
GPT-3模型是一個通用的預(yù)訓(xùn)練生成模型,使用Transformer的Decoder-only結(jié)構(gòu),可以用于解決下游各種類型的生成任務(wù),特別是zero-shot生成能力。模型利用大量無監(jiān)督數(shù)據(jù),通過自回歸任務(wù)進(jìn)行預(yù)訓(xùn)練。可以用于解決文本生成相關(guān)的任務(wù)包含:文本摘要、問題生成、data-to-text等。
模型描述
GPT-3模型使用Transformer的 Decoder結(jié)構(gòu),并對Transformer Decoder進(jìn)行了一些改動,原本的Decoder包含了兩個 Multi-Head Attention 結(jié)構(gòu),GPT-3只保留了 Mask Multi-Head Attention,利用常規(guī)的語言建模優(yōu)化,從左到右的自回歸預(yù)訓(xùn)練。本模型是基于GPT-3的代碼結(jié)合大量中文無監(jiān)督數(shù)據(jù)和下游任務(wù)數(shù)據(jù)預(yù)訓(xùn)練得到,我們訓(xùn)練了多種不同參數(shù)的模型,此處展示的是GPT-3 Base模型。GPT-3模型介紹,詳見:Language Models are Few-Shot Learners
本項(xiàng)目我們復(fù)現(xiàn)了一系列不同規(guī)模的中文GPT3模型,包括base/large/1.3B/2.7B/13B/30B/175B等,本模型是其中base的版本。全部版本如下表所示:
Model | Layers | Heads | d_model | LR | Batch |
---|
base | 12 | 12 | 768 | 6.0e-4 | 0.5M | large | 24 | 16 | 1024 | 3.0e-4 | 0.5M | 1.3B | 24 | 32 | 2048 | 2.0e-4 | 2M | 2.7B | 32 | 32 | 2560 | 1.6e-4 | 2M | 13B | 40 | 40 | 5120 | 1.0e-4 | 6M | 30B | 48 | 56 | 7168 | 1.0e-4 | 6M | 175B(work in process) | 96 | 96 | 12288 | 1.2e-4 | 6M |
期望模型使用方式以及適用范圍
本模型主要用于多種場景輸入的生成和續(xù)寫。比如用戶可以自行嘗試輸入各種內(nèi)容,然后讓模型去回答、續(xù)寫或者根據(jù)指令回復(fù)
如何使用
在安裝完成ModelScope library之后即可使用GPT-3的text-generation的能力
代碼范例
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
text_generation_zh = pipeline(Tasks.text_generation, model='damo/nlp_gpt3_text-generation_chinese-base')
result_zh = text_generation_zh("隨著計(jì)算機(jī)視覺的飛速發(fā)展,人臉識別技術(shù)已從簡單場景發(fā)展到復(fù)雜場景,也即姿態(tài)、光照、表情、噪聲、遮擋、化妝、年齡、種族、性別等差異化所呈現(xiàn)的復(fù)雜場景。盡管已有的人臉識別系統(tǒng)在特定約束環(huán)境下的識別成功率較高,")
print(result_zh['text'])
模型局限性以及可能的偏差
模型在數(shù)據(jù)集上訓(xùn)練,有可能產(chǎn)生一些偏差,請用戶自行評測后決定如何使用。
訓(xùn)練數(shù)據(jù)介紹
訓(xùn)練數(shù)據(jù)包括中文維基百科、網(wǎng)絡(luò)上公開文本數(shù)據(jù)。
模型訓(xùn)練流程
預(yù)處理
訓(xùn)練數(shù)據(jù)只需包含src_txt字段,推薦使用MsDataset包裝后使用ModelScope的Trainer進(jìn)行訓(xùn)練。
import tempfile
from datasets import Dataset
from modelscope.msdatasets import MsDataset
# 模擬訓(xùn)練數(shù)據(jù)集
src_dataset_dict = {
'src_txt': [
'測試文本1', '測試文本2', '測試文本3'
]
}
src_dataset = MsDataset(Dataset.from_dict(src_dataset_dict))
max_epochs = 3
tmp_dir = tempfile.TemporaryDirectory().name
訓(xùn)練
下面是基于GPT-3中文base模型在詩詞生成數(shù)據(jù)集上二次開發(fā)訓(xùn)練
# 基于modelscope中文gpt3底座二次開發(fā)得到詩詞生成模型代碼
from modelscope.msdatasets import MsDataset
from modelscope.trainers import build_trainer
from modelscope.msdatasets import MsDataset
from modelscope.utils.hub import read_config
from modelscope.metainfo import Metrics, Trainers
from datasets import Dataset
from modelscope.msdatasets import MsDataset
dataset_dict = MsDataset.load('chinese-poetry-collection')
train_dataset = dataset_dict['train'].remap_columns({'text1': 'src_txt'})
eval_dataset = dataset_dict['test'].remap_columns({'text1': 'src_txt'})
print (eval_dataset)
max_epochs = 10
tmp_dir = "./gpt3_poetry"
num_warmup_steps = 100
def noam_lambda(current_step: int):
current_step += 1
return min(current_step ** (-0.5), current_step * num_warmup_steps ** (-1.5))
def cfg_modify_fn(cfg):
cfg.train.lr_scheduler = {
"type": "LambdaLR",
"lr_lambda": noam_lambda,
"options": {"by_epoch": False}
}
cfg.train.optimizer = {
"type": "AdamW",
"lr": 3e-4
}
cfg.train.dataloader = {"batch_size_per_gpu": 16, "workers_per_gpu": 1}
return cfg
kwargs = dict(
model='damo/nlp_gpt3_text-generation_chinese-base',
train_dataset=train_dataset,
eval_datase=eval_dataset,
max_epochs=max_epochs,
work_dir=tmp_dir,
cfg_modify_fn=cfg_modify_fn)
# 構(gòu)造 trainer 并進(jìn)行訓(xùn)練
trainer = build_trainer(
name=Trainers.nlp_base_trainer, default_args=kwargs)
trainer.train()
訓(xùn)練tips
- 訓(xùn)練lr設(shè)置可參考上述表格里面不同模型的設(shè)置
- 對于訓(xùn)練數(shù)據(jù)較長的場景,可適當(dāng)增加訓(xùn)練epoch
相關(guān)論文以及引用信息
如果GPT-3模型對您有幫助,請您引用該模型的相關(guān)文章:
@inproceedings{NEURIPS2020_1457c0d6,
author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},
booktitle = {Advances in Neural Information Processing Systems},
editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
pages = {1877--1901},
publisher = {Curran Associates, Inc.},
title = {Language Models are Few-Shot Learners},
url = {https://proceedings./paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},
volume = {33},
year = {2020}
}
|