fastNLP/reproduction/text_classification
2020-06-28 14:38:02 +08:00
..
data delete the alias in files. 2019-09-04 14:47:45 +08:00
model 在@linzehui 的帮助下seq2seq终于有了第一个版本; 目前实现了Seq2Seq的Transformer和LSTM版本,但metric和loss还没update; 2020-06-28 14:38:02 +08:00
test add TC/MTL16Loader 2019-06-16 23:43:37 +08:00
utils -update DPCNN & train script 2019-07-07 16:07:26 +08:00
README.md [verify]charcnn use pipe,remove dataloader 2019-08-29 15:28:50 +08:00
train_awdlstm.py datasetloader改成pipe 2019-08-30 01:21:59 +08:00
train_bert.py Merge branch 'dev0.5.0' of https://github.com/fastnlp/fastNLP into dev0.5.0 2019-07-12 09:56:35 +08:00
train_char_cnn.py [bugfix]修复了一些文档错误 2019-11-15 23:21:52 +08:00
train_dpcnn.py [update] change data-loader to pipe 2019-08-30 21:48:08 +08:00
train_HAN.py 增加fastNLP.embeddings模块并修改对应的现有代码以适配fastNLP.embeddings 2019-07-12 04:07:47 +08:00
train_lstm_att.py datasetloader改成pipe 2019-08-30 01:21:59 +08:00
train_lstm.py datasetloader改成pipe 2019-08-30 01:21:59 +08:00

text_classification任务模型复现

这里使用fastNLP复现以下模型

char_cnn :论文链接Character-level Convolutional Networks for Text Classification

dpcnn:论文链接Deep Pyramid Convolutional Neural Networks for TextCategorization

HAN:论文链接Hierarchical Attention Networks for Document Classification

LSTM+self_attention:论文链接A Structured Self-attentive Sentence Embedding

AWD-LSTM:论文链接Regularizing and Optimizing LSTM Language Models

#数据集来源 IMDBhttp://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz SST-2https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FSST-2.zip?alt=media&token=aabc5f6b-e466-44a2-b9b4-cf6337f84ac8 SSThttps://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip yelp_full:https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M yelp_polarity:https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M

dataset classes train samples dev samples test samples refer
yelp_polarity 2 560k - 38k char_cnn
yelp_full 5 650k - 50k char_cnn
IMDB 2 25k - 25k IMDB
sst-2 2 67k 872 1.8k GLUE

数据集及复现结果汇总

使用fastNLP复现的结果vs论文汇报结果(/前为fastNLP实现后面为论文报道,-表示论文没有在该数据集上列出结果)

model name yelp_p yelp_f sst-2 IMDB
char_cnn 93.80/95.12 - - -
dpcnn 95.50/97.36 - - -
HAN - - - -
LSTM 95.74/- 64.16/- - 88.52/-
AWD-LSTM 95.96/- 64.74/- - 88.91/-
LSTM+self_attention 96.34/- 65.78/- - 89.53/-