fastNLP/reproduction/CNN-sentence_classification
FengZiYjun 7514be6f30 - add validation loss into trainer.train
- restructure: move reproduction outside
- add evaluate in tester
2018-07-11 21:51:35 +08:00
..
rt-polaritydata - add validation loss into trainer.train 2018-07-11 21:51:35 +08:00
__init__.py - add validation loss into trainer.train 2018-07-11 21:51:35 +08:00
.gitignore - add validation loss into trainer.train 2018-07-11 21:51:35 +08:00
dataset.py - add validation loss into trainer.train 2018-07-11 21:51:35 +08:00
model.py - add validation loss into trainer.train 2018-07-11 21:51:35 +08:00
README.md - add validation loss into trainer.train 2018-07-11 21:51:35 +08:00
train.py - add validation loss into trainer.train 2018-07-11 21:51:35 +08:00

Introduction

This is the implementation of Convolutional Neural Networks for Sentence Classification paper in PyTorch.

  • MRDataset, non-static-model(word2vec rained by Mikolov etal. (2013) on 100 billion words of Google News)
  • It can be run in both CPU and GPU
  • The best accuracy is 82.61%, which is better than 81.5% in the paper (by Jingyuan Liu @Fudan University; Email:(fdjingyuan@outlook.com) Welcome to discussion!)

Requirement

  • python 3.6
  • pytorch > 0.1
  • numpy
  • gensim

Run

STEP 1 install packages like gensim (other needed pakages is the same)

pip install gensim

STEP 2 install MRdataset and word2vec resources

Since this file is more than 1.5G, I did not display in folders. If you download the file, please remember modify the path in Function def word_embeddings(path = './GoogleNews-vectors-negative300.bin/'):

STEP 3 train the model

python train.py

you will get the information printed in the screen, like

Epoch [1/20], Iter [100/192] Loss: 0.7008
Test Accuracy: 71.869159 %
Epoch [2/20], Iter [100/192] Loss: 0.5957
Test Accuracy: 75.700935 %
Epoch [3/20], Iter [100/192] Loss: 0.4934
Test Accuracy: 78.130841 %

......
Epoch [20/20], Iter [100/192] Loss: 0.0364
Test Accuracy: 81.495327 %
Best Accuracy: 82.616822 %
Best Model: models/cnn.pkl

Hyperparameters

According to the paper and experiment, I set:

Epoch Kernel Size dropout learning rate batch size
20 h,300,100 0.5 0.0001 50

h = [3,4,5] If the accuracy is not improved, the learning rate will *0.8.

Result

I just tried one dataset : MR. (Other 6 dataset in paper SST-1, SST-2, TREC, CR, MPQA) There are four models in paper: CNN-rand, CNN-static, CNN-non-static, CNN-multichannel. I have tried CNN-non-static:A model with pre-trained vectors from word2vec. All words—including the unknown ones that are randomly initialized and the pretrained vectors are fine-tuned for each task (which has almost the best performance and the most difficut to implement among the four models)

Dataset Class Size Best Result Kim's Paper Result
MR 2 82.617%(CNN-non-static) 81.5%(CNN-nonstatic)

Reference