7514be6f30
- restructure: move reproduction outside - add evaluate in tester |
||
---|---|---|
.. | ||
rt-polaritydata | ||
__init__.py | ||
.gitignore | ||
dataset.py | ||
model.py | ||
README.md | ||
train.py |
Introduction
This is the implementation of Convolutional Neural Networks for Sentence Classification paper in PyTorch.
- MRDataset, non-static-model(word2vec rained by Mikolov etal. (2013) on 100 billion words of Google News)
- It can be run in both CPU and GPU
- The best accuracy is 82.61%, which is better than 81.5% in the paper (by Jingyuan Liu @Fudan University; Email:(fdjingyuan@outlook.com) Welcome to discussion!)
Requirement
- python 3.6
- pytorch > 0.1
- numpy
- gensim
Run
STEP 1 install packages like gensim (other needed pakages is the same)
pip install gensim
STEP 2 install MRdataset and word2vec resources
- MRdataset: you can download the dataset in (https://www.cs.cornell.edu/people/pabo/movie-review-data/rt-polaritydata.tar.gz)
- word2vec: you can download the file in (https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit)
Since this file is more than 1.5G, I did not display in folders. If you download the file, please remember modify the path in Function def word_embeddings(path = './GoogleNews-vectors-negative300.bin/'):
STEP 3 train the model
python train.py
you will get the information printed in the screen, like
Epoch [1/20], Iter [100/192] Loss: 0.7008
Test Accuracy: 71.869159 %
Epoch [2/20], Iter [100/192] Loss: 0.5957
Test Accuracy: 75.700935 %
Epoch [3/20], Iter [100/192] Loss: 0.4934
Test Accuracy: 78.130841 %
......
Epoch [20/20], Iter [100/192] Loss: 0.0364
Test Accuracy: 81.495327 %
Best Accuracy: 82.616822 %
Best Model: models/cnn.pkl
Hyperparameters
According to the paper and experiment, I set:
Epoch | Kernel Size | dropout | learning rate | batch size |
---|---|---|---|---|
20 | h,300,100 |
0.5 | 0.0001 | 50 |
h = [3,4,5] If the accuracy is not improved, the learning rate will *0.8.
Result
I just tried one dataset : MR. (Other 6 dataset in paper SST-1, SST-2, TREC, CR, MPQA) There are four models in paper: CNN-rand, CNN-static, CNN-non-static, CNN-multichannel. I have tried CNN-non-static:A model with pre-trained vectors from word2vec. All words—including the unknown ones that are randomly initialized and the pretrained vectors are fine-tuned for each task (which has almost the best performance and the most difficut to implement among the four models)
Dataset | Class Size | Best Result | Kim's Paper Result |
---|---|---|---|
MR | 2 | 82.617%(CNN-non-static) | 81.5%(CNN-nonstatic) |