fastNLP

mirror of https://gitee.com/fastnlp/fastNLP.git synced 2024-12-04 13:17:51 +08:00

History

FengZiYjun 501ffb26c5 optimize CWS example - see test_fastNLP.py - update interpret_word_seg_results in fastnlp.py - delete useless data to increase git clone speed		2018-08-31 11:23:40 +08:00
..
__init__.py	- add validation loss into trainer.train	2018-07-11 21:51:35 +08:00
.gitignore	- add validation loss into trainer.train	2018-07-11 21:51:35 +08:00
dataset.py	- add validation loss into trainer.train	2018-07-11 21:51:35 +08:00
model.py	- add validation loss into trainer.train	2018-07-11 21:51:35 +08:00
README.md	- add validation loss into trainer.train	2018-07-11 21:51:35 +08:00
train.py	updates to core, loader, test:	2018-08-19 16:21:14 +08:00

README.md

Introduction

This is the implementation of Convolutional Neural Networks for Sentence Classification paper in PyTorch.

MRDataset, non-static-model(word2vec rained by Mikolov etal. (2013) on 100 billion words of Google News)
It can be run in both CPU and GPU
The best accuracy is 82.61%, which is better than 81.5% in the paper (by Jingyuan Liu @Fudan University; Email:(fdjingyuan@outlook.com) Welcome to discussion!)

Requirement

python 3.6
pytorch > 0.1
numpy
gensim

Run

STEP 1 install packages like gensim (other needed pakages is the same)

pip install gensim

STEP 2 install MRdataset and word2vec resources

MRdataset: you can download the dataset in (https://www.cs.cornell.edu/people/pabo/movie-review-data/rt-polaritydata.tar.gz)
word2vec: you can download the file in (https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit)

Since this file is more than 1.5G, I did not display in folders. If you download the file, please remember modify the path in Function def word_embeddings(path = './GoogleNews-vectors-negative300.bin/'):

STEP 3 train the model

python train.py

you will get the information printed in the screen, like

Epoch [1/20], Iter [100/192] Loss: 0.7008
Test Accuracy: 71.869159 %
Epoch [2/20], Iter [100/192] Loss: 0.5957
Test Accuracy: 75.700935 %
Epoch [3/20], Iter [100/192] Loss: 0.4934
Test Accuracy: 78.130841 %

......
Epoch [20/20], Iter [100/192] Loss: 0.0364
Test Accuracy: 81.495327 %
Best Accuracy: 82.616822 %
Best Model: models/cnn.pkl

Hyperparameters

According to the paper and experiment, I set:

Epoch	Kernel Size	dropout	learning rate	batch size
20	`h,300,100`	0.5	0.0001	50

h = [3,4,5] If the accuracy is not improved, the learning rate will *0.8.

Result

I just tried one dataset : MR. (Other 6 dataset in paper SST-1, SST-2, TREC, CR, MPQA) There are four models in paper: CNN-rand, CNN-static, CNN-non-static, CNN-multichannel. I have tried CNN-non-static:A model with pre-trained vectors from word2vec. All words—including the unknown ones that are randomly initialized and the pretrained vectors are fine-tuned for each task (which has almost the best performance and the most difficut to implement among the four models)

Dataset	Class Size	Best Result	Kim's Paper Result
MR	2	82.617%(CNN-non-static)	81.5%(CNN-nonstatic)

README.md

Introduction

Requirement

Run

Hyperparameters

Result

Reference