xuyige
e0b23b16db
update data loader of matching
2019-06-24 21:44:43 +08:00
Danqing Wang
79762c4c6c
Add Summarization framework
2019-06-24 19:24:14 +08:00
yh_cc
39dd086262
1.修改CrossEntropyLoss中存在的反直觉bug; 2.更新sequence labeling
2019-06-24 09:56:28 +08:00
xuyige
3593f0a545
fix bugs in matching dataloader
2019-06-23 18:42:57 +08:00
xuyige
d1f531c049
update matching dataloader in reproduction/matching
2019-06-23 18:25:04 +08:00
yh_cc
8f7ed07441
1. 在vocabulary的from_dataset中增加no_create_entry_dataset选项,用于传递dev和test
...
2. 调整各种Embedding的实现,使得确保来自dev和test的未发现词使用unk的表示
3. 在Embedding中增加dropout_word的选项,使得可以随机drop掉词语
4. 以及其它若干小的bug
2019-06-21 11:06:35 +08:00
yh
a137038eb2
修复ELMO与LSTM无法使用nn.DataParallel的问题
2019-06-19 19:43:53 +08:00
yh_cc
4533427ea3
sequence labeling更新
2019-06-19 11:14:41 +08:00
yh_cc
4d138ed7f8
Merge branch 'dev0.5.0' of github.com:fastnlp/fastNLP into dev0.5.0
2019-06-18 10:02:29 +08:00
yh_cc
9a8fe42cd4
新增NER的数据加载与模型代码; 修改metric中的typo; 修改LSTM中的默认初始化将forget gate设置为1.
2019-06-18 10:02:24 +08:00
xuyige
93620e76ed
update framework of matching
2019-06-18 02:04:53 +08:00
xuyige
342b7026d7
Merge remote-tracking branch 'origin/dev0.5.0' into dev0.5.0
2019-06-17 21:48:42 +08:00
xuyige
39388567ad
update matching.py
2019-06-17 21:48:18 +08:00
yh_cc
2f5d8967a3
1. 适配将Batch修改为pytorch的DataLoader的修改
...
2. 修改embedding.py中的bug
3. ConllReader默认跳过所有的DOCSTART标签
4. 交换bert的heavy lifting到_bert, 将BertEncoder在bert.py中暴露
5. crf中allow_transition的include_end_start修改为false,以与CRF的默认值适配
6. allow_transition与SpanMetric支持BIOES类型的tag
7. datainfo中增加打印格式化输出
2019-06-17 20:18:07 +08:00
yh_cc
839d712467
增强field中的value_count支持对nested的field的支持
2019-06-17 16:46:39 +08:00
yhcc
66f51395d7
Merge pull request #166 from fastnlp/batch
...
[new] 兼容pytorch的DataLoader,替换Batch为DataSetIter
2019-06-17 16:42:28 +08:00
lyhuang18
c78811f87f
add TC/MTL16Loader
2019-06-16 23:43:37 +08:00
yh_cc
4b5113cbea
prefecth变更为deprecated warning;
2019-06-15 14:17:48 +08:00
yh_cc
17b5fd0066
1. 删除Trainer中对train_data必须为DataSet的assert
...
2. 删除Trainer的prefetch参数; 在注释中增加num_workers参数
3. Trainer中默认sampler为RandomSampler
2019-06-15 13:10:28 +08:00
yh_cc
6309eafd25
1. 在fieldarray中支持split,int等handy的function
...
2. 重大更新,支持ElmoEmbedding, BertEmbedding
2019-06-12 11:10:33 +08:00
yh_cc
37c50d6625
Merge branch 'dev0.5.0' of github.com:fastnlp/fastNLP into dev0.5.0
2019-06-11 16:42:11 +08:00
Violet Yao
83729dfc39
moved test to reproduction folder
2019-06-09 21:46:34 +08:00
Violet Yao
ad6a55ba26
fixed comment format
2019-06-08 14:32:25 +08:00
Violet Yao
2edb2a1a00
added yelpLoader
2019-06-08 14:27:52 +08:00
yh_cc
9e5c4f665c
Merge branch 'dev0.5.0' of github.com:fastnlp/fastNLP into dev0.5.0
2019-06-08 09:48:11 +08:00
yh_cc
bddce51b05
merge update
2019-06-08 09:47:39 +08:00
xuyige
8e82c91751
update bert for nli in reproduction/matching
2019-06-06 00:09:25 +08:00
xuyige
96687251a8
update reproduction/README.md
2019-06-05 16:10:13 +08:00
xuyige
e643d7aed5
update reproduction/README.md
2019-06-05 16:07:56 +08:00
yunfan
60de1b2c52
add text_classification reproduction dir
2019-06-05 11:05:46 +08:00
xuyige
e05c182b05
firstly add matching in reproduction
2019-06-05 01:01:30 +08:00
yh_cc
d71f0eef13
序列标注的SemiCRFRelay中文分词.
2019-06-04 23:40:46 +08:00
yh
07c3533126
cws实例第一次提交
2019-06-04 22:16:58 +08:00
ChenXin
881ce01762
Dev0.4.0 ( #149 )
...
* 1. CRF增加支持bmeso类型的tag 2. vocabulary中增加注释
* BucketSampler增加一条错误检测
* 1.修改ClipGradientCallback的bug;删除LRSchedulerCallback中的print,之后应该传入pbar进行打印;2.增加MLP注释
* update MLP module
* 增加metric注释;修改trainer save过程中的bug
* Update README.md
fix tutorial link
* Add ENAS (Efficient Neural Architecture Search)
* add ignore_type in DataSet.add_field
* * AutoPadder will not pad when dtype is None
* add ignore_type in DataSet.apply
* 修复fieldarray中padder潜在bug
* 修复crf中typo; 以及可能导致数值不稳定的地方
* 修复CRF中可能存在的bug
* change two default init arguments of Trainer into None
* Changes to Callbacks:
* 给callback添加给定几个只读属性
* 通过manager设置这些属性
* 代码优化,减轻@transfer的负担
* * 将enas相关代码放到automl目录下
* 修复fast_param_mapping的一个bug
* Trainer添加自动创建save目录
* Vocabulary的打印,显示内容
* * 给vocabulary添加遍历方法
* 修复CRF为负数的bug
* add SQuAD metric
* add sigmoid activate function in MLP
* - add star transformer model
- add ConllLoader, for all kinds of conll-format files
- add JsonLoader, for json-format files
- add SSTLoader, for SST-2 & SST-5
- change Callback interface
- fix batch multi-process when killed
- add README to list models and their performance
* - fix test
* - fix callback & tests
* - update README
* 修改部分bug;调整callback
* 准备发布0.4.0版本“
* update readme
* support parallel loss
* 防止多卡的情况导致无法正确计算loss“
* update advance_tutorial jupyter notebook
* 1. 在embedding_loader中增加新的读取函数load_with_vocab(), load_without_vocab, 比之前的函数改变主要在(1)不再需要传入embed_dim(2)自动判断当前是word2vec还是glove.
2. vocabulary增加from_dataset(), index_dataset()函数。避免需要多行写index dataset的问题。
3. 在utils中新增一个cache_result()修饰器,用于cache函数的返回值。
4. callback中新增update_every属性
* 1.DataSet.apply()报错时提供错误的index
2.Vocabulary.from_dataset(), index_dataset()提供报错时的vocab顺序
3.embedloader在embed读取时遇到不规则的数据跳过这一行.
* update attention
* doc tools
* fix some doc errors
* 修改为中文注释,增加viterbi解码方法
* 样例版本
* - add pad sequence for lstm
- add csv, conll, json filereader
- update dataloader
- remove useless dataloader
- fix trainer loss print
- fix tests
* - fix test_tutorial
* 注释增加
* 测试文档
* 本地暂存
* 本地暂存
* 修改文档的顺序
* - add document
* 本地暂存
* update pooling
* update bert
* update documents in MLP
* update documents in snli
* combine self attention module to attention.py
* update documents on losses.py
* 对DataSet的文档进行更新
* update documents on metrics
* 1. 删除了LSTM中print的内容; 2. 将Trainer和Tester的use_cuda修改为了device; 3.补充Trainer的文档
* 增加对Trainer的注释
* 完善了trainer,callback等的文档; 修改了部分代码的命名以使得代码从文档中隐藏
* update char level encoder
* update documents on embedding.py
* - update doc
* 补充注释,并修改部分代码
* - update doc
- add get_embeddings
* 修改了文档配置项
* 修改embedding为init_embed初始化
* 1.增加对Trainer和Tester的多卡支持;
* - add test
- fix jsonloader
* 删除了注释教程
* 给 dataset 增加了get_field_names
* 修复bug
* - add Const
- fix bugs
* 修改部分注释
* - add model runner for easier test models
- add model tests
* 修改了 docs 的配置和架构
* 修改了核心部分的一大部分文档,TODO:
1. 完善 trainer 和 tester 部分的文档
2. 研究注释样例与测试
* core部分的注释基本检查完成
* 修改了 io 部分的注释
* 全部改为相对路径引用
* 全部改为相对路径引用
* small change
* 1. 从安装文件中删除api/automl的安装
2. metric中存在seq_len的bug
3. sampler中存在命名错误,已修改
* 修复 bug :兼容 cpu 版本的 PyTorch
TODO:其它地方可能也存在类似的 bug
* 修改文档中的引用部分
* 把 tqdm.autonotebook 换成tqdm.auto
* - fix batch & vocab
* 上传了文档文件 *.rst
* 上传了文档文件和若干 TODO
* 讨论并整合了若干模块
* core部分的测试和一些小修改
* 删除了一些冗余文档
* update init files
* update const files
* update const files
* 增加cnn的测试
* fix a little bug
* - update attention
- fix tests
* 完善测试
* 完成快速入门教程
* 修改了sequence_modeling 命名为 sequence_labeling 的文档
* 重新 apidoc 解决改名的遗留问题
* 修改文档格式
* 统一不同位置的seq_len_to_mask, 现统一到core.utils.seq_len_to_mask
* 增加了一行提示
* 在文档中展示 dataset_loader
* 提示 Dataset.read_csv 会被 CSVLoader 替换
* 完成 Callback 和 Trainer 之间的文档
* index更新了部分
* 删除冗余的print
* 删除用于分词的metric,因为有可能引起错误
* 修改文档中的中文名称
* 完成了详细介绍文档
* tutorial 的 ipynb 文件
* 修改了一些介绍文档
* 修改了 models 和 modules 的主页介绍
* 加上了 titlesonly 这个设置
* 修改了模块文档展示的标题
* 修改了 core 和 io 的开篇介绍
* 修改了 modules 和 models 开篇介绍
* 使用 .. todo:: 隐藏了可能被抽到文档中的 TODO 注释
* 修改了一些注释
* delete an old metric in test
* 修改 tutorials 的测试文件
* 把暂不发布的功能移到 legacy 文件夹
* 删除了不能运行的测试
* 修改 callback 的测试文件
* 删除了过时的教程和测试文件
* cache_results 参数的修改
* 修改 io 的测试文件; 删除了一些过时的测试
* 修复bug
* 修复无法通过test_utils.py的测试
* 修复与pytorch1.1中的padsequence的兼容问题; 修改Trainer的pbar
* 1. 修复metric中的bug; 2.增加metric测试
* add model summary
* 增加别名
* 删除encoder中的嵌套层
* 修改了 core 部分 import 的顺序,__all__ 暴露的内容
* 修改了 models 部分 import 的顺序,__all__ 暴露的内容
* 修改了文件名
* 修改了 modules 模块的__all__ 和 import
* fix var runn
* 增加vocab的clear方法
* 一些符合 PEP8 的微调
* 更新了cache_results的例子
* 1. 对callback中indices潜在None作出提示;2.DataSet支持通过List进行index
* 修改了一个typo
* 修改了 README.md
* update documents on bert
* update documents on encoder/bert
* 增加一个fitlog callback,实现与fitlog实验记录
* typo
* - update dataset_loader
* 增加了到 fitlog 文档的链接。
* 增加了 DataSet Loader 的文档
* - add star-transformer reproduction
2019-05-22 18:43:56 +08:00
FengZiYjun
0c5630bd16
Ready for V0.3.1
...
* 升级parser API和模型
* update docs: add new pages for tutorials
* upgrade CWS api download source
* add a new method for dataset field access
* add introduction for bert
* add more unit tests for api/processor
* remove unused test data. Add new test data.
2019-02-04 09:44:54 +08:00
FengZiYjun
986541139a
整理所有dataset loader,建立单元测试
2019-02-02 16:46:42 +08:00
FengZiYjun
887fc9281f
update callbacks:
...
* rename callback methods. Use fastai's notation.
* add a new callback method - on_valid_begin
2019-01-25 21:43:24 +08:00
yunfan
c02980e006
Merge branch 'yyff' into dev
2019-01-21 14:55:53 +08:00
yunfan
e93c6f0053
Merge branch 'dev' of https://github.com/choosewhatulike/fastNLP-private into dev
2019-01-21 14:52:24 +08:00
FengZiYjun
b14dd58828
Update POS API
2019-01-19 18:48:57 +08:00
yunfan
de856fb8eb
update reproduction
2019-01-19 16:22:01 +08:00
yunfan
eb55856c78
- fix parser train
2019-01-19 16:07:10 +08:00
FengZiYjun
864c2238f8
添加FieldArray对list of np.array的支持
2019-01-17 22:42:40 +08:00
FengZiYjun
b93ca9bb30
* FieldArray添加对list of np.array的支持
...
* 添加测试:FieldArray的初始化
2019-01-17 15:39:13 +08:00
FengZiYjun
e4f997d52a
refactor type system in FieldArray:
...
* 重构dtype的检测代码,在FieldArray的初始化和append两处,达到更好的代码复用
* 类型检测的责任完全落在FieldArray,DataSet与之配合
测试:
* 整理dtype相关的测试代码
* 给所有tutorial添加测试
其他:
* 完善一个完整的Conll dataset loader
* 升级POS tag model训练脚本
2019-01-17 12:25:37 +08:00
yh
8091a734ee
1. 将pad的功能从FieldArray中剥离出来,使用Padder完成各种padding操作。
...
2. FieldArray默认使用AutoPadder, AutoPadder的行为与之前不使用padder是一致的的
3. 为了解决二维padding的问题,引入了EngChar2dPadder用于对character进行padding
4. 增加一份padding的tutorial。
2019-01-15 22:21:55 +08:00
yh
1f50b01ffa
conflict solved
2019-01-15 15:16:20 +08:00
yh
6a0a1ed4ad
train增加注释;attention增加注释;新增transformer分词
2019-01-15 14:58:43 +08:00
FengZiYjun
1fdaf236d2
Updates:
...
* 改名: chinese_word_segment ---> Chinese_word_segmentation
* 改名: pos_tag_model ---> POS_tagging
* 添加4个对Batch的测试
* 删除无用的chinese_word_segment/run.py
2019-01-15 14:56:01 +08:00
FengZiYjun
c4ba75d160
code optimization
...
* move used readers from reproduction to io/dataset_loader.py
(API shall not call anything from reproduction/)
2019-01-15 14:30:37 +08:00
yunfan
2e9e6c6c20
- fix trainer with validate_every > 0
...
- refine & fix Transformer Encoder
- refine & speed up biaffine parser
2019-01-14 19:13:52 +08:00
FengZiYjun
8df5bce938
fastNLP V0.3
2019-01-12 19:15:20 +08:00
yunfan
62a7556a04
Merge remote-tracking branch 'private/dev' into dev
...
# Conflicts:
# fastNLP/api/api.py
# fastNLP/modules/encoder/variational_rnn.py
2019-01-12 11:26:32 +08:00
yunfan
ba28702e68
update Biaffine Parser, Variational RNN
...
add parser API
2019-01-12 11:22:09 +08:00
yh
145125feb4
支持CWS的高级api“
2019-01-11 19:55:38 +08:00
yh
400552971c
1. callback中增加GradientClip; 2.Trainer中取消_print_train()和_tqdm_train(),全部并入了_train()
2019-01-11 19:09:01 +08:00
yh
4fd6e12fa9
Merge branch 'dev' of github.com:choosewhatulike/fastNLP-private into dev
2019-01-08 21:49:46 +08:00
yh
56e7641eb8
1. 修复Trainer check_code中检查evaluate时使用train_data的bug
2019-01-08 21:49:31 +08:00
FengZiYjun
525adf1c41
update POS tag training script
2019-01-07 21:50:52 +08:00
FengZiYjun
7ecd8c9c14
finish POS tagging API
2019-01-07 19:45:19 +08:00
FengZiYjun
887c6fec70
add pos-tag training script
2019-01-07 14:22:53 +08:00
yh
e267f925cf
新增cws的train_context.py
2019-01-03 21:35:28 +08:00
yh
897c43fc3b
1. CRF增加constrain, 用于限制跃迁,比如BMES中B不能跃迁到S
...
2. metric增加SpanFMetric,可以用于计算sequence labelling的performance
3. 分词复现任务根据新版接口做了部分调整。
2019-01-03 21:33:36 +08:00
FengZiYjun
27e9453d19
* fix processor.py
...
* add code comments
* merge *_saver.py & *_loader.py in io/
* (ancient codes) rename Loss into LossFromTorch
2018-12-06 19:28:27 +08:00
yunfan
2aaa381827
refine git commits
2018-11-27 22:43:29 +08:00
yunfan
d643a7a894
update set_target, batch's as_numpy
2018-11-27 22:23:55 +08:00
FengZiYjun
090f7aef5b
* fixing unit tests
2018-11-27 22:22:19 +08:00
yh
1d5bb0a3b6
bug fix“
2018-11-27 22:21:22 +08:00
yh
8906155ca2
为api建立一个Analyzer
2018-11-27 22:21:18 +08:00
FengZiYjun
e9d7074ba1
* delete readme_example.py because it is oooooooout of date.
...
* rename preprocess.py into utils.py, because nothing about preprocess in it
* anything in loader/ and saver/ is moved directly into io/
* corresponding unit tests are moved to /test/io
* delete fastnlp.py, because we have new and better APIs
* rename Biaffine_parser/run_test.py to Biaffine_parser/main.py; Otherwise, test will fail.
* A looooooooooot of ancient codes to be refined...........
2018-11-27 22:17:41 +08:00
yunfan
b6a0d33cb1
add parser api
2018-11-27 22:14:22 +08:00
yh
77786509df
pos与cws开发上传
2018-11-27 22:13:20 +08:00
yh
7d97e9365d
增加新的processor“
2018-11-27 22:11:09 +08:00
yh
1496031182
新增pos output processor
2018-11-27 22:11:08 +08:00
yh
d5afffee73
新增端到端pos处理到parser的过度代码
2018-11-27 22:11:08 +08:00
yh_cc
10379e9c74
当前为segapp的方式,但是貌似准确率不行,尝试修改为crf 4tag模式试一试
2018-11-27 22:10:52 +08:00
yunfan
822aaf6286
fix and update tester, trainer, seq_model, add parser pipeline builder
2018-11-27 22:07:20 +08:00
FengZiYjun
4be15a5b43
保存pos tag 脚本
2018-11-27 21:53:05 +08:00
yh
9667c524a4
基本完善了cws的predict
2018-11-11 15:53:33 +08:00
yh
9fc20ac7b8
增加infer的pipeline
2018-11-11 12:55:30 +08:00
yh
dc7f8ef8d4
bug fix
2018-11-11 12:42:05 +08:00
yh_cc
7df33b23ea
Merge branch 'dataset' of github.com:yhcc/fastNLP into dataset
2018-11-11 00:40:10 +08:00
FengZiYjun
e2b14ed33d
Merge remote-tracking branch 'origin/dataset' into dataset
2018-11-10 21:20:34 +08:00
FengZiYjun
5dd0f74d6d
- 添加pos_tagger API, pipeline跑通
...
- 修复processor的bug
- 更新core/的若干组件, 去除batch的冗余参数
- CRF有个打字错误?已修复
- 更新pos tag 训练脚本
2018-11-10 21:20:16 +08:00
yh_cc
3e50ca8a72
创建了一个测试context
2018-11-10 20:37:48 +08:00
yh_cc
de3feeaf5a
调整CWS函数的位置
2018-11-10 20:10:13 +08:00
yh_cc
752efc57fd
Merge branch 'dataset' of github.com:yhcc/fastNLP into dataset
2018-11-10 19:59:40 +08:00
yh_cc
ea1c8c1100
当前版本分词准确率已达正常分词分数
2018-11-10 19:59:32 +08:00
FengZiYjun
ec9fd32d60
improve trainer: log mean and std of model params, and sum of gradients
2018-11-10 18:49:22 +08:00
FengZiYjun
cd68d78d50
Merge remote-tracking branch 'origin/dataset' into dataset
...
# Conflicts:
# fastNLP/api/pipeline.py
# fastNLP/api/pos_tagger.py
# fastNLP/api/processor.py
# fastNLP/modules/decoder/CRF.py
2018-11-10 17:02:58 +08:00
FengZiYjun
26e3abdf58
- 修改pos tag训练脚本,可以跑
...
- 在api中创建converter.py
- Pipeline添加初始化方法,方便一次性添加processors
- 删除pos_tagger.py
- 优化整体code style
2018-11-10 16:58:27 +08:00
yh_cc
10bb2810ab
Merge branch 'dataset' of github.com:yhcc/fastNLP into dataset
2018-11-10 15:34:21 +08:00
yh_cc
73ba3b5eec
bug fix for pipeline
2018-11-10 15:17:58 +08:00
yunfan
a6ab34fd38
fix crf
2018-11-10 14:53:50 +08:00
yh_cc
3cb98ddcf2
Sampler中增加了一个BucketSampler, CWS的训练基本可以实现
2018-11-10 14:46:38 +08:00
yh_cc
69a138eb18
修改了遇到的若干问题,增加了分词任务的一些方法
2018-11-10 13:41:19 +08:00
yh_cc
dc0124cf02
修改model到models
2018-11-10 11:10:14 +08:00
yh_cc
25a53ac5c9
修改processor适配昨天的sao操作
2018-11-10 10:56:28 +08:00
yh
d818e91380
增加dataset自动创建对应的array
2018-11-09 22:11:26 +08:00
yh
515e4f4987
移动processor到processor.py
2018-11-09 22:02:10 +08:00
yh
89ce85b6ed
Merge branch 'dataset' of https://github.com/yhcc/fastNLP into dataset
2018-11-09 20:23:11 +08:00
yh
38aa207ea2
新增cws converter, io
2018-11-09 20:23:05 +08:00
FengZiYjun
12e9a93b52
Merge remote-tracking branch 'origin/dataset' into dataset
2018-11-09 19:53:08 +08:00
FengZiYjun
79105381f5
- add interfaces for pos_tagging API
...
- update predictor.py to remove unused methods
- update model_loader.py & model_saver.py to support entire model saving & loading
- update pos tagging training script
2018-11-09 19:52:31 +08:00
yh
1b9daa1985
新增CWS的部分功能
2018-11-09 19:25:18 +08:00
yunfan
053249420f
update parser, fix bugs varrnn & vocab
2018-11-09 10:59:36 +08:00
yunfan
3192c9ac66
update trainer
2018-11-08 22:15:58 +08:00
yunfan
c14d9f4d66
update biaffine
2018-11-08 22:13:47 +08:00
yunfan
830d223344
add transformer
2018-11-08 22:12:13 +08:00
yunfan
102259df39
update biaffine parser
2018-11-08 22:12:13 +08:00
yunfan
a4c9786ca4
update dataset & loader
2018-10-17 09:59:56 +08:00
yunfan
637c37d62b
add new model, new module, fix bugs
2018-10-10 16:49:17 +08:00
FengZiYjun
fb806163c3
remove unused codes; add more tests
2018-10-07 15:03:00 +08:00
FengZiYjun
cc15588a77
- add progress bar for data set loading
...
- improve metrics codes
- fix validator bugs in trainer; remove early saving
- run CWS codes
- improve README.md
2018-10-01 20:33:29 +08:00
FengZiYjun
0b86d7cf2b
Merge Preprocessor and DataSet
2018-09-28 21:35:17 +08:00
FengZiYjun
cb11a1f2dc
- analyze codes for language model, unable to run yet
...
- add character vocab in preprocessor
- add dataset loader for language model dataset
- other minor adjustments
- preserve only a little example data for language model
2018-09-23 16:03:20 +08:00
FengZiYjun
28a0683853
1. add tests in test_fastNLP.py & test_sampler.py; increase test coverage to 81%
...
2. changes of names:
aggregation ----> aggregator
interaction ----> interactor
action.py ----> sampler.py
BasePreprocess ---> Preprocessor
BaseTester ----> Tester
BaseTrainer ----> Trainer
3. add more code comments
4. fix bugs in predictor's data_forward
5. in sampler.py, remove Bachifier, fix some codes. but not test
6. remove unused codes in other_modules.py & utils.py
7. update fastnlp.py with new config file names and code comments
8. add data examples in data_for_tests/
2018-09-22 15:33:52 +08:00
yunfan
819c8f05be
fix vocab
2018-09-19 15:10:18 +08:00
yunfan
8f60a4fa01
update MLP
2018-09-18 15:57:44 +08:00
2017alan
b3e8db74a6
add self_attention for yelp classification example.
2018-09-15 17:19:56 +08:00
FengZiYjun
57911f771a
- clean up unused codes
...
- improve code comments
- BaseLoader & its subclasses does not need a data name any more
- update file tree
- add setup.py
2018-09-02 13:32:57 +08:00
FengZiYjun
32a036e8e6
[fix] drop "data" in Tester.make_batch; correct spelling of "show_metrics"
...
[add] PeopleDailyCorpusLoader, to parse PeopleDaily Corpus
[update] add CWS + POS_tag interface at FastNLP, see example in test_fastNLP.py
[update] modify README.md and readme_example.py to the latest version.
2018-09-01 21:33:28 +08:00
FengZiYjun
501ffb26c5
optimize CWS example
...
- see test_fastNLP.py
- update interpret_word_seg_results in fastnlp.py
- delete useless data to increase git clone speed
2018-08-31 11:23:40 +08:00
FengZiYjun
ab55f25e20
Updates to Trainer/Tester/fastnlp
...
1. Tester has a parameter "print_every_step" to control printing. print_every_step == 0 means NO print.
2. Tester's evaluate return (list of) floats, rather than torch.cuda.tensor
3. Trainer also has a parameter "print_every_step". The same usage.
4. In training, validation steps are not shown.
5. Updates to code comments.
6. fastnlp.py is ready for CWS. test_fastNLP.py works.
2018-08-31 10:46:56 +08:00
FengZiYjun
9d6b0daa99
Prepare for CWS service:
...
- specify the name of the config file and the name of corresponding section where model init params store.
- fastnlp.py needs load_pickle to get dictionary size and the number of labels
- other minor adjustments
2018-08-30 11:45:47 +08:00
Coet
aea53c1833
Merge pull request #43 from FengZiYjun/master
...
New Trainer Initialization Interface
2018-08-24 09:47:40 +08:00
FengZiYjun
2df8eb740a
Updates to core, loader:
...
- add Loss, Optimizer
- change Trainer & Tester initialization interface: two styles of definition provided
- handle Optimizer construction and loss function definition in a hard manner
- add argparse in task-specific scripts. (seq_labeling.py & text_classify.py)
- seq_labeling.py & text_classify.py work
2018-08-22 19:10:12 +08:00
Coet
ceac3f2e1f
Merge pull request #38 from FengZiYjun/new_updates
...
New updates
2018-08-22 10:18:46 +08:00
FengZiYjun
4c8c2dfdb8
updates to core, loader, test:
...
- move preprocess.py from loader/ to core/
- changes to interface of preprocess: 1. add run method, to run the main processing 2. add cross validation split 3. add return value 4. merge subclasses
- Trainer supports cross validation
- add data as arguments in Trainer.train & Tester.test
- add readme.example.py, to run the example program shown in README.md
- other corresponding changes
2018-08-19 16:21:14 +08:00
Coet
fc7dd7eced
Merge pull request #33 from FengZiYjun/master
...
Updates to cores, loader, saver
2018-08-18 16:50:45 +08:00
choosewhatulike
fb20e87321
add chinese word segmentation model
2018-08-17 00:07:38 +08:00
FengZiYjun
4bbeaebe96
Updates to cores, action, loader:
...
- rename Inference to Predictor
- rename Trainer.prepare_input to Trainer.load_train_data, load data_train.pkl only
- add __contains__ method to config Section class
- more code comments
- more elegant make_batch & data_iterator: Samplers return batch samples instead of batch indices
2018-08-15 20:12:20 +08:00
FengZiYjun
8e6db05339
changes to Trainer, Tester & Inference:
...
- rename "POSTrainer", "POSTester" to "SeqLabelTrainer", "SeqLabelTester"
- Trainer & Tester have NO relation with Action
- Inference owns independent "make_batch" & "data_forward"
- Conversion to Tensor & go into cuda are done in "make_batch"
- "make_batch" support maximum/minimum length
2018-08-08 20:40:44 +08:00
FengZiYjun
c1d7c5d7da
changes to action, trainer and tester:
...
- rename "POSTrainer" to "SeqLabelTrainer"
- add text classification test data
- update make_batch in Trainer and Tester
2018-08-07 19:18:56 +08:00
FengZiYjun
743a6d7547
fix bugs in preprocessor
2018-08-01 10:10:55 +08:00
FengZiYjun
ef8ec3b9e4
add cws train script and corresponding config file
2018-07-30 09:52:46 +08:00
FengZiYjun
242e576a30
changes to trainer, tester, preprocessor, etc.
...
- [tester][trainer] add cuda support
- [preprocess] fix label2index for padding label seq
- update README.md
- [test] add test_tester.py
- rename "action" to "core"
2018-07-28 11:57:25 +08:00
FengZiYjun
eb66cbe6c4
restructure module: 4 classes; add modules; move prototype and rename
2018-07-12 21:53:42 +08:00
FengZiYjun
7514be6f30
- add validation loss into trainer.train
...
- restructure: move reproduction outside
- add evaluate in tester
2018-07-11 21:51:35 +08:00
FengZiYjun
32652407df
restructure files & add "modules" directory & add CRF.py
2018-07-01 10:39:36 +08:00
FengZiYjun
3e1d995b3c
update file structures
2018-06-25 14:16:47 +08:00
FengZiYjun
58127d3c4e
start building word seg (generally seq2seq) model
2018-05-30 22:28:22 +08:00
FengZiYjun
fdd26b8e58
add LICENSE, setup.py & requirements.txt
2018-05-25 18:32:02 +08:00
FengZiYjun
3081a57ef9
optimize trainer logic & prepare charlm test
2018-05-24 10:45:01 +08:00
FengZiYjun
6b357bec40
design intermediate controller between trainer and pytorch model
2018-05-23 17:48:26 +08:00
FengZiYjun
7b46f422c7
add base methods for model.base_model
2018-05-22 11:15:27 +08:00
FengZiYjun
4f71d44999
build classes for saver
2018-05-22 10:34:03 +08:00
FengZiYjun
6e1446beb1
first commit
2018-05-21 22:36:11 +08:00