fastNLP

mirror of https://gitee.com/fastnlp/fastNLP.git synced 2024-12-04 21:28:01 +08:00

Author	SHA1	Message	Date
xuyige	e0b23b16db	update data loader of matching	2019-06-24 21:44:43 +08:00
Danqing Wang	79762c4c6c	Add Summarization framework	2019-06-24 19:24:14 +08:00
yh_cc	39dd086262	1.修改CrossEntropyLoss中存在的反直觉bug; 2.更新sequence labeling	2019-06-24 09:56:28 +08:00
xuyige	3593f0a545	fix bugs in matching dataloader	2019-06-23 18:42:57 +08:00
xuyige	d1f531c049	update matching dataloader in reproduction/matching	2019-06-23 18:25:04 +08:00
yh_cc	8f7ed07441	1. 在vocabulary的from_dataset中增加no_create_entry_dataset选项，用于传递dev和test 2. 调整各种Embedding的实现，使得确保来自dev和test的未发现词使用unk的表示 3. 在Embedding中增加dropout_word的选项，使得可以随机drop掉词语 4. 以及其它若干小的bug	2019-06-21 11:06:35 +08:00
yh	a137038eb2	修复ELMO与LSTM无法使用nn.DataParallel的问题	2019-06-19 19:43:53 +08:00
yh_cc	4533427ea3	sequence labeling更新	2019-06-19 11:14:41 +08:00
yh_cc	4d138ed7f8	Merge branch 'dev0.5.0' of github.com:fastnlp/fastNLP into dev0.5.0	2019-06-18 10:02:29 +08:00
yh_cc	9a8fe42cd4	新增NER的数据加载与模型代码; 修改metric中的typo; 修改LSTM中的默认初始化将forget gate设置为1.	2019-06-18 10:02:24 +08:00
xuyige	93620e76ed	update framework of matching	2019-06-18 02:04:53 +08:00
xuyige	342b7026d7	Merge remote-tracking branch 'origin/dev0.5.0' into dev0.5.0	2019-06-17 21:48:42 +08:00
xuyige	39388567ad	update matching.py	2019-06-17 21:48:18 +08:00
yh_cc	2f5d8967a3	1. 适配将Batch修改为pytorch的DataLoader的修改 2. 修改embedding.py中的bug 3. ConllReader默认跳过所有的DOCSTART标签 4. 交换bert的heavy lifting到_bert, 将BertEncoder在bert.py中暴露 5. crf中allow_transition的include_end_start修改为false，以与CRF的默认值适配 6. allow_transition与SpanMetric支持BIOES类型的tag 7. datainfo中增加打印格式化输出	2019-06-17 20:18:07 +08:00
yh_cc	839d712467	增强field中的value_count支持对nested的field的支持	2019-06-17 16:46:39 +08:00
yhcc	66f51395d7	Merge pull request #166 from fastnlp/batch [new] 兼容pytorch的DataLoader，替换Batch为DataSetIter	2019-06-17 16:42:28 +08:00
lyhuang18	c78811f87f	add TC/MTL16Loader	2019-06-16 23:43:37 +08:00
yh_cc	4b5113cbea	prefecth变更为deprecated warning;	2019-06-15 14:17:48 +08:00
yh_cc	17b5fd0066	1. 删除Trainer中对train_data必须为DataSet的assert 2. 删除Trainer的prefetch参数; 在注释中增加num_workers参数 3. Trainer中默认sampler为RandomSampler	2019-06-15 13:10:28 +08:00
yh_cc	6309eafd25	1. 在fieldarray中支持split，int等handy的function 2. 重大更新，支持ElmoEmbedding, BertEmbedding	2019-06-12 11:10:33 +08:00
yh_cc	37c50d6625	Merge branch 'dev0.5.0' of github.com:fastnlp/fastNLP into dev0.5.0	2019-06-11 16:42:11 +08:00
Violet Yao	83729dfc39	moved test to reproduction folder	2019-06-09 21:46:34 +08:00
Violet Yao	ad6a55ba26	fixed comment format	2019-06-08 14:32:25 +08:00
Violet Yao	2edb2a1a00	added yelpLoader	2019-06-08 14:27:52 +08:00
yh_cc	9e5c4f665c	Merge branch 'dev0.5.0' of github.com:fastnlp/fastNLP into dev0.5.0	2019-06-08 09:48:11 +08:00
yh_cc	bddce51b05	merge update	2019-06-08 09:47:39 +08:00
xuyige	8e82c91751	update bert for nli in reproduction/matching	2019-06-06 00:09:25 +08:00
xuyige	96687251a8	update reproduction/README.md	2019-06-05 16:10:13 +08:00
xuyige	e643d7aed5	update reproduction/README.md	2019-06-05 16:07:56 +08:00
yunfan	60de1b2c52	add text_classification reproduction dir	2019-06-05 11:05:46 +08:00
xuyige	e05c182b05	firstly add matching in reproduction	2019-06-05 01:01:30 +08:00
yh_cc	d71f0eef13	序列标注的SemiCRFRelay中文分词.	2019-06-04 23:40:46 +08:00
yh	07c3533126	cws实例第一次提交	2019-06-04 22:16:58 +08:00
ChenXin	881ce01762	Dev0.4.0 (#149 ) * 1. CRF增加支持bmeso类型的tag 2. vocabulary中增加注释 * BucketSampler增加一条错误检测 * 1.修改ClipGradientCallback的bug；删除LRSchedulerCallback中的print，之后应该传入pbar进行打印;2.增加MLP注释 * update MLP module * 增加metric注释；修改trainer save过程中的bug * Update README.md fix tutorial link * Add ENAS (Efficient Neural Architecture Search) * add ignore_type in DataSet.add_field * * AutoPadder will not pad when dtype is None * add ignore_type in DataSet.apply * 修复fieldarray中padder潜在bug * 修复crf中typo; 以及可能导致数值不稳定的地方 * 修复CRF中可能存在的bug * change two default init arguments of Trainer into None * Changes to Callbacks: * 给callback添加给定几个只读属性 * 通过manager设置这些属性 * 代码优化，减轻@transfer的负担 * * 将enas相关代码放到automl目录下 * 修复fast_param_mapping的一个bug * Trainer添加自动创建save目录 * Vocabulary的打印，显示内容 * * 给vocabulary添加遍历方法 * 修复CRF为负数的bug * add SQuAD metric * add sigmoid activate function in MLP * - add star transformer model - add ConllLoader, for all kinds of conll-format files - add JsonLoader, for json-format files - add SSTLoader, for SST-2 & SST-5 - change Callback interface - fix batch multi-process when killed - add README to list models and their performance * - fix test * - fix callback & tests * - update README * 修改部分bug；调整callback * 准备发布0.4.0版本“ * update readme * support parallel loss * 防止多卡的情况导致无法正确计算loss“ * update advance_tutorial jupyter notebook * 1. 在embedding_loader中增加新的读取函数load_with_vocab(), load_without_vocab, 比之前的函数改变主要在(1)不再需要传入embed_dim(2)自动判断当前是word2vec还是glove. 2. vocabulary增加from_dataset(), index_dataset()函数。避免需要多行写index dataset的问题。 3. 在utils中新增一个cache_result()修饰器，用于cache函数的返回值。 4. callback中新增update_every属性 * 1.DataSet.apply()报错时提供错误的index 2.Vocabulary.from_dataset(), index_dataset()提供报错时的vocab顺序 3.embedloader在embed读取时遇到不规则的数据跳过这一行. * update attention * doc tools * fix some doc errors * 修改为中文注释，增加viterbi解码方法 * 样例版本 * - add pad sequence for lstm - add csv, conll, json filereader - update dataloader - remove useless dataloader - fix trainer loss print - fix tests * - fix test_tutorial * 注释增加 * 测试文档 * 本地暂存 * 本地暂存 * 修改文档的顺序 * - add document * 本地暂存 * update pooling * update bert * update documents in MLP * update documents in snli * combine self attention module to attention.py * update documents on losses.py * 对DataSet的文档进行更新 * update documents on metrics * 1. 删除了LSTM中print的内容; 2. 将Trainer和Tester的use_cuda修改为了device; 3.补充Trainer的文档 * 增加对Trainer的注释 * 完善了trainer，callback等的文档; 修改了部分代码的命名以使得代码从文档中隐藏 * update char level encoder * update documents on embedding.py * - update doc * 补充注释，并修改部分代码 * - update doc - add get_embeddings * 修改了文档配置项 * 修改embedding为init_embed初始化 * 1.增加对Trainer和Tester的多卡支持; * - add test - fix jsonloader * 删除了注释教程 * 给 dataset 增加了get_field_names * 修复bug * - add Const - fix bugs * 修改部分注释 * - add model runner for easier test models - add model tests * 修改了 docs 的配置和架构 * 修改了核心部分的一大部分文档，TODO： 1. 完善 trainer 和 tester 部分的文档 2. 研究注释样例与测试 * core部分的注释基本检查完成 * 修改了 io 部分的注释 * 全部改为相对路径引用 * 全部改为相对路径引用 * small change * 1. 从安装文件中删除api/automl的安装 2. metric中存在seq_len的bug 3. sampler中存在命名错误，已修改 * 修复 bug ：兼容 cpu 版本的 PyTorch TODO：其它地方可能也存在类似的 bug * 修改文档中的引用部分 * 把 tqdm.autonotebook 换成tqdm.auto * - fix batch & vocab * 上传了文档文件 .rst 上传了文档文件和若干 TODO * 讨论并整合了若干模块 * core部分的测试和一些小修改 * 删除了一些冗余文档 * update init files * update const files * update const files * 增加cnn的测试 * fix a little bug * - update attention - fix tests * 完善测试 * 完成快速入门教程 * 修改了sequence_modeling 命名为 sequence_labeling 的文档 * 重新 apidoc 解决改名的遗留问题 * 修改文档格式 * 统一不同位置的seq_len_to_mask, 现统一到core.utils.seq_len_to_mask * 增加了一行提示 * 在文档中展示 dataset_loader * 提示 Dataset.read_csv 会被 CSVLoader 替换 * 完成 Callback 和 Trainer 之间的文档 * index更新了部分 * 删除冗余的print * 删除用于分词的metric，因为有可能引起错误 * 修改文档中的中文名称 * 完成了详细介绍文档 * tutorial 的 ipynb 文件 * 修改了一些介绍文档 * 修改了 models 和 modules 的主页介绍 * 加上了 titlesonly 这个设置 * 修改了模块文档展示的标题 * 修改了 core 和 io 的开篇介绍 * 修改了 modules 和 models 开篇介绍 * 使用 .. todo:: 隐藏了可能被抽到文档中的 TODO 注释 * 修改了一些注释 * delete an old metric in test * 修改 tutorials 的测试文件 * 把暂不发布的功能移到 legacy 文件夹 * 删除了不能运行的测试 * 修改 callback 的测试文件 * 删除了过时的教程和测试文件 * cache_results 参数的修改 * 修改 io 的测试文件; 删除了一些过时的测试 * 修复bug * 修复无法通过test_utils.py的测试 * 修复与pytorch1.1中的padsequence的兼容问题; 修改Trainer的pbar * 1. 修复metric中的bug; 2.增加metric测试 * add model summary * 增加别名 * 删除encoder中的嵌套层 * 修改了 core 部分 import 的顺序，__all__ 暴露的内容 * 修改了 models 部分 import 的顺序，__all__ 暴露的内容 * 修改了文件名 * 修改了 modules 模块的__all__ 和 import * fix var runn * 增加vocab的clear方法 * 一些符合 PEP8 的微调 * 更新了cache_results的例子 * 1. 对callback中indices潜在None作出提示;2.DataSet支持通过List进行index * 修改了一个typo * 修改了 README.md * update documents on bert * update documents on encoder/bert * 增加一个fitlog callback，实现与fitlog实验记录 * typo * - update dataset_loader * 增加了到 fitlog 文档的链接。 * 增加了 DataSet Loader 的文档 * - add star-transformer reproduction	2019-05-22 18:43:56 +08:00
FengZiYjun	0c5630bd16	Ready for V0.3.1 * 升级parser API和模型 * update docs: add new pages for tutorials * upgrade CWS api download source * add a new method for dataset field access * add introduction for bert * add more unit tests for api/processor * remove unused test data. Add new test data.	2019-02-04 09:44:54 +08:00
FengZiYjun	986541139a	整理所有dataset loader，建立单元测试	2019-02-02 16:46:42 +08:00
FengZiYjun	887fc9281f	update callbacks: * rename callback methods. Use fastai's notation. * add a new callback method - on_valid_begin	2019-01-25 21:43:24 +08:00
yunfan	c02980e006	Merge branch 'yyff' into dev	2019-01-21 14:55:53 +08:00
yunfan	e93c6f0053	Merge branch 'dev' of https://github.com/choosewhatulike/fastNLP-private into dev	2019-01-21 14:52:24 +08:00
FengZiYjun	b14dd58828	Update POS API	2019-01-19 18:48:57 +08:00
yunfan	de856fb8eb	update reproduction	2019-01-19 16:22:01 +08:00
yunfan	eb55856c78	- fix parser train	2019-01-19 16:07:10 +08:00
FengZiYjun	864c2238f8	添加FieldArray对list of np.array的支持	2019-01-17 22:42:40 +08:00
FengZiYjun	b93ca9bb30	* FieldArray添加对list of np.array的支持 * 添加测试：FieldArray的初始化	2019-01-17 15:39:13 +08:00
FengZiYjun	e4f997d52a	refactor type system in FieldArray: * 重构dtype的检测代码，在FieldArray的初始化和append两处，达到更好的代码复用 * 类型检测的责任完全落在FieldArray，DataSet与之配合测试： * 整理dtype相关的测试代码 * 给所有tutorial添加测试其他： * 完善一个完整的Conll dataset loader * 升级POS tag model训练脚本	2019-01-17 12:25:37 +08:00
yh	8091a734ee	1. 将pad的功能从FieldArray中剥离出来，使用Padder完成各种padding操作。 2. FieldArray默认使用AutoPadder, AutoPadder的行为与之前不使用padder是一致的的 3. 为了解决二维padding的问题，引入了EngChar2dPadder用于对character进行padding 4. 增加一份padding的tutorial。	2019-01-15 22:21:55 +08:00
yh	1f50b01ffa	conflict solved	2019-01-15 15:16:20 +08:00
yh	6a0a1ed4ad	train增加注释；attention增加注释；新增transformer分词	2019-01-15 14:58:43 +08:00
FengZiYjun	1fdaf236d2	Updates: * 改名: chinese_word_segment ---> Chinese_word_segmentation * 改名: pos_tag_model ---> POS_tagging * 添加4个对Batch的测试 * 删除无用的chinese_word_segment/run.py	2019-01-15 14:56:01 +08:00
FengZiYjun	c4ba75d160	code optimization * move used readers from reproduction to io/dataset_loader.py (API shall not call anything from reproduction/)	2019-01-15 14:30:37 +08:00
yunfan	2e9e6c6c20	- fix trainer with validate_every > 0 - refine & fix Transformer Encoder - refine & speed up biaffine parser	2019-01-14 19:13:52 +08:00
FengZiYjun	8df5bce938	fastNLP V0.3	2019-01-12 19:15:20 +08:00
yunfan	62a7556a04	Merge remote-tracking branch 'private/dev' into dev # Conflicts: # fastNLP/api/api.py # fastNLP/modules/encoder/variational_rnn.py	2019-01-12 11:26:32 +08:00
yunfan	ba28702e68	update Biaffine Parser, Variational RNN add parser API	2019-01-12 11:22:09 +08:00
yh	145125feb4	支持CWS的高级api“	2019-01-11 19:55:38 +08:00
yh	400552971c	1. callback中增加GradientClip; 2.Trainer中取消_print_train()和_tqdm_train()，全部并入了_train()	2019-01-11 19:09:01 +08:00
yh	4fd6e12fa9	Merge branch 'dev' of github.com:choosewhatulike/fastNLP-private into dev	2019-01-08 21:49:46 +08:00
yh	56e7641eb8	1. 修复Trainer check_code中检查evaluate时使用train_data的bug	2019-01-08 21:49:31 +08:00
FengZiYjun	525adf1c41	update POS tag training script	2019-01-07 21:50:52 +08:00
FengZiYjun	7ecd8c9c14	finish POS tagging API	2019-01-07 19:45:19 +08:00
FengZiYjun	887c6fec70	add pos-tag training script	2019-01-07 14:22:53 +08:00
yh	e267f925cf	新增cws的train_context.py	2019-01-03 21:35:28 +08:00
yh	897c43fc3b	1. CRF增加constrain, 用于限制跃迁，比如BMES中B不能跃迁到S 2. metric增加SpanFMetric，可以用于计算sequence labelling的performance 3. 分词复现任务根据新版接口做了部分调整。	2019-01-03 21:33:36 +08:00
FengZiYjun	27e9453d19	* fix processor.py * add code comments * merge _saver.py & _loader.py in io/ * (ancient codes) rename Loss into LossFromTorch	2018-12-06 19:28:27 +08:00
yunfan	2aaa381827	refine git commits	2018-11-27 22:43:29 +08:00
yunfan	d643a7a894	update set_target, batch's as_numpy	2018-11-27 22:23:55 +08:00
FengZiYjun	090f7aef5b	* fixing unit tests	2018-11-27 22:22:19 +08:00
yh	1d5bb0a3b6	bug fix“	2018-11-27 22:21:22 +08:00
yh	8906155ca2	为api建立一个Analyzer	2018-11-27 22:21:18 +08:00
FengZiYjun	e9d7074ba1	* delete readme_example.py because it is oooooooout of date. * rename preprocess.py into utils.py, because nothing about preprocess in it * anything in loader/ and saver/ is moved directly into io/ * corresponding unit tests are moved to /test/io * delete fastnlp.py, because we have new and better APIs * rename Biaffine_parser/run_test.py to Biaffine_parser/main.py; Otherwise, test will fail. * A looooooooooot of ancient codes to be refined...........	2018-11-27 22:17:41 +08:00
yunfan	b6a0d33cb1	add parser api	2018-11-27 22:14:22 +08:00
yh	77786509df	pos与cws开发上传	2018-11-27 22:13:20 +08:00
yh	7d97e9365d	增加新的processor“	2018-11-27 22:11:09 +08:00
yh	1496031182	新增pos output processor	2018-11-27 22:11:08 +08:00
yh	d5afffee73	新增端到端pos处理到parser的过度代码	2018-11-27 22:11:08 +08:00
yh_cc	10379e9c74	当前为segapp的方式，但是貌似准确率不行，尝试修改为crf 4tag模式试一试	2018-11-27 22:10:52 +08:00
yunfan	822aaf6286	fix and update tester, trainer, seq_model, add parser pipeline builder	2018-11-27 22:07:20 +08:00
FengZiYjun	4be15a5b43	保存pos tag 脚本	2018-11-27 21:53:05 +08:00
yh	9667c524a4	基本完善了cws的predict	2018-11-11 15:53:33 +08:00
yh	9fc20ac7b8	增加infer的pipeline	2018-11-11 12:55:30 +08:00
yh	dc7f8ef8d4	bug fix	2018-11-11 12:42:05 +08:00
yh_cc	7df33b23ea	Merge branch 'dataset' of github.com:yhcc/fastNLP into dataset	2018-11-11 00:40:10 +08:00
FengZiYjun	e2b14ed33d	Merge remote-tracking branch 'origin/dataset' into dataset	2018-11-10 21:20:34 +08:00
FengZiYjun	5dd0f74d6d	- 添加pos_tagger API， pipeline跑通 - 修复processor的bug - 更新core/的若干组件, 去除batch的冗余参数 - CRF有个打字错误？已修复 - 更新pos tag 训练脚本	2018-11-10 21:20:16 +08:00
yh_cc	3e50ca8a72	创建了一个测试context	2018-11-10 20:37:48 +08:00
yh_cc	de3feeaf5a	调整CWS函数的位置	2018-11-10 20:10:13 +08:00
yh_cc	752efc57fd	Merge branch 'dataset' of github.com:yhcc/fastNLP into dataset	2018-11-10 19:59:40 +08:00
yh_cc	ea1c8c1100	当前版本分词准确率已达正常分词分数	2018-11-10 19:59:32 +08:00
FengZiYjun	ec9fd32d60	improve trainer: log mean and std of model params, and sum of gradients	2018-11-10 18:49:22 +08:00
FengZiYjun	cd68d78d50	Merge remote-tracking branch 'origin/dataset' into dataset # Conflicts: # fastNLP/api/pipeline.py # fastNLP/api/pos_tagger.py # fastNLP/api/processor.py # fastNLP/modules/decoder/CRF.py	2018-11-10 17:02:58 +08:00
FengZiYjun	26e3abdf58	- 修改pos tag训练脚本，可以跑 - 在api中创建converter.py - Pipeline添加初始化方法，方便一次性添加processors - 删除pos_tagger.py - 优化整体code style	2018-11-10 16:58:27 +08:00
yh_cc	10bb2810ab	Merge branch 'dataset' of github.com:yhcc/fastNLP into dataset	2018-11-10 15:34:21 +08:00
yh_cc	73ba3b5eec	bug fix for pipeline	2018-11-10 15:17:58 +08:00
yunfan	a6ab34fd38	fix crf	2018-11-10 14:53:50 +08:00
yh_cc	3cb98ddcf2	Sampler中增加了一个BucketSampler, CWS的训练基本可以实现	2018-11-10 14:46:38 +08:00
yh_cc	69a138eb18	修改了遇到的若干问题，增加了分词任务的一些方法	2018-11-10 13:41:19 +08:00
yh_cc	dc0124cf02	修改model到models	2018-11-10 11:10:14 +08:00
yh_cc	25a53ac5c9	修改processor适配昨天的sao操作	2018-11-10 10:56:28 +08:00
yh	d818e91380	增加dataset自动创建对应的array	2018-11-09 22:11:26 +08:00
yh	515e4f4987	移动processor到processor.py	2018-11-09 22:02:10 +08:00
yh	89ce85b6ed	Merge branch 'dataset' of https://github.com/yhcc/fastNLP into dataset	2018-11-09 20:23:11 +08:00
yh	38aa207ea2	新增cws converter, io	2018-11-09 20:23:05 +08:00
FengZiYjun	12e9a93b52	Merge remote-tracking branch 'origin/dataset' into dataset	2018-11-09 19:53:08 +08:00
FengZiYjun	79105381f5	- add interfaces for pos_tagging API - update predictor.py to remove unused methods - update model_loader.py & model_saver.py to support entire model saving & loading - update pos tagging training script	2018-11-09 19:52:31 +08:00
yh	1b9daa1985	新增CWS的部分功能	2018-11-09 19:25:18 +08:00
yunfan	053249420f	update parser, fix bugs varrnn & vocab	2018-11-09 10:59:36 +08:00
yunfan	3192c9ac66	update trainer	2018-11-08 22:15:58 +08:00
yunfan	c14d9f4d66	update biaffine	2018-11-08 22:13:47 +08:00
yunfan	830d223344	add transformer	2018-11-08 22:12:13 +08:00
yunfan	102259df39	update biaffine parser	2018-11-08 22:12:13 +08:00
yunfan	a4c9786ca4	update dataset & loader	2018-10-17 09:59:56 +08:00
yunfan	637c37d62b	add new model, new module, fix bugs	2018-10-10 16:49:17 +08:00
FengZiYjun	fb806163c3	remove unused codes; add more tests	2018-10-07 15:03:00 +08:00
FengZiYjun	cc15588a77	- add progress bar for data set loading - improve metrics codes - fix validator bugs in trainer; remove early saving - run CWS codes - improve README.md	2018-10-01 20:33:29 +08:00
FengZiYjun	0b86d7cf2b	Merge Preprocessor and DataSet	2018-09-28 21:35:17 +08:00
FengZiYjun	cb11a1f2dc	- analyze codes for language model, unable to run yet - add character vocab in preprocessor - add dataset loader for language model dataset - other minor adjustments - preserve only a little example data for language model	2018-09-23 16:03:20 +08:00
FengZiYjun	28a0683853	1. add tests in test_fastNLP.py & test_sampler.py; increase test coverage to 81% 2. changes of names: aggregation ----> aggregator interaction ----> interactor action.py ----> sampler.py BasePreprocess ---> Preprocessor BaseTester ----> Tester BaseTrainer ----> Trainer 3. add more code comments 4. fix bugs in predictor's data_forward 5. in sampler.py, remove Bachifier, fix some codes. but not test 6. remove unused codes in other_modules.py & utils.py 7. update fastnlp.py with new config file names and code comments 8. add data examples in data_for_tests/	2018-09-22 15:33:52 +08:00
yunfan	819c8f05be	fix vocab	2018-09-19 15:10:18 +08:00
yunfan	8f60a4fa01	update MLP	2018-09-18 15:57:44 +08:00
2017alan	b3e8db74a6	add self_attention for yelp classification example.	2018-09-15 17:19:56 +08:00
FengZiYjun	57911f771a	- clean up unused codes - improve code comments - BaseLoader & its subclasses does not need a data name any more - update file tree - add setup.py	2018-09-02 13:32:57 +08:00
FengZiYjun	32a036e8e6	[fix] drop "data" in Tester.make_batch; correct spelling of "show_metrics" [add] PeopleDailyCorpusLoader, to parse PeopleDaily Corpus [update] add CWS + POS_tag interface at FastNLP, see example in test_fastNLP.py [update] modify README.md and readme_example.py to the latest version.	2018-09-01 21:33:28 +08:00
FengZiYjun	501ffb26c5	optimize CWS example - see test_fastNLP.py - update interpret_word_seg_results in fastnlp.py - delete useless data to increase git clone speed	2018-08-31 11:23:40 +08:00
FengZiYjun	ab55f25e20	Updates to Trainer/Tester/fastnlp 1. Tester has a parameter "print_every_step" to control printing. print_every_step == 0 means NO print. 2. Tester's evaluate return (list of) floats, rather than torch.cuda.tensor 3. Trainer also has a parameter "print_every_step". The same usage. 4. In training, validation steps are not shown. 5. Updates to code comments. 6. fastnlp.py is ready for CWS. test_fastNLP.py works.	2018-08-31 10:46:56 +08:00
FengZiYjun	9d6b0daa99	Prepare for CWS service: - specify the name of the config file and the name of corresponding section where model init params store. - fastnlp.py needs load_pickle to get dictionary size and the number of labels - other minor adjustments	2018-08-30 11:45:47 +08:00
Coet	aea53c1833	Merge pull request #43 from FengZiYjun/master New Trainer Initialization Interface	2018-08-24 09:47:40 +08:00
FengZiYjun	2df8eb740a	Updates to core, loader: - add Loss, Optimizer - change Trainer & Tester initialization interface: two styles of definition provided - handle Optimizer construction and loss function definition in a hard manner - add argparse in task-specific scripts. (seq_labeling.py & text_classify.py) - seq_labeling.py & text_classify.py work	2018-08-22 19:10:12 +08:00
Coet	ceac3f2e1f	Merge pull request #38 from FengZiYjun/new_updates New updates	2018-08-22 10:18:46 +08:00
FengZiYjun	4c8c2dfdb8	updates to core, loader, test: - move preprocess.py from loader/ to core/ - changes to interface of preprocess: 1. add run method, to run the main processing 2. add cross validation split 3. add return value 4. merge subclasses - Trainer supports cross validation - add data as arguments in Trainer.train & Tester.test - add readme.example.py, to run the example program shown in README.md - other corresponding changes	2018-08-19 16:21:14 +08:00
Coet	fc7dd7eced	Merge pull request #33 from FengZiYjun/master Updates to cores, loader, saver	2018-08-18 16:50:45 +08:00
choosewhatulike	fb20e87321	add chinese word segmentation model	2018-08-17 00:07:38 +08:00
FengZiYjun	4bbeaebe96	Updates to cores, action, loader: - rename Inference to Predictor - rename Trainer.prepare_input to Trainer.load_train_data, load data_train.pkl only - add __contains__ method to config Section class - more code comments - more elegant make_batch & data_iterator: Samplers return batch samples instead of batch indices	2018-08-15 20:12:20 +08:00
FengZiYjun	8e6db05339	changes to Trainer, Tester & Inference: - rename "POSTrainer", "POSTester" to "SeqLabelTrainer", "SeqLabelTester" - Trainer & Tester have NO relation with Action - Inference owns independent "make_batch" & "data_forward" - Conversion to Tensor & go into cuda are done in "make_batch" - "make_batch" support maximum/minimum length	2018-08-08 20:40:44 +08:00
FengZiYjun	c1d7c5d7da	changes to action, trainer and tester: - rename "POSTrainer" to "SeqLabelTrainer" - add text classification test data - update make_batch in Trainer and Tester	2018-08-07 19:18:56 +08:00
FengZiYjun	743a6d7547	fix bugs in preprocessor	2018-08-01 10:10:55 +08:00
FengZiYjun	ef8ec3b9e4	add cws train script and corresponding config file	2018-07-30 09:52:46 +08:00
FengZiYjun	242e576a30	changes to trainer, tester, preprocessor, etc. - [tester][trainer] add cuda support - [preprocess] fix label2index for padding label seq - update README.md - [test] add test_tester.py - rename "action" to "core"	2018-07-28 11:57:25 +08:00
FengZiYjun	eb66cbe6c4	restructure module: 4 classes; add modules; move prototype and rename	2018-07-12 21:53:42 +08:00
FengZiYjun	7514be6f30	- add validation loss into trainer.train - restructure: move reproduction outside - add evaluate in tester	2018-07-11 21:51:35 +08:00
FengZiYjun	32652407df	restructure files & add "modules" directory & add CRF.py	2018-07-01 10:39:36 +08:00
FengZiYjun	3e1d995b3c	update file structures	2018-06-25 14:16:47 +08:00
FengZiYjun	58127d3c4e	start building word seg (generally seq2seq) model	2018-05-30 22:28:22 +08:00
FengZiYjun	fdd26b8e58	add LICENSE, setup.py & requirements.txt	2018-05-25 18:32:02 +08:00
FengZiYjun	3081a57ef9	optimize trainer logic & prepare charlm test	2018-05-24 10:45:01 +08:00
FengZiYjun	6b357bec40	design intermediate controller between trainer and pytorch model	2018-05-23 17:48:26 +08:00
FengZiYjun	7b46f422c7	add base methods for model.base_model	2018-05-22 11:15:27 +08:00
FengZiYjun	4f71d44999	build classes for saver	2018-05-22 10:34:03 +08:00
FengZiYjun	6e1446beb1	first commit	2018-05-21 22:36:11 +08:00

1 2 3 4 5

248 Commits